I'm trying to extract publication years ISI-style data from the Thomson-Reuters Web of Science. The line for "Publication Year" looks like this (at the very beginning of a line):
PY 2015
For the script I'm writing I have defined the following regex function:
import re
f = open('savedrecs.txt')
wosrecords = f.read()
def findyears():
result = re.findall(r'PY (dddd)', wosrecords)
print result
findyears()
This, however, gives false positive results because the pattern may appear elsewhere in the data.
So, I want to only match the pattern at the beginning of a line. Normally I would use ^
for this purpose, but r'^PY (dddd)'
fails at matching my results. On the other hand, using
seems to do what I want, but that might lead to further complications for me.
See Question&Answers more detail:
os 与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…