I'm writing a small tool to extract a bunch of values from a string (usually a tweet).
The string could consist of words and numbers along with an amount prefixed by a currency symbol (£,$,€ etc.) and a number of hashtags (#foo #bar). I'm running on appEngine and using tweepy to bring in the tweets.
The current code I have to find the values is below:
tagex = re.compile(r'#.*')
curex = re.compile(ur'[£].*')
for x in api.user_timeline(since_id = t.lastimport):
tags = re.findall(tagex, x.text)
amount = re.findall(curex, x.text)[0]
logging.info("Text: " + x.text)
logging.info("Tags: " + str(tags))
logging.info("Amount: " + amount)
where x.text is for example "Taxi London £6.50 #projectfoo #clientmeeting"
The tagex finds the hashtags fine, but I can't get curex to extract the amount currently I get:
Amount: £6.50 #projectfoo #clientmeeting.
I also need to separate off the currency symbol so as to get the amount as a float, but that should be pretty simple later.
See Question&Answers more detail:
os 与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…