I would take advantage of Pandas/NumPy indexing. Since your synonym mapping is many-to-one, you can re-index using the Word
column.
sd = sd.applymap(str.strip).applymap(str.lower).set_index('Word').Synonyms
print(sd)
Word
drove drive
office downtown
everyday daily
day daily
Name: Synonyms, dtype: object
Then, you can easily align a list of tokens to their respective synonyms.
words = nltk.word_tokenize(u'i drove to office everyday in my car')
sentence = sd[words].reset_index()
print(sentence)
Word Synonyms
0 i NaN
1 drove drive
2 to NaN
3 office downtown
4 everyday daily
5 in NaN
6 my NaN
7 car NaN
Now, it remains to use the tokens from Synonyms
, falling back to Word
. This can be achieved with
sentence = sentence.Synonyms.fillna(sentence.Word)
print(sentence.values)
[u'i' 'drive' u'to' 'downtown' 'daily' u'in' u'my' u'car']
与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…