I'm writing a little Python script that parses word docs and writes to a csv file. However, some of the docs have some utf-8 characters that my script can't process correctly.
Fancy quotes show up quite often (u'u201c'). Is there a quick and easy (and smart) way of replacing those with the neutral ascii-supported quotes, so I can just write line.encode('ascii')
to the csv file?
I have tried to find the left quote and replace it:
val = line.find(u'u201c')
if val >= 0: line[val] = '"'
But to no avail:
TypeError: 'unicode' object does not support item assignment
Is what I've described a good strategy? Or should I just set up the csv to support utf-8 (though I'm not sure if the application that will be reading the CSV wants utf-8)?
Thank you
See Question&Answers more detail:
os 与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…