I have used tweepy to store the text of tweets in a csv file using Python csv.writer(
), but I had to encode the text in utf-8 before storing, otherwise tweepy throws a weird error.
Now, the text data is stored like this:
"b'Lorem Ipsumxc2xa0Assignment '"
I tried to decode this using this code (there is more data in other columns, text is in 3rd column):
with open('data.csv','rt',encoding='utf-8') as f:
reader = csv.reader(f,delimiter=',')
for row in reader:
print(row[3])
But, it doesn't decode the text. I cannot use .decode('utf-8')
as the csv reader reads data as strings i.e. type(row[3])
is 'str'
and I can't seem to convert it into bytes
, the data gets encoded once more!
How can I decode the text data?
Edit: Here's a sample line from the csv file:
67783591545656656999,3415844,1450443669.0,b'Virginia School District Closes After Backlash Over Arabic Assignment: The Augusta County school district inxe2x80xa6 | @abcde',52,18
Note: If the solution is in the encoding process, please note that I cannot afford to download the entire data again.
See Question&Answers more detail:
os 与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…