Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
353 views
in Technique[技术] by (71.8m points)

python - Extracting tweets through twitter using Tweepy

After sucessfully appending tweets to my csv file, I saw that the tweets were shortened and had a new text at the place where they were shortened.

For eg: original tweet looks like this

Career in Risk Management Some of the programmes and qualifications in the field are:

  1. GARP’s Financial Risk Management (FRM) Certification
  2. IRM’s Enterprise Risk Management (ERM) Qualification
  3. MBA/Masters in Risk Management

My tweet has a body like this: Career in Risk Management Some of the programmes and qualifications in the field are: 1. GARPxe2x80x99s Financial Risk Maxe2x80xa6 (add link here).

any idea how i can solve this problem?

Sharing my code here:

auth = tweepy.OAuthHandler('xxxx', 'xxxx') 
auth.set_access_token('xxxx', 'xxxx')
api = tweepy.API(auth)
search_words = "jobs"      #enter your words
new_search = search_words + " -filter:retweets"
csvFile = open('jobs.csv', 'a')
csvWriter = csv.writer(csvFile)
for tweet in tweepy.Cursor(api.search,q=new_search,count=100,lang="en",since_id=0).items():
        csvWriter.writerow([tweet.created_at,tweet.text.encode('utf8'), tweet.user.screen_name.encode('utf-8'), tweet.favorite_count, tweet.retweet_count,tweet.truncated,tweet.user.location.encode('utf-8'), tweet.source])
question from:https://stackoverflow.com/questions/65905714/extracting-tweets-through-twitter-using-tweepy

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

So what's happening here is you're also catching the special characters, is a common one and is simply a line break, the way I thought of first is with the .split() function, but that splits the string into an array, though it does delete the character, then I found the .replace() function that would look like this to get rid of the line break characters:

tweetToCut.replace('
', '')

That would get rid of the line breaks, though you'd have to do this with every character, but you can chain them so it'd look like:

tweetToCut.replace('
', '').replace('xe2', '')

Though unless if you just want the text of the tweet, the characters you want to remove are required for the formatting of the tweet, so if you intend to just use the text, you're good to remove them, but if you do want the formatting I recommend you keep those characters unless you want to reformat the tweets.


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...