I originally worked out a solution based on Yuva Raj's suggestion to use additional parameters in GET search/tweets - the max_id
parameter in conjunction with the id
of the last tweet returned in each iteration of a loop that also checks for the occurrence of a TweepError
.
However, I discovered there is a far simpler way to solve the problem using a tweepy.Cursor
(see tweepy Cursor tutorial for more on using Cursor
).
The following code fetches the most recent 1000 mentions of 'python'
.
import tweepy
# assuming twitter_authentication.py contains each of the 4 oauth elements (1 per line)
from twitter_authentication import API_KEY, API_SECRET, ACCESS_TOKEN, ACCESS_TOKEN_SECRET
auth = tweepy.OAuthHandler(API_KEY, API_SECRET)
auth.set_access_token(ACCESS_TOKEN, ACCESS_TOKEN_SECRET)
api = tweepy.API(auth)
query = 'python'
max_tweets = 1000
searched_tweets = [status for status in tweepy.Cursor(api.search, q=query).items(max_tweets)]
Update: in response to Andre Petre's comment about potential memory consumption issues with tweepy.Cursor
, I'll include my original solution, replacing the single statement list comprehension used above to compute searched_tweets
with the following:
searched_tweets = []
last_id = -1
while len(searched_tweets) < max_tweets:
count = max_tweets - len(searched_tweets)
try:
new_tweets = api.search(q=query, count=count, max_id=str(last_id - 1))
if not new_tweets:
break
searched_tweets.extend(new_tweets)
last_id = new_tweets[-1].id
except tweepy.TweepError as e:
# depending on TweepError.code, one may want to retry or wait
# to keep things simple, we will give up on an error
break
与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…