I'm trying to scrape the likes and retweets from the results of a Twitter search.
After running the Python below, I get an empty list, []
. I'm not using the Twitter API because it doesn't look at the tweets by hashtag this far back.
The code I'm using is:
from bs4 import BeautifulSoup
import requests
url = 'https://twitter.com/search?q=%23bangkokbombing%20since%3A2015-08-10%20until%3A2015-09-30&src=typd&lang=en'
r = requests.get(url)
data = r.text
soup = BeautifulSoup(data, "lxml")
all_likes = soup.find_all('span', class_='ProfileTweet-actionCountForPresentation')
print(all_likes)
I can successfully save the html to file using this code. It is missing large amounts of information when I search the text, such as the class names I am looking for...
So (part of) the problem is apparently in accurately accessing the source code.
filename = 'newfile2.txt'
with open(filename, 'w') as handle:
handle.writelines(str(data))
This screenshot shows the span that I'm trying to scrape.
I've looked at this question, and others like it, but I'm not quite getting there.
How can I use BeautifulSoup to get deeply nested div values?
See Question&Answers more detail:
os 与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…