the web page being scraped
the wrong output i get
So basically I was trying to scrape over those rows of streamers on each page with the tag name "tr". And in each row, there's multiple columns that I want to include into my output. I was able to include almost all of those columns, but there's two that have the same tag name frustrated me a lot. (The two columns about followers) I tried the method of indexing them to get only odd or even, but the result is included in the second picture and it did not work out well. The numbers just keeps repeating itself and does not go down the way as it should. So is there some way to get the column of "followers gained" correctly into the output?
It's my first time asking here, so i am not sure if it is enough. I am glad to update more info later if needed.
for i in range(30): # Number of pages plus one
url = "https://twitchtracker.com/channels/viewership?page={}&searchbox=Course+Name&searchbox_zip=ZIP&distance=50&price_range=0&course_type=both&has_events=0".format(i)
headers = {'User-agent': 'Mozilla/5.0'}
page = requests.get(url, headers=headers)
soup = BeautifulSoup(page.content)
channels = soup.find_all('tr')
for idx, channel in enumerate(channels):
if idx % 2 == 1:
idx += 1
Name = ", ".join([p.get_text(strip=True) for p in channel.find_all('a', attrs={'style': 'color:inherit'})])
Avg = ", ".join([p.get_text(strip=True) for p in channel.find_all('td', class_ = 'color-viewers')])
Time = ", ".join([p.get_text(strip=True) for p in channel.find_all('td', class_ = 'color-streamed')])
All = ", ".join([p.get_text(strip=True) for p in channel.find_all('td', class_ = 'color-viewersMax')])
HW = ", ".join([p.get_text(strip=True) for p in channel.find_all('td', class_ = 'color-watched')])
FG = ", ".join([soup.find_all('td', class_ = 'color-followers hidden-sm')[idx].get_text(strip=True)])
question from:
https://stackoverflow.com/questions/65895834/how-to-extract-elements-belong-to-the-same-tag-name-in-web-scraping 与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…