web scraping - Multiple classes, unable to return desired page(s)

Question

Welcome To Ask or Share your Answers For Others

web scraping - Multiple classes, unable to return desired page(s)

posted Oct 7, 2021 in Technique[技术] by 深蓝 (71.8m points)

web scraping - Multiple classes, unable to return desired page(s)

first want to say that I am a first time poster so I am sorry in advance if any parts of my question or the way it is asked/presented "sucks." With that being said, I've been trying to scrape a table from barchart.com use jupyter and beautifulsoup that is on multiple pages and while I have been successful in returning the entire page as a whole, I haven't had much luck trying to return the specific pages I need. I did include some images, the first three of which reference the elements that I am currently "choosing" from to use:

the 'div' element that highlights the entire table

another 'div' element within the first 'div' that also has the entire table I need

The 'table' element that I would use but it doesn't include the left most column that includes the tickers/stock symbols

Regardless of what I have tried to put in my code, I always get a "[]" back and haven't been able to figure out how to write the multiple parts of each 'div' or 'table', if that makes sense.

Code pic

from bs4 import BeautifulSoup as soup
from urllib.request import urlopen, Request

stonks_url = Request('https://www.barchart.com/options/unusual-activity/stocks', headers={'User-Agent': 'Mozilla/5.0'})
stonks_data = urlopen(stonks_url)
stonks_html = stonks_data.read()
stonks_data.close()
page_soup = soup(stonks_html, 'html.parser')

uoa_table = page_soup.findAll('tbody', {'data-ng-repeat': 'rows in content'})
print(uoa_table)

Thanks in advance to any advice or guidance!

question from:https://stackoverflow.com/questions/65843014/multiple-classes-unable-to-return-desired-pages

与恶龙缠斗过久,自身亦成为恶龙；凝视深渊过久,深渊将回以凝视…

1 Reply

深蓝 · Answer 1 · 2021-10-06T19:31:28+0000

As this page is not working with javascript request you need to use the selenium and get the page source of the page and use it for processing the table

from bs4 import BeautifulSoup
from urllib.request import urlopen, Request
from selenium import webdriver

driver= webdriver.Chrome()
driver.get('https://www.barchart.com/options/unusual-activity/stocks')
soup = BeautifulSoup(driver.page_source, 'html.parser')


# get text
text = soup.get_text()

print(text)

Categories

web scraping - Multiple classes, unable to return desired page(s)

web scraping - Multiple classes, unable to return desired page(s)

Please log in or register to add a comment.

Please log in or register to reply this article.

1 Reply

Please log in or register to add a comment.

Just Browsing Browsing

Most popular tags