Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
291 views
in Technique[技术] by (71.8m points)

web scraping - Multiple classes, unable to return desired page(s)

first want to say that I am a first time poster so I am sorry in advance if any parts of my question or the way it is asked/presented "sucks." With that being said, I've been trying to scrape a table from barchart.com use jupyter and beautifulsoup that is on multiple pages and while I have been successful in returning the entire page as a whole, I haven't had much luck trying to return the specific pages I need. I did include some images, the first three of which reference the elements that I am currently "choosing" from to use:

the 'div' element that highlights the entire table

another 'div' element within the first 'div' that also has the entire table I need

The 'table' element that I would use but it doesn't include the left most column that includes the tickers/stock symbols

Regardless of what I have tried to put in my code, I always get a "[]" back and haven't been able to figure out how to write the multiple parts of each 'div' or 'table', if that makes sense.

Code pic

from bs4 import BeautifulSoup as soup
from urllib.request import urlopen, Request

stonks_url = Request('https://www.barchart.com/options/unusual-activity/stocks', headers={'User-Agent': 'Mozilla/5.0'})
stonks_data = urlopen(stonks_url)
stonks_html = stonks_data.read()
stonks_data.close()
page_soup = soup(stonks_html, 'html.parser')

uoa_table = page_soup.findAll('tbody', {'data-ng-repeat': 'rows in content'})
print(uoa_table)

Thanks in advance to any advice or guidance!

question from:https://stackoverflow.com/questions/65843014/multiple-classes-unable-to-return-desired-pages

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

As this page is not working with javascript request you need to use the selenium and get the page source of the page and use it for processing the table

from bs4 import BeautifulSoup
from urllib.request import urlopen, Request
from selenium import webdriver

driver= webdriver.Chrome()
driver.get('https://www.barchart.com/options/unusual-activity/stocks')
soup = BeautifulSoup(driver.page_source, 'html.parser')


# get text
text = soup.get_text()

print(text)


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...