first want to say that I am a first time poster so I am sorry in advance if any parts of my question or the way it is asked/presented "sucks." With that being said, I've been trying to scrape a table from barchart.com use jupyter and beautifulsoup that is on multiple pages and while I have been successful in returning the entire page as a whole, I haven't had much luck trying to return the specific pages I need. I did include some images, the first three of which reference the elements that I am currently "choosing" from to use:
the 'div' element that highlights the entire table
another 'div' element within the first 'div' that also has the entire table I need
The 'table' element that I would use but it doesn't include the left most column that includes the tickers/stock symbols
Regardless of what I have tried to put in my code, I always get a "[]" back and haven't been able to figure out how to write the multiple parts of each 'div' or 'table', if that makes sense.
Code pic
from bs4 import BeautifulSoup as soup from urllib.request import urlopen, Request stonks_url = Request('https://www.barchart.com/options/unusual-activity/stocks', headers={'User-Agent': 'Mozilla/5.0'}) stonks_data = urlopen(stonks_url) stonks_html = stonks_data.read() stonks_data.close() page_soup = soup(stonks_html, 'html.parser') uoa_table = page_soup.findAll('tbody', {'data-ng-repeat': 'rows in content'}) print(uoa_table)
Thanks in advance to any advice or guidance!
As this page is not working with javascript request you need to use the selenium and get the page source of the page and use it for processing the table
from bs4 import BeautifulSoup from urllib.request import urlopen, Request from selenium import webdriver driver= webdriver.Chrome() driver.get('https://www.barchart.com/options/unusual-activity/stocks') soup = BeautifulSoup(driver.page_source, 'html.parser') # get text text = soup.get_text() print(text)
1.4m articles
1.4m replys
5 comments
56.9k users