So I am looking forward to make the scraping of the table that appears in this link.
In order scrape I decided to use selenium.
In my first try what I did was:
driver = webdriver.Chrome(ChromeDriverManager().install())
driver.get(url)
html_source = self.driver.page_source
self.driver.quit()
BeautifulSoup(html_source, "html5lib")
table = soup.find('table', {'class': 'heavy-table ncpulse-fav-table ncpulse-sortable compressed-table'})
df = pd.read_html(str(table), flavor='html5lib', header=0, thousands='.', decimal=',')
However it output the error
'no tables found'
Then I tried to make use of expected_conditions class because as I looked up in SO maybe the "Page Source was pulled out even before the child elements have completely rendered"
Therefore I tried something like this:
driver.get(route)
element_present = expected_conditions.presence_of_element_located(
(By.CLASS_NAME, 'heavy-table ncpulse-fav-table ncpulse-sortable compressed-table'))
WebDriverWait(driver, 20).until(element_present)
html_source = driver.page_source
driver.quit()
However this time it outputs :
selenium.common.exceptions.TimeoutException: Message
Therefore my questions are: How could I obtain the desired output? What am I doing wrong with the use of the expected_conditions
class? What is the issue/front-end-technology behind that makes it such a struggle to scrape the table?
See Question&Answers more detail:
os 与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…