Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
659 views
in Technique[技术] by (71.8m points)

python - pandas read_html ValueError: No tables found

I am trying to scrap the historical weather data from the "https://www.wunderground.com/personal-weather-station/dashboard?ID=KMAHADLE7#history/tdata/s20170201/e20170201/mcustom.html" weather underground page. I have the following code:

import pandas as pd 

page_link = 'https://www.wunderground.com/personal-weather-station/dashboard?ID=KMAHADLE7#history/tdata/s20170201/e20170201/mcustom.html'
df = pd.read_html(page_link)
print(df)

I have the following response:

Traceback (most recent call last):
 File "weather_station_scrapping.py", line 11, in <module>
  result = pd.read_html(page_link)
 File "/anaconda3/lib/python3.6/site-packages/pandas/io/html.py", line 987, in read_html
  displayed_only=displayed_only)
 File "/anaconda3/lib/python3.6/site-packages/pandas/io/html.py", line 815, in _parse raise_with_traceback(retained)
 File "/anaconda3/lib/python3.6/site-packages/pandas/compat/__init__.py", line 403, in raise_with_traceback
  raise exc.with_traceback(traceback)
ValueError: No tables found

Although, this page clearly has a table but it is not being picked by the read_html. I have tried using Selenium so that the page can be loaded before I read it.

from selenium import webdriver
from selenium.webdriver.common.keys import Keys

driver = webdriver.Firefox()
driver.get("https://www.wunderground.com/personal-weather-station/dashboard?ID=KMAHADLE7#history/tdata/s20170201/e20170201/mcustom.html")
elem = driver.find_element_by_id("history_table")

head = elem.find_element_by_tag_name('thead')
body = elem.find_element_by_tag_name('tbody')

list_rows = []

for items in body.find_element_by_tag_name('tr'):
    list_cells = []
    for item in items.find_elements_by_tag_name('td'):
        list_cells.append(item.text)
    list_rows.append(list_cells)
driver.close()

Now, the problem is that it cannot find "tr". I would appreciate any suggestions.

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

Here's a solution using selenium for browser automation

from selenium import webdriver
import pandas as pd
driver = webdriver.Chrome(chromedriver)
driver.implicitly_wait(30)

driver.get('https://www.wunderground.com/personal-weather-station/dashboard?ID=KMAHADLE7#history/tdata/s20170201/e20170201/mcustom.html')
    df=pd.read_html(driver.find_element_by_id("history_table").get_attribute('outerHTML'))[0]

Time    Temperature Dew Point   Humidity    Wind    Speed   Gust    Pressure  Precip. Rate. Precip. Accum.  UV  Solar
0   12:02 AM    25.5 °C 18.7 °C 75 %    East    0 kph   0 kph   29.3 hPa    0 mm    0 mm    0   0 w/m2
1   12:07 AM    25.5 °C 19 °C   76 %    East    0 kph   0 kph   29.31 hPa   0 mm    0 mm    0   0 w/m2
2   12:12 AM    25.5 °C 19 °C   76 %    East    0 kph   0 kph   29.31 hPa   0 mm    0 mm    0   0 w/m2
3   12:17 AM    25.5 °C 18.7 °C 75 %    East    0 kph   0 kph   29.3 hPa    0 mm    0 mm    0   0 w/m2
4   12:22 AM    25.5 °C 18.7 °C 75 %    East    0 kph   0 kph   29.3 hPa    0 mm    0 mm    0   0 w/m2

Editing with breakdown of exactly what's happening, since the above one-liner is actually not very good self-documenting code:

After setting up the driver, we select the table with its ID value (Thankfully this site actually uses reasonable and descriptive IDs)

tab=driver.find_element_by_id("history_table")

Then, from that element, we get the HTML instead of the web driver element object

tab_html=tab.get_attribute('outerHTML')

We use pandas to parse the html

tab_dfs=pd.read_html(tab_html)

From the docs:

"read_html returns a list of DataFrame objects, even if there is only a single table contained in the HTML content"

So we index into that list with the only table we have, at index zero

df=tab_dfs[0]

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...