Here's a solution using selenium for browser automation
from selenium import webdriver
import pandas as pd
driver = webdriver.Chrome(chromedriver)
driver.implicitly_wait(30)
driver.get('https://www.wunderground.com/personal-weather-station/dashboard?ID=KMAHADLE7#history/tdata/s20170201/e20170201/mcustom.html')
df=pd.read_html(driver.find_element_by_id("history_table").get_attribute('outerHTML'))[0]
Time Temperature Dew Point Humidity Wind Speed Gust Pressure Precip. Rate. Precip. Accum. UV Solar
0 12:02 AM 25.5 °C 18.7 °C 75 % East 0 kph 0 kph 29.3 hPa 0 mm 0 mm 0 0 w/m2
1 12:07 AM 25.5 °C 19 °C 76 % East 0 kph 0 kph 29.31 hPa 0 mm 0 mm 0 0 w/m2
2 12:12 AM 25.5 °C 19 °C 76 % East 0 kph 0 kph 29.31 hPa 0 mm 0 mm 0 0 w/m2
3 12:17 AM 25.5 °C 18.7 °C 75 % East 0 kph 0 kph 29.3 hPa 0 mm 0 mm 0 0 w/m2
4 12:22 AM 25.5 °C 18.7 °C 75 % East 0 kph 0 kph 29.3 hPa 0 mm 0 mm 0 0 w/m2
Editing with breakdown of exactly what's happening, since the above one-liner is actually not very good self-documenting code:
After setting up the driver, we select the table with its ID value (Thankfully this site actually uses reasonable and descriptive IDs)
tab=driver.find_element_by_id("history_table")
Then, from that element, we get the HTML instead of the web driver element object
tab_html=tab.get_attribute('outerHTML')
We use pandas to parse the html
tab_dfs=pd.read_html(tab_html)
From the docs:
"read_html returns a list of DataFrame objects, even if there is only
a single table contained in the HTML content"
So we index into that list with the only table we have, at index zero
df=tab_dfs[0]
与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…