I'm trying to access an excel table from this website to bring in as a DataFrame. Here is what I have:
import pandas as pd
from urllib.request import urlopen
from bs4 import BeautifulSoup
url = 'https://tedb.ornl.gov/data/'
html = urlopen(url)
soup = BeautifulSoup(html, 'html.parser')
# Select all 'a' elements with href attributes containing URLs starting with https://
for link in soup.select('a[href^="https://"]'):
href = link.get('href')
print(href)
I'd like to grab Table 4.01, whose link, when inspected, is contained within the HTML element:
<a href="https://tedb.ornl.gov/wp-content/uploads/2020/06/Table4_01_06242020.xlsx">xlsx</a>
However, when I run my code, all I get back are the links below:
https://www.ornl.gov
https://tedb.ornl.gov/
https://tedb.ornl.gov/data/
https://tedb.ornl.gov/archive/
https://tedb.ornl.gov/citation/
https://tedb.ornl.gov/contact/
https://tedb.ornl.gov/wp-content/uploads/2020/02/TEDB_Ed_38.pdf
https://tedb.ornl.gov/wp-content/uploads/2020/08/TEDB_38.2_Spreadsheets_08312020.zip
https://tedb.ornl.gov/wp-content/uploads/2020/08/Updates_08312020.pdf
https://www.ornl.gov/ornl/contact-us/Security--Privacy-Notice
https://www.ornl.gov/content/accessibility
https://www.ornl.gov/content/notice-nondiscrimination-and-accessibility-requirements
Does anyone know why the excel link I'm looking for does not show up?
question from:
https://stackoverflow.com/questions/66055635/i-need-help-extracting-embedded-xlsx-link-from-a-webpage-using-python-beautiful 与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…