The 'a' tag in your html does not have any text directly, but it contains a 'h3' tag that has text. This means that text
is None, and .find_all()
fails to select the tag. Generally do not use the text
parameter if a tag contains any other html elements except text content.
You can resolve this issue if you use only the tag's name (and the href
keyword argument) to select elements. Then add a condition in the loop to check if they contain text.
soup = BeautifulSoup(html, 'html.parser')
links_with_text = []
for a in soup.find_all('a', href=True):
if a.text:
links_with_text.append(a['href'])
Or you could use a list comprehension, if you prefer one-liners.
links_with_text = [a['href'] for a in soup.find_all('a', href=True) if a.text]
Or you could pass a lambda
to .find_all()
.
tags = soup.find_all(lambda tag: tag.name == 'a' and tag.get('href') and tag.text)
If you want to collect all links whether they have text or not, just select all 'a' tags that have a 'href' attribute. Anchor tags usually have links but that's not a requirement, so I think it's best to use the href
argument.
Using .find_all()
.
links = [a['href'] for a in soup.find_all('a', href=True)]
Using .select()
with CSS selectors.
links = [a['href'] for a in soup.select('a[href]')]
与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…