python - Get all HTML tags with Beautiful Soup

Question

Welcome To Ask or Share your Answers For Others

python - Get all HTML tags with Beautiful Soup

posted Oct 24, 2021 in Technique[技术] by 深蓝 (71.8m points)

python - Get all HTML tags with Beautiful Soup

I am trying to get a list of all html tags from beautiful soup.

I see find all but I have to know the name of the tag before I search.

If there is text like

html = """<div>something</div>
<div>something else</div>
<div class='magical'>hi there</div>
<p>ok</p>"""

How would I get a list like

list_of_tags = ["<div>", "<div>", "<div class='magical'>", "<p>"]

I know how to do this with regex, but am trying to learn BS4

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙；凝视深渊过久,深渊将回以凝视…

1 Reply

深蓝 · Answer 1 · 2021-10-23T17:46:40+0000

You don't have to specify any arguments to find_all() - in this case, BeautifulSoup would find you every tag in the tree, recursively. Sample:

>>> from bs4 import BeautifulSoup
>>>
>>> html = """<div>something</div>
... <div>something else</div>
... <div class='magical'>hi there</div>
... <p>ok</p>"""
>>> soup = BeautifulSoup(html, "html.parser")
>>> [tag.name for tag in soup.find_all()]
[u'div', u'div', u'div', u'p']
>>> [str(tag) for tag in soup.find_all()]
['<div>something</div>', '<div>something else</div>', '<div class="magical">hi there</div>', '<p>ok</p>']

Categories

python - Get all HTML tags with Beautiful Soup

python - Get all HTML tags with Beautiful Soup

Please log in or register to add a comment.

Please log in or register to reply this article.

1 Reply

Please log in or register to add a comment.

Just Browsing Browsing

Most popular tags