python - BeautifulSoup: what's the difference between 'lxml' and 'html.parser' and 'html5lib' parsers?

Question

Welcome To Ask or Share your Answers For Others

python - BeautifulSoup: what's the difference between 'lxml' and 'html.parser' and 'html5lib' parsers?

posted Oct 17, 2021 in Technique[技术] by 深蓝 (71.8m points)

python - BeautifulSoup: what's the difference between 'lxml' and 'html.parser' and 'html5lib' parsers?

When using Beautiful Soup what is the difference between 'lxml' and "html.parser" and "html5lib"?

When would you use one over the other and the benefits of each? When I used each they seemed to be interchangeable, but people here correct me that I should be using a different one. I'd like to strengthen my understanding; I've read a couple posts on here about this but they're not going over the uses much in any at all.

Example:

soup = BeautifulSoup(response.text, 'lxml')

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙；凝视深渊过久,深渊将回以凝视…

1 Reply

深蓝 · Answer 1 · 2021-10-17T02:51:17+0000

From the docs's summarized table of advantages and disadvantages:

html.parser - BeautifulSoup(markup, "html.parser")
- Advantages: Batteries included, Decent speed, Lenient (as of Python 2.7.3 and 3.2.)
- Disadvantages: Not very lenient (before Python 2.7.3 or 3.2.2)
lxml - BeautifulSoup(markup, "lxml")
- Advantages: Very fast, Lenient
- Disadvantages: External C dependency
html5lib - BeautifulSoup(markup, "html5lib")
- Advantages: Extremely lenient, Parses pages the same way a web browser does, Creates valid HTML5
- Disadvantages: Very slow, External Python dependency

Categories

python - BeautifulSoup: what's the difference between 'lxml' and 'html.parser' and 'html5lib' parsers?

python - BeautifulSoup: what's the difference between 'lxml' and 'html.parser' and 'html5lib' parsers?

Please log in or register to add a comment.

Please log in or register to reply this article.

1 Reply

Please log in or register to add a comment.

Just Browsing Browsing

Most popular tags