python - Using BeautifulSoup to search HTML for string

Question

Welcome To Ask or Share your Answers For Others

python - Using BeautifulSoup to search HTML for string

posted Oct 17, 2021 in Technique[技术] by 深蓝 (71.8m points)

python - Using BeautifulSoup to search HTML for string

I am using BeautifulSoup to look for user-entered strings on a specific page. For example, I want to see if the string 'Python' is located on the page: http://python.org

When I used: find_string = soup.body.findAll(text='Python'), find_string returned []

But when I used: find_string = soup.body.findAll(text=re.compile('Python'), limit=1), find_string returned [u'Python Jobs'] as expected

What is the difference between these two statements that makes the second statement work when there are more than one instances of the word to be searched?

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙；凝视深渊过久,深渊将回以凝视…

1 Reply

深蓝 · Answer 1 · 2021-10-17T00:10:17+0000

The following line is looking for the exact NavigableString 'Python':

>>> soup.body.findAll(text='Python')
[]

Note that the following NavigableString is found:

>>> soup.body.findAll(text='Python Jobs') 
[u'Python Jobs']

Note this behaviour:

>>> import re
>>> soup.body.findAll(text=re.compile('^Python$'))
[]

So your regexp is looking for an occurrence of 'Python' not the exact match to the NavigableString 'Python'.

Categories

python - Using BeautifulSoup to search HTML for string

python - Using BeautifulSoup to search HTML for string

Please log in or register to add a comment.

Please log in or register to reply this article.

1 Reply

Please log in or register to add a comment.

Just Browsing Browsing

Most popular tags