regex - Python regular expression for Beautiful Soup

Question

Welcome To Ask or Share your Answers For Others

regex - Python regular expression for Beautiful Soup

posted Oct 24, 2021 in Technique[技术] by 深蓝 (71.8m points)

regex - Python regular expression for Beautiful Soup

I am using Beautiful Soup to pull out specific div tags, and it seems I can't use simple string matching.

The page has some tags in the form of

<div class="comment form new"...>

which I want to ignore, and also some tags in the form of

<div class="comment comment-xxxx...">

where the x's represent an integer of arbitrary length, and the ellipses represents an arbitrary number of other values separated by white spaces (that I'm not concerned about). I can't figure out the correct regex expression, especially since I've never used python's re class.

Using

soup.find_all(class_="comment")

finds all tags starting with the word comment. I have tried using

soup.find_all(class_=re.compile(r'(comment)( )(comment)'))
soup.find_all(class_=re.compile(r'comment comment.*'))

and lots of other variations, but I think I'm missing something obvious here about how regex expressions or match() work. Can anyone help me out?

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙；凝视深渊过久,深渊将回以凝视…

1 Reply

深蓝 · Answer 1 · 2021-10-23T18:42:21+0000

I think I've got it:

>>> [div['class'] for div in soup.find_all('div')]
[['comment', 'form', 'new'], ['comment', 'comment-xxxx...']]

Notice that, unlike the equivalent in BS3, it's not this:

['comment form new', 'comment comment-xxxx...']

And that's why your regexps won't match.

But you can match, e.g., this:

>>> soup.find_all('div', class_=re.compile('comment-'))
[<div class="comment comment-xxxx..."></div>]

Note that BS does the equivalent of re.search, not re.match, so you don't need 'comment-.*'. Of course if you want to match 'comment-12345' but not 'comment-of-another-kind you'd want, e.g., 'comment-d+'.

Categories

regex - Python regular expression for Beautiful Soup

regex - Python regular expression for Beautiful Soup

Please log in or register to add a comment.

Please log in or register to reply this article.

1 Reply

Please log in or register to add a comment.

Just Browsing Browsing

Most popular tags