I am using Beautiful Soup to pull out specific div tags, and it seems I can't use
simple string matching.
The page has some tags in the form of
<div class="comment form new"...>
which I want to ignore, and also some tags in the form of
<div class="comment comment-xxxx...">
where the x's represent an integer of arbitrary length, and the ellipses represents an arbitrary number of other values separated by white spaces (that I'm not concerned about). I can't figure out the
correct regex expression, especially since I've never used python's re class.
Using
soup.find_all(class_="comment")
finds all tags starting with the word comment. I have tried using
soup.find_all(class_=re.compile(r'(comment)( )(comment)'))
soup.find_all(class_=re.compile(r'comment comment.*'))
and lots of other variations, but I think I'm missing something obvious here about how regex expressions or match() work. Can anyone help me out?
See Question&Answers more detail:
os 与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…