The webpage is something like this:
<h2>section1</h2>
<p>article</p>
<p>article</p>
<p>article</p>
<h2>section2</h2>
<p>article</p>
<p>article</p>
<p>article</p>
How can I find each section with articles within them? That is, after finding h2, find nextsiblings
until the next h2.
If the webpage were like: (which is normally the case)
<div>
<h2>section1</h2>
<p>article</p>
<p>article</p>
<p>article</p>
</div>
<div>
<h2>section2</h2>
<p>article</p>
<p>article</p>
<p>article</p>
</div>
I can write codes like:
for section in soup.findAll('div'):
...
for post in section.findAll('p')
But what should I do with the first webpage if I want to get the same result?
See Question&Answers more detail:
os 与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…