Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
4.0k views
in Technique[技术] by (71.8m points)

python - requesting different paragraphs of the body part of different articles with bs4

I need the body part of different articles on a page. They've been written in a section tag including several p tags for each paragraph. like:

<section class="...">
 <div>...</div>
 <figure>...</figure>
 <p id='...' class='...'></p>
 <p id='...' class='...'></p>
 <p id='...' class='...'></p>
</section>

<section class="...">
 <div>...</div>
 <figure>...</figure>
 <p id='...' class='...'></p>
 <p id='...' class='...'></p>
 <p id='...' class='...'></p>
</section>

If I use code below :

import requests
import re
from bs4 import BeautifulSoup

r = requests.get('url')

all_bodies = soup.find_all('section')
for i in range(len(all_bodies)):
    print(all_bodies[i])

It returns the complete content of section and if I add p tag to find_all it returns each p tag as an element of the list, but I want whole p tags of a section in one list element.


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

Add an additional loop and find all <p>:

for i in all_bodies:
    for p in i.find_all('p'):
        print(p)

Or as alternativ use css selectors to avoid that additional loop:

for p in soup.select('section p'):
    print(p) 

Example with additional for loop

from bs4 import BeautifulSoup

html = '''
<section class="...">
 <div>...</div>
 <figure>...</figure>
 <p id='...' class='...'></p>
 <p id='...' class='...'></p>
 <p id='...' class='...'></p>
</section>

<section class="...">
 <div>...</div>
 <figure>...</figure>
 <p id='...' class='...'></p>
 <p id='...' class='...'></p>
 <p id='...' class='...'></p>
</section>
'''
soup = BeautifulSoup(html, 'lxml')

all_bodies = soup.find_all('section')

for i in all_bodies:
    for p in i.find_all('p'):
        print(p)

Output

<p class="..." id="..."></p>
<p class="..." id="..."></p>
<p class="..." id="..."></p>
<p class="..." id="..."></p>
<p class="..." id="..."></p>
<p class="..." id="..."></p>

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...