Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
683 views
in Technique[技术] by (71.8m points)

python - Using BeautifulSoup to extract text without tags

My webpage looks like this:

<p>
  <strong class="offender">YOB:</strong> 1987<br/>
  <strong class="offender">RACE:</strong> WHITE<br/>
  <strong class="offender">GENDER:</strong> FEMALE<br/>
  <strong class="offender">HEIGHT:</strong> 5'05''<br/>
  <strong class="offender">WEIGHT:</strong> 118<br/>
  <strong class="offender">EYE COLOR:</strong> GREEN<br/>
  <strong class="offender">HAIR COLOR:</strong> BROWN<br/>
</p>

I want to extract the info for each individual and get YOB:1987, RACE:WHITE, etc...

What I tried is:

subc = soup.find_all('p')
subc1 = subc[1]
subc2 = subc1.find_all('strong')

But this gives me only the values of YOB:, RACE:, etc...

Is there a way that I can get the data in YOB:1987, RACE:WHITE format?

Question&Answers:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

Just loop through all the <strong> tags and use next_sibling to get what you want. Like this:

for strong_tag in soup.find_all('strong'):
    print(strong_tag.text, strong_tag.next_sibling)

Demo:

from bs4 import BeautifulSoup

html = '''
<p>
  <strong class="offender">YOB:</strong> 1987<br />
  <strong class="offender">RACE:</strong> WHITE<br />
  <strong class="offender">GENDER:</strong> FEMALE<br />
  <strong class="offender">HEIGHT:</strong> 5'05''<br />
  <strong class="offender">WEIGHT:</strong> 118<br />
  <strong class="offender">EYE COLOR:</strong> GREEN<br />
  <strong class="offender">HAIR COLOR:</strong> BROWN<br />
</p>
'''

soup = BeautifulSoup(html)

for strong_tag in soup.find_all('strong'):
    print(strong_tag.text, strong_tag.next_sibling)

This gives you:

YOB:  1987
RACE:  WHITE
GENDER:  FEMALE
HEIGHT:  5'05''
WEIGHT:  118
EYE COLOR:  GREEN
HAIR COLOR:  BROWN

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...