Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
632 views
in Technique[技术] by (71.8m points)

html - how to extract the text from the div tag using BeautifulSoup and python

I am trying to extract the text that exist inside a div tag using BeautifulSoup package in python.

example I want to extract the text inside the tag <p></p>

and the text inside <dt> and <dd>

When I run the code the system crash and display the below error:

--------------------------------------------------------------------------- AttributeError Traceback (most recent call last) in 60 # # # article_body = s.find('div', {'class' :'card-content t-small bt p20'}).text 61 # text_info = s.find_all("div",{"class":"card-content is-spaced"}) ---> 62 text_desc = text_info.find('div', attrs={'class':'card-content t-small bt p20'}).getText(strip=True) 63 64 print(f"{date_published} {title} {text_desc} ", "-" * 80)

f:aienvlibsite-packagess4element.py in getattr(self, key)
2172 """Raise a helpful exception to explain a common code fix.""" 2173 raise AttributeError( -> 2174 "ResultSet object has no attribute '%s'. You're probably treating a list of elements like a single element. Did you call find_all() when you meant to call find()?" % key 2175
)

AttributeError: ResultSet object has no attribute 'find'. You're probably treating a list of elements like a single element. Did you call find_all() when you meant to call find()?

html

<div class="card-content t-small bt p20" style="max-height:50vh" data-viewsize='{"d":{"height": {"max": 1}}, "offset":"JobSearch.jobViewSize"}'>
<h2 class="h6">Job Description</h2>
<p>The Executive Chef has full knowledge and capability of managing the general operations of the kitchen, specialty outlets kitchen including Stewarding.</p>
<h2 class="h6 p10t">Skills</h2>
<p>?  Provide, develop, train and maintain a professional workforce? Excellent in English both in oral and written.? Computer knowledge is required and good in correspondences and reports writing.</p>
<h2 class="h6 p10t">Job Details</h2>
<dl class="dlist is-spaced is-fitted t-small m0">
<div>
<dt>Job Location</dt>
<dd> Al Olaya, Riyadh , Saudi Arabia </dd>
</div>
<div>
<dt>Company Industry</dt>
<dd>Food & Beverage Production; Entertainment; Catering, Food Service, & Restaurant</dd>
</div>
<div>
<dt>Company Type</dt>
<dd>Employer (Private Sector)</dd>
</div>
<div>
<dt>Job Role</dt>
<dd>Hospitality and Tourism</dd>
</div>
<div>
<dt>Employment Type</dt>
<dd>Unspecified</dd>
</div>
<div>
<dt>Monthly Salary Range</dt>
<dd>$4,000 - $5,000</dd>
</div>
<div>
<dt>Number of Vacancies</dt>
<dd>1</dd>
</div>
</dl>
<h2 class="h6 p10t">Preferred Candidate</h2>
<dl class="dlist is-spaced is-fitted t-small m0">
<div>
<dt>Career Level</dt>
<dd>Management</dd>
</div>
<div>
<dt>Years of Experience</dt>
<dd>Min: 10 Max: 20</dd>
</div>
<div>
<dt>Residence Location</dt>
<dd> Riyadh, Saudi Arabia ; Algeria; Bahrain; Comoros; Djibouti; Egypt; Iraq; Jordan; Kuwait; Lebanon; Libya; Mauritania; Morocco; Oman; Palestine; Qatar; Saudi Arabia; Somalia; Sudan; Syria; Tunisia; United Arab Emirates; Yemen</dd>
</div>
<div>
<dt>Gender</dt>
<dd>Male</dd>
</div>
<div>
<dt>Age</dt>
<dd>Min: 26 Max: 55</dd>
</div>
</dl>
</div>

================================================

code:

import time
import requests
from bs4 import BeautifulSoup

soup = BeautifulSoup(
    requests.get("https://www.bayt.com/en/international/jobs/executive-chef-jobs/").content,
    "lxml"
)

links = []
for a in soup.select("h2.m0.t-regular a"):
    if a['href'] not in links:
        links.append("https://www.bayt.com"+ a['href'])

for link in links:
    s = BeautifulSoup(requests.get(link).content, "lxml")
    text_info = s.find_all("div",{"class":"card-content is-spaced"})
    text_desc = text_info.find('div', attrs={'class':'card-content t-small bt p20'}).getText(strip=True)
    
    print(f"{date_published} {title}

{text_desc}
", "-" * 80)
question from:https://stackoverflow.com/questions/65905303/how-to-extract-the-text-from-the-div-tag-using-beautifulsoup-and-python

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

you are doing a find_all and then using it, maybe you need to do a loop for text in text_info: and extract the information of the loop. if you want the first div use find instead of find_all

Hope that could help you!


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...