python - Python-使用Beautifulsoup从网页提取数据(Python - Extracting data from web page using Beautifulsoup)

Question

Welcome To Ask or Share your Answers For Others

python - Python-使用Beautifulsoup从网页提取数据(Python - Extracting data from web page using Beautifulsoup)

posted Mar 6, 2021 in Technique[技术] by 深蓝 (71.8m points)

python - Python-使用Beautifulsoup从网页提取数据(Python - Extracting data from web page using Beautifulsoup)

I am trying to scrape some data from a webpage using bs4 Given below is what I have done thus far,

(我正在尝试使用bs4从网页上抓取一些数据鉴于以下是我到目前为止所做的事情，)

import requests
from bs4 import BeautifulSoup


url = 'www.website.com'
response = requests.get(url)

soup = BeautifulSoup(response.text, "html.parser")

for article in soup.find_all('section'):
    print(article)

The above code returns the below output:

(上面的代码返回以下输出：)

<section>
<ul class="row-full-width" style="margin:0; list-style: none; padding-left: 0; font-size: 120%">
<li class="four columns">

  Comp A:

  <i class="icon-rupee"></i>
<b>136.90</b>

  Cr.


 </li>
<li class="four columns">

  Comp B:

  <i class="icon-rupee"></i>
<b>10.95</b>
</li>
<li class="four columns">

  Comp C:
  <i class="icon-rupee"></i> <b>49.60</b> / <b>10.20</b>
</li>
<li class="four columns">

  Comp D:

  <i class="icon-rupee"></i>
<b>6.61</b>
</li>
<li class="four columns">

  Comp E:

  <b>25.78</b>
</li>
<li class="four columns">

  Comp F:

  <b>0.00</b>

  %


</li>
<li class="four columns">

  Comp G:

  <b>9.39</b>

  %


</li>
<li class="four columns">

  Comp H:

  <b>6.54</b>

  %


 </li>
<li class="four columns">

  Comp I:

  <b>19.39</b>

  %


</li>
<li class="four columns">

I am trying to extract each of the Comp's and their corresponding values:

(我试图提取每个Comp及其对应的值：)

Expected Output :

(预期产量：)

Comp A,136.90 Cr
Comp B, 10.95
Comp C, 49.60/10.20
Comp D, 6.61
Comp E, 25.78
Comp F, 0.0%
Comp G, 9.39%
Comp H, 6.54%
Comp I, 19.39%

ask by scott martin translate from so

与恶龙缠斗过久,自身亦成为恶龙；凝视深渊过久,深渊将回以凝视…

1 Reply

深蓝 · Answer 1 · 2021-03-06T03:31:49+0000

You can use get_text() method with separator= parameter and then split the string.

(您可以使用带有separator=参数的get_text()方法，然后分割字符串。)

For example ( data contains your HTML string):

(例如（ data包含您的HTML字符串）：)

from bs4 import BeautifulSoup

soup = BeautifulSoup(data, 'html.parser')

print(soup.prettify())

for li in soup.select('li'):
    row = li.get_text(strip=True, separator='|').split('|')
    col1, col2 = row[0].replace(':', ''), ' '.join(row[1:])
    print('{:<20}{:<20}'.format(col1, col2))

Prints:

(印刷品：)

Comp A              136.90 Cr.          
Comp B              10.95               
Comp C              49.60 / 10.20       
Comp D              6.61                
Comp E              25.78               
Comp F              0.00 %              
Comp G              9.39 %              
Comp H              6.54 %              
Comp I              19.39 %

Categories

python - Python-使用Beautifulsoup从网页提取数据(Python - Extracting data from web page using Beautifulsoup)

python - Python-使用Beautifulsoup从网页提取数据(Python - Extracting data from web page using Beautifulsoup)

Please log in or register to add a comment.

Please log in or register to reply this article.

1 Reply

Please log in or register to add a comment.

Just Browsing Browsing

Most popular tags