Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
314 views
in Technique[技术] by (71.8m points)

json - BeautifulSoup <small> tag

I want to scrape the text between <> element. HTML code where I want to scrape "text" inside small (which is was: 27.00). The HTML is <> class="product-views-price-old" Was: £27.00 <> My code is:

from bs4 import BeautifulSoup
import requests
url = "https://www.petshop.co.uk/Dog"
r = requests.get(url)
soup = BeautifulSoup(r.content)
for old_price in soup.find_all("small", class_ = "product-views-price-old"):
    print(old_price)

The above code gives me nothing. Even no error. How can I scrape the text between <> tags?

question from:https://stackoverflow.com/questions/65888966/beautifulsoup-small-tag

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

Content is served dynamically, so you wont get it this way with requests - Take a look at this selenium code.

To get rid of text and spaces you can do:

.get_text(strip=True).replace('Was: ','')

Example

from selenium import webdriver
from bs4 import BeautifulSoup
import time

url = "https://www.petshop.co.uk/Dog"
driver = webdriver.Chrome('C:Program FilesChromeDriverchromedriver.exe')
driver.get(url)
time.sleep(3)

html = driver.page_source
soup = BeautifulSoup(html,'html.parser')
for old_price in soup.find_all("small", class_ = "product-views-price-old"):
    print(old_price.get_text(strip=True).replace('Was: ',''))

driver.quit()

Output

£2.20
£18.61
£27.00
£38.39
£38.39
£20.65
£1.30
£67.99
£20.65
£1.30
£54.95
£30.99

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

1.4m articles

1.4m replys

5 comments

56.9k users

...