python - Scraping data from href

Question

Welcome To Ask or Share your Answers For Others

python - Scraping data from href

posted Jan 31, 2022 in Technique[技术] by 深蓝 (71.8m points)

python - Scraping data from href

I was trying to get the postcodes for DFS, for that i tried getting the href for each shop and then click on it, the next page has shop location from which i can get the postal code, but i am able to get things working, Where am i going wrong? I tried getting upper level attribute first td.searchResults and then for each of them i am trying to click on href with title DFS and after clicking getting the postalCode. Eventually iterate for all three pages. If there is a better way to do it let me know.

 driver = webdriver.Firefox()
    driver.get('http://www.localstore.co.uk/stores/75061/dfs/')
    html = driver.page_source
    soup = BeautifulSoup(html)
    listings = soup.select('td.searchResults')
    for l in listings:
         while True:      
              driver.find_element_by_css_selector("a[title*='DFS']").click()
              shops= {}
              #info = soup.find('span', itemprop='postalCode').contents
              html = driver.page_source
              soup = BeautifulSoup(html)
              info = soup.find(itemprop="postalCode").get_text()
              shops.append(info)

Update:

driver = webdriver.Firefox()
driver.get('http://www.localstore.co.uk/stores/75061/dfs/')
html = driver.page_source
soup = BeautifulSoup(html)
listings = soup.select('td.searchResults')

for l in listings:
    driver.find_element_by_css_selector("a[title*='DFS']").click()
    shops = []
    html = driver.page_source
    soup = BeautifulSoup(html)
    info = soup.find_all('span', attrs={"itemprop": "postalCode"})
    for m in info:
        if m:
           m_text = m.get_text()
           shops.append(m_text)
    print (shops)

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙；凝视深渊过久,深渊将回以凝视…

1 Reply

深蓝 · Answer 1 · 2022-01-31T07:13:30+0000

So after playing with this for a little while, I don't think the best way to do this is with selenium. It would require using driver.back() and waiting for elements to re-appear, and a whole mess of other stuff. I was able to get what you want using just requests, re and bs4. re is included in the Python standard library and if you haven't installed requests, you can do it with pip as follows: pip install requests

from bs4 import BeautifulSoup
import re
import requests

base_url = 'http://www.localstore.co.uk'
url = 'http://www.localstore.co.uk/stores/75061/dfs/'
res = requests.get(url)
soup = BeautifulSoup(res.text)

shops = []

links = soup.find_all('a', href=re.compile('.*/store/.*'))

for l in links:
    full_link = base_url + l['href']
    town = l['title'].split(',')[1].strip()
    res = requests.get(full_link)
    soup = BeautifulSoup(res.text)
    info = soup.find('span', attrs={"itemprop": "postalCode"})
    postalcode = info.text
    shops.append(dict(town_name=town, postal_code=postalcode))

print shops

Categories

python - Scraping data from href

python - Scraping data from href

Please log in or register to add a comment.

Please log in or register to reply this article.

1 Reply

Please log in or register to add a comment.

Just Browsing Browsing

Most popular tags