I'm new to web scraping so I am not totally sure what to do here. But I am trying to extract the images from the site in this URL:
Here are the loops that got the closest to working:
For loop with parsing function
import requests
import os as os
from tqdm import tqdm
from bs4 import BeautifulSoup as bs
from urllib.parse import urljoin, urlparse
url = "https://www.legacysurvey.org/viewer/data-for-radec/?ra=55.0502&dec=-18.5790&layer=ls-dr8&ralo=55.0337&rahi=55.0655&declo=-18.5892&dechi=-18.5714"
def is_valid(url):
"""
Checks whether `url` is a valid URL.
"""
parsed = urlparse(url)
return bool(parsed.netloc) and bool(parsed.scheme)
def get_all_images(url):
"""
Returns all image URLs on a single `url`
"""
soup = bs(requests.get(url).content, "html.parser")
urls = []
for img in tqdm(soup.find_all("img"), "Extracting images"):
img_url = img.attrs.get("src")
if not img_url:
# if img does not contain src attribute, just skip
continue
os.getcwd()
While loop - image scraping
import requests
from bs4 import BeautifulSoup
# link to first page - without `page=`
url = 'https://www.legacysurvey.org/viewer/data-for-radec/?ra=55.0502&dec=-18.5799&layer=ls-dr8&ralo=55.0337&rahi=55.0655&declo=-18.5892&dechi=-18.5714'
# only for information, not used in url
page = 0
while True:
print('---', page, '---')
r = requests.get(url)
soup = BeautifulSoup(r.content, "html.parser")
# String substitution for HTML
for link in soup.find_all("img"):
print("<img href='>%s'>%s</img>" % (link.get("href"), link.text))
# Fetch and print general data from title class
general_data = soup.find_all('div', {'class' : 'title'})
for item in general_data:
print(item.contents[0].text)
print(item.contents[1].text.replace('.',''))
print(item.contents[2].text)
# link to next page
next_page = soup.find('a', {'class': 'next'})
if next_page:
url = next_page.get('href')
page += 1
else:
break # exit `while True`
I tried to gear both of these towards downloading the image links that output but I haven't been able to get outputs for anything I've tried. Any help is greatly appreciated!
question from:
https://stackoverflow.com/questions/66052559/im-trying-to-image-scrape-this-website-but-it-seems-that-the-site-im-scraping 与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…