python - I'm trying to image scrape this website but it seems that the site I'm scraping doesn't respond by actually outputting images

Question

Welcome To Ask or Share your Answers For Others

python - I'm trying to image scrape this website but it seems that the site I'm scraping doesn't respond by actually outputting images

posted Oct 6, 2021 in Technique[技术] by 深蓝 (71.8m points)

python - I'm trying to image scrape this website but it seems that the site I'm scraping doesn't respond by actually outputting images

I'm new to web scraping so I am not totally sure what to do here. But I am trying to extract the images from the site in this URL:

Here are the loops that got the closest to working:

For loop with parsing function

import requests
import os as os
from tqdm import tqdm
from bs4 import BeautifulSoup as bs
from urllib.parse import urljoin, urlparse

url = "https://www.legacysurvey.org/viewer/data-for-radec/?ra=55.0502&dec=-18.5790&layer=ls-dr8&ralo=55.0337&rahi=55.0655&declo=-18.5892&dechi=-18.5714"
def is_valid(url):
    """
    Checks whether `url` is a valid URL.
    """
    parsed = urlparse(url)
    return bool(parsed.netloc) and bool(parsed.scheme)

def get_all_images(url):
    """
    Returns all image URLs on a single `url`
    """
    soup = bs(requests.get(url).content, "html.parser")
urls = []
for img in tqdm(soup.find_all("img"), "Extracting images"):
    img_url = img.attrs.get("src")
    if not img_url:
        # if img does not contain src attribute, just skip
        continue
os.getcwd()

While loop - image scraping

import requests
from bs4 import BeautifulSoup

# link to first page - without `page=`
url = 'https://www.legacysurvey.org/viewer/data-for-radec/?ra=55.0502&dec=-18.5799&layer=ls-dr8&ralo=55.0337&rahi=55.0655&declo=-18.5892&dechi=-18.5714'

# only for information, not used in url
page = 0 

while True:

    print('---', page, '---')

    r = requests.get(url)

    soup = BeautifulSoup(r.content, "html.parser")

    # String substitution for HTML
    for link in soup.find_all("img"):
        print("<img href='>%s'>%s</img>" % (link.get("href"), link.text))

    # Fetch and print general data from title class
    general_data = soup.find_all('div', {'class' : 'title'})

    for item in general_data:
        print(item.contents[0].text)
        print(item.contents[1].text.replace('.',''))
        print(item.contents[2].text)

    # link to next page

    next_page = soup.find('a', {'class': 'next'})

    if next_page:
        url = next_page.get('href')
        page += 1
    else:
        break # exit `while True`

I tried to gear both of these towards downloading the image links that output but I haven't been able to get outputs for anything I've tried. Any help is greatly appreciated!

question from:https://stackoverflow.com/questions/66052559/im-trying-to-image-scrape-this-website-but-it-seems-that-the-site-im-scraping

与恶龙缠斗过久,自身亦成为恶龙；凝视深渊过久,深渊将回以凝视…

Categories

python - I'm trying to image scrape this website but it seems that the site I'm scraping doesn't respond by actually outputting images

python - I'm trying to image scrape this website but it seems that the site I'm scraping doesn't respond by actually outputting images

Please log in or register to add a comment.

Please log in or register to reply this article.

1 Reply

Please log in or register to add a comment.

Just Browsing Browsing

Most popular tags