python - BeautifulSoup: Extract img alt data

Question

Welcome To Ask or Share your Answers For Others

python - BeautifulSoup: Extract img alt data

posted Oct 24, 2021 in Technique[技术] by 深蓝 (71.8m points)

python - BeautifulSoup: Extract img alt data

I have following image html and I am trying to parse information that is in alt. Currently I am able to successfully extract images.

html (What I currently parse

<img class="rslp-p" alt="Sony Cyber-shot DSC-W570 16.1 MP Digital Camera - Silver" src="http://i.ebayimg.com/00/$(KGrHqZ,!j!E5dyh0jTpBO(3yE7Wg!~~_26.JPG?set_id=89040003C1" itemprop="image" />

I construct the image name from what I parse:

Current Code

def main(url, output_folder="~/images"):
         """Download the images at url"""
         soup = bs(urlopen(url))
         parsed = list(urlparse.urlparse(url))
         count = 0
         for image in soup.findAll("img"):
             print image
             count += 1
             print count
             print "Image: %(src)s" % image
             image_url = urlparse.urljoin(url, image['src'])
             filename = image["src"].split("/")[-1].split("?")[0].replace("$",'').replace(".JPG",".jpg").replace("~~_26",str(count)).lstrip("(")
             parsed[2] = image["src"]
             outpath = os.path.join(output_folder, filename)
             urlretrieve(image_url, outpath)

What I would like to do is extract is

alt="Sony Cyber-shot DSC-W570 16.1 MP Digital Camera - Silver"

also I want to use alt data as the file name when I extract the image.

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙；凝视深渊过久,深渊将回以凝视…

1 Reply

深蓝 · Answer 1 · 2021-10-23T19:24:58+0000

Inside your for loop, you can obtain that by simply doing

image.get('alt', '')

This is explained in BeautifulSoup's documentation ("The attributes of Tags").

Categories

python - BeautifulSoup: Extract img alt data

python - BeautifulSoup: Extract img alt data

Please log in or register to add a comment.

Please log in or register to reply this article.

1 Reply

Please log in or register to add a comment.

Just Browsing Browsing

Most popular tags