You can use BeautifulSoup
to extract src
attribute of an html img
tag. In my example, the htmlText
contains the img
tag itself but this can be used for a URL too along with urllib2
.
For URLs
from BeautifulSoup import BeautifulSoup as BSHTML
import urllib2
page = urllib2.urlopen('http://www.youtube.com/')
soup = BSHTML(page)
images = soup.findAll('img')
for image in images:
#print image source
print image['src']
#print alternate text
print image['alt']
For Texts with img tag
from BeautifulSoup import BeautifulSoup as BSHTML
htmlText = """<img src="https://src1.com/" <img src="https://src2.com/" /> """
soup = BSHTML(htmlText)
images = soup.findAll('img')
for image in images:
print image['src']
与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…