python - How to pull out CSS attributes from inline styles with BeautifulSoup

Question

Welcome To Ask or Share your Answers For Others

python - How to pull out CSS attributes from inline styles with BeautifulSoup

posted Oct 24, 2021 in Technique[技术] by 深蓝 (71.8m points)

python - How to pull out CSS attributes from inline styles with BeautifulSoup

I have something like this:

<img style="background:url(/theRealImage.jpg) no-repate 0 0; height:90px; width:92px;") src="notTheRealImage.jpg"/>

I am using beautifulsoup to parse the html. Is there away to pull out the "url" in the "background" css attribute?

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙；凝视深渊过久,深渊将回以凝视…

1 Reply

深蓝 · Answer 1 · 2021-10-23T21:38:03+0000

You've got a couple options- quick and dirty or the Right Way. The quick and dirty way (which will break easily if the markup is changed) looks like

>>> from BeautifulSoup import BeautifulSoup
>>> import re
>>> soup = BeautifulSoup('<html><body><img style="background:url(/theRealImage.jpg) no-repate 0 0; height:90px; width:92px;") src="notTheRealImage.jpg"/></body></html>')
>>> style = soup.find('img')['style']
>>> urls = re.findall('url((.*?))', style)
>>> urls
[u'/theRealImage.jpg']

Obviously, you'll have to play with that to get it to work with multiple img tags.

The Right Way, since I'd feel awful suggesting someone use regex on a CSS string :), uses a CSS parser. cssutils, a library I just found on Google and available on PyPi, looks like it might do the job.

Categories

python - How to pull out CSS attributes from inline styles with BeautifulSoup

python - How to pull out CSS attributes from inline styles with BeautifulSoup

Please log in or register to add a comment.

Please log in or register to reply this article.

1 Reply

Please log in or register to add a comment.

Just Browsing Browsing

Most popular tags