First set up a test document and open up the parser with BeautifulSoup:
>>> from BeautifulSoup import BeautifulSoup
>>> doc = '<html><body><div><a href="something">yep</a></div><div><a href="http://www.nhl.com/ice/boxscore.htm?id=3">somelink</a></div><a href="http://www.nhl.com/ice/boxscore.htm?id=7">another</a></body></html>'
>>> soup = BeautifulSoup(doc)
>>> print soup.prettify()
<html>
<body>
<div>
<a href="something">
yep
</a>
</div>
<div>
<a href="http://www.nhl.com/ice/boxscore.htm?id=3">
somelink
</a>
</div>
<a href="http://www.nhl.com/ice/boxscore.htm?id=7">
another
</a>
</body>
</html>
Next, we can search for all <a>
tags with an href
attribute starting with http://www.nhl.com/ice/boxscore.htm?id=
. You can use a regular expression for it:
>>> import re
>>> soup.findAll('a', href=re.compile('^http://www.nhl.com/ice/boxscore.htm?id='))
[<a href="http://www.nhl.com/ice/boxscore.htm?id=3">somelink</a>, <a href="http://www.nhl.com/ice/boxscore.htm?id=7">another</a>]
与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…