Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
395 views
in Technique[技术] by (71.8m points)

python - Find specific link w/ beautifulsoup

Hi I cannot figure out how to find links which begin with certain text for the life of me. findall('a') works fine, but it's way too much. I just want to make a list of all links that begin with http://www.nhl.com/ice/boxscore.htm?id=

Can anyone help me?

Thank you very much

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

First set up a test document and open up the parser with BeautifulSoup:

>>> from BeautifulSoup import BeautifulSoup
>>> doc = '<html><body><div><a href="something">yep</a></div><div><a href="http://www.nhl.com/ice/boxscore.htm?id=3">somelink</a></div><a href="http://www.nhl.com/ice/boxscore.htm?id=7">another</a></body></html>'
>>> soup = BeautifulSoup(doc)
>>> print soup.prettify()
<html>
 <body>
  <div>
   <a href="something">
    yep
   </a>
  </div>
  <div>
   <a href="http://www.nhl.com/ice/boxscore.htm?id=3">
    somelink
   </a>
  </div>
  <a href="http://www.nhl.com/ice/boxscore.htm?id=7">
   another
  </a>
 </body>
</html>

Next, we can search for all <a> tags with an href attribute starting with http://www.nhl.com/ice/boxscore.htm?id=. You can use a regular expression for it:

>>> import re
>>> soup.findAll('a', href=re.compile('^http://www.nhl.com/ice/boxscore.htm?id='))
[<a href="http://www.nhl.com/ice/boxscore.htm?id=3">somelink</a>, <a href="http://www.nhl.com/ice/boxscore.htm?id=7">another</a>]

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

1.4m articles

1.4m replys

5 comments

57.0k users

...