Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
335 views
in Technique[技术] by (71.8m points)

python - Search and Replace in HTML with BeautifulSoup

I want to use BeautfulSoup to search and replace <a> with <a><br>. I know how to open with urllib2 and then parse to extract all the <a> tags. What I want to do is search and replace the closing tag with the closing tag plus the break. Any help, much appreciated.

EDIT

I would assume it would be something similar to:

soup.findAll('a').

In the documentation, there is a:

find(text="ahh").replaceWith('Hooray')

So I would assume it would be along the lines of:

soup.findAll(tag = '</a>').replaceWith(tag = '</a><br>')

But that doesn't work and the python help() doesn't give much

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

This will insert a <br> tag after the end of each <a>...</a> element:

from BeautifulSoup import BeautifulSoup, Tag

# ....

soup = BeautifulSoup(data)
for a in soup.findAll('a'):
    a.parent.insert(a.parent.index(a)+1, Tag(soup, 'br'))

You can't use soup.findAll(tag = '</a>') because BeautifulSoup doesn't operate on the end tags separately - they are considered part of the same element.


If you wanted to put the <a> elements inside a <p> element as you ask in a comment, you can use this:

for a in soup.findAll('a'):
    p = Tag(soup, 'p') #create a P element
    a.replaceWith(p)   #Put it where the A element is
    p.insert(0, a)     #put the A element inside the P (between <p> and </p>)

Again, you don't create the <p> and </p> separately because they are part of the same thing.


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...