python - Extract text and nodes from <p> using lxml in the same array index

Question

Welcome To Ask or Share your Answers For Others

python - Extract text and nodes from <p> using lxml in the same array index

posted Oct 7, 2021 in Technique[技术] by 深蓝 (71.8m points)

python - Extract text and nodes from <p> using lxml in the same array index

Hello i need get all text and other things inside a pharagrap something like this:

<div>
<p>
Whatever you want type <strong>here is great</strong>
</p>
<p>
Whatever you want type <strong>here is great</strong>
</p>
</div>

I am using this to get all text and strong text from the pharagraps but the problem is that using this way the text and strong text is processed by split, then i get an array like this ['Whatever you want type','here is great'] and i need get the nodes in the same array index, something like this ['Whatever you want type here is great']

content = html.xpath('.//p/text() | .//p/strong/text()')

I found a way to extrac the text inside them:

.text_content(): Returns the text content of the element, including the text content of its children, with no markup.

https://lxml.de/lxmlhtml.html

question from:https://stackoverflow.com/questions/65830421/extract-text-and-nodes-from-p-using-lxml-in-the-same-array-index

与恶龙缠斗过久,自身亦成为恶龙；凝视深渊过久,深渊将回以凝视…

1 Reply

深蓝 · Answer 1 · 2021-10-06T19:36:47+0000

You could use BeautifulSoup for this.

from bs4 import BeautifulSoup

html_string = """<p>
 Whatever you want type <strong>here is great</strong>
</p>
    """

soup = BeautifulSoup(html_string, 'html.parser')
mytext = [soup.find('p').get_text().strip()]
#['Whatever you want type here is great']

Categories

python - Extract text and nodes from <p> using lxml in the same array index

python - Extract text and nodes from <p> using lxml in the same array index

Please log in or register to add a comment.

Please log in or register to reply this article.

1 Reply

Please log in or register to add a comment.

Just Browsing Browsing

Most popular tags