I think that strip_tags
and strip_elements
are what you want in each case. For example, this script:
from lxml import etree
text = "<x>hello, <z>keep me</z> and <y>ignore me</y>, and here's some <y>more</y> text</x>"
tree = etree.fromstring(text)
print etree.tostring(tree, pretty_print=True)
# Remove the <z> tags, but keep their contents:
etree.strip_tags(tree, 'z')
print '-' * 72
print etree.tostring(tree, pretty_print=True)
# Remove all the <y> tags including their contents:
etree.strip_elements(tree, 'y', with_tail=False)
print '-' * 72
print etree.tostring(tree, pretty_print=True)
... produces the following output:
<x>hello, <z>keep me</z> and <y>ignore me</y>, and
here's some <y>more</y> text</x>
------------------------------------------------------------------------
<x>hello, keep me and <y>ignore me</y>, and
here's some <y>more</y> text</x>
------------------------------------------------------------------------
<x>hello, keep me and , and
here's some text</x>
与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…