I want to parse an xml file, then process the result tree by removing selected elements. My problem is that removing an element disrupts the loop that iterates over the elements.
Consider the following xml data:
<results>
<group>
<a />
<b />
<c />
</group>
</results>
and the code:
import xml.etree.ElementTree as ET
def showGroup(group,s):
print(s + ' len=' + str(len(group)))
print('<group>' )
for e in group:
print(' <' + e.tag + '>')
print('</group>
')
def processGroup(group):
for e in group:
if e.tag != 'a':
group.remove(e)
showGroup(group,'removed <' + e.tag + '>')
tree = ET.parse('x.xml')
root = tree.getroot()
for group in root:
processGroup(group)
I expected the for loop to process elements <a>
, <b>
, and <c>
in order. In particular:
- processing
<a>
should not remove any element
- processing
<b>
should remove <b>
- processing
<c>
should remove <c>
I expected the resulting tree to have a single element inside <group>
(the <a>
element), and that len(group) would return 1.
Instead, after processing <b>
, the for loop decides the end test has been met, and it does not process element <c>
. If it did, <c>
would be removed. Instead, I am left with a tree with elements <a>
and <c>
, and len(group) returns 2.
What do I need to do to process all three elements while removing selected elements? PS: any comments on style or better ways to do something are welcome.
Update: an ugly hack "fixes" the problem at the cost of some efficiency, if there is no code after removing the element. But in my real program, there is a lot of code after the pruning loop.
for e in group:
if e.tag != 'a':
group.remove(e)
showGroup(group,'removed <' + e.tag + '>')
processGroup(group)
I assume that if the for loop is disrupted, then starting again with the group at the beginning might solve the problem. Recursion is a tidy way of doing that - at the expense of reprocessing all elements that have already been checked but not removed.
I am not satisfied with this solution.
See Question&Answers more detail:
os 与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…