python - LXML - Sorting Tag Order

Question

Welcome To Ask or Share your Answers For Others

python - LXML - Sorting Tag Order

posted Oct 24, 2021 in Technique[技术] by 深蓝 (71.8m points)

python - LXML - Sorting Tag Order

I have a legacy file format which I'm converting into XML for processing. The structure can be summarised as:

<A>
    <A01>X</A01>
    <A02>Y</A02>
    <A03>Z</A03>
</A>

The numerical part of the tags can go from 01 to 99 and there may be gaps. As part of the processing certain records may have additional tags added. After the processing is completed I'm converting the file back to the legacy format by iterwalking the tree. The files are reasonably large (~150,000 nodes).

A problem with this is that some software which uses the legacy format assumes that the tags (or rather fields by the time it's converted) will be in alpha-numeric order but by default new tags will be added to the end of the branch which then causes them to come out of the iterator in the wrong order.

I can use xpath to find the preceeding sibling based on tag name each time I come to add a new tag but my question is whether there's a simpler way to sort the tree at once just prior to export?

Edit:

I think I've over summarised the structure.

A record can contain several levels as described above to give something like:

<X>
    <X01>1</X01>
    <X02>2</X02>
    <X03>3</X03>
    <A>
        <A01>X</A01>
        <A02>Y</A02>
        <A03>Z</A03>
    </A>
    <B>
        <B01>Z</B02>
        <B02>X</B02>
        <B03>C</B03>
    </B>
</X>

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙；凝视深渊过久,深渊将回以凝视…

1 Reply

深蓝 · Answer 1 · 2021-10-23T19:32:19+0000

It's possible to write a helper function to insert a new element in the correct place, but without knowing more about the structure it's difficult to make it generic.

Here's a short example of sorting child elements across the whole document:

from lxml import etree

data = """<X>
    <X03>3</X03>
    <X02>2</X02>
    <A>
        <A02>Y</A02>
        <A01>X</A01>
        <A03>Z</A03>
    </A>
    <X01>1</X01>
    <B>
        <B01>Z</B01>
        <B02>X</B02>
        <B03>C</B03>
    </B>
</X>"""

doc = etree.XML(data,etree.XMLParser(remove_blank_text=True))

for parent in doc.xpath('//*[./*]'): # Search for parent elements
  parent[:] = sorted(parent,key=lambda x: x.tag)

print etree.tostring(doc,pretty_print=True)

Yielding:

<X>
  <A>
    <A01>X</A01>
    <A02>Y</A02>
    <A03>Z</A03>
  </A>
  <B>
    <B01>Z</B01>
    <B02>X</B02>
    <B03>C</B03>
  </B>
  <X01>1</X01>
  <X02>2</X02>
  <X03>3</X03>
</X>

Categories

python - LXML - Sorting Tag Order

python - LXML - Sorting Tag Order

Please log in or register to add a comment.

Please log in or register to reply this article.

1 Reply

Please log in or register to add a comment.

Just Browsing Browsing

Most popular tags