Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
1.0k views
in Technique[技术] by (71.8m points)

xml - How to keep whitespace before document element when parsing with Java?

In my application, I alter some part of XML files, which begin like this:

<?xml version="1.0" encoding="UTF-8"?>
<!-- $Id: version control yadda-yadda $ -->

<myElement>
...

Note the blank line before <myElement>. After loading, altering and saving, the result is far from pleasing:

<?xml version="1.0" encoding="UTF-8"?>
<!-- $Id: version control yadda-yadda $ --><myElement>
...

I found out that the whitespace (one newline) between the comment and the document node is not represented in the DOM at all. The following self-contained code reproduces the issue reliably:

String source =
    "<?xml version="1.0" encoding="UTF-16"?>
<!-- foo -->
<empty/>";
byte[] sourceBytes = source.getBytes("UTF-16");

DocumentBuilder builder =
    DocumentBuilderFactory.newInstance().newDocumentBuilder();
Document doc =
    builder.parse(new ByteInputStream(sourceBytes, sourceBytes.length));

DOMImplementationLS domImplementation =
    (DOMImplementationLS) doc.getImplementation();
LSSerializer lsSerializer = domImplementation.createLSSerializer();
System.out.println(lsSerializer.writeToString(doc));

// output: <?xml version="1.0" encoding="UTF-16"?>
<!-- foo --><empty/>

Does anyone have an idea how to avoid this? Essentially, I want the output to be the same as the input. (I know that the xml declaration will be regenerated because it's not part of the DOM, but that's not an issue here.)

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

I had the same problem. My solution was to write my own XML parser: DecentXML

Main feature: it can 100% preserve the original input, whitespace, entities, everything. It won't bother you with the details, but if your code needs to generate XML like this:

 <element
     attr="some complex value"
     />

then you can.


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...