Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
617 views
in Technique[技术] by (71.8m points)

xml - Preserving attribute whitespace

Disclaimer: the following is a sin against XML. That's why I'm trying to change it with XSLT :)

My XML currently looks like this:

<root>
    <object name="blarg" property1="shablarg" property2="werg".../>
    <object name="yetanotherobject" .../>
</root>

Yes, I'm putting all the textual data in attributes. I'm hoping XSLT can save me; I want to move toward something like this:

<root>
    <object>
        <name>blarg</name>
        <property1>shablarg</name>
        ...
    </object>
    <object>
        ...
    </object>
</root>

I've actually got all of this working so far, with the exception that my sins against XML have been more... exceptional. Some of the tags look like this:

<object description = "This is the first line

This is the third line.  That second line full of whitespace is meaningful"/>

I'm using xsltproc under linux, but it doesn't seem to have any options to preserve whitespace. I've attempted to use xsl:preserve-space and xml:space="preserve" to no avail. Every option I've found seems to apply to keeping whitespace within the elements themselves, but not the attributes. Every single time, the above gets changed to:

This is the first line This is the third line.  That second line full of whitespace is meaningful

So the question is, can I preserve the attribute whitespace?

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

This is actually a raw XML parsing problem, not something XSLT can help you with. An XML parse must convert the newlines in that attribute value to spaces, as per ‘3.3.3 Attribute-Value Normalization’ in the XML standard. So anything currently reading your description attributes and keeping the newlines in is doing it wrong.

You may be able to recover the newlines by pre-processing the XML to escape the newlines to & #10; character references, as long as you haven't also got newlines where charrefs are disallowed, such as inside tag bodies. Charrefs should survive as control characters through to the attribute value, where you can then turn them into text nodes.


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...