[jdom-interest] jdom 1.0 XMLOutputter

Bradley S. Huffman hip at cs.okstate.edu
Tue Sep 21 09:32:14 PDT 2004


Bill Leng writes:

> This does not explain the reason. With a raw Format, XMLOutputter is 
> supposed to return the exact text content. No normalization is involed 
> here. If that change is for the sake of normalization, I think, it 
> should be included in the normalization section.

It is returning the exact text content, just the \r is represented as a
char. reference instead of the actual char. Which is ok, any characters
in PCDATA can be replaced by their equilivant char. reference and it doesn't
change the "meaning" of the document.  Eventhough char. for char. they are not
the same.

The problem is section 2.11 "End-of-Line Handling" in XML 1.0 says all
single \r that are not followed by a \n are replaced by a single \n.
So the document (here I'm using \r represents the actual cariage return)

    <root>\r</root>

when parsed becomes the jdom tree

    Element root = new Element("root");
    root.setText("\n");

and not

    Element root = new Element("root");
    root.setText("\r");

To get the later the original document would have to have been

    <root>&#0A;</root>

So if jdom didn't escape the \r on output, then when the document is re-parsed
the new jdom tree would be different than what produced it.  The old jdom tree
and the new jdom tree would represent two different documents.

Brad


More information about the jdom-interest mailing list