[jdom-interest] Need to optionally cancel automatic escaping
Bradley S. Huffman
hip at cs.okstate.edu
Fri Jul 11 14:23:33 PDT 2003
Perfect timing. A while back James Clark posted on the xml-dev mailing list.
If your infoset contains a carriage return, you have to output
it as a numeric character reference, otherwise line-end
normalization will turn it into a line-feed. Similarly, if
attribute values in the infoset contain line-feeds or tabs, they
need to be output as numeric character references, otherwise
attribute value normalization will turn them into spaces...When
I'm creating XML, some parts of what I am creating may well have
come from parsing an XML document. That means if there's any
XML infoset that my program cannot serialize correctly, it's
potentially a bug.
To which Elliotte Rusty Harold asked on his XOM mail-list (XOM's Serializer
and JDOM's XMLOutputter are similar so issues affecting one usually affect
the other).
I don't think the XOM serializer bothers to escape such carriage
returns, line feeds, tabs and the like where Clark suggests it
should. Should it? Or should this at least be an option in the
Serializer? And if it is an option, should it be the default option?
Thoughts?
Which lead to a two day thread about what, if anything, should be done about
carriage returns, line feeds, and tabs in attribute values and text content.
To which John Cowan came up with the following algorithm.
In that case, the default mode should:
1) Escape all \r characters;
2) Escape \t and \n characters in attribute values;
3) Output \n characters in character content as the line terminator;
4) Escape all non-encodable characters;
5) Encode everything else.
Doing anything else will not preserve the infoset through a round trip.
#1-#3 would be fairly easy to do in XMLOutputer since we already escape & and
>. #4 and #5 I think are already handled by the default escape strategy, but
I haven't looked deep enough to give a definitive answer. This would provide
for roundtripping by default in the two cases of
text -> SAXBuilder -> JDOM tree -> XMLOutputter -> text
JDOM tree -> XMLOutputter -> text -> SAXBuilder -> JDOM tree
which currently JDOM doesn't do.
Thoughts?
Brad
More information about the jdom-interest
mailing list