[jdom-interest] Need to optionally cancel automatic escaping
Bradley S. Huffman
hip at cs.okstate.edu
Thu Jul 17 07:30:03 PDT 2003
If
Text text = new Text("The rain in Spain,\nfalls mainly on the plain");
and the line seperator is "\r\n", then XMLOutputter will convert "\n"
to "\r\n" as in
The rain in Spain,\r\nfalls mainly on the plain
Since JDOM already scans for < and & this will cost almost nothing and maybe
help applications that rely on a specific line terminator.
However, if the user does
Text text = new Text("The rain in Spain,\r\nfalls mainly on the plain");
and the line seperator is "\r\n", then XMLOutputter will output
The rain in Spain,\r\r\nfalls mainly on the plain
so on the return trip through a XML paser the original line
The rain in Spain,\r\nfalls mainly on the plain
is built.
Brad
"Alex Rosen" writes:
> Sounds fine to me.
>
> I didn't understand #3 though.
>
> Alex
>
>
> >>> "Bradley S. Huffman" <hip at cs.okstate.edu> 7/11/2003 5:23:33 PM >>>
> Perfect timing. A while back James Clark posted on the xml-dev mailing
> list.
>
> If your infoset contains a carriage return, you have to output
> it as a numeric character reference, otherwise line-end
> normalization will turn it into a line-feed. Similarly, if
> attribute values in the infoset contain line-feeds or tabs, they
> need to be output as numeric character references, otherwise
> attribute value normalization will turn them into spaces...When
> I'm creating XML, some parts of what I am creating may well have
> come from parsing an XML document. That means if there's any
> XML infoset that my program cannot serialize correctly, it's
> potentially a bug.
>
> To which Elliotte Rusty Harold asked on his XOM mail-list (XOM's
> Serializer
> and JDOM's XMLOutputter are similar so issues affecting one usually
> affect
> the other).
>
> I don't think the XOM serializer bothers to escape such carriage
> returns, line feeds, tabs and the like where Clark suggests it
> should. Should it? Or should this at least be an option in the
> Serializer? And if it is an option, should it be the default
> option?
> Thoughts?
>
> Which lead to a two day thread about what, if anything, should be done
> about
> carriage returns, line feeds, and tabs in attribute values and text
> content.
>
> To which John Cowan came up with the following algorithm.
>
> In that case, the default mode should:
>
> 1) Escape all \r characters;
> 2) Escape \t and \n characters in attribute values;
> 3) Output \n characters in character content as the line
> terminator;
> 4) Escape all non-encodable characters;
> 5) Encode everything else.
>
> Doing anything else will not preserve the infoset through a round
> trip.
>
> #1-#3 would be fairly easy to do in XMLOutputer since we already escape
> & and
> >. #4 and #5 I think are already handled by the default escape
> strategy, but
> I haven't looked deep enough to give a definitive answer. This would
> provide
> for roundtripping by default in the two cases of
>
> text -> SAXBuilder -> JDOM tree -> XMLOutputter -> text
> JDOM tree -> XMLOutputter -> text -> SAXBuilder -> JDOM tree
>
> which currently JDOM doesn't do.
>
> Thoughts?
>
> Brad
> _______________________________________________
> To control your jdom-interest membership:
> http://lists.denveronline.net/mailman/options/jdom-interest/youraddr@yourhost
> .com
> _______________________________________________
> To control your jdom-interest membership:
> http://lists.denveronline.net/mailman/options/jdom-interest/youraddr@yourhost
> .com
More information about the jdom-interest
mailing list