[jdom-interest] Yet another TODO (entity escapes)
Jason Hunter
jhunter at collab.net
Tue Jun 19 15:19:05 PDT 2001
guru at stinky.com wrote:
>
> * Figure out how to deal with XMLOutputter writing of special characters like
>  . Should it char escape only chars unprintable in the current
> character set? Or should there be a fancy API for selecting what's escaped?
> http://lists.denveronline.net/lists/jdom-interest/2001-February/004521.html
>
> XMLOutputter is faithfully outputting what it was given; if it's a
> high Unicode value inside a Java String, then Java takes care of
> converting it to the right bytes for the stream's output encoding.
Right, but raw bytes are generally not what people want. People want
round-tripping.
We also have to address the situation where you setEncoding("latin1")
and output char \ucafe. Now, we could add logic to escape the chars
which aren't in the specified character set, but the price would be very
high to do that checking and the encoding to character set mapping isn't
easily available. Plus we'd still be stuck for such chars in element
names where it's just an error we can't help with.
So currently we punt, but the todo item is to debate the punting. Guess
we're attacking it now. :-)
> OTOH, if someone wants to make sure that all unicode characters turn
> into their corresponding escapes on the way out, or vice versa, then
> that's a good use for a filter stream.
Except your filter stream wouldn't know that a #160 char in an element
name can't be escaped. You need to know the structure to do the right
thing.
I think the right action is to allow some way for the user to plug in
logic as needed, either with a subclass or a pluggable API.
-jh-
More information about the jdom-interest
mailing list