[jdom-interest] Re: Getting original Encoding and changing the
d efau lt UTF-8
Jason Hunter
jhunter at xquery.com
Fri Sep 10 00:53:43 PDT 2004
Young Matthew wrote:
> hej,
>
> Regarding the default encoding I more thinking on the front end and not with
> printing. In other words before parsing a document it would be cool if I could
> shift the encoding to someother than UTF-8 to handle svenska characters.
XML files generally have their encoding listed in the declaration if
they're not UTF-8. So the parser automatically can determine the proper
encoding to use. Getting the data in correctly isn't an issue; the
issue arises if you want to encode the document the same way on output
instead of using the universal UTF-8 encoding. SAX doesn't report what
the original encoding was, just returns the already-decoded characters.
Another builder, like an XNI builder, could report the encoding. The
Document class doesn't currently have an encoding property but we could
add one if we had a parser that reported it. That is, assuming it's a
document-level notion. The story's less clear when pulling together
elements from multiple documents. If the original Document node was
Latin-1 but you included an Element from a Shift_JIS document, you can't
reliably assume Latin-1 for the new document.
-jh-
More information about the jdom-interest
mailing list