[jdom-interest] What does the encoding really mean?

Alex Rosen arosen at silverstream.com
Tue Nov 27 17:00:58 PST 2001


... and that's why "encoding" is not a property of the JDOM Document, but of
the XMLOutputter that serializes it. When it's an in-memory tree, an XML
document doesn't really have any encoding.

You say that the xml data still says encoding="UTF-8". That's only because
XMLOutputter's default encoding is UTF-8, when you don't explicitly tell it
what encoding to use. So just tell XMLOutputter to use UCS-2 instead.

The JavaDoc for outputString() does say: "Warning: a String is Unicode,
which may not match the outputter's specified encoding." But maybe we should
change it, since it will do the wrong thing by default - i.e. simply
creating a new XMLOutputter and calling outputString() will give you the
incorrect result (encoding="UTF-8"). Seems like we're leading users down the
garden path here.

Alex

> -----Original Message-----
> From: jdom-interest-admin at jdom.org
> [mailto:jdom-interest-admin at jdom.org]On Behalf Of Jason Hunter
> Sent: Tuesday, November 27, 2001 7:16 PM
> To: Fred Clewis
> Cc: jdom-interest at jdom.org
> Subject: Re: [jdom-interest] What does the encoding really mean?
>
>
> This is the problem with XML putting the encoding information
> within the
> text format itself.  If you change the encoding of the string
> representation, you should change the encoding in the decl.  It's not
> pretty, but that's how XML was designed.  :-)
>
> -jh-
>
> Fred Clewis wrote:
> >
> > I'm using JDOM beta7 and xerces 2 beta3.  I have a question
> about the XML
> > decl encoding attribute and when it should be altered.
> >
> > Suppose you have a UTF-8 (with multibyte encodings) XML
> file and parse it
> > in to build a document and then output it to a unicode
> string in Java that
> > perhaps you use MQSeries to send somewhere.   In the
> MQSeries transport it
> > is described as CCSID 1200, unicode, and it is stored as
> twobyte unicode.
> > The xml data still says encoding="UTF-8".  Well, at that
> moment in memory,
> > that is untrue.   Is that OK?  Does the original encoding from file,
> > "UTF-8", need to be preserved like this for some subsequent
> purpose?   Does
> > it need to be changed to "UCS-2"?
> >
> > thanks for any ideas,
> >
> > _______________________________________________
> > To control your jdom-interest membership:
> >
http://lists.denveronline.net/mailman/options/jdom-interest/youraddr@yourhos
t.com
_______________________________________________
To control your jdom-interest membership:
http://lists.denveronline.net/mailman/options/jdom-interest/youraddr@yourhos
t.com




More information about the jdom-interest mailing list