[jdom-interest] non-ascii characters in xml document

Jason Hunter jhunter at servlets.com
Thu Nov 29 16:19:25 PST 2001


UTF-8 is a charset that contains all Unicode characters, so on output it
should be properly encoded.  The char will be encoded in 2 bytes though,
so readers that don't know the byte stream is in UTF-8 format will be
confused.

-jh-

Dave Neuendorf wrote:
> 
> The problem originated when I was using the default UTF-8. The quote
> characters were not properly represented in the xml file, which is why I
> decided to try ISO-8859-1 encoding (which didn't help). The JTextArea into
> which the text had been pasted was properly displaying the quote characters
> until the text was read back in from xml, at which point the bogus characters
> from the xml were displayed.
> 
> Jason Hunter wrote:
> 
> > > I'm working on an application, in which the user is allowed to paste
> > > text into a JTextArea. The text can include "curly" single and double
> > > quotes, and presumably other non-ascii characters. When the text is
> > > written to an xml file from a jdom Document, each such character is
> > > replaced in the file with some other non-ascii character. I tried
> > > changing the encoding from the default UTF-8 to ISO-8859-1, but the
> > > result is that now the replacement character is always a question mark.
> >
> > If you're using UTF-8, all Unicode characters can and will be
> > represented and you'll have them nicely encoded in UTF-8 format.  If it
> > shows up as a ? for you, it's probably because your viewer isn't
> > recognizing that the characters are encoded as UTF-8, or it doesn't have
> > the glyph necessary to display the chars.
> >
> > -jh-
> 
> _______________________________________________
> To control your jdom-interest membership:
> http://lists.denveronline.net/mailman/options/jdom-interest/youraddr@yourhost.com



More information about the jdom-interest mailing list