[jdom-interest] Character encodings...

Mark Schmeets mschmeets at hotmail.com
Tue Oct 1 16:54:32 PDT 2002


Thanks for your reply, and the suggestions. I had two problems, one was not 
really understanding how to get the correct results from PrintWriter (in the 
output servlet), and more importantly, not paying attention to the character 
set of the database. The database was storing ISO-8859-1, stripping one 
byte, leaving me with the unfortunate x1C. Changing the database character 
set to UTF-8 was the first step, then after fixing the encoding on the 
OutputStreamWriter/PrintWriter, all's well.

Mark


>From: Jason Hunter <jhunter at servlets.com>
>To: Mark Schmeets <mschmeets at hotmail.com>
>CC: JDOM Interest <jdom-interest at jdom.org>
>Subject: Re: [jdom-interest] Character encodings...
>Date: Sun, 29 Sep 2002 20:44:30 -0700
>
>I don't have a good answer except you're right it's an encoding issue.
>I'd guess your content is in a multibyte encoding and your parser is
>treating the content as a single-byte or UTF-8 encoding.  Look at the
>XML file declarations and the character content and make sure the char's
>encoding is appropriate for the declaration at every step.
>
>Or stop using left double quotes.  :-)
>
>-jh-
>
>Mark Schmeets wrote:
> >
> > Hi All,
> > I know this must be a character encoding problem, but I am at wits end
> > trying to figure out where I am making the mistake.
> >
> > I have a Swing application which programatically converts CSV files
> > (produced on an NT client) creates a JDOM document, posts that document 
>to a
> > servlet which passes the JDOM to a builder class which in turn creates 
>sql
> > statements to insert the records into an Oracle database. So far, so 
>good,
> > apparently.
> >
> > Another part of the system contains an applet which posts a request to a
> > servlet (that queries the database, creates a JDOM document with the
> > resultset) and then displays the data.
> > Touches a lot of stuff here, I know. The problem character is the left
> > double quotation mark character. In the CSV file it shows up as 0x93 
>which
> > matches the Windows 1252 codepage map, also as U+201C.
> > My applet throws a SAXParseException for an illegal xml character : 
>&#X1c.
> > Ok, I see that is "half" of the unicode value for the character, but I 
>do
> > not understand why I am getting the error.
> > I have looked at the XML on the input side, no apparent problems there.
> >
> > On the output side the data comes from JDBC, and I am I am specifying 
>UTF-8
> > as the encoding for the XMLOutputter. The InputStreamReader that is 
>created
> > on the applet is also specified for UTF-8. So, it seems like the output 
>side
> > should be ok, but to me it looks like we are dropping part of the 
>unicode
> > value (the 20), and just passing the 1C.
> >
> > Any suggestions, as to what I am doing wrong?
> >
> > Thanks,
> > Mark
> >
> > _________________________________________________________________
> > Chat with friends online, try MSN Messenger: http://messenger.msn.com
> >
> > _______________________________________________
> > To control your jdom-interest membership:
> > 
>http://lists.denveronline.net/mailman/options/jdom-interest/youraddr@yourhost.com




_________________________________________________________________
Send and receive Hotmail on your mobile device: http://mobile.msn.com




More information about the jdom-interest mailing list