[jdom-interest] special characters problem

Alex Rosen arosen at novell.com
Sun Nov 2 12:12:40 PST 2003


You shouldn't need to set the file.encoding option. You just need to
make sure that you read and write the file using the encoding that the
file is actually in, rather than the platform's default encoding
(whatever that may be). The easiest way to do this is to let JDOM do it
for you - give it an InputStream (instead of a Reader or a String) when
parsing, and an OutputStream (instead of a Writer) when streaming. If
you can't do this, then you'll have to manually make sure that you're
using the right encoding when creating a Reader or Writer.

Alex 

>>> <manish.sharan at divlogic.com> 10/30/2003 3:52:44 PM >>>
Hi Pramodh
This is not a JDOM issue.

But I had the same problem -- my app worked fine with a french xhtml
file on 
Windows but on Linux, it turned all special characters to '?' 

I was dealing with ISO-8859-1  encoded files ( french xhtml). So my app
read 
the file and simply saved it onto a file on my disk, the resulting file
looked 
ok on Windows. Howevere,when I ran this test on Linux , the result has
a lot 
of '?' . 

So before you pass the string to JDOM, you need to make sure that you
read it 
correctly.

I my case, I  fixed the problem by explicitly defing a caharater set
with 
InputStream : InputStreamReader( inputStream,"ISO-8859-1" )

In Java options, use -Dfile.encoding=ISO_8859-1 

Please note that I dont have my development machine before me, so my
code and 
sample may not be exactly correct. 


Regards
-manish

Quoting Pramodh Peddi <peddip at contextmedia.com>:

> Hi Manish,
> Thanks for responding! Did you have exactly the same problem? i.e,
Working
> fine on windows but not on Unix?
> 
> Can you tell me exactly what should be done in Java to do this. I am
using
> Java1.4.1. Should i mention the file.encoding in JAVA_OPTS? If so,
what
> should I mention. And is this what all I should do to make it work?
Is the
> way I build the document ok?
> 
> Sorry for asking too many questions:-)!
> 
> You are right, I am using InpustStreams to read external data.
> 
> Thanks,
> pramodh.
> ----- Original Message -----
> From: <manish.sharan at divlogic.com>
> To: "Pramodh Peddi" <peddip at contextmedia.com>
> Cc: <jdom-interest at jdom.org>
> Sent: Thursday, October 30, 2003 1:41 PM
> Subject: Re: [jdom-interest] special characters problem
> 
> 
> > I recently solved this kind of problem by enforcing charset
encoding all
> theb
> > way from JVM "file.encoding" option to using the charset encoding
name
> whenever
> > using any InputStreams to read external data .
> >
> > The windows and Unix/Linux behaviorial difference with respect to
sepcial
> > characters is due to  the differing default charset encoding.
> >
> > Hope this helps.
> > -manish
> >
> >
> > Quoting Pramodh Peddi <peddip at contextmedia.com>:
> >
> > > Hi,
> > > I am using JDOM Beta 8 version for XML parsing. we are happening
to have
> lot
> > > of special characters (like registered marks, copyright symbols,
trade
> > > marks, and other many funky chars). After building the document,
the
> parser
> > > is converting the characters into "?" characters. This is what I
am
> doing to
> > > build the document:
> > >
> > >
>
****************************************************************************
> > > ************
> > > // Method to return a Document object given an xml String
> > >
> > > public Document getDocumentfromString(String xmlString)
> > >
> > > throws Exception {
> > >
> > > Document schemaDoc = null;
> > >
> > > SAXBuilder builder = new SAXBuilder(false);
> > >
> > > String resultingXML = null;
> > >
> > > if(!StringUtils.isEmpty(xmlString)){
> > >
> > >
> > > try{
> > >
> > > schemaDoc =
> > >
> > > builder.build(
> > >
> > > new StringReader(xmlString));
> > >
> > > }catch(JDOMException jdomex){
> > >
> > > throw new Exception("Document could not be built: " + jdomex);
> > >
> > > }
> > >
> > > }else{
> > >
> > > log.info("xmlString is null");
> > >
> > > }
> > >
> > > return schemaDoc;
> > >
> > > }
> > >
> > >
>
****************************************************************************
> > > ****
> > >
> > > It is working fine on Windows (2000) machine, but spitting "?"
symbols
> in
> > > place of special chars on UNIX machines.
> > >
> > > I used to use schemaDoc = builder.build(new
> > > java.io.ByteArrayInputStream(xmlString.getBytes()));
> > >
> > > to build the document in place of StringReader, but it was
changing the
> > > encoding and throwing exception saying the special
> > >
> > > chars don't belong to UTF-8. So, i changed it to StringReader -
which
> > > doesn't throw exceptions but, converts the special chars to "?".
> > >
> > > I also tried using builder.build(new
> > > java.io.ByteArrayInputStream(xmlString.getBytes(
> > >
> > > "UTF-8"
> > >
> > > )));
> > >
> > > . But that din't help too.
> > >
> > >
> > >
> > > Again, "?" are occuring only in UNIX machines, but works fine on
Windows
> > > machines.
> > >
> > >
> > >
> > > I would appreciate any help.
> > >
> > >
> > >
> > > Thank you,
> > >
> > >
> > >
> > > pramodh.
> > >
> > >
> > >
> > >
> > >
> > > _______________________________________________
> > > To control your jdom-interest membership:
> > > http://lists.denveronline.net/mailman/options/jdom- 
> > interest/youraddr at yourhost.com 
> > >
> >
> >
> >
> >
> 
> 



_______________________________________________
To control your jdom-interest membership:
http://lists.denveronline.net/mailman/options/jdom-interest/youraddr@yourhost.com



More information about the jdom-interest mailing list