[jdom-interest] special characters problem

manish.sharan at divlogic.com manish.sharan at divlogic.com
Thu Oct 30 12:52:44 PST 2003


Hi Pramodh
This is not a JDOM issue.

But I had the same problem -- my app worked fine with a french xhtml file on 
Windows but on Linux, it turned all special characters to '?' 

I was dealing with ISO-8859-1  encoded files ( french xhtml). So my app read 
the file and simply saved it onto a file on my disk, the resulting file looked 
ok on Windows. Howevere,when I ran this test on Linux , the result has a lot 
of '?' . 

So before you pass the string to JDOM, you need to make sure that you read it 
correctly.

I my case, I  fixed the problem by explicitly defing a caharater set with 
InputStream : InputStreamReader( inputStream,"ISO-8859-1" )

In Java options, use -Dfile.encoding=ISO_8859-1 

Please note that I dont have my development machine before me, so my code and 
sample may not be exactly correct. 


Regards
-manish

Quoting Pramodh Peddi <peddip at contextmedia.com>:

> Hi Manish,
> Thanks for responding! Did you have exactly the same problem? i.e, Working
> fine on windows but not on Unix?
> 
> Can you tell me exactly what should be done in Java to do this. I am using
> Java1.4.1. Should i mention the file.encoding in JAVA_OPTS? If so, what
> should I mention. And is this what all I should do to make it work? Is the
> way I build the document ok?
> 
> Sorry for asking too many questions:-)!
> 
> You are right, I am using InpustStreams to read external data.
> 
> Thanks,
> pramodh.
> ----- Original Message -----
> From: <manish.sharan at divlogic.com>
> To: "Pramodh Peddi" <peddip at contextmedia.com>
> Cc: <jdom-interest at jdom.org>
> Sent: Thursday, October 30, 2003 1:41 PM
> Subject: Re: [jdom-interest] special characters problem
> 
> 
> > I recently solved this kind of problem by enforcing charset encoding all
> theb
> > way from JVM "file.encoding" option to using the charset encoding name
> whenever
> > using any InputStreams to read external data .
> >
> > The windows and Unix/Linux behaviorial difference with respect to sepcial
> > characters is due to  the differing default charset encoding.
> >
> > Hope this helps.
> > -manish
> >
> >
> > Quoting Pramodh Peddi <peddip at contextmedia.com>:
> >
> > > Hi,
> > > I am using JDOM Beta 8 version for XML parsing. we are happening to have
> lot
> > > of special characters (like registered marks, copyright symbols, trade
> > > marks, and other many funky chars). After building the document, the
> parser
> > > is converting the characters into "?" characters. This is what I am
> doing to
> > > build the document:
> > >
> > >
> ****************************************************************************
> > > ************
> > > // Method to return a Document object given an xml String
> > >
> > > public Document getDocumentfromString(String xmlString)
> > >
> > > throws Exception {
> > >
> > > Document schemaDoc = null;
> > >
> > > SAXBuilder builder = new SAXBuilder(false);
> > >
> > > String resultingXML = null;
> > >
> > > if(!StringUtils.isEmpty(xmlString)){
> > >
> > >
> > > try{
> > >
> > > schemaDoc =
> > >
> > > builder.build(
> > >
> > > new StringReader(xmlString));
> > >
> > > }catch(JDOMException jdomex){
> > >
> > > throw new Exception("Document could not be built: " + jdomex);
> > >
> > > }
> > >
> > > }else{
> > >
> > > log.info("xmlString is null");
> > >
> > > }
> > >
> > > return schemaDoc;
> > >
> > > }
> > >
> > >
> ****************************************************************************
> > > ****
> > >
> > > It is working fine on Windows (2000) machine, but spitting "?" symbols
> in
> > > place of special chars on UNIX machines.
> > >
> > > I used to use schemaDoc = builder.build(new
> > > java.io.ByteArrayInputStream(xmlString.getBytes()));
> > >
> > > to build the document in place of StringReader, but it was changing the
> > > encoding and throwing exception saying the special
> > >
> > > chars don't belong to UTF-8. So, i changed it to StringReader - which
> > > doesn't throw exceptions but, converts the special chars to "?".
> > >
> > > I also tried using builder.build(new
> > > java.io.ByteArrayInputStream(xmlString.getBytes(
> > >
> > > "UTF-8"
> > >
> > > )));
> > >
> > > . But that din't help too.
> > >
> > >
> > >
> > > Again, "?" are occuring only in UNIX machines, but works fine on Windows
> > > machines.
> > >
> > >
> > >
> > > I would appreciate any help.
> > >
> > >
> > >
> > > Thank you,
> > >
> > >
> > >
> > > pramodh.
> > >
> > >
> > >
> > >
> > >
> > > _______________________________________________
> > > To control your jdom-interest membership:
> > > http://lists.denveronline.net/mailman/options/jdom-
> > interest/youraddr at yourhost.com
> > >
> >
> >
> >
> >
> 
> 






More information about the jdom-interest mailing list