[jdom-interest] encoding problem with xmloutputter

Jason Hunter jhunter at acm.org
Sun May 25 13:59:47 PDT 2003


Short answer: Try the latest code in CVS and it should just work.

Long answer: ISO-8859-1 can't represent characters > 255 and in b9 we 
didn't do anything to check that the chars you were outputting fit 
within your chosen encoding's restrictions.  After b9 we added a Format 
class with a setEscapeStrategy() that lets you escape chars that would 
be illegal in your chosen charset.  ISO-8859-1 is a charset we recognize 
by default and so without making any calls things should behave correctly.

Let us know how it works.  You'll be among the first real-world test 
cases.  :-)

-jh-

manish sharan wrote:
> Hi,
> 
> I am creating a Jdom Document from a web page such as www.cnn.com (good 
> example as it uses ‘•’ character for bullet points ) by first 
> passing the page through jtidy and then into JDOM.  I am able to build 
> the org.jdom.Document object without any problems. Now, my goal is to 
> save this on my local drive and read it with my browser. Enncoding in 
> ISO-8859-1 is a requirement
> 
> The problem is that when I open the saved jdom object file  in IE 
> browser, I see a lot of weird characters. www.Cnn.com has a lot of 
> bulleted items and the bullets (dots) are shown as some weird 
> characters.  When I switch the encoding of the browser to UTF-8., this 
> file displays ok.   The problem is that this behavior is unexpected as I 
> am explicitly setting the encoding to ISO-8859-1.
> 
> This is my code for XML Outputter
> 
>      org.jdom.output.XMLOutputter xmlOutputter = new XMLOutputter();
>            xmlOutputter.setEncoding("ISO-8859-1");
>            xmlOutputter.setOmitDeclaration(true) ;
>            xmlOutputter.setOmitEncoding(true) ;
>            ByteArrayOutputStream output = new ByteArrayOutputStream();
> 
>            //  I have tried ISO8859_1 and ISO-8859-1 ,
>         OutputStreamWriter osw=new OutputStreamWriter(output,"ISO8859_1") ;
>            xmlOutputter.output(jdoc, osw);
>            jdomStr= output.toString("ISO8859_1");
> 
>         // this is the parT where I save the Dom:
> 
>         DataOutputStream dos= new DataOutputStream(new 
> FileOutputStream(fileName));
>         //  I have tried ISO8859_1 and ISO-8859-1 ,
>        OutputStreamWriter osw=new OutputStreamWriter(dos,"ISO8859_1") ;
>        osw.write(s,0,s.length() );
> 
> 
> When I open this file in browser, I see weird characters in place of 
> bullet points, until I explicity set the encoding in the browser to utf-8.
> 
> Can someone please tell me why is encoding not working ? How can I get 
> it to work ?I will deeply appreciate any help !!
> 
> I am using JDOM beta 9 and JDK 1.4.1_02.
> Regards
> -manish
> 
> _________________________________________________________________
> Add photos to your messages with MSN 8. Get 2 months FREE*.  
> http://join.msn.com/?page=features/featuredemail
> 
> _______________________________________________
> To control your jdom-interest membership:
> http://lists.denveronline.net/mailman/options/jdom-interest/youraddr@yourhost.com 
> 
> 




More information about the jdom-interest mailing list