[jdom-interest] encoding problem with xmloutputter
Jason Hunter
jhunter at acm.org
Sun May 25 13:59:47 PDT 2003
Short answer: Try the latest code in CVS and it should just work.
Long answer: ISO-8859-1 can't represent characters > 255 and in b9 we
didn't do anything to check that the chars you were outputting fit
within your chosen encoding's restrictions. After b9 we added a Format
class with a setEscapeStrategy() that lets you escape chars that would
be illegal in your chosen charset. ISO-8859-1 is a charset we recognize
by default and so without making any calls things should behave correctly.
Let us know how it works. You'll be among the first real-world test
cases. :-)
-jh-
manish sharan wrote:
> Hi,
>
> I am creating a Jdom Document from a web page such as www.cnn.com (good
> example as it uses ‘•’ character for bullet points ) by first
> passing the page through jtidy and then into JDOM. I am able to build
> the org.jdom.Document object without any problems. Now, my goal is to
> save this on my local drive and read it with my browser. Enncoding in
> ISO-8859-1 is a requirement
>
> The problem is that when I open the saved jdom object file in IE
> browser, I see a lot of weird characters. www.Cnn.com has a lot of
> bulleted items and the bullets (dots) are shown as some weird
> characters. When I switch the encoding of the browser to UTF-8., this
> file displays ok. The problem is that this behavior is unexpected as I
> am explicitly setting the encoding to ISO-8859-1.
>
> This is my code for XML Outputter
>
> org.jdom.output.XMLOutputter xmlOutputter = new XMLOutputter();
> xmlOutputter.setEncoding("ISO-8859-1");
> xmlOutputter.setOmitDeclaration(true) ;
> xmlOutputter.setOmitEncoding(true) ;
> ByteArrayOutputStream output = new ByteArrayOutputStream();
>
> // I have tried ISO8859_1 and ISO-8859-1 ,
> OutputStreamWriter osw=new OutputStreamWriter(output,"ISO8859_1") ;
> xmlOutputter.output(jdoc, osw);
> jdomStr= output.toString("ISO8859_1");
>
> // this is the parT where I save the Dom:
>
> DataOutputStream dos= new DataOutputStream(new
> FileOutputStream(fileName));
> // I have tried ISO8859_1 and ISO-8859-1 ,
> OutputStreamWriter osw=new OutputStreamWriter(dos,"ISO8859_1") ;
> osw.write(s,0,s.length() );
>
>
> When I open this file in browser, I see weird characters in place of
> bullet points, until I explicity set the encoding in the browser to utf-8.
>
> Can someone please tell me why is encoding not working ? How can I get
> it to work ?I will deeply appreciate any help !!
>
> I am using JDOM beta 9 and JDK 1.4.1_02.
> Regards
> -manish
>
> _________________________________________________________________
> Add photos to your messages with MSN 8. Get 2 months FREE*.
> http://join.msn.com/?page=features/featuredemail
>
> _______________________________________________
> To control your jdom-interest membership:
> http://lists.denveronline.net/mailman/options/jdom-interest/youraddr@yourhost.com
>
>
More information about the jdom-interest
mailing list