[jdom-interest] XMLOutputter problems with Unicode
Mad Einstein
madeinstein at hotmail.com
Wed Jul 3 02:29:36 PDT 2002
I tryied to do this like that:
Element root = new Element("indexes");
Document doc = new Document(root); //sample JDom Document
FileWriter fw = new FileWriter("test.xml",false);
new XMLOutputter(" ", true, "UTF-8").output(doc,fw);
And the result was as I said one byte 93hex insead of \u8220
Should I use different writer? Do you know any writers that will give me
proper Unicode output?
Thanks,
Mad Einstein
----- Original Message -----
From: "Jason Hunter" <jhunter at servlets.com>
To: "Mad Einstein" <madeinstein at hotmail.com>
Cc: <jdom-interest at jdom.org>
Sent: Tuesday, July 02, 2002 8:06 PM
Subject: Re: [jdom-interest] XMLOutputter problems with Unicode
> Your solution is one approach. However, if you simply leave the
> outputter's encoding as UTF-8 (the default) and pass in an output stream
> or a writer designed for UTF-8, then characters are encoded correctly
> without needing to be escaped. That should be faster than your
> solution. If you don't see that happening, you probably passed in an
> improper writer or changed the encoding.
>
> -jh-
>
> > Mad Einstein wrote:
> >
> > 
> > Current XMLOutputter class (Version 8) doesn't support Unicode
> > characters with hashcode above 128.
> >
> > I was trying to save this character \u8220 to xml using XMLOutputter
> > and as the result I had in file one byte (93hex) instead of two bytes,
> > and then I couldn't parse this file using SAXBuilder as well as I
> > couldn't open this file in Internet Explorer.
> >
> > I was reading different algorithms that converts Unicode to XML, HTML
> > and I think this one is the best
> >
> > ----------------------------------------------------------------------
> > http://czyborra.com/utf/#UTF-8
> >
> > HTML's Numerical Character References
> >
> > A somewhat more standardized encoding option is specified by HTML. RFC
> > 2070 allows us to reference just any Unicode character within any HTML
> > document of any charset by using the decimal numeric character
> > reference 〹 as in:
> >
> > putwchar(c)
> > {
> > if (c < 0x80 && c != '&' && c != '<') putchar(c);
> > else printf ("&#%d;", c);
> > }
> >
> > Decimal numbers for Unicode characters are also used in Windows NT's
> > Alt-12345 input method but are still of so little mnemonic value that
> > a hexadecimal alternative Ƽ is being supported by the newer
> > standards HTML 4.0 and XML 1.0. Apart from that, hexadecimal numbers
> > aren't that easy to memorize either. SGML has long allowed symbolic
> > character entities for some character references like é for é
> > and € for the â,¬ but the table of supported entities differs
> > from browser to browser.
> >
> > ----------------------------------------------------------------------
> >
> > I wrote this method for the conversion
> >
> > This class converts this 3 characters (&,<,>) to SGML Entities as well
> > as all characters above 128 using this format Ӓ Now it works
> > with any parsers suporting XML 1.0
> >
> > /**
> > * Converts Unicode Character to HTML Decimal Entity.
> > * All Characters with hashcode less than 128(decimal) apart from
> > * '>','<' and '&' are the same.. The rest is converted to decimal
> > entity &#{char_hashcode};
> > * Supported formats examples:
> > * <br> /u003F --> ?
> > * @param value Unicode Character
> > * @return Converted HTML Character or Entity.
> > */
> > public String convertTEXTtoHTML(char value)
> > {
> > String temp = null;
> > char b[] = new char[1];
> > int bint = new Character(value).hashCode();
> >
> >
if((bint<128)&&(bint!="&".hashCode())&&(bint!="<".hashCode())&&(bint!=">".ha
shCode()))
> > {
> > // b[0] = value;
> > // temp = new String(b);
> > temp = null;
> > }
> > else
> > temp = "&#"+ bint +";";
> > return temp;
> > }
> >
> > and I changed XMLOutputter.escapeElementEntities(String str) method
> >
> > default :
> > entity = convertTEXTtoHTML(ch);
> > break;
> >
> > Maybe there is a different solution for this problem, but It works
> > fine.
> >
> > Mad Einstein
> _______________________________________________
> To control your jdom-interest membership:
>
http://lists.denveronline.net/mailman/options/jdom-interest/youraddr@yourhos
t.com
>
More information about the jdom-interest
mailing list