<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN">
<HTML><HEAD>
<META http-equiv=Content-Type content="text/html; charset=utf-8">
<META content="MSHTML 6.00.2716.2200" name=GENERATOR>
<STYLE></STYLE>
</HEAD>
<BODY bgColor=#ffffff>
<DIV><FONT face=Arial size=2>Current XMLOutputter class (Version 8) doesn't
support Unicode characters with hashcode above 128.</FONT></DIV>
<DIV><FONT face=Arial size=2></FONT> </DIV>
<DIV><FONT face=Arial size=2><FONT face="Times New Roman" size=3><FONT
face=Arial size=2>I was trying to save this character \u8220 to xml using
XMLOutputter and as the result I had in file one byte (93hex) instead of two
bytes, and then I couldn't parse this file using SAXBuilder as well as I
couldn't open this file in Internet Explorer.</FONT></FONT></FONT></DIV>
<DIV><FONT face=Arial size=2></FONT> </DIV>
<DIV><FONT face=Arial size=2>I was reading different algorithms that converts
Unicode to XML, HTML and I think this one is the best </FONT></DIV>
<DIV><FONT face=Arial size=2></FONT> </DIV>
<DIV>
<HR>
</DIV>
<DIV><FONT face=Arial size=2><A
href="http://czyborra.com/utf/#UTF-8">http://czyborra.com/utf/#UTF-8</A></FONT></DIV>
<DIV>
<H2><A name=HTML>HTML's Numerical Character References</A></H2>A somewhat more
standardized encoding option is specified by HTML. <A
href="ftp://ftp.isi.edu/in-notes/rfc2070.txt">RFC 2070</A> allows us to
reference just any Unicode character within any HTML document of any charset by
using the decimal numeric character reference &#12345; as in:
<P><PRE>putwchar(c)
{
if (c < 0x80 && c != '&' && c != '<') putchar(c);
else printf ("&#%d;", c);
}
</PRE>
<P>Decimal numbers for Unicode characters are also used in Windows NT's
Alt-12345 input method but are still of so little mnemonic value that a
hexadecimal alternative &#x1bc; is being supported by the newer standards <A
href="http://www.w3.org/TR/REC-html40/charset.html">HTML 4.0</A> and <A
href="http://www.w3.org/XML/">XML 1.0</A>. Apart from that, hexadecimal numbers
aren't that easy to memorize either. SGML has long allowed <A
href="http://czyborra.com/yudit/SGML.kmap">symbolic character entities</A> for
some character references like &eacute; for é and &euro; for the € but
the table of supported entities differs from browser to browser. </P>
<P>
<HR>
</P>
<P><FONT face=Arial size=2>I wrote this method for the conversion</FONT></P>
<P><FONT face=Arial size=2>This class converts this 3 characters
(&,<,>) to SGML Entities as well as all characters above 128 using
this format &#1234; Now it works with any parsers suporting XML
1.0</FONT></P>
<P><FONT face=Arial size=2><EM>/**<BR> * Converts Unicode Character to HTML
Decimal Entity.<BR> * All Characters with hashcode less than 128(decimal)
apart from<BR> * '>','<' and '&' are the same.. The rest is
converted to decimal entity &#{char_hashcode};<BR> * Supported formats
examples:<BR> * <br> /u003F --> &#63;<BR> * @param
value Unicode Character<BR> * @return Converted HTML Character or
Entity.<BR> */<BR> public String convertTEXTtoHTML(char
value)<BR> {<BR> String temp =
null;<BR> char b[] = new
char[1];<BR> int bint = new
Character(value).hashCode();<BR>
if((bint<128)&&(bint!="&".hashCode())&&(bint!="<".hashCode())&&(bint!=">".hashCode()))<BR>
{<BR>// b[0] =
value;<BR>// temp = new
String(b);<BR> temp =
null;<BR> }<BR>
else<BR> temp = "&#"+ bint
+";";<BR> return temp;<BR> }</EM></FONT></P>
<P><FONT face=Arial size=2>and I changed
<STRONG>XMLOutputter.escapeElementEntities(String str)</STRONG> method
</FONT></P>
<P><FONT face=Arial size=2><EM> default
:<BR> entity =
convertTEXTtoHTML(ch);<BR> break;</EM></FONT></P>
<P><FONT face=Arial size=2>Maybe there is a different solution for this problem,
but It works fine.</FONT></P>
<P><FONT face=Arial size=2>Mad Einstein</FONT></P></DIV></BODY></HTML>