<HTML >
<HEAD>
<META http-equiv="Content-Type" content="text/html; charset=iso-2022-jp">
<TITLE></TITLE>
<META content="MSHTML 6.00.2900.3059" name=GENERATOR></HEAD>
<BODY text=#000000 bgColor=#ffffff>
<DIV>
<DIV> </DIV>
<DIV>
<DIV dir=ltr align=left><SPAN class=013040417-22052007><FONT
face="Lucida Sans Unicode" color=#0000ff size=2>This is really why (IMHO) it's
very dangerous to represent XML documents as 'Strings', the 'encoding=' portion
of the prolog tells the parser how the characters are encoded in the following
stream of bytes.</FONT></SPAN></DIV>
<DIV dir=ltr align=left><SPAN class=013040417-22052007><FONT
face="Lucida Sans Unicode" color=#0000ff size=2></FONT></SPAN> </DIV>
<DIV dir=ltr align=left><SPAN class=013040417-22052007><FONT
face="Lucida Sans Unicode" color=#0000ff size=2>I believe it's legal (but
somebody might shoot me down on this) for the prolog to have an encoding of
'us-ascii' (ie single byte characters) and then to use the XML character escapes
(&#xnnnn) to represent the extended character set.</FONT></SPAN></DIV>
<DIV dir=ltr align=left><SPAN class=013040417-22052007><FONT
face="Lucida Sans Unicode" color=#0000ff size=2></FONT></SPAN> </DIV>
<DIV dir=ltr align=left><SPAN class=013040417-22052007><FONT
face="Lucida Sans Unicode" color=#0000ff size=2>So, when using any xml
serializer (and JDOM would fall into this category) you need to ensure that the
character encoding of your java.io.Writer matches the encoding specified in the
prolog. </FONT></SPAN></DIV>
<DIV dir=ltr align=left><SPAN class=013040417-22052007><FONT
face="Lucida Sans Unicode" color=#0000ff size=2></FONT></SPAN> </DIV>
<DIV dir=ltr align=left><SPAN class=013040417-22052007><FONT
face="Lucida Sans Unicode" color=#0000ff size=2>Jem...</FONT></SPAN></DIV>
<DIV dir=ltr align=left><SPAN class=013040417-22052007><FONT
face="Lucida Sans Unicode" color=#0000ff size=2></FONT></SPAN> </DIV><BR>
<DIV class=OutlookMessageHeader lang=en-us dir=ltr align=left>
<HR tabIndex=-1>
<FONT face=Tahoma size=2><B>From:</B> jdom-interest-bounces@jdom.org
[mailto:jdom-interest-bounces@jdom.org] <B>On Behalf Of </B>Alan
Deikman<BR><B>Sent:</B> Tuesday, May 22, 2007 9:44 AM<BR><B>To:</B>
jdom-interest@jdom.org<BR><B>Subject:</B> [jdom-interest] Kana symbols and
UTF-8? (was Re: Kana characters?)<BR></FONT><BR></DIV>
<DIV></DIV>OK, now I'm a little confused. I guess this is an XML
question and not really a JDOM question, but perhaps someone can explain
it.<BR><BR>Angela Amoateng wrote:
<BLOCKQUOTE cite=mid:20070521225023.1852uiurvcccso0k@impmail.kcl.ac.uk
type="cite"><BR>This is the code in my XML document (by the way, romaji is
romanised Japanese): <BR><BR><?xml version="1.0" encoding="UTF-8"?>
<BR><BR><dictionary> <BR> <word>
<BR> <noun>
<BR>
<english>book</english>
<BR>
<romaji>hon</romaji>
<BR>
<hiraganaSym>ほん</hiraganaSym>
<BR>
<hiraganaNum>&#x307B;&#x3093;</hiraganaNum>
<BR> </noun>
<BR></BLOCKQUOTE><BR>Where I get lost is in the <hiriganaSym>
tag. Those characters inside are not part of any 8-bit code (ASCII,
UTF-8 or whatever). Java has no problem with it because all String objects
are built on unicode, but what does the <U>encoding="UTF-8"</U> mean in the
header if these symbols can show up in the document?<BR><BR><PRE class=moz-signature cols="72">--
Alan Deikman
ZNYX Networks</PRE> </DIV>
<DIV> </DIV>
<DIV>-- </DIV>
<DIV> </DIV>
<DIV>
<DIV>
<FONT SIZE="2">
<FONT FACE="Trebuchet MS">
<P CLASS="MsoNormal" STYLE="MARGIN: 0cm 0cm 0pt"><SPAN LANG="EN-US" STYLE="FONT-SIZE: 10pt; FONT-FAMILY: Arial">This message and any attachments are confidential, proprietary, and may be privileged.<SPAN STYLE="mso-spacerun: yes"> </SPAN>If this message was misdirected, Barclays Global Investors (BGI) does not waive any confidentiality or privilege.<SPAN STYLE="mso-spacerun: yes"> </SPAN>If you are not the intended recipient, please notify us immediately and destroy the message without disclosing its contents to anyone.<SPAN STYLE="mso-spacerun: yes"> </SPAN>Any distribution, use or copying of this e-mail or the information it contains by other than an intended recipient is unauthorized.<SPAN STYLE="mso-spacerun: yes"> </SPAN>The views and opinions expressed in this e-mail message are the author's own and may not reflect the views and opinions of BGI, unless the author is authorized by BGI to express such views or opinions on its behalf.<SPAN STYLE="mso-spacerun: yes"> </SPAN>All email sent to or from this address is subject to electronic storage and review by BGI.<SPAN STYLE="mso-spacerun: yes"> </SPAN>Although BGI operates anti-virus programs, it does not accept responsibility for any damage whatsoever caused by viruses being passed. </SPAN></P>
</FONT>
</FONT>
</DIV>
</DIV>
</DIV></BODY></HTML>