[jdom-interest] Kana symbols and UTF-8? (was Re: Kana characters?)

Day, Jem BGI SF Jem.Day at barclaysglobal.com
Tue May 22 10:14:38 PDT 2007


This is really why (IMHO) it's very dangerous to represent XML documents as 'Strings', the 'encoding=' portion of the prolog tells the parser how the characters are encoded in the following stream of bytes.
 
I believe it's legal (but somebody might shoot me down on this) for the prolog to have an encoding of 'us-ascii' (ie single byte characters) and then to use the XML character escapes (&#xnnnn) to represent the extended character set.
 
So, when using any xml serializer (and JDOM would fall into this category) you need to ensure that the character encoding of your java.io.Writer matches the encoding specified in the prolog. 
 
Jem...
 

________________________________

From: jdom-interest-bounces at jdom.org [mailto:jdom-interest-bounces at jdom.org] On Behalf Of Alan Deikman
Sent: Tuesday, May 22, 2007 9:44 AM
To: jdom-interest at jdom.org
Subject: [jdom-interest] Kana symbols and UTF-8? (was Re: Kana characters?)


OK, now I'm a little confused.   I guess this is an XML question and not really a JDOM question, but perhaps someone can explain it.

Angela Amoateng wrote: 


	This is the code in my XML document (by the way, romaji is romanised Japanese): 
	
	<?xml version="1.0" encoding="UTF-8"?> 
	
	<dictionary> 
	   <word> 
	       <noun> 
	           <english>book</english> 
	           <romaji>hon</romaji> 
	           <hiraganaSym>ほん</hiraganaSym> 
	           <hiraganaNum>&#x307B;&#x3093;</hiraganaNum> 
	       </noun> 
	


Where I get lost is in the <hiriganaSym> tag.   Those characters inside are not part of any 8-bit code (ASCII, UTF-8 or whatever).  Java has no problem with it because all String objects are built on unicode, but what does the encoding="UTF-8" mean in the header if these symbols can show up in the document?


-- 
Alan Deikman
ZNYX Networks 
 
-- 
 
This message and any attachments are confidential, proprietary, and may be privileged.  If this message was misdirected, Barclays Global Investors (BGI) does not waive any confidentiality or privilege.  If you are not the intended recipient, please notify us immediately and destroy the message without disclosing its contents to anyone.  Any distribution, use or copying of this e-mail or the information it contains by other than an intended recipient is unauthorized.  The views and opinions expressed in this e-mail message are the author's own and may not reflect the views and opinions of BGI, unless the author is authorized by BGI to express such views or opinions on its behalf.  All email sent to or from this address is subject to electronic storage and review by BGI.  Although BGI operates anti-virus programs, it does not accept responsibility for any damage whatsoever caused by viruses being passed.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.jdom.org/pipermail/jdom-interest/attachments/20070522/3f8c591d/attachment.htm


More information about the jdom-interest mailing list