[jdom-interest] Fwd: Re: Kana symbols and UTF-8? (was Re: Kana characters?)

Tue May 22 13:44:26 PDT 2007

----- Forwarded message from angela.amoateng at kcl.ac.uk -----
    Date: Tue, 22 May 2007 21:42:42 +0100
    From: Angela Amoateng <angela.amoateng at kcl.ac.uk>
Reply-To: Angela Amoateng <angela.amoateng at kcl.ac.uk>
Subject: Re: Kana symbols and UTF-8? (was Re: Kana characters?)
      To: Alan Deikman <Alan.Deikman at znyx.com>

Hi Alan,

To clear confusion, the symbols used in the <hiraganaSym> tags are 
actual fonts of the UTF-8 hexadecimal value. As long as its true fonts 
and not an image, it is exactly the same as putting the hexadecimal 
value in its place. When I used the font symbols(which is what the 
encoding produces) and the hexadecimal values, when I viewed it in the 
browser, they both came up with the symbols (check the other message 
for the output)

I hope that clears things up!

Angela

Quoting Alan Deikman <Alan.Deikman at znyx.com>:

> OK, now I'm a little confused.   I guess this is an XML question and 
> not really a JDOM question, but perhaps someone can explain it.
>
> Angela Amoateng wrote:
>>
>> This is the code in my XML document (by the way, romaji is romanised 
>> Japanese):
>>
>> <?xml version="1.0" encoding="UTF-8"?>
>>
>> <dictionary>
>>    <word>
>>        <noun>
>>            <english>book</english>
>>            <romaji>hon</romaji>
>>            <hiraganaSym>ほん</hiraganaSym>
>>            <hiraganaNum>&#x307B;&#x3093;</hiraganaNum>
>>        </noun>
>
> Where I get lost is in the <hiriganaSym> tag.   Those characters 
> inside are not part of any 8-bit code (ASCII, UTF-8 or whatever).  
> Java has no problem with it because all String objects are built on 
> unicode, but what does the _encoding="UTF-8"_ mean in the header if 
> these symbols can show up in the document?
>
> -- 
> Alan Deikman
> ZNYX Networks
>
>

-- 
Angela Amoateng
angela.amoateng at kcl.ac.uk

----- End forwarded message -----

-- 
Angela Amoateng
angela.amoateng at kcl.ac.uk