[jdom-interest] UTF8 charset issues...
Patrick JUSSEAU
patrick at openbase.com
Fri Oct 10 05:35:20 PDT 2003
Hi all,
I am trying to understand how jdom handles character encodings. Here is
what I am doing:
I have a java app which reads data from a xml file (UTF-8 encoded). I
am able to get text just fine using
String str = anElement.getText();
The resulting str string (Unicode encoded) contains exactly what was
defined in my xml file. The charset translation is here transparent for
me. For example if my xml document is:
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE DOCUMENT SYSTEM "annonce.dtd">
<DOCUMENT>
<TEXT>Æ</TEXT>
</DOCUMENT>
I get Æ in my str string.
However when I am trying to generate a xml document with this exact
same Æ value, just calling Element.setText("Æ") does not generate a
correct UTF-8 encoded document. I have first to manually do this in my
code:
String text = "Æ";
try{
byte[] bytes = text.getBytes("UTF8");
String newText = new String(bytes);
setText(newText);
}catch(UnsupportedEncodingException uee){
uee.printStackTrace();
}
Why do I have to do this for the xml generation to work. Why isn't jdom
taking care of the charset translation for me since the resulting file
has UTF-8 encoding specified in it?
Thanks for any help
Patrick
More information about the jdom-interest
mailing list