[jdom-interest] Encoding not working as expected - Copyright Symbol
Christian Cabanero
chumpboy at yahoo.com
Thu Jun 21 23:59:03 PDT 2001
First off, let me just say that JDom is DA BOMB DIGITY! Congrats on
building such a great product.
But unfortunately, something's been confusing me. I have an XML document
that is UTF-8 encoded and contains the (C) symbol encoded with UTF-8 (at
least I assume that it is). It shows up in my XML as...
<?xml version="1.0" encoding="UTF-8"?>
<article section="ECONOMIC">
<copyright>\302\251 Copyright 2001 USA TODAY, a division of Gannett Co.
Inc.</copyright>
</article>
...the "\302\251" being the character for (C). I load this XML file using a
SAXBuilder and then just spit it right out again into another file using an
XMLOutputter like so...
public static void main(String[] args) throws Exception {
SAXBuilder builder = new SAXBuilder();
Document doc = builder.build(args[0]); // pass in the xml file
containing the copyright symbol
XMLOutputter out = new XMLOutputter(" ", true, "UTF-8");
FileWriter writer = new FileWriter("output.xml");
out.output(doc, writer);
writer.close();
}
BUT, for some reason this modifies the XML data and messes up the copyright
symbol...
output.xml:
<?xml version="1.0" encoding="UTF-8"?>
<article section="ECONOMIC">
<copyright>\251 Copyright 2001 USA TODAY, a division of Gannett Co.
Inc.</copyright>
</article>
What happened to the copyright symbol? Am I missing something?
Subsequently, if I try to read in the resulting output.xml file I get a
JDOMException which reports "Character conversion error: "Unconvertible
UTF-8 character beginning with 0xa9" (line number may be too low)."
Any help would be very much appreciated. I've been using JDom with a lot of
success so far and just hit this snag, but otherwise have found it to be an
exceptional product.
Thanks in advance!
-Christian Cabanero
More information about the jdom-interest
mailing list