[jdom-interest] SAXbuilder and escape sequences
Luke Majewski
luke.majewski+jdom at gmail.com
Wed Oct 12 07:37:23 PDT 2005
Hi all,
I have scoured the web for a solution to this and I am stumped. I have an
xml file with elements like:
<pr type="US">&stress1;ræbit </pr>
When reading this in through the SAXbuilder, I get question marks and
strange characters instead of the actual text.
Here is the code I am currently using, I figured it was an issue of encoding
but it's not doing the trick:
SAXBuilder sb = new SAXBuilder("org.apache.crimson.parser.XMLReaderImpl");
InputSource is = new InputSource("file:///d:/workspace/OACD/OACD_rz.xml");
is.setEncoding("UTF-8");
sb.setEntityResolver(new EntityResolver() {
public InputSource resolveEntity(String publicId, String systemId) throws
SAXException, IOException {
return new InputSource("file:///d:/workspace/oup-character-entities.ent");
}
});
document = sb.build(is);
and the xml header is:
<?xml version='1.0' encoding='UTF-8'?>
<?xml-stylesheet type='text/xsl' href="http://somestyle.xsl"?>
<!DOCTYPE dictionary SYSTEM "dictionary.dtd">
<dictionary xml:space='preserve'>
What I get back when I do a getText() on the element pr is "?r?bit"
I assume I am missing something obvious, pointing me to the right section of
the documentation would be sufficient.
Thank you,
Luke Majewski
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.jdom.org/pipermail/jdom-interest/attachments/20051012/c59d2e57/attachment.htm
More information about the jdom-interest
mailing list