Hi all,<br>
<br>
I have scoured the web for a solution to this and I am stumped. I have an xml file with elements like:<br>
<br>
<pr type="US">&stress1;r&aelig;bit
<div id="mb_1"></pr><br>
<div>
<br>
When reading this in through the SAXbuilder, I get question marks and strange characters instead of the actual text.<br>
<br>
Here is the code I am currently using, I figured it was an issue of encoding but it's not doing the trick:<br>
<br>
<br>
SAXBuilder sb = new SAXBuilder("org.apache.crimson.parser.XMLReaderImpl");<br>
<br>
InputSource is = new InputSource("<a>file:///d:/workspace/OACD/OACD_rz.xml")</a>;<br>
is.setEncoding("UTF-8");<br>
sb.setEntityResolver(new EntityResolver() {<br>
public InputSource resolveEntity(String publicId, String systemId)
throws SAXException, IOException {<br>
return new
InputSource("<a>file:///d:/workspace/oup-character-entities.ent")</a>;<br>
}<br>
});<br>
document = sb.build(is);<br>
<br>
and the xml header is:<br>
<br>
<?xml version='1.0' encoding='UTF-8'?><br>
<?xml-stylesheet type='text/xsl' href="<a>http://somestyle.xsl"</a>?><br>
<!DOCTYPE dictionary SYSTEM "dictionary.dtd"><br>
<dictionary xml:space='preserve'><br>
<br>
What I get back when I do a getText() on the element pr is "?r?bit"<br>
<br>
I assume I am missing something obvious, pointing me to the
right section of the documentation would be sufficient.<br>
<br>
Thank you,<br><span>
<br>
Luke Majewski</span></div></div>