[jdom-interest] SAXbuilder and escape sequences
Tatu Saloranta
cowtowncoder at yahoo.com
Wed Oct 12 10:31:37 PDT 2005
--- Luke Majewski <luke.majewski+jdom at gmail.com>
wrote:
> Hi all,
>
> I have scoured the web for a solution to this and I
> am stumped. I have an
> xml file with elements like:
>
> <pr type="US">&stress1;ræbit </pr>
>
> When reading this in through the SAXbuilder, I get
> question marks and
> strange characters instead of the actual text.
>
> Here is the code I am currently using, I figured it
> was an issue of encoding
> but it's not doing the trick:
One thing you could check with respect to encoding is:
....
> sb.setEntityResolver(new EntityResolver() {
> public InputSource resolveEntity(String publicId,
> String systemId) throws
> SAXException, IOException {
> return new
>
InputSource("file:///d:/workspace/oup-character-entities.ent");
> }
Note that here you are not forcing this input source
to use utf-8 encoding. It may not be necessary (if
there's an xml declaration, or entities are declared
as char entities, not using embedded utf-8 chars etc),
but you may want to look into that file. At any rate,
encodings are not inherited via entity expansion (as
far as I know).
Also, are you sure question marks are not just
produced by your outputter? It is possible that the
characters contained in the string are correct, but if
you print them to stdout, it may just use ISO-Latin1
(8-bit) encoding, and might not have a way to
represent those chars. Or if you output to a file, you
also need to specify encoding on output side: JDom may
not be able to pass necessary encoding (doesn't have
it, or you are passing a Reader that has been
constructed with defaults).
Hope this helps,
-+ Tatu +-
__________________________________
Yahoo! Mail - PC Magazine Editors' Choice 2005
http://mail.yahoo.com
More information about the jdom-interest
mailing list