[jdom-interest] SAXBuilder.setExpandEntities

Jason Hunter jhunter at acm.org
Thu May 16 14:39:07 PDT 2002


> Our application provides another application with an XML document. The
> applications talk to eachother over HTTP. The remote application stores the
> file and returns it at a later time.
> 
> The remote application first parses the XML document using JDOM and stores it
> in the file system. At a later stage, when it is requested for this file
> again, it gets the file from the file system, passes it through JDOM again
> and sends it over HTTP to our application.
> 
> Problem is that when the document is stored, the character entities have been
> interpreted by JDOM and are replaced with '?' question marks in the file. So
> we never get back what we originally sent. All character entities are
> replaced by question marks. Pehaps special characters should be replaced by
> character entities by the XMLOutputter then?

The default encoding in XML and JDOM is UTF-8, and in UTF-8 chars above
127 require 2 bytes.  That may make it look odd when viewed in your
editor which doens't know UTF-8, but an XML parser will read it
correctly!

If you output using Latin-1 (ISO-8859-1) then chars 0-255 can be done in
1 byte, and many viewers will display the right thing since they often
know about Latin-1.

The main point is it doesn't really matter though so long as your
consumer is an XML parser; it'll just work.  If you absolutely need the
char escaped, then you can subclass XMLOutputter to escape any
characters you like.  On the plan for XMLOutputter is a pluggable API
where you can escape char ranges.

-jh-



More information about the jdom-interest mailing list