[jdom-interest] SAXBuilder.setExpandEntities
Jason Hunter
jhunter at acm.org
Thu May 16 14:39:07 PDT 2002
> Our application provides another application with an XML document. The
> applications talk to eachother over HTTP. The remote application stores the
> file and returns it at a later time.
>
> The remote application first parses the XML document using JDOM and stores it
> in the file system. At a later stage, when it is requested for this file
> again, it gets the file from the file system, passes it through JDOM again
> and sends it over HTTP to our application.
>
> Problem is that when the document is stored, the character entities have been
> interpreted by JDOM and are replaced with '?' question marks in the file. So
> we never get back what we originally sent. All character entities are
> replaced by question marks. Pehaps special characters should be replaced by
> character entities by the XMLOutputter then?
The default encoding in XML and JDOM is UTF-8, and in UTF-8 chars above
127 require 2 bytes. That may make it look odd when viewed in your
editor which doens't know UTF-8, but an XML parser will read it
correctly!
If you output using Latin-1 (ISO-8859-1) then chars 0-255 can be done in
1 byte, and many viewers will display the right thing since they often
know about Latin-1.
The main point is it doesn't really matter though so long as your
consumer is an XML parser; it'll just work. If you absolutely need the
char escaped, then you can subclass XMLOutputter to escape any
characters you like. On the plan for XMLOutputter is a pluggable API
where you can escape char ranges.
-jh-
More information about the jdom-interest
mailing list