[jdom-interest] Yet another TODO (entity escapes)
Joseph Bowbeer
jozart at csi.com
Tue Jun 19 15:48:05 PDT 2001
I think EntityRef is only part of the story. Here's another thread to
review:
Attribute.getSerializedForm bug [eg]
http://lists.denveronline.net/lists/jdom-interest/2001-April/005644.html
http://lists.denveronline.net/lists/jdom-interest/2001-April/005649.html
http://lists.denveronline.net/lists/jdom-interest/2001-April/005669.html
-- original message --
[jdom-interest] Yet another TODO (entity escapes)
From: alex at jguru.com
Date: Tue, 19 Jun 2001 10:12:18 -0700
* Figure out how to deal with XMLOutputter writing of special characters
like  . Should it char escape only chars unprintable in the current
character set? Or should there be a fancy API for selecting what's
escaped?
http://lists.denveronline.net/lists/jdom-interest/2001-February/004521.html
It seems to me that this is a parser issue. If XMLOutputter is passed
an EntityRef containing the special character code, it outputs it as
an entity reference.
( new Element("funky")
.addContent( new EntityRef("#x2022") )
.addContent("Bullet")
.addContent(" ")
.addContent( new EntityRef("#160") )
outputs <funky>•Bullet one  </funky> as expected.
However, the SAX parser expands &#xxx; entities into their unicode
string versions, *even when you call setExpandEntities(false)*.
Sounds like either a bug or a design flaw in SAX parsers, or in the
SAXBuilder (which I haven't looked closely at). Shouldn't they return
EntityRef objects?
XMLOutputter is faithfully outputting what it was given; if it's a
high Unicode value inside a Java String, then Java takes care of
converting it to the right bytes for the stream's output encoding.
OTOH, if someone wants to make sure that all unicode characters turn
into their corresponding escapes on the way out, or vice versa, then
that's a good use for a filter stream.
Either way, I think we can check this one off the todo list, at least
for XMLOutputter.
More information about the jdom-interest
mailing list