[jdom-interest] Not legal JDOM characters?
Elliotte Rusty Harold
elharo at metalab.unc.edu
Mon Aug 23 12:48:38 PDT 2004
At 11:18 AM -0700 8/23/04, Dave Byrne wrote:
>The data "?" is not legal for a JDOM character content: 0xd835 is not a
>legal XML character.
Notice that this error message does not reference the character
you're including. Furthermore, 0xd835 is a Unicode high surrogate. I
therefore surmise that this is a bug in XOM, where XOM is not
decoding surrogate pairs before passing them to the Verifier. Looking
at the source, my surmise is correct:
for (int i = 0, len = text.length(); i<len; i++) {
if (!isXMLCharacter(text.charAt(i))) {
// Likely this character can't be easily displayed
// because it's a control so we use it'd hexadecimal
// representation in the reason.
return ("0x" + Integer.toHexString(text.charAt(i))
+ " is not a legal XML character");
}
}
JDOM should be recognizing that this character is half of a surrogate
pair, decoding the surrogate pair, and checking that. That's it's
failing to do so is a bug.
--
Elliotte Rusty Harold
elharo at metalab.unc.edu
Effective XML (Addison-Wesley, 2003)
http://www.cafeconleche.org/books/effectivexml
http://www.amazon.com/exec/obidos/ISBN%3D0321150406/ref%3Dnosim/cafeaulaitA
More information about the jdom-interest
mailing list