[jdom-interest] Not legal JDOM characters?
Dave Byrne
dave-lists at intelligentendeavors.com
Mon Aug 23 14:54:34 PDT 2004
In Verifier.isXMLCharacter() I added:
if (c >= 0xD800 && c <= 0xDBFF) return true;
if (c >= 0xDC00 && c <= 0xDFFF) return true;
which seems to work for me but definitely not the correct way to go about
it.
I looked into the surrogate pair decoding that you brought up, and from what
I can tell from the J2SE docs using String.charAt() will always split
surrogate pairs in half.
I'm not familiar with the low-level handling of UTF-16 in java, but is there
a way to examine strings char by char without splitting the surrogate pairs
in half? It seems that it may offer a better long-term way of handling
these chars.
Thanks
Dave Byrne
-----Original Message-----
From: Elliotte Rusty Harold [mailto:elharo at metalab.unc.edu]
Sent: Monday, August 23, 2004 12:49 PM
To: Dave Byrne
Cc: jdom-interest at jdom.org
Subject: Re: [jdom-interest] Not legal JDOM characters?
At 11:18 AM -0700 8/23/04, Dave Byrne wrote:
>The data "?" is not legal for a JDOM character content: 0xd835 is not a
>legal XML character.
Notice that this error message does not reference the character
you're including. Furthermore, 0xd835 is a Unicode high surrogate. I
therefore surmise that this is a bug in XOM, where XOM is not
decoding surrogate pairs before passing them to the Verifier. Looking
at the source, my surmise is correct:
for (int i = 0, len = text.length(); i<len; i++) {
if (!isXMLCharacter(text.charAt(i))) {
// Likely this character can't be easily displayed
// because it's a control so we use it'd hexadecimal
// representation in the reason.
return ("0x" + Integer.toHexString(text.charAt(i))
+ " is not a legal XML character");
}
}
JDOM should be recognizing that this character is half of a surrogate
pair, decoding the surrogate pair, and checking that. That's it's
failing to do so is a bug.
--
Elliotte Rusty Harold
elharo at metalab.unc.edu
Effective XML (Addison-Wesley, 2003)
http://www.cafeconleche.org/books/effectivexml
http://www.amazon.com/exec/obidos/ISBN%3D0321150406/ref%3Dnosim/cafeaulaitA
More information about the jdom-interest
mailing list