[jdom-interest] Possible inconsistency in
Verifier.isXMLCharacter()
Elliotte Rusty Harold
elharo at metalab.unc.edu
Fri Apr 11 04:41:23 PDT 2003
At 5:47 PM -0400 4/10/03, Rolf Lear wrote:
Now, according to Java spec, chars have value 0x0000 through 0xffff
(<http://java.sun.com/docs/books/jls/second_edition/html/typesValues.doc.html#9151>http://java.sun.com/docs/books/jls/second_edition/html/typesValues.doc.html#9151)
Thus, the line:
if (c < 0x10000) return false; if (c <= 0x10FFFF) return true;
is redundant, until there is a java with more than 2 byte chars.
So, whatever characters are meant to be in the range 0x10000 through
0x10FFF they will never validate.
There's an impedance mismatch between Java and XML here, and JDOM
suffers as a result. Java chars are not Unicode characters. We are
currently checking Java chars as you note, though the code is really
designed to handle Unicode characters. Looking at it now, there's a
major bug here: we reject all surrogate characters, which means that
we reject all characters beyond the basic multilingual plane,
including musical notation, mathematical symbols, various obscure
parts of Chinese, and more.
Probably the way this should be handled is by verifying only strings.
The strings can be decoded into Unicode characters represented as
ints rather than Java chars and then the ints can be verified. This
is the way XOM does it.
--
+-----------------------+------------------------+-------------------+
| Elliotte Rusty Harold | elharo at metalab.unc.edu | Writer/Programmer |
+-----------------------+------------------------+-------------------+
| Processing XML with Java (Addison-Wesley, 2002) |
| http://www.cafeconleche.org/books/xmljava |
| http://www.amazon.com/exec/obidos/ISBN%3D0201771861/cafeaulaitA |
+----------------------------------+---------------------------------+
| Read Cafe au Lait for Java News: http://www.cafeaulait.org/ |
| Read Cafe con Leche for XML News: http://www.cafeconleche.org/ |
+----------------------------------+---------------------------------+
More information about the jdom-interest
mailing list