[jdom-interest] Possible inconsistency in Verifier.isXMLCharacter()

Elliotte Rusty Harold elharo at metalab.unc.edu
Fri Apr 11 04:41:23 PDT 2003


At 5:47 PM -0400 4/10/03, Rolf Lear wrote:

Now, according to Java spec, chars have value 0x0000 through 0xffff 
(<http://java.sun.com/docs/books/jls/second_edition/html/typesValues.doc.html#9151>http://java.sun.com/docs/books/jls/second_edition/html/typesValues.doc.html#9151)

Thus, the line:

if (c < 0x10000) return false;  if (c <= 0x10FFFF) return true;

is redundant, until there is a java with more than 2 byte chars.


So, whatever characters are meant to be in the range 0x10000 through 
0x10FFF they will never validate.


There's an impedance mismatch between Java and XML here, and JDOM 
suffers as a result. Java chars are not Unicode characters. We are 
currently checking Java chars as you note, though the code is really 
designed to handle Unicode characters. Looking at it now, there's a 
major bug here: we reject all surrogate characters, which means that 
we reject all characters beyond the basic multilingual plane, 
including musical notation, mathematical symbols, various obscure 
parts of Chinese, and more.

Probably the way this should be handled is by verifying only strings. 
The strings can be decoded into Unicode characters represented as 
ints rather than Java chars and then the ints can be verified. This 
is the way XOM does it.
-- 

+-----------------------+------------------------+-------------------+
| Elliotte Rusty Harold | elharo at metalab.unc.edu | Writer/Programmer |
+-----------------------+------------------------+-------------------+
|           Processing XML with Java (Addison-Wesley, 2002)          |
|              http://www.cafeconleche.org/books/xmljava             |
| http://www.amazon.com/exec/obidos/ISBN%3D0201771861/cafeaulaitA  |
+----------------------------------+---------------------------------+
|  Read Cafe au Lait for Java News:  http://www.cafeaulait.org/      |
|  Read Cafe con Leche for XML News: http://www.cafeconleche.org/    |
+----------------------------------+---------------------------------+



More information about the jdom-interest mailing list