[jdom-interest] XML Element name Verifier is overly strict and doesn't match current XML 1.0 REC

Klotz, Leigh Leigh.Klotz at xerox.com
Thu Mar 19 14:50:34 PDT 2009


JDOM 1.1 won't create elements whose characters are in the following
ranges:
  Unicode 0xFF41-0xFF5A (FULLWIDTH LATIN SMALL LETTER A to FULLWIDTH
LATIN SMALL LETTER Z)
  Unicode 0xFF21-0xFF3A (FULLWIDTH LATIN CAPITAL LETTER A to FULLWIDTH
LATIN CAPITAL LETTER Z)

The JDOM 1.1 source for org.jdom.Verifier.isXMLLetter cites production
84 of the XML 1.0 Recommendation for its table of allowed characters. 

However, according to http://www.w3.org/TR/REC-xml/ the whole of
Appendix B (which contains Production 84) is obsolete and is not used
within the recommendation.  The XML Rec instead uses production [4] for
NameStartChar and [5] for NameChar.  

The productions at [4] and [5] are considerably smaller than those of
Appendix B, and are more inclusive, providing for greater utility in
I18N applications of XML.

Furthermore, according to http://www.w3.org/TR/REC-xml/ Appendix J
(Non-Normative), the characters I menition above are not only allowed,
but encouraged for use in XML Names, because the Unicode ID_Start
property and ID_Continue of these Unicode code points is True.  

The XML REC says:

    1. The first character of any name should have a Unicode property of
ID_Start, or else be '_' #x5F.
    2. Characters other than the first should have a Unicode property of
ID_Continue, or ...

You can see that ID_Start and ID_Continue are True on the individual
pages for the small letters here:
http://unicode.org/cldr/utility/character.jsp?a=FF41
to
http://unicode.org/cldr/utility/character.jsp?a=FF5A

I recommend that org.jdom.Verifier.isXMLLetter be updated to use
production [4], [4a], and [5] of XML 1.0 Fifth Edition.
It's quite likely that some of the other character class verifiers need
updating as well, but I didn't examine them.

Leigh.



More information about the jdom-interest mailing list