[jdom-interest] XML Element name Verifier is overly strict and
doesn't match current XML 1.0 REC
Klotz, Leigh
Leigh.Klotz at xerox.com
Thu Mar 26 10:30:15 PDT 2009
I agree.
Thank you both for researching the issue and for getting the list back
up.
Leigh.
-----Original Message-----
From: Jason Hunter [mailto:jhunter at servlets.com]
Sent: Saturday, March 21, 2009 7:55 PM
To: jdom interest
Cc: Klotz, Leigh
Subject: Re: [jdom-interest] XML Element name Verifier is overly strict
and doesn't match current XML 1.0 REC
Note that this had a pretty good debate on xml-dev (while our list was
down):
http://markmail.org/message/wqcmohlf7srpqhkl
General consensus seems to be the current behavior is the lesser of two
evils.
-jh-
On Mar 19, 2009, at 2:50 PM, Klotz, Leigh wrote:
> JDOM 1.1 won't create elements whose characters are in the following
> ranges:
> Unicode 0xFF41-0xFF5A (FULLWIDTH LATIN SMALL LETTER A to FULLWIDTH
> LATIN SMALL LETTER Z) Unicode 0xFF21-0xFF3A (FULLWIDTH LATIN CAPITAL
> LETTER A to FULLWIDTH LATIN CAPITAL LETTER Z)
>
> The JDOM 1.1 source for org.jdom.Verifier.isXMLLetter cites production
> 84 of the XML 1.0 Recommendation for its table of allowed characters.
>
> However, according to http://www.w3.org/TR/REC-xml/ the whole of
> Appendix B (which contains Production 84) is obsolete and is not used
> within the recommendation. The XML Rec instead uses production [4]
> for NameStartChar and [5] for NameChar.
>
> The productions at [4] and [5] are considerably smaller than those of
> Appendix B, and are more inclusive, providing for greater utility in
> I18N applications of XML.
>
> Furthermore, according to http://www.w3.org/TR/REC-xml/ Appendix J
> (Non-Normative), the characters I menition above are not only allowed,
> but encouraged for use in XML Names, because the Unicode ID_Start
> property and ID_Continue of these Unicode code points is True.
>
> The XML REC says:
>
> 1. The first character of any name should have a Unicode property
> of ID_Start, or else be '_' #x5F.
> 2. Characters other than the first should have a Unicode property
> of ID_Continue, or ...
>
> You can see that ID_Start and ID_Continue are True on the individual
> pages for the small letters here:
> http://unicode.org/cldr/utility/character.jsp?a=FF41
> to
> http://unicode.org/cldr/utility/character.jsp?a=FF5A
>
> I recommend that org.jdom.Verifier.isXMLLetter be updated to use
> production [4], [4a], and [5] of XML 1.0 Fifth Edition.
> It's quite likely that some of the other character class verifiers
> need updating as well, but I didn't examine them.
>
> Leigh.
>
> _______________________________________________
> To control your jdom-interest membership:
> http://www.jdom.org/mailman/options/jdom-interest/
> youraddr at yourhost.com
More information about the jdom-interest
mailing list