[jdom-interest] Text class
Elliotte Rusty Harold
elharo at metalab.unc.edu
Sun May 27 08:12:48 PDT 2001
At 7:48 PM -0500 5/26/01, Brett McLaughlin wrote:
>The only reason that I didn't do that was to ensure that we aren't
>"hard-wired" to any character based format. This was something Elliotte
>pointed out, that made sense to me. Like even if String wasn't final, I
>wouldn't extend it for the sake of Unicode and so forth...
>
>Elliotte, any thoughts here?
>
I agree. We don't want to hardwire it if we don't have to.
I've been rethinking my initial objections to Sun's approach to
handling non-BMP characters in Strings. I need to look more closely
at the just released JDK 1.4 to see what they're up to, but I'm
thinking maybe surrogate pairs will work. However, our logic will
have to decode the surrogate pairs before processing. In particular,
this affects the Verifier class. For name characters we're OK because
those can't use non-BMP characters. However, verifying the text
content of an element or attribute may require the ability to ask for
the next character inside the isCharacter() method. Or we may need
to rethink the API completely so it only verifies whole strings, not
individual characters.
What I've come to realize is that we may be OK with strings and
string buffers if we no longer assume one Java char equals one
Unicode character. I need to do some more research and
experimentation.
On the other hand, it's very important that we do support all these
non-BMP characters. The latest discovery here is that the new Han
ideographs include some essential characters, including, for example,
the ideogram for "I" (1st person singular pronoun) used in one
dialect of Chinese spoken by more than 30 million people.
--
+-----------------------+------------------------+-------------------+
| Elliotte Rusty Harold | elharo at metalab.unc.edu | Writer/Programmer |
+-----------------------+------------------------+-------------------+
| The XML Bible (IDG Books, 1999) |
| http://metalab.unc.edu/xml/books/bible/ |
| http://www.amazon.com/exec/obidos/ISBN=0764532367/cafeaulaitA/ |
+----------------------------------+---------------------------------+
| Read Cafe au Lait for Java News: http://metalab.unc.edu/javafaq/ |
| Read Cafe con Leche for XML News: http://metalab.unc.edu/xml/ |
+----------------------------------+---------------------------------+
More information about the jdom-interest
mailing list