[jdom-interest] Full Unicode support considered necessary
Kenworthy, Edward
edward.kenworthy at exchange.co.uk
Wed May 23 23:33:20 PDT 2001
I think the problem you will have is that you can't sub-class String to add
your extra Unicode support, whereas when Sun get around to doing it they can
simply replace String.
You can't even replace Sun's String (and be sure it will work everywhere)
because there's not even an interface for you to implement.
-----Original Message-----
From: Elliotte Rusty Harold [mailto:elharo at metalab.unc.edu]
Sent: 23 May 2001 13:55
To: jdom-interest at jdom.org
Subject: [jdom-interest] Full Unicode support considered necessary
There's been some question as to how important full Unicode
compatibility is in practice. In particular does anybody actually
need the new characters define in Uncode 3.1 and beyond. I've done a
little research on that, and I think the answer is clearly yes.
Here's what you lose by not supporting characters past 65,535:
1. Mathematical symbols (my personal interest). Needed for MathML.
This set is going to get even bigger in Unicode 3.2.
2. Musical notation as used in sheet music; e.g. quarter notes and
eighth notes and G-clefs and so forth (my wife's personal interest).
Needed for MusicML and MusicXML.
3. Old italic scripts used for Etruscan and other scripts of the
Italian peninsula. Unlike Latin, these really are dead languages,
Nonetheless they're of significant interest to an active scholarly
community.
4. Deseret: a phonemic alphabet devised to write the English
language. It was originally developed in the 1850s at the University
of Deseret, now the University of Utah. It was promoted by The Church
of Jesus Christ of Latter-day Saints, also known as the "Mormon" or
LDS Church, under Church President
Brigham Young (1801-1877).
5. About 40,000 new Han ideographs. Personally I'm not qualified to
judge how important they are.
6. Egyptian hieroglyphics will probably be added in Unicode 3.2, at
least the basic set used in elementary schools around the world.
I've half been waiting to see what Java 1.4 was going to do, but the
latest word seems to be that Sun is going to punt. They are going to
pretend the problem doesn't exist, at least until 1.5. Frankly that's
too long. I think I'm going to start work outside JDOM on a
UnicodeString class or some such that could be used to provide real
Unicode support, and then I'm going to start reinventing the rest of
Java's text handling and XML parsing on top of that.
Like I said, this is not a JDOM project, and won't be ready for JDOM
anytime soon. But I would like to see that JDOM doesn't lock itself
into String at a very low level. I'd like JDOM to hide the
implementation details enough so that it's plausible that at some
point in the future, we could use real Unicode support when it
becomes available, either from Sun, from me, or from somebody else.
--
+-----------------------+------------------------+-------------------+
| Elliotte Rusty Harold | elharo at metalab.unc.edu | Writer/Programmer |
+-----------------------+------------------------+-------------------+
| The XML Bible (IDG Books, 1999) |
| http://metalab.unc.edu/xml/books/bible/ |
| http://www.amazon.com/exec/obidos/ISBN=0764532367/cafeaulaitA/ |
+----------------------------------+---------------------------------+
| Read Cafe au Lait for Java News: http://metalab.unc.edu/javafaq/ |
| Read Cafe con Leche for XML News: http://metalab.unc.edu/xml/ |
+----------------------------------+---------------------------------+
_______________________________________________
To control your jdom-interest membership:
http://lists.denveronline.net/mailman/options/jdom-interest/youraddr@yourhos
t.com
More information about the jdom-interest
mailing list