[jdom-interest] Not legal JDOM characters?

Elliotte Rusty Harold elharo at metalab.unc.edu
Mon Aug 23 15:48:07 PDT 2004


At 2:54 PM -0700 8/23/04, Dave Byrne wrote:

>I'm not familiar with the low-level handling of UTF-16 in java, but is there
>a way to examine strings char by char without splitting the surrogate pairs
>in half?  It seems that it may offer a better long-term way of handling
>these chars.

It's a simple matter of coding the math to decode surrogate pairs. 
The algorithm is available in the Unicode spec, along with 
pseudocode.  I've done it myself in XOM's Verifier class. It's tricky 
but doable.

-- 

   Elliotte Rusty Harold
   elharo at metalab.unc.edu
   Effective XML (Addison-Wesley, 2003)
   http://www.cafeconleche.org/books/effectivexml
   http://www.amazon.com/exec/obidos/ISBN%3D0321150406/ref%3Dnosim/cafeaulaitA


More information about the jdom-interest mailing list