[jdom-interest] CDATA inconsistency
Malachi de AElfweald
malachi at tremerechantry.com
Sat Nov 2 11:32:12 PST 2002
"unmatched halves of surrogate pairs".... That would be assuming UTF-8 specifically,
would it not? ISO-8859-1, for example, does not have surrogate pairs.
Malachi
11/2/2002 8:22:01 AM, Elliotte Rusty Harold <elharo at metalab.unc.edu> wrote:
>At 11:08 PM -0800 11/1/02, Malachi de AElfweald wrote:
>>It would be against XML spec to check the characters within the
>>CDATA, since the spec
>>says that CDATA is "unparsed character data". Seems like parsing it
>>wouldn't fit the description, eh?
>>
>
>No, that's not quite true. there are a number of characters which
>cannot appear in a CDATA section. These include many C0 controls such
>as null and vertical tab, unmatched halves of surrogate pairs, and a
>few other undefined code points. The three character sequence ]]> is
>also illegal.
>--
>
>+-----------------------+------------------------+-------------------+
>| Elliotte Rusty Harold | elharo at metalab.unc.edu | Writer/Programmer |
>+-----------------------+------------------------+-------------------+
>| XML in a Nutshell, 2nd Edition (O'Reilly, 2002) |
>| http://www.cafeconleche.org/books/xian2/ |
>| http://www.amazon.com/exec/obidos/ISBN%3D0596002920/cafeaulaitA/ |
>+----------------------------------+---------------------------------+
>| Read Cafe au Lait for Java News: http://www.cafeaulait.org/ |
>| Read Cafe con Leche for XML News: http://www.cafeconleche.org/ |
>+----------------------------------+---------------------------------+
>
>
More information about the jdom-interest
mailing list