[jdom-interest] SAXHandler / CDATA / entities
Malachi de AElfweald
malachi at tremerechantry.com
Tue Nov 19 19:50:15 PST 2002
I have never really noticed, cuz I consistently use the CDATA:
<SomeNode><![CDATA[Here is some embedded HTML with a <br> in
it.]]></SomeNode>
which, I would think, would be the fastest, since no character-based
handling is required.
It is also much more human-readable :)
Malachi
On Tue, 19 Nov 2002 23:11:50 +0100, Ingo Struck <ingo at ingostruck.de> wrote:
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
>
> Hi...
>
>> I am confused by your statement....
>>
>> JDOM does cope with CDATA just fine. You can put all of those characters
>> in
>> a CDATA now.
> Right... I erred regarding this point - it really works.
>
> What does *not* work properly is the decoding of characters.
> The basic problem here is, that the decoding happens *before* parsing,
> i.e.
> if I want to spare the CDATA section, I would just say something like:
>
> <SomeNode>Here is some embedded HTML with a <br> in
> it.</SomeNode>
>
> (The reason for using numeric encoding is, that most chars can be encoded using
> uniform length; a fact that could be used to significantly speed up the
> escaping process; if you want all ascii chars with uniform length, then
> it is even better to use the hexadecimal form)
> If you feed this into jdom, what happens is that the chars are decoded to
> <SomeNode>Here is some embedded HTML with a <br> in it.</SomeNode>
>
> which, of course, is not valid XML. The solution provided here (to
> exclude the five "named" entities and - what I proposed as a fix - the
> respective numeric entities) is the wrong approach imho. It would be much
> cleaner to parse the document and decode the characters *afterwards*.
> Then you can be 100% sure that the parsed document really contains only
> the nodes of the serialized form and not some "embedded" stuff that has
> been decoded/parsed by error.
>
> Kind regards
>
> Ingo Struck
>
> - -- ingo at ingostruck.de
> Use PGP: http://ingostruck.de/ingostruck.gpg with fingerprint
> C700 9951 E759 1594 0807 5BBF 8508 AF92 19AA 3D24
> -----BEGIN PGP SIGNATURE-----
> Version: GnuPG v1.2.0 (GNU/Linux)
>
> iD8DBQE92rcrhQivkhmqPSQRAuH0AJ9i0YvAs1r+n55uwrJdYVrI8Cr1MgCgpsI1
> gMZzGUA+A7umw1zJEWZOs8g=
> =ZAWf
> -----END PGP SIGNATURE-----
>
>
>
--
More information about the jdom-interest
mailing list