[jdom-interest] don't validate comments
Grinvald, Edward
Edward.Grinvald at ca.com
Thu Dec 5 08:04:01 PST 2002
If the comments in the html are of no concern to you, then you might
want to do a little preprocessing and simply get them out of the
document before you parse.
= eg
-----Original Message-----
From: Elliotte Rusty Harold [mailto:elharo at metalab.unc.edu]
Sent: Thursday, December 05, 2002 10:53 AM
To: Christian Peter
Cc: jdom-interest at jdom.org
Subject: Re: [jdom-interest] don't validate comments
At 4:05 PM +0100 12/5/02, Christian Peter wrote:
>Well, you are right that I don't quite know about the difference
>between validation and well-formedness check (I thought the latter
>is part of the first).
Well-formedness is a prerequisite for validity, but it is not the
same thing. A document can be invalid but still well-formed.
>However, I think it should be possible to take a HTML document with
>some incorrect comment content and extract the content of the
>document, ignoring the comments. Isn't it the content of the
>document which is of interest, not the comments? And as you can see,
>even such official governmental sites have non-valid HTML comments.
>In my opinion we should provide the option not to regard the
>comment's content. Don't you agree?
No. I don't. If it's not well-formed it isn't an XML document,
period. In a malformed document there is no way to tell what is and
is not a comment. All well-formedness rules must be adhere to without
exception. Short of that you don't have an XML document.
--
+-----------------------+------------------------+-------------------+
| Elliotte Rusty Harold | elharo at metalab.unc.edu | Writer/Programmer |
+-----------------------+------------------------+-------------------+
| XML in a Nutshell, 2nd Edition (O'Reilly, 2002) |
| http://www.cafeconleche.org/books/xian2/ |
| http://www.amazon.com/exec/obidos/ISBN%3D0596002920/cafeaulaitA/ |
+----------------------------------+---------------------------------+
| Read Cafe au Lait for Java News: http://www.cafeaulait.org/ |
| Read Cafe con Leche for XML News: http://www.cafeconleche.org/ |
+----------------------------------+---------------------------------+
_______________________________________________
To control your jdom-interest membership:
http://lists.denveronline.net/mailman/options/jdom-interest/youraddr@you
rhost.com
More information about the jdom-interest
mailing list