[jdom-interest] don't validate comments
Todd O'Bryan
toddobryan at mac.com
Thu Dec 5 11:58:27 PST 2002
After being bitten in the butt by bad entity names (undefined entities
are well-formedness violations, not validity violations), I can
appreciate Christian's point, but I have to side with the XML community
finally.
It's easy enough to write a subclass of BufferedReader that, as soon as
it reads <!-- eats everything until it sees -->, and then pass an
instance of that RemoveCommentReader to your XML parser. Now, it may be
that certain things happen often enough (someone mentioned <p> and <br>
in (X)HTML) that JDOM should have some Readers pre-built to handle
those cases, but they definitely should be considered utility classes,
not part of the core.
Todd
On Thursday, December 5, 2002, at 10:53 AM, Elliotte Rusty Harold
wrote:
> At 4:05 PM +0100 12/5/02, Christian Peter wrote:
>
>
>> Well, you are right that I don't quite know about the difference
>> between validation and well-formedness check (I thought the latter is
>> part of the first).
>
> Well-formedness is a prerequisite for validity, but it is not the same
> thing. A document can be invalid but still well-formed.
>
>> However, I think it should be possible to take a HTML document with
>> some incorrect comment content and extract the content of the
>> document, ignoring the comments. Isn't it the content of the document
>> which is of interest, not the comments? And as you can see, even such
>> official governmental sites have non-valid HTML comments.
>> In my opinion we should provide the option not to regard the
>> comment's content. Don't you agree?
>
> No. I don't. If it's not well-formed it isn't an XML document, period.
> In a malformed document there is no way to tell what is and is not a
> comment. All well-formedness rules must be adhere to without
> exception. Short of that you don't have an XML document.
> --
>
> +-----------------------+------------------------+-------------------+
> | Elliotte Rusty Harold | elharo at metalab.unc.edu | Writer/Programmer |
> +-----------------------+------------------------+-------------------+
> | XML in a Nutshell, 2nd Edition (O'Reilly, 2002) |
> | http://www.cafeconleche.org/books/xian2/ |
> | http://www.amazon.com/exec/obidos/ISBN%3D0596002920/cafeaulaitA/ |
> +----------------------------------+---------------------------------+
> | Read Cafe au Lait for Java News: http://www.cafeaulait.org/ |
> | Read Cafe con Leche for XML News: http://www.cafeconleche.org/ |
> +----------------------------------+---------------------------------+
> _______________________________________________
> To control your jdom-interest membership:
> http://lists.denveronline.net/mailman/options/jdom-interest/
> youraddr at yourhost.com
More information about the jdom-interest
mailing list