[jdom-interest] Internal DTD subset verification

Alex Rosen arosen at silverstream.com
Wed May 1 08:03:28 PDT 2002


I see. This is designed for people that don't know that you can't put binary
data into an XML document, and are using non-compliant parsers that allow
this. That's at least plausible. This really is the responsibility of the
parser, though, and maybe it's a little excessive to take on the
responsibility of finding errors that are missed by both the programmer and
the parser.

> For instance, the JDOM character checks you
> dislike are substantially more accurate than what at least one major
> parser does.

Which parser? It's not one of the Big 2 is it? Is it a parser like MinML
that explicitly trades off correctness for speed and/or size?

> Turning off verification in tests really has not proven to greatly
> speed anything up that I've seen. Maybe 20% at most, and probably
> less than that.

If the cost is negligible I can't object. Of course your definition of
negligible may depend on how useful you think the feature is :) I would say
that 10% on average, or 20% in a significant minority of cases, would not be
negligible.

> >Two of the philosophies of the design of C++ were "you don't pay for
stuff
> >you don't use", and "trust the programmer". These aren't quite as central
to
> >the philosophy of Java, but I think they're still useful to consider, and
it
> >seems like we're almost going out of our way here to break these rules.
>
> They're even less part of the design of XML. XML is deliberately
> draconian. There are very good reasons for XML to be inflexible about
> what it allows, even at the cost of convenience, even at the cost of
> performance.

That depends on how you look at it. XML is deliberately draconian in its
specification of the parser. Does that imply that an object model should be
draconian to match, or does it mean that the object model can be more
relaxed, because it knows it can rely on the parser to be the picky one?
Hand-written XML will always exist, and I think that means you can never
trust a document to be well-formed - it's the parser that has the
responsibility of checking.

> The whole spirit of XML is that it simply does not allow
> malformedness at any time.

If that's your goal for JDOM, I think that it's impossible, or at least
unrealistic. Here are a couple of quick examples:

http://www.w3.org/TR/2000/REC-xml-20001006#wf-entdeclared
http://www.w3.org/TR/2000/REC-xml-20001006#norecursion

Anyway, in actuality I don't think we're too far from each other. Free
runtime checks are good, and expensive runtime checks are bad, right?. It's
just a matter of where you draw the line.

Alex




More information about the jdom-interest mailing list