[jdom-interest] Internal DTD subset verification
Elliotte Rusty Harold
elharo at metalab.unc.edu
Wed May 1 08:39:45 PDT 2002
At 11:03 AM -0400 5/1/02, Alex Rosen wrote:
>I see. This is designed for people that don't know that you can't put binary
>data into an XML document, and are using non-compliant parsers that allow
>this. That's at least plausible. This really is the responsibility of the
>parser, though, and maybe it's a little excessive to take on the
>responsibility of finding errors that are missed by both the programmer and
>the parser.
If JDOM were a read-only API, I'd agree with you, but it's not. JDOM
also creates and writes XML. As I said, I can live with letting the
parser check constraints when a parser is used to create a node.
However, in many cases a parser is not used to create a node. I'm not
willing to forego the checks then.
>
>> For instance, the JDOM character checks you
>> dislike are substantially more accurate than what at least one major
>> parser does.
>
>Which parser? It's not one of the Big 2 is it? Is it a parser like MinML
>that explicitly trades off correctness for speed and/or size?
>
It's not MinML. It's GNU JAXP, a pretty significant parser I think.
I've seen others in the past though none as major as this.
>That depends on how you look at it. XML is deliberately draconian in its
>specification of the parser. Does that imply that an object model should be
>draconian to match, or does it mean that the object model can be more
>relaxed, because it knows it can rely on the parser to be the picky one?
>Hand-written XML will always exist, and I think that means you can never
>trust a document to be well-formed - it's the parser that has the
>responsibility of checking.
>
>> The whole spirit of XML is that it simply does not allow
>> malformedness at any time.
>
>If that's your goal for JDOM, I think that it's impossible, or at least
>unrealistic. Here are a couple of quick examples:
>
>http://www.w3.org/TR/2000/REC-xml-20001006#wf-entdeclared
>http://www.w3.org/TR/2000/REC-xml-20001006#norecursion
>
We could check both of these once we add a DTD model. I'm not
suggesting we do that now, but it's worth considering further down
the road. For now, I suggest the EntityRef only be used for skipped
entities reported by SAX. I tell readers not to add new ones to their
trees precisely because of the first constraint you reference.
(In fact, I wonder if we shouldn't hide those constructors. Looking
at it now I don't immediately see how we could do it. Once again, I
really wish Java had friend functions. You know, I'm really beginning
to wonder if maybe we shouldn't just merge org.jdfom, with
org.jdom.input and org.jdom.output. It would help a lot, especially
with turning off redundant checks on input. )
>Anyway, in actuality I don't think we're too far from each other. Free
>runtime checks are good, and expensive runtime checks are bad, right?. It's
>just a matter of where you draw the line.
>
Yes, but it's not just a question of expense. It's question of
expense vs. benefit. I think guaranteed well-formedness is a huge
benefit, and I want to pay as little for it as possible but no less.
JDOM already costs significantly less than I can afford to pay. :-)
--
+-----------------------+------------------------+-------------------+
| Elliotte Rusty Harold | elharo at metalab.unc.edu | Writer/Programmer |
+-----------------------+------------------------+-------------------+
| The XML Bible, 2nd Edition (Hungry Minds, 2001) |
| http://www.cafeconleche.org/books/bible2/ |
| http://www.amazon.com/exec/obidos/ISBN=0764547607/cafeaulaitA/ |
+----------------------------------+---------------------------------+
| Read Cafe au Lait for Java News: http://www.cafeaulait.org/ |
| Read Cafe con Leche for XML News: http://www.cafeconleche.org/ |
+----------------------------------+---------------------------------+
More information about the jdom-interest
mailing list