[jdom-interest] Internal DTD subset verification

Elliotte Rusty Harold elharo at metalab.unc.edu
Wed May 1 08:39:45 PDT 2002


At 11:03 AM -0400 5/1/02, Alex Rosen wrote:
>I see. This is designed for people that don't know that you can't put binary
>data into an XML document, and are using non-compliant parsers that allow
>this. That's at least plausible. This really is the responsibility of the
>parser, though, and maybe it's a little excessive to take on the
>responsibility of finding errors that are missed by both the programmer and
>the parser.

If JDOM were a read-only API, I'd agree with you, but it's not. JDOM 
also creates and writes XML. As I said, I can live with letting the 
parser check constraints when a parser is used to create a node. 
However, in many cases a parser is not used to create a node. I'm not 
willing to forego the checks then.

>
>>  For instance, the JDOM character checks you
>>  dislike are substantially more accurate than what at least one major
>>  parser does.
>
>Which parser? It's not one of the Big 2 is it? Is it a parser like MinML
>that explicitly trades off correctness for speed and/or size?
>

It's not MinML. It's GNU JAXP, a pretty significant parser I think. 
I've seen others in the past though none as major as this.


>That depends on how you look at it. XML is deliberately draconian in its
>specification of the parser. Does that imply that an object model should be
>draconian to match, or does it mean that the object model can be more
>relaxed, because it knows it can rely on the parser to be the picky one?
>Hand-written XML will always exist, and I think that means you can never
>trust a document to be well-formed - it's the parser that has the
>responsibility of checking.
>
>>  The whole spirit of XML is that it simply does not allow
>>  malformedness at any time.
>
>If that's your goal for JDOM, I think that it's impossible, or at least
>unrealistic. Here are a couple of quick examples:
>
>http://www.w3.org/TR/2000/REC-xml-20001006#wf-entdeclared
>http://www.w3.org/TR/2000/REC-xml-20001006#norecursion
>

We could check both of these once we add a DTD model. I'm not 
suggesting we do that now, but it's worth considering further down 
the road. For now, I suggest the EntityRef only be used for skipped 
entities reported by SAX. I tell readers not to add new ones to their 
trees precisely because of the first constraint you reference.

(In fact, I wonder if we shouldn't hide those constructors. Looking 
at it now I don't immediately see how we could do it. Once again, I 
really wish Java had friend functions. You know, I'm really beginning 
to wonder if maybe we shouldn't just merge org.jdfom, with 
org.jdom.input and org.jdom.output. It would help a lot, especially 
with turning off redundant checks on input. )

>Anyway, in actuality I don't think we're too far from each other. Free
>runtime checks are good, and expensive runtime checks are bad, right?. It's
>just a matter of where you draw the line.
>

Yes, but it's not just a question of expense. It's  question of 
expense vs. benefit. I think guaranteed well-formedness is a huge 
benefit, and I want to pay as little for it as possible but no less. 
JDOM already costs significantly less than I can afford to pay. :-)

-- 

+-----------------------+------------------------+-------------------+
| Elliotte Rusty Harold | elharo at metalab.unc.edu | Writer/Programmer |
+-----------------------+------------------------+-------------------+
|          The XML Bible, 2nd Edition (Hungry Minds, 2001)           |
|             http://www.cafeconleche.org/books/bible2/              |
|   http://www.amazon.com/exec/obidos/ISBN=0764547607/cafeaulaitA/   |
+----------------------------------+---------------------------------+
|  Read Cafe au Lait for Java News:  http://www.cafeaulait.org/      |
|  Read Cafe con Leche for XML News: http://www.cafeconleche.org/    |
+----------------------------------+---------------------------------+



More information about the jdom-interest mailing list