[jdom-interest] Internal DTD subset verification

Dennis Sosnoski dms at sosnoski.com
Wed May 8 11:38:53 PDT 2002


Not to get involved in the main point of this discussion, but...

Elliotte Rusty Harold wrote:

> ...Keep in mind that in many scenarios I/O concerns are likely to 
> swamp any issues with verification, and when they don't the speed of 
> the underlying SAX parser is probably the second biggest factor. 

Not even close. The build time for a document model is much larger than 
the parsing time for fast parsers. My current published test round, at 
http://www.sosnoski.com/opensrc/xmlbench/results.html, shows JDOM beta 7 
taking about 4 times as long as the SAX2 parse alone for medium to large 
documents. The SAX2 parsers I was working with had high overhead for 
small documents, so there the total JDOM build time was only about twice 
the SAX2 parse time - that should change with Piccolo in the next set of 
tests.

>>  XML seemed at the time to offer real help in
>> the kinds of applications I have made my career doing.  DOM was 
>> clunky AND
>> untenably slow.
>
>
> Have you checked out DOM lately? Several implementation have gotten a 
> lot better in the last couple of years. 

That's definitely true. Xerces is faster than JDOM pretty much across 
the board - they've obviously put a lot of work into optimizations.

> ...I've never seen anybody pick JDOM for performance or memory 
> reasons. For one thing, it's not at all clear that JDOM is faster or 
> uses less memory than modern DOMs like Xerces-2. The benchmarks in 
> this area range from abominable to non-existent, and are typically 
> written to prove that the author's pet API is better than the 
> alternatives. 

Gee, thanks! :-) I actually started on my set of tests because I got 
tired of seeing unsubstantiated PR claims about JDOM performance 
compared to other models.

Measured memory usage across a variety of documents shows Xerces about 
on a par with JDOM if you turn off the "deferred node expansion" feature 
of Xerces. If you *don't* turn this off (it's on by default) both time 
and memory performance is abysmal for Xerces on small documents.

Aside from this whole dispute over verification and performance, it's 
worth noting that most applications where people are currently using 
document models (DOM, JDOM, dom4j, etc.) are much better suited for data 
binding. Data binding allows much smaller memory footprints because the 
data is abstracted from the document and may even be stored in more 
compact form (an int as opposed to a String, for instance). As data 
binding becomes more prevalent I think usage of document models will 
fade away except in applications that really need to work with the XML 
document structure (generic document handlers such as editors or 
transformation applications). I think this may be a point in favor of 
Elliotte's view of verification, though I'd personally prefer to see 
verification as an optional feature.

  - Dennis




More information about the jdom-interest mailing list