[jdom-interest] Internal DTD subset verification
Dennis Sosnoski
dms at sosnoski.com
Wed May 8 11:38:53 PDT 2002
Not to get involved in the main point of this discussion, but...
Elliotte Rusty Harold wrote:
> ...Keep in mind that in many scenarios I/O concerns are likely to
> swamp any issues with verification, and when they don't the speed of
> the underlying SAX parser is probably the second biggest factor.
Not even close. The build time for a document model is much larger than
the parsing time for fast parsers. My current published test round, at
http://www.sosnoski.com/opensrc/xmlbench/results.html, shows JDOM beta 7
taking about 4 times as long as the SAX2 parse alone for medium to large
documents. The SAX2 parsers I was working with had high overhead for
small documents, so there the total JDOM build time was only about twice
the SAX2 parse time - that should change with Piccolo in the next set of
tests.
>> XML seemed at the time to offer real help in
>> the kinds of applications I have made my career doing. DOM was
>> clunky AND
>> untenably slow.
>
>
> Have you checked out DOM lately? Several implementation have gotten a
> lot better in the last couple of years.
That's definitely true. Xerces is faster than JDOM pretty much across
the board - they've obviously put a lot of work into optimizations.
> ...I've never seen anybody pick JDOM for performance or memory
> reasons. For one thing, it's not at all clear that JDOM is faster or
> uses less memory than modern DOMs like Xerces-2. The benchmarks in
> this area range from abominable to non-existent, and are typically
> written to prove that the author's pet API is better than the
> alternatives.
Gee, thanks! :-) I actually started on my set of tests because I got
tired of seeing unsubstantiated PR claims about JDOM performance
compared to other models.
Measured memory usage across a variety of documents shows Xerces about
on a par with JDOM if you turn off the "deferred node expansion" feature
of Xerces. If you *don't* turn this off (it's on by default) both time
and memory performance is abysmal for Xerces on small documents.
Aside from this whole dispute over verification and performance, it's
worth noting that most applications where people are currently using
document models (DOM, JDOM, dom4j, etc.) are much better suited for data
binding. Data binding allows much smaller memory footprints because the
data is abstracted from the document and may even be stored in more
compact form (an int as opposed to a String, for instance). As data
binding becomes more prevalent I think usage of document models will
fade away except in applications that really need to work with the XML
document structure (generic document handlers such as editors or
transformation applications). I think this may be a point in favor of
Elliotte's view of verification, though I'd personally prefer to see
verification as an optional feature.
- Dennis
More information about the jdom-interest
mailing list