[jdom-interest] Internal DTD subset verification
Elliotte Rusty Harold
elharo at metalab.unc.edu
Wed May 8 18:02:19 PDT 2002
At 11:38 AM -0700 5/8/02, Dennis Sosnoski wrote:
>Not to get involved in the main point of this discussion, but...
>
>Elliotte Rusty Harold wrote:
>
>> ...Keep in mind that in many scenarios I/O concerns are likely to
>>swamp any issues with verification, and when they don't the speed
>>of the underlying SAX parser is probably the second biggest factor.
>
>Not even close. The build time for a document model is much larger
>than the parsing time for fast parsers. My current published test
>round, at http://www.sosnoski.com/opensrc/xmlbench/results.html,
>shows JDOM beta 7 taking about 4 times as long as the SAX2 parse
>alone for medium to large documents. The SAX2 parsers I was working
>with had high overhead for small documents, so there the total JDOM
>build time was only about twice the SAX2 parse time - that should
>change with Piccolo in the next set of tests.
>
Your tests haven't convinced me. They're a lot of problems with them,
but most importantly for this case, please allow me to quote from
your site:
All tests involving I/O use memory buffers to avoid any external
timing variables. Input and output uses streams (specifically
ByteArrayInputStream and ByteArrayOutputStream) to most closely
simulate the normal usage. Some of the models support direct input
from character arrays or Strings with higher performance than stream
input, but using this type of input for testing gives misleading
results; in real world applications, text documents are rarely
resident in memory to be passed directly to parsers. Validation is
turned off in all tests, and the documents used for the test do not
specify DTDs.
In other words, your tests deliberately do not include the cost of
I/O, which makes sense for what you're doing because I/O would
indeed swamp what you're trying to test. However, the fact is there's
not a huge amount of point to us optimizing input that's going to be
swamped by I/O in any real work scenario.
>Measured memory usage across a variety of documents shows Xerces
>about on a par with JDOM if you turn off the "deferred node
>expansion" feature of Xerces. If you *don't* turn this off (it's on
>by default) both time and memory performance is abysmal for Xerces
>on small documents.
How are you actually measuring memory usage? I did not find any
details on your site. Based on the following, it's not obvious to me
that you're getting accurate counts:
Testing the memory usage of the representations works a little
differently, in that the program keeps all the constructed copies of
the document and pauses between relevant tests to encourage garbage
collection. Memory usage per copy of the representation is found by
dividing the total memory used by the number of copies.
--
+-----------------------+------------------------+-------------------+
| Elliotte Rusty Harold | elharo at metalab.unc.edu | Writer/Programmer |
+-----------------------+------------------------+-------------------+
| The XML Bible, 2nd Edition (Hungry Minds, 2001) |
| http://www.cafeconleche.org/books/bible2/ |
| http://www.amazon.com/exec/obidos/ISBN=0764547607/cafeaulaitA/ |
+----------------------------------+---------------------------------+
| Read Cafe au Lait for Java News: http://www.cafeaulait.org/ |
| Read Cafe con Leche for XML News: http://www.cafeconleche.org/ |
+----------------------------------+---------------------------------+
More information about the jdom-interest
mailing list