[jdom-interest] Still more Verification
Elliotte Rusty Harold
elharo at metalab.unc.edu
Wed Aug 23 05:00:01 PDT 2000
At 8:28 PM -0700 8/22/00, Jason Hunter wrote:
>I'm curious what people think about this approach. What Elliotte's code
>does is ensure that you absolutely cannot create an non-well-formed XML
>document using JDOM. That's a cool feature!
>
>My concern is that every change to the JDOM document is going to be
>checked char by char by char, resulting in a noticeable performance
>decrease. Elliotte says he saw a 20% slowdown (not sure on what test).
>It's probably really bad for documents that are mostly text.
>
Just to be clear my initial, real-world tests showed a change that
was down in the noise. I eventually was able to carefully construct
some tests cases I designed to have worst case behavior that showed
close to 20% slow down, but in most normal cases the cost would be
much less than this.
In particular, any program where the actual XML construction is a
relatively small fraction of the work, this probably wouldn't be
significant. For instance, in the Fibonacci example I've used in
several talks, almost all the time goes into calculating Fibonacci
numbers. Very little is spent doing anything with JDOM. If you're
working with a database, almost all your time will be spent waiting
for the database to respond; almost none doing anything with JDOM. If
you're reading a file from a network most of your time is waiting for
the network. Very little is actually constructing the document with
JDOM. Even if you're writing a document to disk, it's still true that
more of your time will be spent waiting for the disk than doing JDOM
work.
The only thing I'm not sure of is what happens with the JIT off. I
only tested on JDK 1.3 with JIT. This is the exact sort of code that
JITs extremely well. (a simple loop with no I/O that repeats many
times). I did not warm up the JIT before testing.
>We could perhaps find a way for SAXBuilder to avoid the slowdown by
>using some special constructor. Problem there is that since builders
>are and should be in a different package than the core (because people
>should have the ability to write their own builders), we're going to
>have to expose those special constructors to the public at large, and
>that eliminates the ability to say you cannot create an non-well-formed
>JDOM document, because with those constructors you can.
>
Actually, you can. You subclass the relevant classes with non-public
classes in the org.jdom.input package. These subclasses would
override the relevant setData(), setValue(), etc. methods in the
normal classes. We'd need to make sure all the constructors in the
superclasses called setData()/setValue() rather than doing the checks
directly, but that's not hard. In fact, I think that's how I've got
most of them structured already.
>Is it worth a 20% performance on all element construction to sanity
>check the text content? The answer is probably sometimes yes, sometimes
>no. But how would one differentiate between the two?
>
>We have a similar issue already for checking tag names, PI content, and
>so on. If the content has already passed through a parser like Xerces,
>checking again only wastes CPU cycles. We haven't worried about it for
>things like checking tag names because it's relatively fast, but when
>you have a document that could have large amounts of text, do you really
>want to check every character one at a time against a matrix of legal
>characters?
>
Again, special non-public builder subclasses could easily omit these
checks. I'd prefer not to write them until performance testing proved
they were necessary though. Remember, in most programs less than 10%
of your time will be spent on JDOM at all. Almost all programs have
something much more significant they're actually doing most of the
time. I'm not willing to sacrifice program correctness for a small
amount of performance. If testing shows that this is a real issue,
then I think there are some optimizations we can do to get better
performance, but first I'd like to get it right; then worry about
making it fast.
+-----------------------+------------------------+-------------------+
| Elliotte Rusty Harold | elharo at metalab.unc.edu | Writer/Programmer |
+-----------------------+------------------------+-------------------+
| The XML Bible (IDG Books, 1999) |
| http://metalab.unc.edu/xml/books/bible/ |
| http://www.amazon.com/exec/obidos/ISBN=0764532367/cafeaulaitA/ |
+----------------------------------+---------------------------------+
| Read Cafe au Lait for Java News: http://metalab.unc.edu/javafaq/ |
| Read Cafe con Leche for XML News: http://metalab.unc.edu/xml/ |
+----------------------------------+---------------------------------+
More information about the jdom-interest
mailing list