[jdom-interest] Internal DTD subset verification

Jason Hunter jhunter at servlets.com
Fri May 10 15:26:28 PDT 2002


This thread has been fantastic.  It's been long (33 posts) and there are
good points throughout every post.  I'm very proud of the community
we've established here!

I wonder if we set a length record?  I'm sure we set a "Longest Thread
Without Jason Talking" record.  :-)  But I'm back from vacation and need
to dig into this one.

My feeling is that if we require verification across all circumstances
then we're doing a disservice to those users who have application needs
where verification doesn't make sense.  Verification wasn't among the
original JDOM goals.  It was a perk added later, and something we've
bragged about but not a near and dear feature.  As Philip pointed out,
it's not part of the JDOM Mission Statement.  The mission we started
with was to make XML easy, natural, useful, and lightweight in both
memory and CPU requirements.

On the verification implementation, we earlier had an unstable
compromise before b8 where verification was turned on for everything
except for text content.  Verification minus text didn't take too much
time in benchmarks.  For example, I tried using a thread local flag to
indicate if verification should occur, and access to the TL variable
swamped the improvement from avoiding the checks.  It was under 10% IIRC
and we had lower hanging fruit to claim.  That time included the tree
structure checking, by the way, which is particularly useful because it
avoids infinite loops on output and such.

In b8, text verification was added as part of the Text class without my
realizing it.  That caused a slowdown that appears equal to all the
other checks put together.  I'm not terribly happy the Text checking is
in there.  If JDOM is 20% slower than the competition it's going to lose
marketshare.  The reason the government requires car manufacturers to
follow certain environmental rules (as Rusty points out) is that without
those mandates any manufacturer that skips the expensive enhancement
will dominate the market with lower priced products.  In this case, the
XML spec is the government ruleset, and we do try to conform to that. 
Beyond that, adding expensive features that competitors don't provide
and users don't generally clamor for will cause lost marketshare and
nullify any benefit from doing what's "good for the world".

Consequently, I believe there absolutely should be a way to avoid
verification.  I've wanted a mechanism for 18 months.  The problem is
Java doesn't provide a good mechanism for our situation.  A global flag
isn't quite right because of multithreaded servers.  The best approach
I've come up with is to rely on factories, where you can choose a
verifying factory or a non-verifying factory.  But that alters the
fundamental model for constructing objects in memory.  That's a lot of
pain for a minor performance improvement.

Personally, I would like to see the Text verification turned off again. 
Someone can benchmark this, but what I saw last year was that the other
verification was cheap enough, but text verification was a significant
hit.  That's probably why DOM avoids it.  Again, thinking of a free
market, being equivalent to DOM in this aspect provides no risk to
marketshare and doesn't compromise the founding mission statement.

Meanwhile it's appropriate to look at smarter ways to control
verification.

-jh-



More information about the jdom-interest mailing list