[jdom-interest] Re: small bug with big impact

Joseph Bowbeer jozart at csi.com
Sat Dec 2 16:45:07 PST 2000


Fyi.  Here's more information about why the bug bit Crimson and not
Xerces.

Consider a document like:

<log>
  <entry id="1" msg="message1" />
  <entry id="2" msg="message2" />
  ...
</log>

Xerces reports the intervening newline and indentation in a single
'characters' event.  This bypasses the code in JDOM that tries to
concatenate contiguous characters.

Crimson, however, reports the whitespace in two or more events.  On
every subsequent event, JDOM will scan the content list from the start,
looking for a matching string.  In this case the scan will work
correctly, although slowly, and no match will be found until the end of
the list.  But the longer the parser runs, the longer this process
becomes...


----- Original Message -----
From: "Joseph Bowbeer" <jozart at csi.com>
To: <jdom-interest at jdom.org>
Sent: Thursday, November 30, 2000 6:48 PM
Subject: small bug with huge implications


Why is Crimson hurt and not Xerces?  I haven't investigated this
thoroughly, but I did observe a difference in how Crimson and Xerces
report whitespace.  With the way Xerces reports whitespace,
list.remove(object) is not called.  Using Crimson, however, results in
two removals for every entry in the document.  With the broken code, 95%
of the time was spent in List.remove(Object), and half of that time was
spent in String.equals(Object).








More information about the jdom-interest mailing list