[jdom-interest] small bug with huge implications

Joseph Bowbeer jozart at csi.com
Thu Nov 30 18:48:46 PST 2000


Sometimes when you find a bug, you wonder how it ever worked.  This is
one of those times!

The bug that is repeated several times in the source is to use
list.remove(object) instead of list.remove(index).

There's one occurrence of this in Element.addContent that, in addition
to being prone to failure, can cause a huge performance penalty when the
Crimson parser is used.

Here's the original code:

    public Element addContent(String text) {
        if (content == null) {
            content = new LinkedList();
        }

        if (content.size() > 0) {
            Object ob = content.get(content.size() - 1);
            if (ob instanceof String) {
                text = (String)ob + text;
                content.remove(ob); /* BUG */
            }
        }
        content.add(text);
        return this;
    }

Here's the corrected code:

    public Element addContent(String text) {
        if (content == null) {
            content = new LinkedList();
        }

        if (content.size() > 0) {
            int lastIndex = content.size() - 1;
            Object ob = content.get(lastIndex);
            if (ob instanceof String) {
                text = (String)ob + text;
                content.remove(lastIndex); /* FIX */
            }
        }
        content.add(text);
        return this;
    }


Here are some timing results comparing Xerces with Crimson before and
after the fix:

Before:

  Driver class: org.apache.crimson.parser.XMLReaderImpl
  SAXBuilder time: 24.675 secs.

  Driver class: org.apache.xerces.parsers.SAXParser
  SAXBuilder time: 1.862 secs.

After:

  Driver class: org.apache.crimson.parser.XMLReaderImpl
  SAXBuilder time: 0.931 secs.

  Driver class: org.apache.xerces.parsers.SAXParser
  SAXBuilder time: 1.542 secs.

That's a 25x speedup...

The test document, by the way, has 10K lines and looks something like:

  <?xml version="1.0"?>
  <log>
    <entry id="1" event="whatever1" />
    <entry id="2" event="whatever2" />
    ...
  </log>

Why is Crimson hurt and not Xerces?  I haven't investigated this
thoroughly, but I did observe a difference in how Crimson and Xerces
report whitespace.  With the way Xerces reports whitespace,
list.remove(object) is not called.  Using Crimson, however, results in
two removals for every entry in the document.  With the broken code, 95%
of the time was spent in List.remove(Object), and half of that time was
spent in String.equals(Object).

List.remove(Object) is also called in several places in PartialList.
This can lead to bugs if several objects in the backing list are equal
to the Object in question.  When an object is removed from a
PartialList, the object in the corresponding position in the backing
list should be removed -- not the first object that satisfies equals().

--
Joe Bowbeer

PS - The performance analysis was aided by Nathan Meyers' PerfAnal tool:


http://developer.java.sun.com/developer/technicalArticles/Programming/pe
rfanal/






More information about the jdom-interest mailing list