[jdom-interest] Huge slowdown when reading > 15 xml files

Thu Dec 6 14:29:05 PST 2001

OptimizeIt shows 72% of the time is spent in StringBuffer's constructor,
10% on StringBuffer's expandCapacity(), and 6% on System.arraycopy(). 
So 88% is spent dealing with strings.  The cause for all this
StringBuffer work appears to be principally the
Element.addContent(String) logic:

    public Element addContent(String text) {
        if (content == null) {
            content = new ArrayList(INITIAL_ARRAY_SIZE);
        }

        int size = content.size();
        if (size > 0) {
            Object ob = content.get(size - 1);
            if (ob instanceof String) {
                text = (String)ob + text;
                content.remove(size - 1);
            }
        }
        content.add(text);
        return this;
    }

I suspect all the entity chars cause the parser to call addContent()
repeatedly and these repeated string concatentations bog down the
system.

The solution is probably to use a StringBuffer during the build process
and only when something other than addContent(String) is called to
convert the StringBuffer to a String.

If anyone wants to tackle this, feel free.  I can't do it right this
second.

-jh-

Mark Bennett wrote:
> 
> Including a sample.  A bit big because of the sample files.  Readme.txt
> included.
> 
> The slowdown is not progressive in terms of program run; it's progressive in
> terms of size.  A 130k file takes a VERY long time to parse; smaller sizes
> are even pokey.
> 
> Using CDATA did help a bit but introduces some other issues for my code.
> Thanks Philip.
> 
> -----Original Message-----
> From: philip.nelson at omniresources.com
> [mailto:philip.nelson at omniresources.com]
> Sent: Wednesday, October 17, 2001 1:04 PM
> To: mbennett at ideaeng.com; jdom-interest at jdom.org
> Subject: RE: [jdom-interest] Huge slowdown when reading > 15 xml files
> 
> If you could send a sample that reproduces the problem, that would be great.
> Do you input and output the document?  Are you expanding entities?
> 
> > I have a program that reads in xml files from a directory and
> > builds them
> > into jdom documents.
> >
> > As it does .build(filename), the system gets slower and
> > slower.  After about
> > 18 xml files it basically stops working and hangs.
> >
> > One of the elements in the XML files is the encoded content
> > of a small web
> > page.  (lots of escaped &gt; lt, etc, so it's not seen as
> > structured text)
> > The total size of the files is < 4k, even with the XML wrapping.
> >
> > With that field in the file it's very slow.  If I remove just
> > the field with
> > the web content, the xml files rip right through the system
> > with no problem.
> >
> > But the web text isn't that large, and it's not structured  (all HTML
> > entities are escaped) so I don't understand why that would
> > matter.  And slow
> > is one thing, but it eventually grinds to a complete halt.
> >
> > Kind'a stuck, any ideas?
> >
> > Mark
> >
> >
> > _______________________________________________
> > To control your jdom-interest membership:
> > http://lists.denveronline.net/mailman/options/jdom-interest/yo
> uraddr at yourhost.com
> 
>   ------------------------------------------------------------------------
>                     Name: SlowParse.zip
>    SlowParse.zip    Type: Zip Compressed Data (application/x-zip-compressed)
>                 Encoding: base64