[jdom-interest] Huge slowdown when reading > 15 xml files
Jason Hunter
jhunter at acm.org
Thu Dec 6 14:29:05 PST 2001
OptimizeIt shows 72% of the time is spent in StringBuffer's constructor,
10% on StringBuffer's expandCapacity(), and 6% on System.arraycopy().
So 88% is spent dealing with strings. The cause for all this
StringBuffer work appears to be principally the
Element.addContent(String) logic:
public Element addContent(String text) {
if (content == null) {
content = new ArrayList(INITIAL_ARRAY_SIZE);
}
int size = content.size();
if (size > 0) {
Object ob = content.get(size - 1);
if (ob instanceof String) {
text = (String)ob + text;
content.remove(size - 1);
}
}
content.add(text);
return this;
}
I suspect all the entity chars cause the parser to call addContent()
repeatedly and these repeated string concatentations bog down the
system.
The solution is probably to use a StringBuffer during the build process
and only when something other than addContent(String) is called to
convert the StringBuffer to a String.
If anyone wants to tackle this, feel free. I can't do it right this
second.
-jh-
Mark Bennett wrote:
>
> Including a sample. A bit big because of the sample files. Readme.txt
> included.
>
> The slowdown is not progressive in terms of program run; it's progressive in
> terms of size. A 130k file takes a VERY long time to parse; smaller sizes
> are even pokey.
>
> Using CDATA did help a bit but introduces some other issues for my code.
> Thanks Philip.
>
> -----Original Message-----
> From: philip.nelson at omniresources.com
> [mailto:philip.nelson at omniresources.com]
> Sent: Wednesday, October 17, 2001 1:04 PM
> To: mbennett at ideaeng.com; jdom-interest at jdom.org
> Subject: RE: [jdom-interest] Huge slowdown when reading > 15 xml files
>
> If you could send a sample that reproduces the problem, that would be great.
> Do you input and output the document? Are you expanding entities?
>
> > I have a program that reads in xml files from a directory and
> > builds them
> > into jdom documents.
> >
> > As it does .build(filename), the system gets slower and
> > slower. After about
> > 18 xml files it basically stops working and hangs.
> >
> > One of the elements in the XML files is the encoded content
> > of a small web
> > page. (lots of escaped > lt, etc, so it's not seen as
> > structured text)
> > The total size of the files is < 4k, even with the XML wrapping.
> >
> > With that field in the file it's very slow. If I remove just
> > the field with
> > the web content, the xml files rip right through the system
> > with no problem.
> >
> > But the web text isn't that large, and it's not structured (all HTML
> > entities are escaped) so I don't understand why that would
> > matter. And slow
> > is one thing, but it eventually grinds to a complete halt.
> >
> > Kind'a stuck, any ideas?
> >
> > Mark
> >
> >
> > _______________________________________________
> > To control your jdom-interest membership:
> > http://lists.denveronline.net/mailman/options/jdom-interest/yo
> uraddr at yourhost.com
>
> ------------------------------------------------------------------------
> Name: SlowParse.zip
> SlowParse.zip Type: Zip Compressed Data (application/x-zip-compressed)
> Encoding: base64
More information about the jdom-interest
mailing list