[jdom-interest] ElementScanner and Memory

Laurent Bihanic laurent.bihanic at atosorigin.com
Tue Nov 14 16:53:40 PST 2006


Hi,

Yes, this looks like a bug.

ElementScanner relies on the EmptyDocument and EmptyDocumentFactory nested 
classes to prevent any document to be built.
So, something has gone wrong here!

Comparing the current code (from CVS) and the original code (2002), I think 
the problem may come from FragmentHandler which was imported from JDOMResult 
as a replacement for the original ElementBuilder.

The following statement was imported:
          // Add a dummy root element to the being-built document as XSL
          // transformation can output node lists instead of well-formed
          // documents.
          this.pushElement(new Element("root", null, null));

It makes sense for JDOMResult (the comment explains why) but not here.

I suspect the root element it adds allows SAXHandler to attach the build 
Elements hence causing the memory leak.

Could you remove these lines from FragmentHandler's constructor et verify this 
fixes the problem ?

Laurent


Brian Nahas a écrit :
> I have a 1.2 GB xml file I need to parse.  Since it's nicely 
> partitioned, I planned on using ElementScanner from the contrib package 
> to only load one item at a time.  Here's an equivalent schema:
> 
> <data>
>     <item>...</item>
>     <item>...</item>
>     <item>...</item>
>     ...
> </data>
> 
> The path for I'm using for my listener is "/data/item".
> 
> I assumed any previous items would be released by the parser upon 
> completion.  ElementScanner was very simple to set up to handle this, 
> however I ran into an OutOfMemory error on my first try.  I was a little 
> confused as I thought ElementScanner was specifically designed to 
> prevent this.  Upon investigation, I found that the SAXHandler used by 
> the ElementScanner was holding onto the previous items after I was done 
> with them.  It adds them to the default root element that 
> FragmentHandler creates and nothing removes them after the listeners are 
> called.  This seems to be in direct conflict with this message I found 
> which states that ElementScanner doesn't build a document (this message 
> is fairly old though):
> 
> http://www.servlets.com/archive/servlet/ReadMsg?msgId=350607&listName=jdom-interest 
> <http://www.servlets.com/archive/servlet/ReadMsg?msgId=350607&listName=jdom-interest>
> 
> I worked around this by explicitly detaching the element in my listener 
> when I was done with it, but since it seems like this would be a common 
> pattern and subtle trap, so I thought I'd ask and see if I was missing 
> some setting or improperly using ElementScanner.  There's a namespace 
> declared on the data element so I don't know if that has something to do 
> with it.
> 
> Thanks,
> -Brian


More information about the jdom-interest mailing list