[jdom-interest] JDOM parser reuse memory problem

Thu Nov 17 08:01:59 PST 2011

Randall, that depends on a few things....

1. I am looking at the source code, and I could have interpreted it 
wrong, but I don't think so
2. whether you are reusing the SAX Parser instance - setReuseParser(true)
3. whether I understand your question right....

Firstly, there's two ways to read your question: does the parsed 
Document refer back to the SAXBuilder somehow; or does the SAXBuilder 
have references somehow to the parsed Document

Answering the first mechanism first....

In a normal 'build', the code creates a SAX Parser/XMLReader instance, 
and a SAX ContentHandler to handle the SAX events.
The ContentHandler created is an instance of SAXHandler, and that 
contains references to the Document that is parsed.
When the build is completed, the Document instance is retrieved from the 
SAXHandler, and returned to the caller (you). The SAXHandler and the 
XMLReader are then de-referenced and can be garbage-collected.

In this normal case the answer would be 'there is no reference from the 
SAXBuildert to the Document'.

If however you configure the SAXBuilder to reuse the SAX 
Parser/XMLReader though, then you run in to the bug you first alerted us 
to... At the end of the build process the SAXParser does not 
de-reference the XMLReader, and keeps it for the next (potential) build. 
Unfortunately, that XMLReader contains references to the ContentHandler 
it last used (the SAXHandler). The SAXHandler has references to the last 
Document it handled. In other words, if you re-use the XMLReader, then 
you also keep a chain of references that link to the Document you last 
parsed.

The Second mechanism ... does a parsed Document refer back to it's 
SAXBuilder?

That is easy to answer, no, it does not. There is no reference from the 
Document back to the SAXBuilder, and Elements only reference back as far 
as the parent Document

In a more generalized answer, the only issue I can see with having a 
pool of SAXBuilders is that, if you reuse parsers, you will 'carry' the 
most recently parsed document from each SAXBuilder until that builder is 
used again.

Again though, I have to ask, is there something you have seen which 
indicates there may be a back-reference to the SAXBuilder?

Rolf

If you are *not* reusing the parser then both the parser and the , then 
SAXBuilder 'remembers' the XMLReader instance

On 17/11/2011 10:25 AM, Randall Theobald wrote:
> I have a quick question related to pooling SAXBuilders. Can I release the
> SAXBuilder back to the pool immediately after the .build method is called?
> In other words, there's no tie back to the builder from the resulting
> Document or Element objects, right?
>
>     Randall Theobald
>
>     Performance: WebSphere
>     Business Process
>     Management&
>     Connectivity
>
>     IBM Software Group             randallt at us.ibm.com
>
>     Austin, TX                     512-286-8870   t/l:
>                                               363-8870
>
>
>
>
>
>
>
>
>
>
> From:	Rolf Lear<jdom at tuis.net>
> To:	Michael Kay<mike at saxonica.com>,
> Cc:	jdom-interest at jdom.org
> Date:	11/11/2011 05:31 AM
> Subject:	Re: [jdom-interest] JDOM parser reuse memory problem
> Sent by:	jdom-interest-bounces at jdom.org
>
>
>
> On 11/11/2011 3:33 AM, Michael Kay wrote:
>> On 10/11/2011 18:51, Rolf Lear wrote:
>>> Hi Randall, Michael.
>>>
>>> It's an interesting observation... and I can see the implications. I
>>> would
>>> like to take a closer look at at, but that may take a little while.
>>>
>>> I filed https://github.com/hunterhacker/jdom/issues/52
>>>
>>> 'Off the cuff' I can think of one work-around and a few solutions (in
>>> addition to what Michael has suggested)
>>>
>>> 1. immediately after parsing your real document you then parse a
>>> dummy/small/inmemory document (even invalid - and catch the exception).
>>> 2. Currently when you do-no reuse the parser, it goes back to 'first
>>> principals' and queries JAXP, etc. to find a parser instance Instead it
>>> could 'cache' the parser 'source' after the first time, and then just
>>> create a new instance, instead of doing all the class-based lookups...
>> Ouch. Creating a new parser to parse a small document is a cost that
>> it's nice to avoid, but it isn't going to kill you. Going through the
>> JAXP factory process to get a new ParserFactory is a monstrous cost
>> that can dominate all other processing - and reusing the factory costs
>> nothing.
>>
>> Michael Kay
>> Saxonica
>>
> Not sure what you are saying... are you agreeing that the 'ouch' problem
> is the one it has at the moment, or the suggestion to skip the JAXB
> processing on subsequent non-reuse-parser parses?
>
> I have not yet had a close look at the problem... the potential option
> of not going back to first-principles on subsequent parses may not be
> (easily) possible.... Unless Randall can convince me otherwise, I'm
> going to finish working on some StAX outputter code I am embroiled in,
> and then look at it.
>
> Rolf
> _______________________________________________
> To control your jdom-interest membership:
> http://www.jdom.org/mailman/options/jdom-interest/youraddr@yourhost.com
>
>
>