[jdom-interest] JDOM and memory
Rolf Lear
jdom at tuis.net
Sun Jan 29 03:44:48 PST 2012
Hi all.
Just to be clear, the 'SlimJDOMFactory is not a default setting.
by default people will:
SAXBuilder builder = new SAXBuilder();
If you want to have a smaller mempory footprint (but also a slower parse)
you can:
SAXBuilder builder = new SAXBuilder(new SlimJDOMBuilder());
So, these changes are not affecting anything by default.
What I am hearing is that there is value in an 'InterningJDOMFactory'
which will do a String.intern() on element and attribute names? That should
be easy to arrange... but doing more thant just the Element and Attribute
names is likely to cause issues in PermGen (the SlimJDOMFactory can do
'everything' including the XML Text and CDATA sections...
Regardless, I sense some anxiety about the SlimJDOMFactory, but, it is
something the user needs to opt-in for, so it is very 'safe'.
Rolf
On Sun, 29 Jan 2012 11:58:18 +0100, Paul Libbrecht <paul at hoplahup.net>
wrote:
> Rolf,
>
> I do know there are applications (such as what Michael reported about:
> that generate random prefixes) for which any form of pooling is
dangerous;
> and you show that there are situation where interning performs worth
than
> other pooling methods (I think hashCode might be seen as guilty but that
> can't been changed).
>
> Nonetheless, I believe the design that we had where the element names
were
> interned is common: in the server application that was there, the
> ActiveMath learning environment, the element names are everywhere in the
> java code as well, e.g. for comparison within if statements. So for this
> interning is actually better than pooling overall.
>
> I'm convinced many JDOM users have this approach; using JDOM is cute for
> Java programming, not for XSLT friends that only see the world as
pipelines
> translatable into a set of unix xsltproc calls.
>
> I would suggest the following:
> - make this configurable
> - make this subclassable and exploitable
>
> That is to let e.g. SAXBuilder have a method:
>
> public String makePooledName(String)
>
> which would then call the right interning method (String.intern for
those
> who want, SlimJDOMFactory's per default?, nothing for those who fear
> retention).
>
> That'd be in SAXBuilder or JDOMFactory? I'm afraid there's no global
JDOM
> config object, that'd be the place, e.g. also to be called from new
> Element("name").
>
> paul
>
>
> Le 29 janv. 2012 à 02:41, Rolf Lear a écrit :
>
>> I have now compared the results of string-interning to the String-cache
>> code.
>>
>> The 'raw' code (neither SLimJDOMFactory nor string-interning) is:
>> 2.06MB @ 4.55ms
>> The SlimJDOMFactory is:
>> 1.57MB @ 8ms
>> The string-interning SAX Feature is:
>> 2.06MB @ 6.1ms
>>
>> Not sure how I got essentially zero improvement of memory.... got
>> something wrong..... no... been checking, but I think the difference in
>> using String.intern on element names only is so insignificant that it
>> does not feature as much as 1%..... perhaps all the dirrerence is
coming
>> in whitespace....
>>
>> Not worth checking in to it.... I don't believe the String.itern() is
>> the right answer regardless.
>>
>> Rolf
>>
>>
>> On 28/01/2012 1:37 PM, Michael Kay wrote:
>>>
>>>>
>>>>
>>>> Finally, I have in the past had some success with the concept of
>>>> 'reusing' String values. XML Parsers (like SAX, etc.) typically
create
>>>> a new String instance for all the variables they pass. For example,
>>>> the Element names, prefixes, etc. are all new instances of String.
>>>> Thus, if you have hundreds of Elements called 'car' in your input
XML,
>>>> you will get hundreds of different String Element names with the
value
>>>> 'car'. I have built a class that does something similar to
>>>> String.intern() in order to rationalize the hundreds of
>>>> different-but-equals() values that are passed in by the parsers.
>>> Have you measured how your optimization compares with the effect of
>>> setting the http://xml.org/sax/features/string-interning property on
the
>>> SAX parser?
>>>
>>> Are you doing the interning in a way that guarantees strings can be
>>> compared using "==", and if so, are you taking advantage of this when
>>> doing the comparisons? .The big win comes with XPath searches such as
>>> //x. Does the interning introduce any synchronization? (This is the
big
>>> disadvantage with Saxon's NamePool - it speeds up XPath searching
>>> substantially, but the contention in a highly concurrent workload can
>>> become quite significant.)
>>>
>>> Are you pooling the QName as a whole, or the local name, prefix and
URI
>>> separately?
>>>
>>> Michael Kay
>>> Saxonica
>>>>
>>>> I have incorporated this 'caching' class in to a new JDOMFactory
>>>> called 'SlimJDOMFactory'. This factory 'normalizes' all String values
>>>> to a single instance of each unique String value. This significantly
>>>> reduces the amount of memory used in the JDOM tree especially if
there
>>>> are lots of: similarly named attributes, elements,
white-space-padding
>>>> in otherwise empty elements, or between elements. This process is
>>>> significantly slower through...
>>>>
>>>> For example, with the 'hamlet' test case, the 'baseline' memory
>>>> footprint for hamlet in JDOM is 2.27MB in 4.75ms.
>>>> With the SlimJDOMFactory it is: 1.77MB in 8.5ms
>>>> With Lazy AttributeList it is: 2.06MB in 4.55ms
>>>> With the both it is 1.57MB in 8.3ms
>>>>
>>>> I am pushing both of these changes in to github. The AttributeList is
>>>> an easy one to justify. It is fully compatible with prior code, it
has
>>>> positive memory and perfomance impacts.
>>>>
>>>> The SlimJDOMFactory is also justifiable when you consider:
>>>> 1. the user has to decide to use it specifically.
>>>> 2. The memory saving can be very significant.
>>>> 3. Even though the parse time is slower, the GC time savings can be
>>>> significant if the document 'hangs around' for a long time - the
>>>> quicker GC time can add up fast.
>>>> 4. When you have lots of code doing comparisons it is much faster to
>>>> do equals() calls on Strings that are == as well. It saves a hashCode
>>>> calculation as well as a string character scan to prove equals().
>>>>
>>>> Rolf
>>>>
>>>> On 02/01/2012 3:27 PM, Rolf wrote:
>>>>> Hi all.
>>>>>
>>>>> Memory optimization has never been a top priority for JDOM. At the
>>>>> same
>>>>> time, for what it does, JDOM is not a 'terrible' memory user. Still,
I
>>>>> have done some analysis, and, I believe I can trim about a quarter
to
>>>>> a
>>>>> half of 'JDOM Overhead' memory usage by making two 'simple'
>>>>> changes....
>>>>>
>>>>> The first is to merge the ContentList class in to the Element class
>>>>> (and
>>>>> also in to Document). This will reduce the number of Java objects by
>>>>> about half, and that will save about 32 bytes per Element at a
minimum
>>>>> in a 64-bit JRE. Additionally, by lazy-initialization of the Content
>>>>> array, we can save memory on otherwise 'empty' Elements.
>>>>>
>>>>> This can be done by extending the Element (and perhaps Document)
class
>>>>> to extend 'List'. It can all be done in a 'backward compatible' way,
>>>>> but
>>>>> also leads to some interesting possibilities, like:
>>>>>
>>>>> for (Content c : element) {
>>>>> ... do something
>>>>> }
>>>>>
>>>>> (for backward compatibility, Element.getContent() will return
'this').
>>>>>
>>>>>
>>>>> The second change is to make the AttributeList instance in Element a
>>>>> lazy-initialization. This would save memory on all Elements that
have
>>>>> no
>>>>> attributes, but would have an impact for people who sub-class the
>>>>> Element class and may expect the attributes field to be non-null.
>>>>>
>>>>>
>>>>> I am trying to get a feel for how important this sort of
optimization
>>>>> may be. If there is interest then I will make some changes, and test
>>>>> the
>>>>> impact. I may make a separate branch in github to test it out....
>>>>>
>>>>> If the above changes are unrealistic then I don't think it makes
sense
>>>>> to even try....
>>>>>
>>>>> Rolf
>>>>> _______________________________________________
>>>>> To control your jdom-interest membership:
>>>>>
http://www.jdom.org/mailman/options/jdom-interest/youraddr@yourhost.com
>>>>>
>>>>
>>>> _______________________________________________
>>>> To control your jdom-interest membership:
>>>>
http://www.jdom.org/mailman/options/jdom-interest/youraddr@yourhost.com
>>>>
>>>
>>> _______________________________________________
>>> To control your jdom-interest membership:
>>>
http://www.jdom.org/mailman/options/jdom-interest/youraddr@yourhost.com
>>>
>>
>> _______________________________________________
>> To control your jdom-interest membership:
>> http://www.jdom.org/mailman/options/jdom-interest/youraddr@yourhost.com
More information about the jdom-interest
mailing list