[jdom-interest] JDOM and memory

Rolf Lear jdom at tuis.net
Sat Jan 28 08:38:32 PST 2012


Hi All ... An update...

I have played with a number of options, and have not had significant 
success with any.

Merging Content-list in to Element has a number of problems:
1. Document and Element end up duplicating a lot of code
2. It changes the API of Document and Element with it implementing 
List<Content>

Document and Element almost always contain content... it is seldom that 
you have empty Elements (there is normally some text at least). As a 
result, the savings of not having to have a content array are limited.

There can be some saving in not having a separate object as the list, 
but it does not amount to much. Given the issues with the API this 
approach does not make sense.

Michael Kay suggested keeping the ContentList independent of the 
Element, and creating an instance when it was referenced in 
getContent(). The problem with this is that the management of 
ConcurrentModification becomes very complicated, and, as far as I can 
tell, essentially impossible if there are multiple differet instances of 
the ContentList class for any particular Element. Given that almost all 
Element instances have content, it is not worth the effort to lose the 
ConcurrentModification control, and not actually save any memory in a 
typical use case.

So, neither option for changing the ContentList system is very successful.

On the other hand, it is relatively common to have no Attributes on an 
Element, and some careful changes to the Element class (adding a 
hasAttributes() method and making the AttributeList variable a 'lazy' 
initialised field) this means that in ideal cases we never need to 
actually create an AttributeList instance for the Element. This has a 
significant impact on the 'hamlet' test, where there are essentially no 
attributes. It has no 'negative' impact on memory in the worst case 
either, and it has positive (small but significant) impact on performance.

So, the lazy initialization of AttributeList is a 'win'.

Finally, I have in the past had some success with the concept of 
'reusing' String values. XML Parsers (like SAX, etc.) typically create a 
new String instance for all the variables they pass. For example, the 
Element names, prefixes, etc. are all new instances of String. Thus, if 
you have hundreds of Elements called 'car' in your input XML, you will 
get hundreds of different String Element names with the value 'car'. I 
have built a class that does something similar to String.intern() in 
order to rationalize the hundreds of different-but-equals() values that 
are passed in by the parsers.

I have incorporated this 'caching' class in to a new JDOMFactory called 
'SlimJDOMFactory'. This factory 'normalizes' all String values to a 
single instance of each unique String value. This significantly reduces 
the amount of memory used in the JDOM tree especially if there are lots 
of: similarly named attributes, elements, white-space-padding in 
otherwise empty elements, or between elements. This process is 
significantly slower through...

For example, with the 'hamlet' test case, the 'baseline' memory 
footprint for hamlet in JDOM is 2.27MB in 4.75ms.
With the SlimJDOMFactory it is: 1.77MB in 8.5ms
With Lazy AttributeList it is: 2.06MB in 4.55ms
With the both it is 1.57MB in 8.3ms

I am pushing both of these changes in to github. The AttributeList is an 
easy one to justify. It is fully compatible with prior code, it has 
positive memory and perfomance impacts.

The SlimJDOMFactory is also justifiable when you consider:
1. the user has to decide to use it specifically.
2. The memory saving can be very significant.
3. Even though the parse time is slower, the GC time savings can be 
significant if the document 'hangs around' for a long time - the quicker 
GC time can add up fast.
4. When you have lots of code doing comparisons it is much faster to do 
equals() calls on Strings that are == as well. It saves a hashCode 
calculation as well as a string character scan to prove equals().

Rolf

On 02/01/2012 3:27 PM, Rolf wrote:
> Hi all.
>
> Memory optimization has never been a top priority for JDOM. At the same
> time, for what it does, JDOM is not a 'terrible' memory user. Still, I
> have done some analysis, and, I believe I can trim about a quarter to a
> half of 'JDOM Overhead' memory usage by making two 'simple' changes....
>
> The first is to merge the ContentList class in to the Element class (and
> also in to Document). This will reduce the number of Java objects by
> about half, and that will save about 32 bytes per Element at a minimum
> in a 64-bit JRE. Additionally, by lazy-initialization of the Content
> array, we can save memory on otherwise 'empty' Elements.
>
> This can be done by extending the Element (and perhaps Document) class
> to extend 'List'. It can all be done in a 'backward compatible' way, but
> also leads to some interesting possibilities, like:
>
> for (Content c : element) {
> ... do something
> }
>
> (for backward compatibility, Element.getContent() will return 'this').
>
>
> The second change is to make the AttributeList instance in Element a
> lazy-initialization. This would save memory on all Elements that have no
> attributes, but would have an impact for people who sub-class the
> Element class and may expect the attributes field to be non-null.
>
>
> I am trying to get a feel for how important this sort of optimization
> may be. If there is interest then I will make some changes, and test the
> impact. I may make a separate branch in github to test it out....
>
> If the above changes are unrealistic then I don't think it makes sense
> to even try....
>
> Rolf
> _______________________________________________
> To control your jdom-interest membership:
> http://www.jdom.org/mailman/options/jdom-interest/youraddr@yourhost.com
>



More information about the jdom-interest mailing list