[jdom-interest] JDOM and memory
Rolf Lear
jdom at tuis.net
Sat Jan 28 08:38:32 PST 2012
Hi All ... An update...
I have played with a number of options, and have not had significant
success with any.
Merging Content-list in to Element has a number of problems:
1. Document and Element end up duplicating a lot of code
2. It changes the API of Document and Element with it implementing
List<Content>
Document and Element almost always contain content... it is seldom that
you have empty Elements (there is normally some text at least). As a
result, the savings of not having to have a content array are limited.
There can be some saving in not having a separate object as the list,
but it does not amount to much. Given the issues with the API this
approach does not make sense.
Michael Kay suggested keeping the ContentList independent of the
Element, and creating an instance when it was referenced in
getContent(). The problem with this is that the management of
ConcurrentModification becomes very complicated, and, as far as I can
tell, essentially impossible if there are multiple differet instances of
the ContentList class for any particular Element. Given that almost all
Element instances have content, it is not worth the effort to lose the
ConcurrentModification control, and not actually save any memory in a
typical use case.
So, neither option for changing the ContentList system is very successful.
On the other hand, it is relatively common to have no Attributes on an
Element, and some careful changes to the Element class (adding a
hasAttributes() method and making the AttributeList variable a 'lazy'
initialised field) this means that in ideal cases we never need to
actually create an AttributeList instance for the Element. This has a
significant impact on the 'hamlet' test, where there are essentially no
attributes. It has no 'negative' impact on memory in the worst case
either, and it has positive (small but significant) impact on performance.
So, the lazy initialization of AttributeList is a 'win'.
Finally, I have in the past had some success with the concept of
'reusing' String values. XML Parsers (like SAX, etc.) typically create a
new String instance for all the variables they pass. For example, the
Element names, prefixes, etc. are all new instances of String. Thus, if
you have hundreds of Elements called 'car' in your input XML, you will
get hundreds of different String Element names with the value 'car'. I
have built a class that does something similar to String.intern() in
order to rationalize the hundreds of different-but-equals() values that
are passed in by the parsers.
I have incorporated this 'caching' class in to a new JDOMFactory called
'SlimJDOMFactory'. This factory 'normalizes' all String values to a
single instance of each unique String value. This significantly reduces
the amount of memory used in the JDOM tree especially if there are lots
of: similarly named attributes, elements, white-space-padding in
otherwise empty elements, or between elements. This process is
significantly slower through...
For example, with the 'hamlet' test case, the 'baseline' memory
footprint for hamlet in JDOM is 2.27MB in 4.75ms.
With the SlimJDOMFactory it is: 1.77MB in 8.5ms
With Lazy AttributeList it is: 2.06MB in 4.55ms
With the both it is 1.57MB in 8.3ms
I am pushing both of these changes in to github. The AttributeList is an
easy one to justify. It is fully compatible with prior code, it has
positive memory and perfomance impacts.
The SlimJDOMFactory is also justifiable when you consider:
1. the user has to decide to use it specifically.
2. The memory saving can be very significant.
3. Even though the parse time is slower, the GC time savings can be
significant if the document 'hangs around' for a long time - the quicker
GC time can add up fast.
4. When you have lots of code doing comparisons it is much faster to do
equals() calls on Strings that are == as well. It saves a hashCode
calculation as well as a string character scan to prove equals().
Rolf
On 02/01/2012 3:27 PM, Rolf wrote:
> Hi all.
>
> Memory optimization has never been a top priority for JDOM. At the same
> time, for what it does, JDOM is not a 'terrible' memory user. Still, I
> have done some analysis, and, I believe I can trim about a quarter to a
> half of 'JDOM Overhead' memory usage by making two 'simple' changes....
>
> The first is to merge the ContentList class in to the Element class (and
> also in to Document). This will reduce the number of Java objects by
> about half, and that will save about 32 bytes per Element at a minimum
> in a 64-bit JRE. Additionally, by lazy-initialization of the Content
> array, we can save memory on otherwise 'empty' Elements.
>
> This can be done by extending the Element (and perhaps Document) class
> to extend 'List'. It can all be done in a 'backward compatible' way, but
> also leads to some interesting possibilities, like:
>
> for (Content c : element) {
> ... do something
> }
>
> (for backward compatibility, Element.getContent() will return 'this').
>
>
> The second change is to make the AttributeList instance in Element a
> lazy-initialization. This would save memory on all Elements that have no
> attributes, but would have an impact for people who sub-class the
> Element class and may expect the attributes field to be non-null.
>
>
> I am trying to get a feel for how important this sort of optimization
> may be. If there is interest then I will make some changes, and test the
> impact. I may make a separate branch in github to test it out....
>
> If the above changes are unrealistic then I don't think it makes sense
> to even try....
>
> Rolf
> _______________________________________________
> To control your jdom-interest membership:
> http://www.jdom.org/mailman/options/jdom-interest/youraddr@yourhost.com
>
More information about the jdom-interest
mailing list