[jdom-interest] AttributeList performance

Martin Schulz schulz at videotron.ca
Sun Apr 23 09:53:12 PDT 2006


I would like to thank all of you who have replied for the interesting 
discussion.

To summarize the current state of Attribute handling:
1) Attributes are stored in a List and that List can be retrieved and 
manipulated directly.
    The choice of datastructure may have been biased by the sequential 
nature of
    the XML or the sequential parsing.  From the perspective of dynamic 
operations
    on the Attributes a Map representation would be more natural.
2) Performance characteristics are optimal for memory footprint and 
serialization,
    suboptimal for addition, retrieval and removal, in particular when 
the number of
    Attributes exceeds 5 (preallocated size).
3) While Attributes are not bound to appear in any particular order in 
XML, the
    current implementation preserves the order of insertion; this 
property is not officially
    endorsed, but constitutes a 'nice-to-have' characteristic.  This is 
my personal opinion,
    not endorsed during the discussion, but I would expect it to be 
expressed by non-expert
    users of JDOM or readers of the XML nontheless.

Much of the discussion centered around the performance of Maps versus 
arrays and Lists,
in particular when small numbers of Attributes are present.  The actual 
performance would
have to be quantified for a given use case, and there again was much 
discussion about what
constitutes the expected usage profile of JDOM.

Some very supportive comments have been voiced, both from the parser 
perspective, where
fast lookup is vital and typically ensured through HashMaps, and 
similarly from a user level
perspective where the 'vital' data is, again, made avialable in the form 
of HashMaps for faster
lookup.  The latter approach I have tried out myself and found it 
satisfactory from a performance
perspective, but it for sure added some seemingly unnecessary clutter to 
my code.

I have not seen any profiling of construction and in-memory manipulation 
of JDOM objects -
the usual expectation being that these operations are somehow O(1).  
Wrong, of course.

There was even one message expressing the opinion building large JDOM 
structures in memory to
be a 'bad' approach.  There is of course not of a choice there, not 
having a hybrid model, to build
only some of the structure using JDOM.  I would like to turn the 
argument around and propose to remedy the
situation.

The Future of JDOM
I would expect the Attribute operations to perform in O(1), whether 
small or large numbers of
Attributes are concerned.  In particular, read and write from or to XML 
and serialization of JDOM
objects should not suffer.

What is Required
At the end of the day, I would expect AttributeMap to take the place of 
AttributeList.  Arguing
the 'least damage',  I would by default use an AttributeMap which 
scales, versus an AttributeList,
which doesn't.

Thinking along the Map route, I would envision using either 
org.apache.commons.collections.map.LinkedMap
or java.util.LinkedHashMap to be able to preserve the order of the 
attributes.

That of course would break compatibility with Java 1.2 or make jdom 
dependant on additional external jars.
Are there any alternatives?

The second problem, as I see it is the Element API for getAttributes, 
which cannot be made
a live List in this new world.  I would prefer very much a method 
getAttributeMap over the existing getAttributes.

Which brings us to another interesting point, somehow touched upon but 
not quite explicitly
in some messages.

Why not start right away by adding an AttributeKey class, which would 
encapsulate the
name and Namespace and serve as the key for any Map.

That would be very useful for either internally or externally 
implemented Maps of Attributes
(for a given Element).

The crux of the matter.
How to Transition? I would propose to
1) add AttributeKey for the next revision of JDOM, and
2) remove the live List guarantee for the returned List from the 
getAttributes method.

That would at least allow alternate implementations, without being stuck 
with supporting the live list guarantee.

Thanks!
   Martin

Martin Schulz wrote:
>
> we have been running into a performance problem caused by the O(N) 
> performance characteristic
> of the JDOM AttributeList in the indexOf method.  This affects both 
> insertions and lookup of Attributes.
>
> Our application uses hundreds of Elements, each of which may have 
> 20-40 attributes.
> In essence, the application (ab)uses the Element attributes as 
> properties.
>
> The visible effect for us is a noticable slowdown, e.g. a by a second 
> or so.
>
> We now face the choice to
> a) speed up JDOM internally
> b) subclass AttributeList and implement it more efficiently
> c) build a HashMap from the AttributeList externally
> d) not use Element attributes for this purpose
>
> Being naive about XML, my first question is whether the order of
> attributes is either significant or within JDOM guaranteed to be stable
> as part of any API contract, or is it considered a 'nice-to-have'?
>
> My other question to the list is whether this is something
> worth fixing in general.  My use case may sound absurd and exotic,
> but with the widespread use of JDOM I wouldn't be surprised if this
> were a lingering problem for others.
>
> A quick search of the mailing list doesn't indicate that the internal 
> datastructures have
> been publicly scrutinized for performance; on the other hand, 
> everything appears
> just to work well enough.
>
> There might well be an implementation trade-off where the current 
> default (array of 5 Attributes)
> is the better choice compared to adding a datastructure with a bigger 
> footprint, so I'd
> be curious about what others might deem appropriate in this situation.
>
> Thanks for your advice.
>
>   Martin
> _______________________________________________
> To control your jdom-interest membership:
> http://www.jdom.org/mailman/options/jdom-interest/youraddr@yourhost.com
>
>




More information about the jdom-interest mailing list