[jdom-interest] AttributeList performance
Martin Schulz
schulz at videotron.ca
Sun Apr 23 09:53:12 PDT 2006
I would like to thank all of you who have replied for the interesting
discussion.
To summarize the current state of Attribute handling:
1) Attributes are stored in a List and that List can be retrieved and
manipulated directly.
The choice of datastructure may have been biased by the sequential
nature of
the XML or the sequential parsing. From the perspective of dynamic
operations
on the Attributes a Map representation would be more natural.
2) Performance characteristics are optimal for memory footprint and
serialization,
suboptimal for addition, retrieval and removal, in particular when
the number of
Attributes exceeds 5 (preallocated size).
3) While Attributes are not bound to appear in any particular order in
XML, the
current implementation preserves the order of insertion; this
property is not officially
endorsed, but constitutes a 'nice-to-have' characteristic. This is
my personal opinion,
not endorsed during the discussion, but I would expect it to be
expressed by non-expert
users of JDOM or readers of the XML nontheless.
Much of the discussion centered around the performance of Maps versus
arrays and Lists,
in particular when small numbers of Attributes are present. The actual
performance would
have to be quantified for a given use case, and there again was much
discussion about what
constitutes the expected usage profile of JDOM.
Some very supportive comments have been voiced, both from the parser
perspective, where
fast lookup is vital and typically ensured through HashMaps, and
similarly from a user level
perspective where the 'vital' data is, again, made avialable in the form
of HashMaps for faster
lookup. The latter approach I have tried out myself and found it
satisfactory from a performance
perspective, but it for sure added some seemingly unnecessary clutter to
my code.
I have not seen any profiling of construction and in-memory manipulation
of JDOM objects -
the usual expectation being that these operations are somehow O(1).
Wrong, of course.
There was even one message expressing the opinion building large JDOM
structures in memory to
be a 'bad' approach. There is of course not of a choice there, not
having a hybrid model, to build
only some of the structure using JDOM. I would like to turn the
argument around and propose to remedy the
situation.
The Future of JDOM
I would expect the Attribute operations to perform in O(1), whether
small or large numbers of
Attributes are concerned. In particular, read and write from or to XML
and serialization of JDOM
objects should not suffer.
What is Required
At the end of the day, I would expect AttributeMap to take the place of
AttributeList. Arguing
the 'least damage', I would by default use an AttributeMap which
scales, versus an AttributeList,
which doesn't.
Thinking along the Map route, I would envision using either
org.apache.commons.collections.map.LinkedMap
or java.util.LinkedHashMap to be able to preserve the order of the
attributes.
That of course would break compatibility with Java 1.2 or make jdom
dependant on additional external jars.
Are there any alternatives?
The second problem, as I see it is the Element API for getAttributes,
which cannot be made
a live List in this new world. I would prefer very much a method
getAttributeMap over the existing getAttributes.
Which brings us to another interesting point, somehow touched upon but
not quite explicitly
in some messages.
Why not start right away by adding an AttributeKey class, which would
encapsulate the
name and Namespace and serve as the key for any Map.
That would be very useful for either internally or externally
implemented Maps of Attributes
(for a given Element).
The crux of the matter.
How to Transition? I would propose to
1) add AttributeKey for the next revision of JDOM, and
2) remove the live List guarantee for the returned List from the
getAttributes method.
That would at least allow alternate implementations, without being stuck
with supporting the live list guarantee.
Thanks!
Martin
Martin Schulz wrote:
>
> we have been running into a performance problem caused by the O(N)
> performance characteristic
> of the JDOM AttributeList in the indexOf method. This affects both
> insertions and lookup of Attributes.
>
> Our application uses hundreds of Elements, each of which may have
> 20-40 attributes.
> In essence, the application (ab)uses the Element attributes as
> properties.
>
> The visible effect for us is a noticable slowdown, e.g. a by a second
> or so.
>
> We now face the choice to
> a) speed up JDOM internally
> b) subclass AttributeList and implement it more efficiently
> c) build a HashMap from the AttributeList externally
> d) not use Element attributes for this purpose
>
> Being naive about XML, my first question is whether the order of
> attributes is either significant or within JDOM guaranteed to be stable
> as part of any API contract, or is it considered a 'nice-to-have'?
>
> My other question to the list is whether this is something
> worth fixing in general. My use case may sound absurd and exotic,
> but with the widespread use of JDOM I wouldn't be surprised if this
> were a lingering problem for others.
>
> A quick search of the mailing list doesn't indicate that the internal
> datastructures have
> been publicly scrutinized for performance; on the other hand,
> everything appears
> just to work well enough.
>
> There might well be an implementation trade-off where the current
> default (array of 5 Attributes)
> is the better choice compared to adding a datastructure with a bigger
> footprint, so I'd
> be curious about what others might deem appropriate in this situation.
>
> Thanks for your advice.
>
> Martin
> _______________________________________________
> To control your jdom-interest membership:
> http://www.jdom.org/mailman/options/jdom-interest/youraddr@yourhost.com
>
>
More information about the jdom-interest
mailing list