[jdom-interest] Announce: JDOMPlus a flexible XML framework f or Java

James Strachan james at metastuff.com
Thu Dec 7 02:51:19 PST 2000


----- Original Message -----
From: "Jason Hunter" <jhunter at collab.net>


> > > With the interface design (the one that doesn't depend on
> > > factories for every object creation) the only advantage
> > > interfaces offer that I can't replicate with base classes is...
> >
> > No! You missed the most vital step. Interfaces come with no
> > instance data baggage
>
> You took the bait!

OK, consider me hooked

> > e.g. today right now the sizes of the Element and Attribute are:
> >
> > Element has 7 instance variables right now
> > Attribute has 3 instance variables right now, maybe 4 soon
> > (parent to handle the sharing / mutation issue).
>
> It's possible to implement Element and Attribute each with just 1
> required instance variable.

Not in JDOM today. Though it is in the patch I've created at
http://www.jdomplus.org
The only option available in JDOM right now is through inheritence. If I
inherit from Element I get 7 instance variables whether I want them or not.
I cannot get rid of them, they are always there.

> Basically use association for other data as
> needed so memory is allocated to hold an attribute list only if one
> exists, for a namespace map only if one exists, and so on.

Ah you're saying it doesn't really matter that much if the 7 instance
variables don't ever point to something?
I'm afraid I don't agree here at all. Its a waste of memory and computation
to force base classes to inherit unnecessary instance variables. There are
often many, many Element and Attribute instances in RAM at the same time
whenever XML is parsed in JDOM so this is a big deal.

> I too don't
> like 7 instance variables, but I expect we can cut it down.

How?

Lets say that for a given DTD I know that an Element is empty and has no
attributes but just a text value. (This is actually suprisingly common in
XML - that an element only contains text).

It would be nice if it were technically possible to use a nice small element
implementation which just used one internal instance variable, a String.
JDOM currently disallows this as the lowest common demoninator is the
default Element implementation which comes hardwired with 7 instance
variables. Whether they refer to objects or not they are a waste of RAM.
This seems like a good argument to me for interfaces or abstract base
classes.

> BTW, that
> one variable would be reusable in the same model by any sort of
> subclass.  So, as I said, that's not a fundamental advantage of
> interfaces.

I'm arguing for a seperation between interface and implementation for the
XML tree in JDOM. This is elementary object orientation stuff.

I want to be free to implement the "element" interface using as much or as
little RAM as I see fit. Memory is kinda important with XML as files can
often contain very large numbers of elements and attribtues and it uses
considerable CPU and RAM to parse.

Whether the abstract base class technique is used or interfaces are used I
don't mind that much. You obviously favour the abstract base class approach
which I can understand. I'm not arguing over either of these approaches. I'm
just arguing for a clear split between interface and implementation which is
something the Collections library does very well. This is also quite
elementary object orientated design. Incidentally the Collections library
uses both interfaces and abstract base classes for maximum flexibility.

> >     List list = new List()
> >
> > is much easier than
> >
> >     List list = ArrayList();
>
> Following the interface model used by collections, changing from an
> ArrayList to LinkedList takes one line of change.  If JDOM followed suit
> with the same design, changing from FooElement to BarElement would
> require "n" many lines of change.

I don't follow your logic here. It depends on what your code is. I might
have 10,000 lines of code creating Lists or 1. I might have 10,000 lines of
code creating BarElements or 1.

> That's why factories usually come
> into play, to get it back to one line of change at the cost of each
> later line involving a factory call.

Yes, the "Factory" pattern or the "Factory Method" pattern are useful
techniques to allow you to change the implemetantions you are using without
changing too much code. They are useful but optional techniques though.

> > Interfaces are useful because the implementation is totally open
> > ended rather than having to use the bagagge that you decide upfront.
>
> If reduced instance data is your only argument for an abstract class,
> you're discounting the implementation flexibility we have in concrete
> classes.

Huh? What flexibility do you have because I am only allowed to derive from
an Element class with 7 instance variables?

> > > The price of that interface
> > > architecture is we can no longer declare final methods like
> > > equals().
> >
> > I don't see that as a big loss.
>
> I suspect that's because you haven't fully understood how JDOM is
> designed to work.

???

JDOM is designed to implement an XML document object model in Java. Having a
final equals method is not essential for this goal and losing it is not a
big loss.

> > > Also, we'll have this IElement interface that no one uses because
> > > they say new Element(),
> >
> > Only someone who's doing custom building of an XML tree is affected.
> > Readers, parsers, outputters are not affected at all.
>
> If those readers still say "Document d = builder.getDocument()" then the
> advantage of the generic interface/abstract class above is nullified,
> because they're speaking in terms of concrete classes.

Firstly the code:-

Document d = builder.getDocument()

is not speaking in terms of concrete classes. Document could be abstract.

Secondly if you notice I've never once argued for Document to be abstract.
Document instances are relatively rare so I'm not too concerned with its
memory footprint, number of instances or construction cost.
The rest of the tree classses -  Element, Attribute, Entity and the like are
much, much more frequent - any XML document over a few hundred K will create
thousands and thousands of element, attributes and the like but only ever
one Document.

>
> > Incidentally I can not find one line of code in any of the samples
> > in CVS do a "new Element()".
>
> Only because our samples are minimal.  Look at Elliotte's slides for his
> talks and you'll see it used everywhere.

Sure I just wanted to highlight the fact that the current sample programs
concentrate on reading / parsing rather than custom construction. Parsing
and reading of XML appears much more common than custom building. If we can
optimise the parsing and construction then I think its a big win for the
majority of the JDOM community.

> > When I turned Element, Attribute et al all into abstract
> > classes, all of the samples compiled and ran correctly without
> > modification.
>
> So you have "Document" as the abstract class.

Erm no. I never said this. Check out the code on http://www.jdomplus.org and
look around. Document is pretty much the same as in the standard JDOM
distribution. (Apart from it now supports a Visitor pattern and XPath too).
I'm only talking about the 'tree' classses from the root element downwards
to the leaves.

> Thus someone wanting to
> construct a new document from scratch would have to say new
> FooDocument() and fill it with new FooElement() instances.

Nope.

    Element element = ...;
    Document d = new Document( element );

No change. The only change is a choice of which Element or Attribute
implementation to use. If you don't want a choice, just use JDOM's 'default'
implementations. Some of us want a choice as choosing the correct data
structure can make a big performance difference.

>
> >     // I don't want to use a factory here as its complicated
> >     // I want an easy life which is a good choice
> >
> >     Element element = new DefaultElement( "foo" );
> >
> > like they do with collections
> >
> >     List list = new ArrayList();
> >     List list2 = new LinkedList();
>
> Again, in collections it's a one-line change to switch.  In JDOM it
> would be nearly every line changed to switch.  It's a bit apples and
> oranges.

???

> > The advantages are a plethora of implementation choices for developers
> > without taking a big hit in memory usage. Here's a couple of examples
> >
> > * ... (e.g. 1 String for attributes and PCDATA elements
>
> Possible with concrete classes.

I cannot use the current JDOM implemetation to implement an element or
attribute with only 1 instance variable. I get 7 or 3 right now whether I
like it or not. Care to show me how if I'm missing something?

>
> > * allow dual tree implementations
>
> This is the most interesting possibility, but not one due to interfaces
> or abstract classes but rather just an internal split in the API as was
> discussed here.
> However, since not many people appeared interested in
> having the split and it significantly complicates the API, it's a change
> that would be unwise to make at this time.

Because directly it doesn't seem useful. If I said I could boost performance
of their XML parsing by X% they might be interested, depending on the value
of X. If X = 1 maybe not. If X > 20% then maybe yes.

> > * allow nodes and branches to be reused (caching) for use in multiple
> > documents by developers who know what they are doing
>
> Possible as a side effect of the above.
>
> > > Now, the other interface model (where factories are used for all
> > > object creation) has its own set of problems that people who've
> > > used DOM are familiar with.
> >
> > What like?
>
> It's been discussed enough on this list.

Like I said, factories are optional. If people want to keep referring to the
default JDOM concrete implementation of element and attribute they can. I'm
arguing for more choice not less.


> > Using a factory is totally optional. There's nothing to stop Brett
> > or anyone else using a 'default standard JDOM' implementation of
> > Element and Attribute throughout his entire code base. e.g.
> >
> >     Element element = new DefaultElement( "foo" );
>
> You could say the same for DOM.  But everyone uses the factory.

Noone uses a factory today in JDOM. Noone. If abstract base classes or
interfaces were introduced would everyone immediately switch over and use
factories instead? I'm not sure about that. They may be already using their
own factories right now or "factory methods".

> > You don't have to use a factory when using List and ArrayList and
> > LinkedList. Its your choice. I'm advocating a similar stance in JDOM.
>
> I think you're a little fixated on lists.  :-)

I think you're a little fixated on concrete base classes ;-)

I could use many other examples from the JDK if you'd rather, talking about
lists is maybe getting a bit dull. I thought since JDOM is mostly an XML
data structure then the existing JDK data structure package was a good basis
to discuss good object orientated design. Here's a few more we could use:-

    Set or Map
    URL
    Window, Button
    JDBC
    JNDI
    JMS

I'd be happy to use another example if you'd rather.

> > I'm not allowed to implement an Element or Attribute without
> > inheriting all the instance data Brett & you decide should be there.
>
> Well, then you're really going to enjoy my proposal to cut mandatory
> instance data down to one reusable variable.

I don't see a way forward with only one concrete Element implementation
class and no seperation of interface.

Are you talking about reducing the number of instance variables inside an
object instance or promoting the sharing of object instances, they're quite
different things as I'm sure you understand.


<James/>


James Strachan
=============
email: james at metastuff.com
web: http://www.metastuff.com




If you are not the addressee of this confidential e-mail and any
attachments, please delete it and inform the sender; unauthorised
redistribution or publication is prohibited. Views expressed are those of
the author and do not necessarily represent those of Citria Limited.



More information about the jdom-interest mailing list