[jdom-interest] Re: Proposal: JDOM event based processing
Patrick Dowler
Patrick.Dowler at nrc.ca
Tue Nov 7 10:05:21 PST 2000
On Tue, 07 Nov 2000, you wrote:
> From: "Patrick Dowler" <Patrick.Dowler at nrc.ca>
> > Umm, isn't the whole point of JDOM to have an object model rather than an
> > event stream?
>
> Yes. All I'm saying is often you only need part of the document tree to be
> in memory to be able to process it correctly. Think ASCII files - we're used
> to processing a 'buffer' at a time aren't we? We rarely ever make the
> assumption we can fit it all in RAM at once do we? Surely we can do the same
> with JDOM too? The XML document can often be too big to fit all into RAM at
> once.
>
> > If you want events, wouldn't you use SAX directly?
>
> SAX is too low level for non trivial use. SAX is sub-Element level
> granularity, we have to build the Elements ourselves from SAX events and
> then connect them up by hand into Element subtrees.
>
> JDOM right now is Document level granularity - we get a whole Document or
> nothing if it wont fit in RAM.
>
> This proposal is to allow a 'SAXProcessor' to create document sub trees and
> process those - i.e sub-Document (i.e. Element tree) level.
>
> > The thing described below could be implemented as an event handler (using
> > SAX) or as a tree-walker (using JDOM). You SAXProcessor coudl just as
> > easily be a JDOMWalker, yes?
> >
> > Document doc = ...
>
> You missed out the vital step. The "..." could have just tried to load a 1Gb
> XML file into RAM and barfed because it wouldn't fit. Or for a big XML file
> that would just fit, it would have sat there on high CPU usage for a long
> time until it had read the entire tree into memory.
Fair enough :-)
There was some talk about reworking JDOM so that you did "lazy" reading
to an extent (buffered the input rather than reading the whole document). It
seems to be a necessity for handling large XML files. It would also be good
for any application that will make one pass through the Document:
- read an XML data file and convert it into a TableModel or
TreeModel for display via Swing
- pass-through filtering (web apps, for example)
I think a LazyDocument wouldn't be trivial to write since an Element (subtree)
can have arbitrary size... at first glance it appears that the live part of the
document would vary in size. Maybe the caller would have to explicitly
discard fragments when they were done (an Iterator type of thing comes to
mind: next() could discard the last fragment and process the next one).
This may be a good base upon which one could build more powerful tools
that looked to operating on a Document but had a more "stream-processing"
behaviour in practice.
--
Patrick Dowler
Canadian Astronomy Data Centre
More information about the jdom-interest
mailing list