[jdom-interest] XMLOuputter = SAXOutputter + XMLFilter + XMLWriter

Brett McLaughlin brett.mclaughlin at lutris.com
Mon Oct 9 14:14:03 PDT 2000


Joseph Bowbeer wrote:
> 
> I've been thinking about SAX event streams and SAX filters, and how JDOM
> should interface with these.  I've also been thinking about the kind of
> support JDOM should provide for data documents (no mixed content).
> 
> The use case leading to these thoughts was the necessity to write JDOM
> elements onto a SAX event stream.  For example: outputting an unknown
> number of entries (child elements) to a log file.
> 
> After experimenting with JDOM's XMLOutputter and looking at David
> Megginson's XMLWriter and DataWriter, I'd like to suggest the following
> refactoring of XMLOutputter.
> 
>   XMLOutputter = SAXOutputter + XMLFilter + XMLWriter
> 
> 1. SAXOutputter should be the cornerstone of JDOM output.  (When it is
> implemented, SAXOutputter will convert JDOM pieces into SAX events.)
> 
> Why make SAXOutputter the cornerstone?  Because that's where the best
> leverage is.  If we can generate SAX events correctly, we can do
> anything related to output.  The addition (or removal) of indentation
> and newlines can be viewed as a filter acting on the SAX event stream.

I'm not sure of this - generating SAX events from a JDOM Document is
something we need in SAXOutputter, granted. But do we want to hardwire
this into everything? For example, would it not be slower to go from
JDOM Docuemt --> SAX events --> Output as opposed to JDOM Document -->
output? I think in most cases, yes. I do admit later that I think
/allowing/ XMLOuputter to be chained onto a SAXOutputter makes sense, in
the case where we do have more of a flexible pipeline model; but I'm not
sure requireing that makes sense. It seems like an extra step that adds
time to output.

> 
> 2. Take most of what is currently in XMLOutputter and move it into
> something along the lines of Megginson's XMLWriter.
> 
> Note: XMLWriter takes a SAX event stream and writes out XML.  XMLWriter
> should (for example) provide options for customizing the appearance of
> the XML header.  It should not, however, provide options for adding
> newlines or indentation, because all whitespace is potentially
> significant (but see #3 below).
> 
> As with Megginson's version, our XMLWriter should implement XMLFilter.
> Since an XMLFilter is an event source as well as an event sink, an
> XMLWriter can be inserted into the middle of an event stream without
> interrupting the flow, and the XML output can be sluiced out the side.

Now this is more intriguing to me. I'd have to see more examples of
pipelining before the work became worth it, but I definitely see the
potential. I'm curious...

> 
> 3. For formatting "data documents", implement a special XMLFilter.  Call
> it DataFormatFilter.
> 
> Note the term "data documents".  In Megginson's terminology, these are
> documents that contain only fielded content (no mixed content).  This
> data format filter inserts the additional newlines and indentation that
> are needed to "pretty-print" the data document.  The indent width,
> indent  character, and line ending should all be customizable.
> 
> Note: The DataFormatFilter is similar to Megginson's DataWriter, except
> it should be implemented as a pure filter rather than as a subclass of
> XMLWriter.  (Filter composition rocks; subclassing XMLWriter is fragile
> and unnecessary in this case.)
> 
> (I plan to implement the DataFormatFilter, and the related
> DataUnformatFilter described later.)
> 
> 4. Finally, XMLOutputter becomes a convenience class that provides the
> same toplevel "output" methods it does now.
> 
> XMLOutputter is responsible for creating the constituent components,
> hooking them up to form an output pipeline, and delegating to them.
> 
> Comments?  XMLFilter is a SAX2 thing, which was released 5/2000.  Does
> this matter?

Not that it's SAX 2, but that it's SAX. I'm not convinced that there is
any advantage in tying all XML output, re: XMLOutputter, to SAX. Right
now, if you don't need to parse XML documents, but just create and
output them, you don't need xerces.jar, or anything other than the very
small jdom.jar. I'm not sure that it makes sense to change that, and
introduce a SAX dependence, which can (almost surprisingly) get very
big.

That said, I don't have any problems with

(1) Moving SAXOutputter to use SAX and XMLFilters a lot better
(2) enabling XMLOutputter to use something like that for a feed.

In other words, I'm not at all against allowing XMLOutputter to work
with SAX filters and XMLFilter, but I'm not convinced that we want all
output hard-wired to SAX.

> 
> Here are some related ideas for pipelining the input side:
> 
>   XMLReader + XMLFilter + SAXBuilder = JDOM document
> 
> 1. Add an optional DataUnformatFilter to remove newlines and indentation
> from data documents.  This filter reads the SAX event stream from the
> reader/parser and passes it through to the SAXBuilder, removing the
> extra formatting along the way.

This is something we have talked about (allowing up-front stripping of,
for example, whitespace). It would be optional, and I think a good idea.
It's along the lines of what JAXP 1.1 is doing, by the way.

> 
> 2. For added convenience, the SAXBuilder should implement XMLFilter.

Convince me - I'm not against this, but I'm not sure I see the
advantage. Perhaps for pipelining?

Joe, this is some really excellent work (even though I'm not 100% sure
on it all yet). I really like the idea of building better pipelines - I
think we are going to need things like HTMLOutputter and HTMLBuilder, as
well as some other cool variations. Building a more robust pipeline
makes a lot of sense; however, I'm not yet convinced that hard-wiring it
to SAX makes sense. 

I'm curious - is it SAX you want to use, or the functionality that these
SAX-based components (XMLFilter and XMLWriter) provide? It reads like it
is the functionality; if that is the case, it might make sense to
decouple the filters from SAX.

Let's continue this thread ...

-Brett

> 
> --
> Joe Bowbeer
> 
> _______________________________________________
> To control your jdom-interest membership:
> http://lists.denveronline.net/mailman/options/jdom-interest/youraddr@yourhost.com

-- 
Brett McLaughlin, Enhydra Strategist
Lutris Technologies, Inc. 
1200 Pacific Avenue, Suite 300 
Santa Cruz, CA 95060 USA 
http://www.lutris.com
http://www.enhydra.org



More information about the jdom-interest mailing list