<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 3.2//EN">
<HTML>
<HEAD>
<META HTTP-EQUIV="Content-Type" CONTENT="text/html; charset=iso-8859-1">
<META NAME="Generator" CONTENT="MS Exchange Server version 5.5.2650.12">
<TITLE>RE: [jdom-interest] Partial Tree building/instantiation --- XPathFilter</TITLE>
</HEAD>
<BODY>
<BR>
<P><FONT SIZE=2>I have run into this problem as well. I have experimented with one</FONT>
<BR><FONT SIZE=2>approach that works and would like to know what others think of it.</FONT>
</P>
<P><FONT SIZE=2>By the way, I looked at XSLT for a bit, and it appeared to me</FONT>
<BR><FONT SIZE=2>that that the XSLT processor wanted to load the whole document</FONT>
<BR><FONT SIZE=2>into memory first. Saxon has a Preview mode to get around this</FONT>
<BR><FONT SIZE=2>but I didn't research that very far since I would have had to</FONT>
<BR><FONT SIZE=2>do two passes on everything before it was in JDOM where I wanted</FONT>
<BR><FONT SIZE=2>it.</FONT>
</P>
<BR>
<P><FONT SIZE=2>My solution is as follows:</FONT>
</P>
<P><FONT SIZE=2>Add a setHandler() method to a copy of org.jdom.input.SAXBuilder</FONT>
<BR><FONT SIZE=2>to which an element name and a handler class is passed.</FONT>
</P>
<P><FONT SIZE=2> private HashMap handledElements = new HashMap();</FONT>
<BR><FONT SIZE=2> public void setHandler(String elementName, ElementHandler handler) {</FONT>
<BR><FONT SIZE=2> handledElements.put(elementName, handler);</FONT>
<BR><FONT SIZE=2> }</FONT>
</P>
<P><FONT SIZE=2>Modify SAXHandler.endElement() (still inside SAXBuilder) so that</FONT>
<BR><FONT SIZE=2>it calls the handler if this is a registered element. Upon return</FONT>
<BR><FONT SIZE=2>from the handler, the element content is removed from the tree.</FONT>
</P>
<P><FONT SIZE=2>The end usage ends up looking something like this:</FONT>
</P>
<P> <FONT SIZE=2>...</FONT>
<BR> <FONT SIZE=2>MyElementHandler myhandler = new MyElementHandler();</FONT>
</P>
<P> <FONT SIZE=2>reader.setHandler("transferBatch", myhandler);</FONT>
<BR> <FONT SIZE=2>reader.setHandler("phoneCall", myhandler);</FONT>
<BR> <FONT SIZE=2>try {</FONT>
<BR> <FONT SIZE=2>reader.build(srcXmlFilename);</FONT>
<BR> <FONT SIZE=2>}</FONT>
<BR> <FONT SIZE=2>...</FONT>
</P>
<P><FONT SIZE=2>You can have as many handlers as you want, and, anything you handle </FONT>
<BR><FONT SIZE=2>is discarded after it's handler is called.</FONT>
</P>
<P><FONT SIZE=2>The XPathFilter proposal would be very nice, since</FONT>
<BR><FONT SIZE=2>the above solution only registers explicit element names.</FONT>
</P>
<P><FONT SIZE=2>I wanted to extend SAXBuilder, but I could not see how to do this. </FONT>
<BR><FONT SIZE=2>I see from other posts that XMLFilter may help. I will look at that</FONT>
<BR><FONT SIZE=2>and I welcome more advice in this area.</FONT>
</P>
<P><FONT SIZE=2>I can't see how to avoid needing access to the JDOM Element</FONT>
<BR><FONT SIZE=2>stack (named "stack") inside the SAXHandler in SAXBuilder. That's</FONT>
<BR><FONT SIZE=2>how I pass the JDOM Element to the handler. Perhaps a small change</FONT>
<BR><FONT SIZE=2>to SAXBuilder would expose a safe interface to that stack and then</FONT>
<BR><FONT SIZE=2>a subclass of SAXBuilder with an XMLFilter might be a clean solution?</FONT>
</P>
<BR>
<P><FONT SIZE=2>Regards,</FONT>
</P>
<P><FONT SIZE=2>Adam Simantel</FONT>
<BR><FONT SIZE=2>Adam.Simantel@merant.com</FONT>
</P>
<BR>
<BR>
<P><FONT SIZE=2>-----Original Message-----</FONT>
<BR><FONT SIZE=2>From: Steven Gould [<A HREF="mailto:steven.gould@cgiusa.com">mailto:steven.gould@cgiusa.com</A>]</FONT>
<BR><FONT SIZE=2>Sent: Monday, April 02, 2001 2:14 PM</FONT>
<BR><FONT SIZE=2>To: jdom-interest@jdom.org</FONT>
<BR><FONT SIZE=2>Subject: Re: [jdom-interest] Partial Tree building/instantiation ---</FONT>
<BR><FONT SIZE=2>XPathFilter</FONT>
</P>
<BR>
<P><FONT SIZE=2>Jakob,</FONT>
</P>
<P><FONT SIZE=2>Could you use XSLT to break the file up into smaller, more manageable</FONT>
<BR><FONT SIZE=2>documents? Then use JDOM to manipulate/process each of these smaller</FONT>
<BR><FONT SIZE=2>documents.</FONT>
</P>
<P><FONT SIZE=2>Steve</FONT>
</P>
<P><FONT SIZE=2>---</FONT>
</P>
<P><FONT SIZE=2>Jakob Jenkov wrote:</FONT>
</P>
<P><FONT SIZE=2>> Hi There. I'm currently working on a long, long :-) project in which</FONT>
<BR><FONT SIZE=2>> we parse through some quite long files. We have tried converting these</FONT>
<BR><FONT SIZE=2>> files to XML for easier/standard parsing but each file will then be of</FONT>
<BR><FONT SIZE=2>> a size of about 16-30+ MB each. I don't even dare think about how much</FONT>
<BR><FONT SIZE=2>> memory such a JDOM tree would take! And the plans for lazy evaluation</FONT>
<BR><FONT SIZE=2>> won't help, since we are visiting every node in the tree, thus</FONT>
<BR><FONT SIZE=2>> instantiating all objects anyway. Parsing the trees solely using SAX</FONT>
<BR><FONT SIZE=2>> is not developer-friendly enough. What I have in mind is some kind of</FONT>
<BR><FONT SIZE=2>> a XPath filter, allowing you to build JDOM trees from sub trees from</FONT>
<BR><FONT SIZE=2>> the data, and dipose these trees when I don't longer need that tree.</FONT>
<BR><FONT SIZE=2>> Let me give an example: We parse phone call records in files that</FONT>
<BR><FONT SIZE=2>> sometimes can contain thousands and thousands of records. In XML</FONT>
<BR><FONT SIZE=2>> format these files and records would look something like</FONT>
<BR><FONT SIZE=2>> this: <transferBatch> <phoneCall> <details>bla.bla.bla., sub</FONT>
<BR><FONT SIZE=2>> records etc.</details> </phoneCall> <phoneCall></FONT>
<BR><FONT SIZE=2>> <details>bla.bla.bla., sub records etc.</details> </phoneCall></FONT>
<BR><FONT SIZE=2>> <phoneCall> <details>bla.bla.bla., sub records</FONT>
<BR><FONT SIZE=2>> etc.</details> </phoneCall> ... ...</FONT>
<BR><FONT SIZE=2>> ...</transferBatch> Each <phoneCall> record with all it's sub</FONT>
<BR><FONT SIZE=2>> records can be quite large, and there can be thousands of these</FONT>
<BR><FONT SIZE=2>> <phoneCall> records. I'd like some way to get a JDOM tree for each</FONT>
<BR><FONT SIZE=2>> <phoneCall> record one at a time, and to be able to dispose</FONT>
<BR><FONT SIZE=2>> <phoneCall> JDOM tree before moving on to the next. How will I do</FONT>
<BR><FONT SIZE=2>> that? My Suggestion would be to insert an XPathFilter, that only</FONT>
<BR><FONT SIZE=2>> builds JDOM trees from the records that match the given XPath. In the</FONT>
<BR><FONT SIZE=2>> example above, an XPath of transferBatch::phoneCall would have</FONT>
<BR><FONT SIZE=2>> done the job. Does my complaints/ideas sound completely</FONT>
<BR><FONT SIZE=2>> out-of-this-world? I think there are many out there who will have the</FONT>
<BR><FONT SIZE=2>> same problem, parsing one sub tree at a time, without regard to the</FONT>
<BR><FONT SIZE=2>> others. Regards,Jakob Jenkovjakob@jenkov.com</FONT>
<BR><FONT SIZE=2>_______________________________________________</FONT>
<BR><FONT SIZE=2>To control your jdom-interest membership:</FONT>
<BR><FONT SIZE=2><A HREF="http://lists.denveronline.net/mailman/options/jdom-interest/youraddr@yourhost.com" TARGET="_blank">http://lists.denveronline.net/mailman/options/jdom-interest/youraddr@yourhost.com</A></FONT>
</P>
</BODY>
</HTML>