[jdom-interest] Partial Tree building/instantiation --- XPathFilter

Jakob Jenkov jakob at jenkov.com
Mon Apr 2 13:33:37 PDT 2001


Hi There.

I'm currently working on a long, long :-) project in which we parse through some quite long files. We have tried converting these files to XML for easier/standard parsing but each file will then be of a size of about 16-30+ MB each. I don't even dare think about how much memory such a JDOM tree would take! And the plans for lazy evaluation won't help, since we are visiting every node in the tree, thus instantiating all objects anyway. Parsing the trees solely using SAX is not developer-friendly enough. What I have in mind is some kind of a XPath filter, allowing you to build JDOM trees from sub trees from the data, and dipose these trees when I don't longer need that tree. Let me give an example:

We parse phone call records in files that sometimes can contain thousands and thousands of records. In XML format these files and records would look something like this:

<transferBatch>
    <phoneCall>
        <details>bla.bla.bla., sub records etc.</details>
    </phoneCall>
    <phoneCall>
        <details>bla.bla.bla., sub records etc.</details>
    </phoneCall>
    <phoneCall>
        <details>bla.bla.bla., sub records etc.</details>
    </phoneCall>
    ...
    ...
    ...
</transferBatch>



Each <phoneCall> record with all it's sub records can be quite large, and there can be thousands of these <phoneCall> records. I'd like some way to get a JDOM tree for each <phoneCall> record one at a time, and to be able to dispose <phoneCall> JDOM tree before moving on to the next. How will I do that?

My Suggestion would be to insert an XPathFilter, that only builds JDOM trees from the records that match the given XPath. In the example above, an XPath of    transferBatch::phoneCall   would have done the job.

Does my complaints/ideas sound completely out-of-this-world? I think there are many out there who will have the same problem, parsing one sub tree at a time, without regard to the others.


Regards,
Jakob Jenkov
jakob at jenkov.com





-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://jdom.org/pipermail/jdom-interest/attachments/20010402/37ff7186/attachment.htm


More information about the jdom-interest mailing list