SV: [jdom-interest] JDOM and very large files

Fri Oct 10 00:20:32 PDT 2003

Hi,

Actually, the file (or input stream) is read sequentially,
but, as you say, the entire document tree is built in memory;
and this is usually much larger than the text represntation.

You can use SAX directly and bypass the tree building step. 
This maybe a bit awkward if you are unfamiliar with its
callback mechanism, but it should work well for simple
processing.

Another allternative is the new StAX API (Streaming API for XML).
Here is an introduction, http://www.xml.com/pub/a/2003/09/17/stax.html,
with pointers to a reference implementation.

Also, there is of course lots of PERL implementations
out there, but you'll have to ask someone else about
their usefulness. I once had to maintain a PERL thingy
which manipulated a large XML file; since then me and PERL
has an agreement not to come closer than 100 yards of each
other.

/pmn

> -----Ursprungligt meddelande-----
> Från: jdom-interest-admin at jdom.org 
> [mailto:jdom-interest-admin at jdom.org] För Daryl Handley
> Skickat: den 10 oktober 2003 07:08
> Till: jdom-interest at jdom.org
> Ämne: [jdom-interest] JDOM and very large files
> 
> 
> Hi everyone,
> 
> I am trying to parse a very large file (1.3 Gb) using JDOM. I 
> have used JDOM before with smaller files, but never large 
> ones. It seems like the builder attempts to read the whole 
> file into memory and build the document. Is there a way to 
> read only part of the document into memory ? I only beed to 
> read through the file sequentially, random access to parts of 
> the doc is not important. Does anyone have any suggestions 
> how to do this with JDOM ? Java in general ? Any other language ?
> 
> Otherwise I may have to write my own parser to do it in PERL. 
> The document structure is fairly simple (just huge) so this 
> shouldn't be too hard, but I would prefer to do it with JDOM 
> if possible.
> 
> _______________________________________________
> To control your jdom-interest membership: 
> http://lists.denveronline.net/mailman/options/jdom-interest/yo
uraddr at yourhost.com