[jdom-interest] java.lang.OutOfMemory with an 10 MB xml file??

Michael Kay mike at saxonica.com
Thu Aug 4 03:02:22 PDT 2005


> In general XPath over large files is an issue since XPath needs to 
> navigate the whole tree.
> I think XSLTproc also has troubles but I may mistake.
> 
> I see two solutions to your problem:
> 
> - use STaX or XPP which are, both, supposed to be the ones with least 
> memory impact. I think some support XPath.


This slightly misses the point. Building the tree in the first place is much
more expensive than scanning it (once) using XPath. If you only need to scan
the document once then there are a number of approaches you can use to avoid
building the tree, for example SAX, StAX, and at a higher level, STX.
(Confusingly similar names!)

The latest Saxon-SA release, 8.5, has an option for serial processing using
a subset of XPath: see

http://www.saxonica.com/documentation/sourcedocs/serial.html

However, 10Mb is not really big - Saxon can handle that easily in memory on
any modern machine (I've no idea if JDOM can, though). Just allocate enough
heap space. Problems only start at 100Mb.

Michael Kay
http://www.saxonica.com/




More information about the jdom-interest mailing list