[jdom-interest] Memory usage when processing a large file

Dennis Sosnoski dms at sosnoski.com
Mon Oct 8 14:11:53 PDT 2001


That's a really large size for the serialized form relative to the original document.
You can see some comparison figures in an article I did for developerWorks, at
http://www-106.ibm.com/developerworks/xml/library/x-injava/index.html. None of my
documents even came close to a 10x expansion, but I didn't use any with lots of
really short text content. That might be what's making the difference.

I'll just echo bob mcwhirter's earlier post and suggest you look at using dom4j's
approach instead (http://dom4j.org). dom4j is designed for handling documents too
large to fit in memory and has hooks for piece-at-a-time processing.

  - Dennis

Benjamin Kopic wrote:

> Hi Dennis
>
> Thanks for getting back to me. I have found out that Document object
> returned by SAXBuilder is the "bottleneck", i.e. for an XML file of 750K the
> Document object when serialised and converted into byte array is around 7MB
> (i.e. ~7000 bytes). Also looking at the garbage collector
> (using -verbose:gc) I saw that it cleans up roughly 10 times the size of the
> file when invoked.
>
> The file is fairly flat:
> <!ELEMENT colours (row*)>
>   <!ELEMENT row (aID,blue,green,yellow,red,bID?,cID?,grey?)>
>     <!ELEMENT aID    (#PCDATA)>
>     <!ELEMENT blue   (#PCDATA)>
>     <!ELEMENT green  (#PCDATA)>
>     <!ELEMENT yellow (#PCDATA)>
>     <!ELEMENT red    (#PCDATA)>
>     <!ELEMENT bID    (#PCDATA)>
>     <!ELEMENT cID    (#PCDATA)>
>     <!ELEMENT grey   (#PCDATA)>
>
> But it has a lot of "row" elements, but none of the elements is too big,
> that is to say that their CDATA content is small (a couple hundred
> characters at most).
>
> Best regards
>
> Benjamin
>
> > -----Original Message-----
> > From: Dennis Sosnoski [mailto:dms at sosnoski.com]
> > Sent: 08 October 2001 03:35
> > To: Benjamin Kopic
> > Cc: jdom-interest at jdom.org
> > Subject: Re: [jdom-interest] Memory usage when processing a large file
> >
> >
> >
> > Hi Benjamin,
> >
> > I've run some tests using documents up to just over 1MB (nt.xml,
> > the New Testament
> > marked up with element wrappers for the text). The JDOM document
> > took a little over
> > 3MB of the Java heap, though I didn't look at total usage by the
> > JVM (as seen by the
> > system).
> >
> > Have you looked at how your memory usage scales for smaller
> > documents? You might also
> > try pausing the program at various points and see when your
> > memory usage goes
> > offscale. I'd personally suspect the database interface code more
> > than JDOM, though,
> > unless your documents are very unusual (lots of entities that
> > expand to huge amounts
> > of text, for instance).
> >
> >   - Dennis
> >
> > Benjamin Kopic wrote:
> >
> > > Hi
> > >
> > > We have an application that processes a data feed and loads it into a
> > > database. It builds a JDom Document using SAXBuilder and
> > Xerces, and then it
> > > uses Jaxen XPath to retrieve data needed.
> > >
> > > The problem is that when we parse a 7MB feed the memory usage
> > by Java jumps
> > > to 110MB. Has anyone else used to process relatively large data
> > feeds with
> > > JDom?
> > >
> > > Best regards
> > >
> > > Benjamin Kopic
> > > E: ben at kopic.org
> > > W: www.kopic.org
> > > T: +44 (0)20 7794 3090
> > > M: +44 (0)78 0154 7643
> > >
> > > _______________________________________________
> > > To control your jdom-interest membership:
> > >
> http://lists.denveronline.net/mailman/options/jdom-interest/youraddr@yourhos
> t.com
>
> _______________________________________________
> To control your jdom-interest membership:
> http://lists.denveronline.net/mailman/options/jdom-interest/youraddr@yourhost.com




More information about the jdom-interest mailing list