[jdom-interest] Thread questions regarding JDOM SAXBuiler?
David Wall
d.wall at computer.org
Mon Aug 30 20:04:57 PDT 2004
Peter,
Thanks for your input. Can you share the results you got?
Can anybody explain that behavior? It sounds suspect. Of course, the cost
of creating a SAXBuilder should go down relative to the time for parsing as
the XML file gets bigger, but the cost of construction shouldn't change much
unless there's a memory leak in the program. For example, are the Documents
created from build() being destroyed? Is it just the garbage collector
that's entering the picture? I know that the modern GC does well with lots
of small objects coming and going because that's the most typical scenario
(especially String). But it seems odd that the construction of an object
would change just because bigger XML files are used in the build() method.
David
----- Original Message -----
From: "Per Norrman" <per.norrman at austers.se>
To: "David Wall" <d.wall at computer.org>
Cc: <jdom-interest at jdom.org>
Sent: Monday, August 30, 2004 3:58 PM
Subject: Re: [jdom-interest] Thread questions regarding JDOM SAXBuiler?
> Hi,
>
> I had a test program lying around that was fairly easy to
> adopt to make an unscientific measurement of the "cost" of
> allocating new SAXBuilders/XMLReaders vs reusing them. This program
measures
> total parse time for five threads parsing 20 identical XML sources,
> i.e. 4 each. Each test run is set up with a different "file" size,
> from ~20Kb up to over 0.5Mb.
>
> I was surprised! As expected, the cost (in speed) of allocating a new
> SAXBuilder/XMLReader shows up when parsing small files, but, rather
quickly,
> something else kicks in and reverses that situation, so that for
sufficiently
> large "files", you loose speed when reusing the parser. Telling exactly,
or even
> approximately, where the lines cross each other is most certainly
impossible in
> the general case.
>
> So, go back and read up on the Strategey pattern and make it swappable at
run
> time for optimal tuning ;-)
>
> /pmn
>
>
> David Wall wrote:
> > The docs clearly say that JDOM is not threaded and that thread safety
comes
> > from our code. That's fine.
> >
> > But can anybody tell me if the cost of instantiating a SAXBuilder object
is
> > considered "expensive" or not? In other words, should I have a pool of
> > SAXBuilder objects if I plan on doing a lot of XML parsing (which is
already
> > expensive, though most of our docs are quite small -- such as
configuration
> > data) or are these objects lightweight enough to just instantiate, use
and
> > throw away?
> >
> > Thanks,
> > David
> >
> > ----- Original Message -----
> > From: "David Wall" <d.wall at computer.org>
> > To: <jdom-interest at jdom.org>
> > Sent: Sunday, August 29, 2004 3:32 PM
> > Subject: [jdom-interest] Thread questions regarding JDOM SAXBuiler?
> >
> >
> >
> >>If I expect to do a lot of parsing with JDOM, it seems that I might want
> >
> > to
> >
> >>create a pool of SAXBuilder objects to avoid the overhead of loading the
> >>parser. Is that true, or is the overhead of creating one quite small?
> >>
> >>Is a SAXBuilder thread-safe, or should I only be calling the
> >>SAXBuilder.build() method in a single thread at a time for a given
> >>SAXBuilder?
> >>
> >>And I presume it's okay to reuse a SAXBuilder object by having many
> >>different threads build() documents over time without any issues that
one
> >>XML parsing would affect the other (assuming they have the same options,
> >>like ignore whitespace, validation, etc.).
> >>
> >>Is that the case? If I do a lot of parsing (at least once per HTTP
> >>request), should I use a pool of SAXBuilder objects for this purpose, or
> >
> > is
> >
> >>the overhead small enough that I can just create a new SAXBuilder
whenever
> >
> > I
> >
> >>want one?
> >>
> >>Thanks,
> >>David
> >>
> >>_______________________________________________
> >>To control your jdom-interest membership:
> >>http://www.jdom.org/mailman/options/jdom-interest/youraddr@yourhost.com
> >
> >
> > _______________________________________________
> > To control your jdom-interest membership:
> > http://www.jdom.org/mailman/options/jdom-interest/youraddr@yourhost.com
> >
>
>
----------------------------------------------------------------------------
----
> package large;
>
> import java.io.StringReader;
> import java.text.DateFormat;
> import java.text.DateFormatSymbols;
> import java.util.Calendar;
> import java.util.Date;
>
> import org.jdom.Comment;
> import org.jdom.Document;
> import org.jdom.Element;
> import org.jdom.input.SAXBuilder;
> import org.jdom.output.XMLOutputter;
> import org.xml.sax.InputSource;
>
> import EDU.oswego.cs.dl.util.concurrent.LinkedQueue;
>
> /**
> * @author Per Norrman
> *
> */
> public class ThreadedReader {
> private boolean _reuse = true;
>
> private String _xml = "";
>
> LinkedQueue _queue = new LinkedQueue();
>
> private long _time = 0;
>
> public ThreadedReader(boolean reuse) {
> _reuse = reuse;
> }
>
> public synchronized void addTime(long elapsed) {
> _time += elapsed;
> }
>
> public synchronized long getTime() {
> return _time;
> }
>
> public void reset() {
> _time = 0;
> }
>
> public void process(String start, String end, int count) throws
Exception {
> reset();
> generate(start, end);
> // fill work queue
> for (int i = 1; i <= count; ++i) {
> _queue.put(new InputSource(new StringReader(_xml)));
> }
>
> // create threads
> Thread[] thread = new Thread[5];
> for (int i = 0; i < 5; ++i) {
> thread[i] = new ReaderThread(_reuse);
> thread[i].start();
> }
>
> // make them stop
> for (int i = 0; i < 5; ++i) {
> _queue.put(new Object());
> }
>
> for (int i = 0; i < 5; ++i) {
> thread[i].join();
> }
>
> // report
> System.out.println("Reuse=" + _reuse + "\tsize=" + _xml.length()
> + "\ttime: " + getTime());
> }
>
> public void generate(String startDate, String endDate) throws
Exception {
> DateFormat df = DateFormat.getDateInstance(DateFormat.SHORT);
> DateFormatSymbols dfs = new DateFormatSymbols();
> String[] weekDays = dfs.getWeekdays();
>
> Element root = new Element("root");
> Document doc = new Document(root);
> doc.getContent().add(0,
> new Comment(" Generated: " + df.format(new Date()) + "
"));
>
> Calendar cal = Calendar.getInstance();
> Date start = df.parse(startDate);
> Date end = df.parse(endDate);
>
> cal.setTime(start);
> while (cal.getTime().before(end)) {
> Element date = new Element("day");
> date.addContent(new Element("date").setText(df
> .format(cal.getTime())));
> root.addContent(date);
> String weekDay = weekDays[cal.get(Calendar.DAY_OF_WEEK)];
> Element day = new Element("dayname").setText(weekDay);
> date.addContent(day);
> cal.add(Calendar.DATE, 1);
> }
>
> XMLOutputter out = new XMLOutputter();
>
> _xml = out.outputString(doc);
>
> }
>
> public static void test(String start, String end) throws Exception {
> new ThreadedReader(true).process(start, end, 20);
> new ThreadedReader(false).process(start, end, 20);
> }
>
> public static void main(String[] args) throws Exception {
> test("2000-01-01", "2001-01-01");
> test("2000-01-01", "2001-01-01");
> test("2000-01-01", "2001-12-31");
> test("1990-01-01", "2004-12-31");
> test("1970-01-01", "2004-12-31");
> }
>
> private class ReaderThread extends Thread {
> private boolean _reuse = true;
>
> private SAXBuilder _builder = new SAXBuilder();
>
> public ReaderThread(boolean reuse) {
> _reuse = reuse;
> _builder.setReuseParser(reuse);
> }
>
> private void parse(InputSource source) {
> long elapsed = 0;
> try {
> elapsed = System.currentTimeMillis();
> if (_reuse) {
> _builder.build(source);
> } else {
> SAXBuilder builder = new SAXBuilder();
> _builder.build(source);
> }
> elapsed = System.currentTimeMillis() - elapsed;
> addTime(elapsed);
> } catch (Exception e) {
> System.out.println(getName() + ": " + e.getMessage());
> }
> }
>
> public void run() {
> try {
> while (true) {
> Object thing = _queue.take();
> if (thing instanceof InputSource) {
> parse((InputSource) thing);
> } else {
> break;
> }
> }
> } catch (InterruptedException e) {
> System.out.println(getName() + ": " + e.getMessage());
> }
> }
> }
>
> }
More information about the jdom-interest
mailing list