[jdom-interest] Toward beta 9

Jason Hunter jhunter at acm.org
Thu Apr 10 13:05:26 PDT 2003


Yep, this is something I tried in the past.  One thing I'll point out is
that XML files don't tend to be "relatively random" but often contain
mostly or exclusively ASCII.  The verifier checks as you'll note are
optimized for low-numbered characters (because it starts looking at low
blocks first).  So what would be more interesting is to compare the
verifier performance between large ASCII files rather than JAR binary
data which doesn't look anything like XML.  When I tried the approach
against these types of files, I actually saw a slowdown.  But I'd be
happy for you to see what's happening today.  My tests were a long time
ago.

One thing I was thinking of back then was trying was using a byte for
each character and each bit within the byte on whether this char was a
digit, extender, legal character, etc.  I'd also toyed with the idea of
compiling that data directly into the class so it could be loaded w/o
calculation overhead.

Rolf, do you want to tinker with these ideas?  Regardless of whether we
establish a way to turn off some of the verifier checks, having the
verifier run faster when it's on is always better and checking XML text
data is the bulk of the slowness ISTR.

Also, you didn't attach your code.  :-)

-jh-

> Rolf Lear wrote:
> 
> How serious are people about performance in Verifier?
> 
> Using a relatively random input source (the characters in various
> Jars), I can get a 500% - 1000% performance improvement in Verifier.
> 
> This is relatively simple, and "just as logical" as the existing
> verifier.
> 
> Have a look at the attached code, it is a "new" Verifier, with a main
> method which has a relatively clunky, but effective performance test
> comparison between the existing checks, and the proposed checks.
> 
> On my linux box I am getting performance improvements from 5.6
> ms/10000 chars to 0.7ms/10000 chars. I know that the numbers are
> rough, but people with profilers may be able to substantiate them
> better.
> 
> The basic principal is to build a bitmask representing all the valid
> letters/combinations. The bitmask has 0xffff+1 bits, i.e. is 8K
> (relatively small), and there is 1 for each "test". I have done only
> the isXMLLetter, and isXMLCombiner. The pre-processing overhead is
> relatively small, (on my box I measure 23ms).
> 
> Have a look-see, and tell me if I am barking up the wrong tree. I
> haven't neatened up the code too much, but the principal seems good.
> 
> I have been running:
> 
> ant package
> java -cp build/jdom.jar org.jdom.Verifier 5 lib/*.jar
> 
> and getting results:
> 
> Building lettermask
> Done in 22ms.
> Building combinationmask
> Done in 0ms.
> OLD Iteration lib/ant.jar count 0 took 6.93ms/10000 chars, counted
> 176182 trues in 732481 characters .
> NEW Iteration lib/ant.jar count 0 took 0.76ms/10000 chars, counted
> 176182 trues in 732481 characters .
> OLD Iteration lib/ant.jar count 1 took 5.61ms/10000 chars, counted
> 176182 trues in 732481 characters .
> NEW Iteration lib/ant.jar count 1 took 0.76ms/10000 chars, counted
> 176182 trues in 732481 characters .
> OLD Iteration lib/ant.jar count 2 took 5.66ms/10000 chars, counted
> 176182 trues in 732481 characters .
> NEW Iteration lib/ant.jar count 2 took 0.76ms/10000 chars, counted
> 176182 trues in 732481 characters .
> OLD Iteration lib/ant.jar count 3 took 5.69ms/10000 chars, counted
> 176182 trues in 732481 characters .
> NEW Iteration lib/ant.jar count 3 took 0.76ms/10000 chars, counted
> 176182 trues in 732481 characters .
> OLD Iteration lib/ant.jar count 4 took 5.61ms/10000 chars, counted
> 176182 trues in 732481 characters .
> NEW Iteration lib/ant.jar count 4 took 0.76ms/10000 chars, counted
> 176182 trues in 732481 characters .
> OLD Iteration lib/jaxen-core.jar count 0 took 5.34ms/10000 chars,
> counted 41039 trues in 160965 characters .
> NEW Iteration lib/jaxen-core.jar count 0 took 0.86ms/10000 chars,
> counted 41039 trues in 160965 characters .
> OLD Iteration lib/jaxen-core.jar count 1 took 5.34ms/10000 chars,
> counted 41039 trues in 160965 characters .
> NEW Iteration lib/jaxen-core.jar count 1 took 0.8ms/10000 chars,
> counted 41039 trues in 160965 characters .
> OLD Iteration lib/jaxen-core.jar count 2 took 5.34ms/10000 chars,
> counted 41039 trues in 160965 characters .
> NEW Iteration lib/jaxen-core.jar count 2 took 0.8ms/10000 chars,
> counted 41039 trues in 160965 characters .
> OLD Iteration lib/jaxen-core.jar count 3 took 5.34ms/10000 chars,
> counted 41039 trues in 160965 characters .
> NEW Iteration lib/jaxen-core.jar count 3 took 0.8ms/10000 chars,
> counted 41039 trues in 160965 characters .
> OLD Iteration lib/jaxen-core.jar count 4 took 5.34ms/10000 chars,
> counted 41039 trues in 160965 characters .
> NEW Iteration lib/jaxen-core.jar count 4 took 0.8ms/10000 chars,
> counted 41039 trues in 160965 characters .
> ........
> 
> Rolf
> 
> -----Original Message-----
> From: Elliotte Rusty Harold [mailto:elharo at metalab.unc.edu]
> Sent: Thursday, April 10, 2003 7:54 AM
> To: jdom-interest at jdom.org
> Subject: Re: [jdom-interest] Toward beta 9
> 
> At 10:55 PM -0700 4/9/03, Philip Nelson wrote:
> >Has anybody tried this approach?
> >
> >create a package protected or inner subclass of DefaultJDOMFactory in
> 
> >SAXBuilder.  Then in the factory, for example...
> >
> >     private class NoCheckText extends Text
> >     {
> >        public void noCheck(String text) {
> >           value = text;
> >        }
> >     }
> >     public Text text(String text) {
> >         NoCheckText t = new NoCheckText();
> >         t.noCheck(text);
> >         return (Text) t;
> >     }
> 
> That looks like it might actually work without causing too many
> problems or further complicating the API, though it does depend on
> those protected, do-nothing, no-args constructors that I wish we
> didn't have.
> --
> 
> +-----------------------+------------------------+-------------------+
> 
> | Elliotte Rusty Harold | elharo at metalab.unc.edu | Writer/Programmer |
> 
> +-----------------------+------------------------+-------------------+
> 
> |           Processing XML with Java (Addison-Wesley, 2002)          |
> 
> |              http://www.cafeconleche.org/books/xmljava             |
> 
> | http://www.amazon.com/exec/obidos/ISBN%3D0201771861/cafeaulaitA  |
> +----------------------------------+---------------------------------+
> 
> |  Read Cafe au Lait for Java News:  http://www.cafeaulait.org/      |
> 
> |  Read Cafe con Leche for XML News: http://www.cafeconleche.org/    |
> 
> +----------------------------------+---------------------------------+
> 
> _______________________________________________
> To control your jdom-interest membership:
> http://lists.denveronline.net/mailman/options/jdom-interest/youraddr@yourhost.com



More information about the jdom-interest mailing list