<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
<html>
<head>
<meta content="text/html;charset=ISO-8859-1" http-equiv="Content-Type">
</head>
<body bgcolor="#ffffff" text="#000000">
Michael has it right -- penny-wise and pound foolish won't result in a
good performing system.<br>
<br>
If your XML is really that simple, maybe XML parsing it is not even the
right solution, though SAX would surely do well. Much depends on how
much data transformation is needed. Your firstname, lastname and SSN
fields likely don't have to be encoded elements, so a simple string
parsing may do much better, such as searching for "<lastname>"
and the pulling the data until you find "</".<br>
<br>
XML parsers are very general, so they are very useful. But if your
data is simple, you may find simple string parsing yourself to be the
fastest.<br>
<br>
David<br>
<br>
<br>
<blockquote cite="mid:%3C48E64528.70806@computer.org%3E" type="cite"><br>
Michael Kay wrote:
<blockquote cite="mid:E27BE570CA684C3E99540A47824DCE3D@Sealion"
type="cite">
<meta http-equiv="Content-Type" content="text/html; ">
<meta content="MSHTML 6.00.6001.18099" name="GENERATOR">
<div dir="ltr" align="left"><span class="684192808-03102008"><font
color="#0000ff" face="Arial" size="2">The point of my message is that
if it's taking 1 second to get the file over the network and 1msec to
process the file, then improving the processing speed to 0.9 msec is a
waste of effort. It's like tuning your car's engine and leaving the
handbrake on. You need to understand the overall system performance
(and the extent to which it falls short of the performance
requirements) before you decide which parts of it to tune.</font></span></div>
<div dir="ltr" align="left"><span class="684192808-03102008"></span> </div>
<div dir="ltr" align="left"><span class="684192808-03102008"><font
color="#0000ff" face="Arial" size="2">Michael Kay</font></span></div>
<div dir="ltr" align="left"><span class="684192808-03102008"><font
color="#0000ff" face="Arial" size="2"><a moz-do-not-send="true"
href="http://www.saxonica.com/">http://www.saxonica.com/</a></font></span></div>
<br>
<blockquote dir="ltr"
style="border-left: 2px solid rgb(0, 0, 255); padding-left: 5px; margin-left: 5px; margin-right: 0px;">
<div class="OutlookMessageHeader" dir="ltr" align="left"
lang="en-us">
<hr tabindex="-1"> <font face="Tahoma" size="2"><b>From:</b>
Praveen Gattu [<a moz-do-not-send="true" class="moz-txt-link-freetext"
href="mailto:pgattu@gmail.com">mailto:pgattu@gmail.com</a>] <br>
<b>Sent:</b> 03 October 2008 01:32<br>
<b>To:</b> Michael Kay<br>
<b>Cc:</b> Paul Libbrecht; <a moz-do-not-send="true"
class="moz-txt-link-abbreviated" href="mailto:jdom-interest@jdom.org">jdom-interest@jdom.org</a><br>
<b>Subject:</b> Re: [jdom-interest] Re: Reading XML with JDOM<br>
</font><br>
</div>
<div dir="ltr">Michael
<div><br>
</div>
<div>The XML is to be obtained over a www URL. Our networks are
T1
speed and performing their best. There isn't a problem with the network
latency, but I acknowledge that retrieving the XML file over a www URL
is probably the most time consuming procedure for my application. Www
Network latency aside, I want to assure that whatever APIs/frameworks I
use for parsing the XML are the fastest. For the purposes of measuring
the performance of the parser, I am using a XML file located in the
file system and removing the network latency aspect. Is this a valid
method to measure the parser's performance?</div>
<div><br>
</div>
<div>-- Praveen<br>
<br>
<div class="gmail_quote">On Thu, Oct 2, 2008 at 3:49 PM, Michael
Kay <span dir="ltr"><<a moz-do-not-send="true"
href="mailto:mike@saxonica.com">mike@saxonica.com</a>></span> wrote:<br>
<blockquote class="gmail_quote"
style="border-left: 1px solid rgb(204, 204, 204); margin: 0px 0px 0px 0.8ex; padding-left: 1ex;">
<div>
<div dir="ltr" align="left"><span><font color="#0000ff"
face="Arial" size="2">You're asking about how to read the data
efficiently with JDOM, but I suspect that if you are looking for
performance then you might be looking in the wrong part of your system.</font></span></div>
<div dir="ltr" align="left"><span></span> </div>
<div dir="ltr" align="left"><span><font color="#0000ff"
face="Arial" size="2">XML parser start-up costs can be very high;
initializing the parser for each document could easily turn out to be
the dominant cost in this application. You can get a lot of saving by
reusing parser instances. I don't know what JDOM's initialization costs
for building a document are, but you need to check them too.</font></span></div>
<div><span></span> </div>
<div><span><font color="#0000ff" face="Arial" size="2">At any
rate, building a JDOM tree almost certainly takes longer than
extracting the data from the tree once built.</font></span></div>
<div><span></span> </div>
<div><span><font color="#0000ff" face="Arial" size="2">I find
this statement a bit worrying:</font></span></div>
<div class="Ih2E3d">
<div><span></span> </div>
<div><span>>Anyway, since there is probably not much I can
do
with the network latency, I am trying to keep the Java code as skinny
and efficient as possible.<br>
</span></div>
</div>
<div><span><font color="#0000ff" face="Arial" size="2">That
seems
to be an inversion of the way performance engineering should be done.
If network latency is the dominant cost, then effort spent on tuning
your Java code is a total waste of time.</font></span></div>
<div><span></span> </div>
<div><span><font color="#0000ff" face="Arial" size="2">I would
focus your attention on measuring performance, end to end, before you
start tuning anything.</font></span></div>
<div><span></span> </div>
<div><span><font color="#0000ff" face="Arial" size="2">Michael
Kay</font></span></div>
<div><span><font color="#0000ff" face="Arial" size="2"><a
moz-do-not-send="true" href="http://www.saxonica.com/" target="_blank">http://www.saxonica.com/</a></font></span></div>
<div dir="ltr" align="left"><br>
</div>
<blockquote dir="ltr"
style="border-left: 2px solid rgb(0, 0, 255); padding-left: 5px; margin-left: 5px; margin-right: 0px;">
<div dir="ltr" align="left" lang="en-us">
<hr> <font face="Tahoma" size="2"><b>From:</b> <a
moz-do-not-send="true" href="mailto:jdom-interest-bounces@jdom.org"
target="_blank">jdom-interest-bounces@jdom.org</a> [mailto:<a
moz-do-not-send="true" href="mailto:jdom-interest-bounces@jdom.org"
target="_blank">jdom-interest-bounces@jdom.org</a>] <b>On Behalf Of </b>Praveen
Gattu<br>
<b>Sent:</b> 02 October 2008 23:04<br>
<b>To:</b> Paul Libbrecht<br>
<b>Cc:</b> <a moz-do-not-send="true"
href="mailto:jdom-interest@jdom.org" target="_blank">jdom-interest@jdom.org</a><br>
<b>Subject:</b> Re: [jdom-interest] Re: Reading XML with JDOM<br>
</font><br>
</div>
<div>
<div class="Wj3C7c">
<div dir="ltr">Paul,<br>
<br>
Thanks for the response. My XML is really as simple as the one I
posted. The 8,500 documents are retrieved over a HTTP URL. So add
network latency, which makes it longer than a minute, unless my XML
parser is extremely fast. Anyway, since there is probably not much I
can do with the network latency, I am trying to keep the Java code as
skinny and efficient as possible.<br>
<br>
Would you be able to provide sample code for the solution you suggested?<br>
<br>
<div class="gmail_quote">On Thu, Oct 2, 2008 at 2:27 PM, Paul
Libbrecht <span dir="ltr"><<a moz-do-not-send="true"
href="mailto:paul@activemath.org" target="_blank">paul@activemath.org</a>></span>
wrote:<br>
<blockquote class="gmail_quote"
style="border-left: 1px solid rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;">Praveen,<br>
<br>
in jdom you would just parse then take the root, then the employee, the
extract last-name and ssn.<br>
It is ignoring from the point of view of your programme but not from
the point of view of parsing.<br>
<br>
Where you can save is by changing the xml technology... if your
document is as simple as below then using sax has greater performance
guarantees (you really cannot go faster) but is harder to programme
with.<br>
Another part where jdom can take too much of your CPU is if this
document has loads of other stuffs.<br>
<br>
Where JDOM would make a positive difference is at walking more
elaborate xml documents, which is the norm, and at manipulating them.
The expressivity of the library there is unbeatable to my taste.<br>
<br>
However, your requirements sound easy: 8500 such documents per minutes?<br>
JDOM does this probably ten times, multithreadedness not being really
necessary.<br>
<br>
paul
<div>
<div><br>
<br>
On 02-oct.-08, at 20:29, Praveen Gattu wrote:<br>
<br>
</div>
</div>
<blockquote class="gmail_quote"
style="border-left: 1px solid rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;">
<div>
<div>I have a XML as below. There is always "only one"
employee node in the XML. So rather than iterating through the nodes, I
want to read the lastname and ssn directly, while ignoring the
firstname. What is the best way to do this in JDOM? My most important
criteria is speed. We will be processing about a 8,500 of such XML
documents per minute (multi-threaded of course) and need something
efficient and fast. I appreciate any help you can offer in this regard.<br>
<br>
<response><br>
<employee><br>
<firstname>John</firstname><br>
<lastname>Smith</lastname><br>
<ssn>111-11-1111</ssn><br>
</employee><br>
</response><br>
<br>
-- <br>
Thanks,<br>
Praveen<br>
<br>
<br>
<br>
<br>
-- <br>
Thanks,<br>
Praveen<br>
</div>
</div>
_______________________________________________<br>
To control your jdom-interest membership:<br>
<a moz-do-not-send="true"
href="http://www.jdom.org/mailman/options/jdom-interest/"
target="_blank">http://www.jdom.org/mailman/options/jdom-interest/</a><a
moz-do-not-send="true" href="mailto:youraddr@yourhost.com"
target="_blank">youraddr@yourhost.com</a><br>
</blockquote>
<br>
</blockquote>
</div>
<br>
<br clear="all">
</div>
</div>
</div>
</blockquote>
</div>
</blockquote>
</div>
<br>
</div>
</div>
</blockquote>
<pre wrap=""><hr size="4" width="90%">
_______________________________________________
To control your jdom-interest membership:
<a moz-do-not-send="true" class="moz-txt-link-freetext"
href="http://www.jdom.org/mailman/options/jdom-interest/youraddr@yourhost.com">http://www.jdom.org/mailman/options/jdom-interest/youraddr@yourhost.com</a></pre>
</blockquote>
</blockquote>
<br>
<pre class="moz-signature" cols="72">--
David A. E. Wall
724 17th Avenue
Kirkland, WA 98033-4206
Tel 425.822.8135 </pre>
</body>
</html>