[jdom-interest] Re: Reading XML with JDOM
Praveen Gattu
pgattu at gmail.com
Thu Oct 2 17:32:10 PDT 2008
Michael
The XML is to be obtained over a www URL. Our networks are T1 speed and
performing their best. There isn't a problem with the network latency, but I
acknowledge that retrieving the XML file over a www URL is probably the most
time consuming procedure for my application. Www Network latency aside, I
want to assure that whatever APIs/frameworks I use for parsing the XML are
the fastest. For the purposes of measuring the performance of the parser, I
am using a XML file located in the file system and removing the network
latency aspect. Is this a valid method to measure the parser's performance?
-- Praveen
On Thu, Oct 2, 2008 at 3:49 PM, Michael Kay <mike at saxonica.com> wrote:
> You're asking about how to read the data efficiently with JDOM, but I
> suspect that if you are looking for performance then you might be looking in
> the wrong part of your system.
>
> XML parser start-up costs can be very high; initializing the parser for
> each document could easily turn out to be the dominant cost in this
> application. You can get a lot of saving by reusing parser instances. I
> don't know what JDOM's initialization costs for building a document are, but
> you need to check them too.
>
> At any rate, building a JDOM tree almost certainly takes longer than
> extracting the data from the tree once built.
>
> I find this statement a bit worrying:
>
> >Anyway, since there is probably not much I can do with the network
> latency, I am trying to keep the Java code as skinny and efficient as
> possible.
> That seems to be an inversion of the way performance engineering should be
> done. If network latency is the dominant cost, then effort spent on tuning
> your Java code is a total waste of time.
>
> I would focus your attention on measuring performance, end to end, before
> you start tuning anything.
>
> Michael Kay
> http://www.saxonica.com/
>
> ------------------------------
> *From:* jdom-interest-bounces at jdom.org [mailto:
> jdom-interest-bounces at jdom.org] *On Behalf Of *Praveen Gattu
> *Sent:* 02 October 2008 23:04
> *To:* Paul Libbrecht
> *Cc:* jdom-interest at jdom.org
> *Subject:* Re: [jdom-interest] Re: Reading XML with JDOM
>
> Paul,
>
> Thanks for the response. My XML is really as simple as the one I posted.
> The 8,500 documents are retrieved over a HTTP URL. So add network latency,
> which makes it longer than a minute, unless my XML parser is extremely fast.
> Anyway, since there is probably not much I can do with the network latency,
> I am trying to keep the Java code as skinny and efficient as possible.
>
> Would you be able to provide sample code for the solution you suggested?
>
> On Thu, Oct 2, 2008 at 2:27 PM, Paul Libbrecht <paul at activemath.org>wrote:
>
>> Praveen,
>>
>> in jdom you would just parse then take the root, then the employee, the
>> extract last-name and ssn.
>> It is ignoring from the point of view of your programme but not from the
>> point of view of parsing.
>>
>> Where you can save is by changing the xml technology... if your document
>> is as simple as below then using sax has greater performance guarantees (you
>> really cannot go faster) but is harder to programme with.
>> Another part where jdom can take too much of your CPU is if this document
>> has loads of other stuffs.
>>
>> Where JDOM would make a positive difference is at walking more elaborate
>> xml documents, which is the norm, and at manipulating them. The expressivity
>> of the library there is unbeatable to my taste.
>>
>> However, your requirements sound easy: 8500 such documents per minutes?
>> JDOM does this probably ten times, multithreadedness not being really
>> necessary.
>>
>> paul
>>
>>
>> On 02-oct.-08, at 20:29, Praveen Gattu wrote:
>>
>> I have a XML as below. There is always "only one" employee node in the
>>> XML. So rather than iterating through the nodes, I want to read the lastname
>>> and ssn directly, while ignoring the firstname. What is the best way to do
>>> this in JDOM? My most important criteria is speed. We will be processing
>>> about a 8,500 of such XML documents per minute (multi-threaded of course)
>>> and need something efficient and fast. I appreciate any help you can offer
>>> in this regard.
>>>
>>> <response>
>>> <employee>
>>> <firstname>John</firstname>
>>> <lastname>Smith</lastname>
>>> <ssn>111-11-1111</ssn>
>>> </employee>
>>> </response>
>>>
>>> --
>>> Thanks,
>>> Praveen
>>>
>>>
>>>
>>>
>>> --
>>> Thanks,
>>> Praveen
>>> _______________________________________________
>>> To control your jdom-interest membership:
>>> http://www.jdom.org/mailman/options/jdom-interest/youraddr@yourhost.com
>>>
>>
>>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.jdom.org/pipermail/jdom-interest/attachments/20081002/323341ce/attachment.htm
More information about the jdom-interest
mailing list