[jdom-interest] String vs. StringBuffer in Text class

Fri Jun 1 10:14:42 PDT 2001

So, it seems like we need to take the time to find out the best approach.  I
really just want what is "best" in terms of speed and memory use.  As we're
finding out, conventional wisdom hasn't gotten us very far.  This should be
easy to code and run so when somebody has a minute.....

> -----Original Message-----
> From: Alex Rosen [mailto:arosen at silverstream.com]
> Sent: Thursday, May 31, 2001 4:47 PM
> To: jdom-interest at jdom.org
> Subject: [jdom-interest] String vs. StringBuffer in Text class
> 
> 
> My last e-mail made me think about whether using StringBuffer 
> in the Text class
> really buys us anything. The supposed advantage is for 
> reading in large
> documents, or more specifically documents that have elements 
> with long text
> content strings. For these elements, the parser presumably 
> can't fit the whole
> text string in the buffer it passes to characters(), so it 
> has to call us more
> than once, with pieces of the text. Using StringBuffer would 
> presumably make
> that more efficient.
> 
> But - I'm now thinking that using String would almost 
> certainly be better.
> First, remember that even StringBuffer still has to grow its 
> internal char
> array. So the copy-and-append that would happen with String 
> would still happen
> with StringBuffer, just not as often. Also:
> 
> - The StringBuffer(String) constructor creates an internal 
> buffer that's 16
> chars longer than the string passed in, so in the common case where
> characters() is only called once, you've got 32 extra bytes 
> laying around per
> Text object. In the uncommon case, where characters() is 
> called repeatedly, you
> may end up with much more memory wasted (depending on the 
> input document),
> since StringBuffer doubles in size each time it grows.
> 
> - In all cases, using a StringBuffer would mean that we'd 
> have to create a new
> String for every call to getValue() (though this wouldn't 
> normally copy the
> char array, as mentioned in my last message). If we used a 
> String, we could
> just return it directly.
> 
> - The case that StringBuffer is supposedly good for 
> presumably only happens
> when the element's text value is larger that the XML parser's 
> internal buffer,
> right? Xerces uses a 16K buffer, I think? It would certainly 
> be nice to be fast
> for documents with text content in the hundreds of kilobytes, 
> but I'd think
> this is the 10-20% case. And, since StringBuffer still needs 
> to do some
> copying, it wouldn't necessarily be much faster in many cases.
> 
> So, I suspect that using StringBuffer will almost always 
> result in significant
> wasted memory, and will only be faster a small percentage of 
> the time. (Of
> course, testing both ways would be the best way to go...)
> 
> In either case, we should try to make sure that the character 
> data that we
> receive goes from SAXHandler to the Text class with as little 
> copying as
> possible. That might mean that Text wants to have a 
> constructor and an append
> method that take char arrays.
> 
> Alex Rosen
> SilverStream Software
> 
> _______________________________________________
> To control your jdom-interest membership:
> http://lists.denveronline.net/mailman/options/jdom-interest/yo
uraddr at yourhost.com