[jdom-interest] Re: SAXBuilder enhancement request /2

Dennis Sosnoski dms at sosnoski.com
Sat Mar 30 02:40:26 PST 2002


 From a quick look at the code this appears to remove character data 
consisting only of whitespace separating elements - it doesn't strip 
leading and trailing whitespace from the character data content of an 
element. It might be good to change the class description to make this 
clear. :-)

It could be modified to strip leading and trailing whitespace with some 
work; right now it just collects whitespace character data if it hasn't 
seen anything that's not a whitespace, and once it sees a nonwhitespace 
passes everything on directly. Instead it'd need to accumulate all the 
character data once it sees a nonwhitespace (scanning from the start of 
each sequence, not the end), then strip trailing whitespace before it 
dumps the data to the next step (on any non-character data event).

Performance would be better just doing the whitespace stripping within 
the SAXHandler, though (no copying and extra array creation steps).

  - Dennis

Joseph Bowbeer wrote:

>Btw, a whitespace stripping filter is here:
>
>http://cvs.jdom.org/cgi-bin/viewcvs.cgi/jdom/samples/sax/DataUnformatFilter.
>java
>
>As the javadoc says:
>
>* This filter removes leading and trailing whitespace from field-oriented
>* XML without mixed content. Note that this class will likely not yield
>* appropriate results for document-oriented XML like XHTML pages
>* which mix character data and elements together.
>
>----- Original Message ----- >
>
>[...] It could also be done using a filter, as ERH suggests, though this
>might be a little more complicated - for stripping trailing whitespace you'd
>need to make sure you have the entire character data sequence available,
>rather than just a portion. [...]
>
>
>





More information about the jdom-interest mailing list