[jdom-interest] PATCH: Whitespace in Element

guru at stinky.com guru at stinky.com
Sat Jun 16 08:14:07 PDT 2001


Currently, Element.getTextNormalize only considers the following
characters to be whitespace:
 space, tab, \n, \r

This breaks the Unicode and Java definitions of whitespace.  I'm not
sure if it breaks XML's.

In Element.java, line 639 should be changed from
 if (" \t\n\r".indexOf(c[i]) != -1) {

to
 if (Character.isWhitespace(c[i])) {

(which has the further advantage of being more legible).

One common whitespace character that this catches is Form Feed (^L).
It may also be helpful in non-Roman alphabets -- I wouldn't presume to
understand the Arabic rules for word breaks...



More information about the jdom-interest mailing list