[jdom-interest] PATCH: Whitespace in Element
guru at stinky.com
guru at stinky.com
Sat Jun 16 08:14:07 PDT 2001
Currently, Element.getTextNormalize only considers the following
characters to be whitespace:
space, tab, \n, \r
This breaks the Unicode and Java definitions of whitespace. I'm not
sure if it breaks XML's.
In Element.java, line 639 should be changed from
if (" \t\n\r".indexOf(c[i]) != -1) {
to
if (Character.isWhitespace(c[i])) {
(which has the further advantage of being more legible).
One common whitespace character that this catches is Form Feed (^L).
It may also be helpful in non-Roman alphabets -- I wouldn't presume to
understand the Arabic rules for word breaks...
More information about the jdom-interest
mailing list