[jdom-interest] XMLOutputter.isWhiteSpace

David D. Lucas ddlucas at lse.com
Fri Apr 26 11:11:12 PDT 2002


Does the isWhitespace method contain any sensitive string encoding in 
the way it uses the 4 whitespace markers and the indexof?
The indexof method is very speed prohibitive.
I can do the same thing using char comparisons and it will yeild about 
10X performance increase.  This adds up with big messages.  I guess the 
question is how does the 16bit characters react to literals like '\t' 
compared to "\t" Strings.  I don't see any gotchas.  What does everyone 
else think?

If you want to take a look at escapeElement, there is also some 
improvement if we not add a character every loop, but add chunks of 
data.  This probably applies to escapeAttribute.

Please let me know if you want me to integrate these changes and send 
you the file or someone with CVS access would like too.

Thanks in advance,
Dave



P.S.  Here is the code for comparison. My experience is there are more 
\n than \t laying around, but mileage may very.


//=========================================================================
   static boolean isWhiteSpace(char c ) {
     if (" \t\n\r".indexOf(c)==-1) {
       return false;
     }
     else {
       return true;
     }
   }

//=========================================================================
   static boolean isWhiteSpaceNew(char c ) {
     if (c==' ' || c=='\n' || c=='\t' || c=='\r' ){
       return true;
     }
     else {
       return false;
     }
   }

//=========================================================================
//=========================================================================
//=========================================================================

// here is the escape code comparison
     static String escapeElementEntities(String str) {
         StringBuffer buffer;
         char ch;
         String entity;

         buffer = null;
         for (int i = 0; i < str.length(); i++) {
             ch = str.charAt(i);
             switch(ch) {
                 case '<' :
                     entity = "&lt;";
                     break;
                 case '>' :
                     entity = "&gt;";
                     break;
                 case '&' :
                     entity = "&amp;";
                     break;
                 default :
                     entity = null;
                     break;
             }
             if (buffer == null) {
                 if (entity != null) {
                     // An entity occurred, so we'll have to use 
StringBuffer
                     // (allocate room for it plus a few more entities).
                     buffer = new StringBuffer(str.length() + 20);
                     // Copy previous skipped characters and fall through
                     // to pickup current character
                     buffer.append(str.substring(0, i));
                     buffer.append(entity);
                 }
             }
             else {
                 if (entity == null) {
                     buffer.append(ch);
                 }
                 else {
                     buffer.append(entity);
                 }
             }
         }

         // If there were any entities, return the escaped characters
         // that we put in the StringBuffer. Otherwise, just return
         // the unmodified input string.
         return (buffer == null) ? str : buffer.toString();
     }

     static String escapeElementEntitiesNew(String str) {
         StringBuffer buffer;
         char ch;
         String entity;

         buffer = null;
         int length = str.length();
         char[] chars=str.toCharArray();
         int begin=0;
         int offset=0;
         for (int i = 0; i < length; i++) {
             //ch = str.charAt(i);
             ch = chars[i];
             switch(ch) {
                 case '<' :
                     offset=i;
                     entity = "&lt;";
                     break;
                 case '>' :
                     offset=i;
                     entity = "&gt;";
                     break;
                 case '&' :
                     offset=i;
                     entity = "&amp;";
                     break;
                 default :
                     entity = null;
                     break;
             }
             if (buffer == null) {
                 if (entity != null) {
                     // An entity occurred, so we'll have to use 
StringBuffer
                     // (allocate room for it plus a few more entities).
                     buffer = new StringBuffer(length + 20);
                     // Copy previous skipped characters and fall through
                     // to pickup current character
                     buffer.append(str.substring(0, i));
                     buffer.append(entity);
                     begin=i+1;
                     offset=0;
                 }
             }
             else {
                 if (entity != null) {
                     buffer.append(chars,begin,(offset-begin));
                     buffer.append(entity);
                     begin=i+1;
                     offset=0;
                 }
             }
         }

         if (buffer!=null) {
           if (begin < length) {
             buffer.append(chars,begin,(length-begin));
           }
         }
         // If there were any entities, return the escaped characters
         // that we put in the StringBuffer. Otherwise, just return
         // the unmodified input string.
         return (buffer == null) ? str : buffer.toString();
     }
//=========================================================================

-- 

+------------------------------------------------------------+
| David Lucas                        mailto:ddlucas at lse.com  |
| Lucas Software Engineering, Inc.   (740) 964-6248 Voice    |
| Unix,Java,C++,CORBA,XML,EJB        (614) 668-4020 Mobile   |
| Middleware,Frameworks              (888) 866-4728 Fax/Msg  |
+------------------------------------------------------------+
| GPS Location:  40.0150 deg Lat,  -82.6378 deg Long         |
| IMHC: "Jesus Christ is the way, the truth, and the life."  |
| IMHC: "I know where I am; I know where I'm going."    <><  |
+------------------------------------------------------------+

Notes: PGP Key Block=http://www.lse.com/~ddlucas/pgpblock.txt
IMHO="in my humble opinion" IMHC="in my humble conviction"
All trademarks above are those of their respective owners.





More information about the jdom-interest mailing list