[jdom-interest] A suggested performance improvement
Tom Oke
tomo at elluminate.com
Sun Mar 16 17:18:44 PST 2003
I have noticed, on large XML files, that the majority of the CPU time
is going into the routines: Verifier.isXMLCharacter and
Verifier.checkCharacterData.
I had initially modified isXMLCharacter to have it check the most
likely range of data first, to get a short exit, and this took off
about 25% of the CPU used in some large files, for the JDOM read.
However, in the thread doing the JDOM input, 62% of the time
was still in isXMLCharacter and 16% was in checkCharacterData,
which calls isXMLCharacter.
The biggest bang for the buck was by enclosing the
if statement with isXMLCharacter with a test for the
most likely good range. This is seen below in the two
lines:
char c = text.charAt(i);
if (!(c > 0x1F && c < 0xD800)) {
This reduced checkCharacterData to 1.32% of the thread use,
and isXMLCharacter doesn't really show up at all.
Hopefully this is a reasonable change to submit to JDOM?
What follows is the full code for Verifier.checkCharacterData.
public static final String checkCharacterData(String text) {
if (text == null) {
return "A null is not a legal XML value";
}
// do check
for (int i = 0, len = text.length(); i<len; i++) {
char c = text.charAt(i);
if (!(c > 0x1F && c < 0xD800)) {
if (!isXMLCharacter(text.charAt(i))) {
// Likely this character can't be easily displayed
// because it's a control so we use it'd hexadecimal
// representation in the reason.
return ("0x" + Integer.toHexString(text.charAt(i))
+ " is not a legal XML character");
}
}
}
// If we got here, everything is OK
return null;
}
Tom Oke
More information about the jdom-interest
mailing list