[jdom-interest] Improving performance of SAX parser configuration
Scott Emmons
lscotte at gmail.com
Thu May 7 10:44:00 PDT 2009
Greetings jdom-interest,
I've run across an interesting performance issue in the way JDOM
handles Xerces parser configuration even when reuseParser is enabled
in SAXBuilder, and I wanted to run this by the list - not only for
validation, but hopefully something along the lines of this
improvement can get rolled in (yep, I know JDOM is in maintenance
mode).
For a bit of background, the particular case we have involves parsing
lots and lots of little XML document fragments via SAXBuilder.build()
- not terribly efficient, but for what we use JDOM for it's a
pre-existing condition that we're stuck with.
What I found is that more time was spent in configureParser() than in
actually parsing the XML. The reason for this is attempting to set
options on the parser which don't exist in Xerces - or at least the
version of it we are using. This results in
SaxNotRecognizedExceptions. Exceptions are expensive, plus Xerces does
ResourceBundle lookups each time. While we do set reuseParser, each
execution of build() still reconfigures the underlying parser.
I know that the contentHandler, and perhaps other options are not
reusable, and this doesn't change the semantics of that. Since the
underlying parser is unlikely to suddenly start supporting some option
it didn't used to, it's possible to remember whether or not the
underlying parser implementation was able to support a property, and
skip attempting to configuring it if not. I wired this as a specific
option only used with reuseParser to be safe, but it's possible this
could be done in a more generic manner that would benefit other
codepaths and usages as well (it would simply my patch somewhat, but I
wanted to be safe since there may be other consequences of this which
I've overlooked).
Again, I wouldn't expect this to help cases where larger XML is
handled less frequently, but for my case where it's hundreds of XML
fragments per transaction per second, this fix reduces the execution
time of SAXBuilder.build() by about 1/2.
I would love to hear any feedback as well as find out if anyone else
has the same sort of performance improvements I've seen with this
patch in cases where lots of small documents are parsed.
Thanks for your time,
-Scott
===CUT HERE===
diff --git a/src/java/org/jdom/input/SAXBuilder.java
b/src/java/org/jdom/input/SAXBuilder.java
index 09fbb00..1627345 100644
--- a/src/java/org/jdom/input/SAXBuilder.java
+++ b/src/java/org/jdom/input/SAXBuilder.java
@@ -134,6 +134,15 @@ public class SAXBuilder {
/** User-specified properties to be set on the SAX parser */
private HashMap properties = new HashMap(5);
+ /** Whether to use fast parser reconfiguration */
+ private boolean fastReconfigure = false;
+
+ /** Whether to try lexical reporting in fast parser reconfiguration */
+ private boolean tryLexicalReportingConfig = true;
+
+ /** Whether to to try entity expansion in fast parser reconfiguration */
+ private boolean tryEntityExpandConfig = true;
+
/**
* Whether parser reuse is allowed.
* <p>Default: <code>true</code></p>
@@ -396,6 +405,25 @@ public class SAXBuilder {
}
/**
+ * Specifies whether this builder will do fast reconfiguration of the
+ * underlying SAX parser when reuseParser is true. This improves
+ * performance in cases where SAXBuilders are reused and lots of small
+ * documents are frequently parsed. This avoids attempting to set features
+ * on the SAX parser each time build() is called which result in
+ * SaxNotRecognizedExceptions. This should ONLY be set for builders where
+ * this specific case is an issue. The default value of this setting is
+ * <code>false</code> (no fast reconfiguration). If reuseParser is false,
+ * calling this has no effect.
+ *
+ * @param reuseParser Whether to reuse the SAX parser.
+ */
+ public void setFastReconfigure(boolean fastReconfigure) {
+ if (this.reuseParser) {
+ this.fastReconfigure = fastReconfigure;
+ }
+ }
+
+ /**
* This sets a feature on the SAX parser. See the SAX documentation for
* </p>
* <p>
@@ -657,42 +685,76 @@ public class SAXBuilder {
parser.setErrorHandler(new BuilderErrorHandler());
}
- // Setup lexical reporting.
- boolean lexicalReporting = false;
- try {
- parser.setProperty("http://xml.org/sax/handlers/LexicalHandler",
- contentHandler);
- lexicalReporting = true;
- } catch (SAXNotSupportedException e) {
- // No lexical reporting available
- } catch (SAXNotRecognizedException e) {
- // No lexical reporting available
- }
+ /* If fastReconfigure is enabled and we failed in the previous attempt
+ * in configuring lexical reporting, then skip this step.
+ */
+ if (tryLexicalReportingConfig) {
+ boolean configured = true;
- // Some parsers use alternate property for lexical handling (grr...)
- if (!lexicalReporting) {
+ // Setup lexical reporting.
+ boolean lexicalReporting = false;
try {
- parser.setProperty(
- "http://xml.org/sax/properties/lexical-handler",
- contentHandler);
+
parser.setProperty("http://xml.org/sax/handlers/LexicalHandler",
+ contentHandler);
lexicalReporting = true;
} catch (SAXNotSupportedException e) {
// No lexical reporting available
+ configured = false;
} catch (SAXNotRecognizedException e) {
// No lexical reporting available
+ configured = false;
+ }
+
+ // Some parsers use alternate property for lexical
handling (grr...)
+ if (!lexicalReporting) {
+ try {
+ parser.setProperty(
+ "http://xml.org/sax/properties/lexical-handler",
+ contentHandler);
+ lexicalReporting = true;
+ } catch (SAXNotSupportedException e) {
+ // No lexical reporting available
+ configured = false;
+ } catch (SAXNotRecognizedException e) {
+ // No lexical reporting available
+ configured = false;
+ }
+ }
+
+ /* If unable to configure this property and fastReconfigure is
+ * enabled, then setup to avoid this code path entirely next time.
+ */
+ if (!configured && fastReconfigure) {
+ tryLexicalReportingConfig=false;
}
}
- // Try setting the DeclHandler if entity expansion is off
- if (!expand) {
- try {
- parser.setProperty(
- "http://xml.org/sax/properties/declaration-handler",
- contentHandler);
- } catch (SAXNotSupportedException e) {
- // No lexical reporting available
- } catch (SAXNotRecognizedException e) {
- // No lexical reporting available
+ /* If fastReconfigure is enabled and we failed in the previous attempt
+ * in configuring entity expansion, then skip this step.
+ */
+ if (tryEntityExpandConfig) {
+ boolean configured = true;
+
+ // Try setting the DeclHandler if entity expansion is off
+ if (!expand) {
+ try {
+ parser.setProperty(
+ "http://xml.org/sax/properties/declaration-handler",
+ contentHandler);
+ } catch (SAXNotSupportedException e) {
+ // No lexical reporting available
+ configured = false;
+ } catch (SAXNotRecognizedException e) {
+ // No lexical reporting available
+ configured = false;
+ }
+ }
+
+ /* If unable to configure this property and fastReconfigure is
+ * enabled, then setup to avoid this code path entirely next time.
+ */
+ if (!configured && fastReconfigure) {
+ tryEntityExpandConfig=false;
}
}
}
More information about the jdom-interest
mailing list