[jdom-interest] Improving performance of SAX parser configuration
Jason Hunter
jhunter at servlets.com
Thu May 7 18:34:15 PDT 2009
Hi Scott,
Thanks for sending in what looks like a really good improvement! I
plan to add this to the codebase for the next release. If anyone has
issues, speak up now.
-jh-
On May 7, 2009, at 10:44 AM, Scott Emmons wrote:
> Greetings jdom-interest,
>
> I've run across an interesting performance issue in the way JDOM
> handles Xerces parser configuration even when reuseParser is enabled
> in SAXBuilder, and I wanted to run this by the list - not only for
> validation, but hopefully something along the lines of this
> improvement can get rolled in (yep, I know JDOM is in maintenance
> mode).
>
> For a bit of background, the particular case we have involves parsing
> lots and lots of little XML document fragments via SAXBuilder.build()
> - not terribly efficient, but for what we use JDOM for it's a
> pre-existing condition that we're stuck with.
>
> What I found is that more time was spent in configureParser() than in
> actually parsing the XML. The reason for this is attempting to set
> options on the parser which don't exist in Xerces - or at least the
> version of it we are using. This results in
> SaxNotRecognizedExceptions. Exceptions are expensive, plus Xerces does
> ResourceBundle lookups each time. While we do set reuseParser, each
> execution of build() still reconfigures the underlying parser.
>
> I know that the contentHandler, and perhaps other options are not
> reusable, and this doesn't change the semantics of that. Since the
> underlying parser is unlikely to suddenly start supporting some option
> it didn't used to, it's possible to remember whether or not the
> underlying parser implementation was able to support a property, and
> skip attempting to configuring it if not. I wired this as a specific
> option only used with reuseParser to be safe, but it's possible this
> could be done in a more generic manner that would benefit other
> codepaths and usages as well (it would simply my patch somewhat, but I
> wanted to be safe since there may be other consequences of this which
> I've overlooked).
>
> Again, I wouldn't expect this to help cases where larger XML is
> handled less frequently, but for my case where it's hundreds of XML
> fragments per transaction per second, this fix reduces the execution
> time of SAXBuilder.build() by about 1/2.
>
> I would love to hear any feedback as well as find out if anyone else
> has the same sort of performance improvements I've seen with this
> patch in cases where lots of small documents are parsed.
>
> Thanks for your time,
> -Scott
>
> ===CUT HERE===
> diff --git a/src/java/org/jdom/input/SAXBuilder.java
> b/src/java/org/jdom/input/SAXBuilder.java
> index 09fbb00..1627345 100644
> --- a/src/java/org/jdom/input/SAXBuilder.java
> +++ b/src/java/org/jdom/input/SAXBuilder.java
> @@ -134,6 +134,15 @@ public class SAXBuilder {
> /** User-specified properties to be set on the SAX parser */
> private HashMap properties = new HashMap(5);
>
> + /** Whether to use fast parser reconfiguration */
> + private boolean fastReconfigure = false;
> +
> + /** Whether to try lexical reporting in fast parser
> reconfiguration */
> + private boolean tryLexicalReportingConfig = true;
> +
> + /** Whether to to try entity expansion in fast parser
> reconfiguration */
> + private boolean tryEntityExpandConfig = true;
> +
> /**
> * Whether parser reuse is allowed.
> * <p>Default: <code>true</code></p>
> @@ -396,6 +405,25 @@ public class SAXBuilder {
> }
>
> /**
> + * Specifies whether this builder will do fast reconfiguration
> of the
> + * underlying SAX parser when reuseParser is true. This improves
> + * performance in cases where SAXBuilders are reused and lots
> of small
> + * documents are frequently parsed. This avoids attempting to
> set features
> + * on the SAX parser each time build() is called which result in
> + * SaxNotRecognizedExceptions. This should ONLY be set for
> builders where
> + * this specific case is an issue. The default value of this
> setting is
> + * <code>false</code> (no fast reconfiguration). If reuseParser
> is false,
> + * calling this has no effect.
> + *
> + * @param reuseParser Whether to reuse the SAX parser.
> + */
> + public void setFastReconfigure(boolean fastReconfigure) {
> + if (this.reuseParser) {
> + this.fastReconfigure = fastReconfigure;
> + }
> + }
> +
> + /**
> * This sets a feature on the SAX parser. See the SAX
> documentation for
> * </p>
> * <p>
> @@ -657,42 +685,76 @@ public class SAXBuilder {
> parser.setErrorHandler(new BuilderErrorHandler());
> }
>
> - // Setup lexical reporting.
> - boolean lexicalReporting = false;
> - try {
> - parser.setProperty("http://xml.org/sax/handlers/LexicalHandler
> ",
> - contentHandler);
> - lexicalReporting = true;
> - } catch (SAXNotSupportedException e) {
> - // No lexical reporting available
> - } catch (SAXNotRecognizedException e) {
> - // No lexical reporting available
> - }
> + /* If fastReconfigure is enabled and we failed in the
> previous attempt
> + * in configuring lexical reporting, then skip this step.
> + */
> + if (tryLexicalReportingConfig) {
> + boolean configured = true;
>
> - // Some parsers use alternate property for lexical handling
> (grr...)
> - if (!lexicalReporting) {
> + // Setup lexical reporting.
> + boolean lexicalReporting = false;
> try {
> - parser.setProperty(
> - "http://xml.org/sax/properties/lexical-handler",
> - contentHandler);
> +
> parser.setProperty("http://xml.org/sax/handlers/LexicalHandler",
> + contentHandler);
> lexicalReporting = true;
> } catch (SAXNotSupportedException e) {
> // No lexical reporting available
> + configured = false;
> } catch (SAXNotRecognizedException e) {
> // No lexical reporting available
> + configured = false;
> + }
> +
> + // Some parsers use alternate property for lexical
> handling (grr...)
> + if (!lexicalReporting) {
> + try {
> + parser.setProperty(
> + "http://xml.org/sax/properties/lexical-handler
> ",
> + contentHandler);
> + lexicalReporting = true;
> + } catch (SAXNotSupportedException e) {
> + // No lexical reporting available
> + configured = false;
> + } catch (SAXNotRecognizedException e) {
> + // No lexical reporting available
> + configured = false;
> + }
> + }
> +
> + /* If unable to configure this property and
> fastReconfigure is
> + * enabled, then setup to avoid this code path entirely
> next time.
> + */
> + if (!configured && fastReconfigure) {
> + tryLexicalReportingConfig=false;
> }
> }
>
> - // Try setting the DeclHandler if entity expansion is off
> - if (!expand) {
> - try {
> - parser.setProperty(
> - "http://xml.org/sax/properties/declaration-handler
> ",
> - contentHandler);
> - } catch (SAXNotSupportedException e) {
> - // No lexical reporting available
> - } catch (SAXNotRecognizedException e) {
> - // No lexical reporting available
> + /* If fastReconfigure is enabled and we failed in the
> previous attempt
> + * in configuring entity expansion, then skip this step.
> + */
> + if (tryEntityExpandConfig) {
> + boolean configured = true;
> +
> + // Try setting the DeclHandler if entity expansion is off
> + if (!expand) {
> + try {
> + parser.setProperty(
> + "http://xml.org/sax/properties/declaration-handler
> ",
> + contentHandler);
> + } catch (SAXNotSupportedException e) {
> + // No lexical reporting available
> + configured = false;
> + } catch (SAXNotRecognizedException e) {
> + // No lexical reporting available
> + configured = false;
> + }
> + }
> +
> + /* If unable to configure this property and
> fastReconfigure is
> + * enabled, then setup to avoid this code path entirely
> next time.
> + */
> + if (!configured && fastReconfigure) {
> + tryEntityExpandConfig=false;
> }
> }
> }
> _______________________________________________
> To control your jdom-interest membership:
> http://www.jdom.org/mailman/options/jdom-interest/
> youraddr at yourhost.com
More information about the jdom-interest
mailing list