[jdom-interest] Stopping a parse

Alex Rosen arosen at silverstream.com
Mon Jun 18 13:17:07 PDT 2001


I need to be able to identify an XML document, but not process it. Usually, the
document type can be determined by just looking at the beginning of a
document - the DOCTYPE, or the root element type, or the root element
attributes. I'm trying to avoid having to parse the whole document, just to get
the first part of it. I do this by throwing a special RootFoundException after
the root element is found. The catch block around builder.build() checks for
this special exception, and knows that it's not actually a malfunction, but
just a way to stop the parsing. It grabs the JDOM document that had been built
before the exception happened, which contains everything up to the root element
and its attributes, but none of its children. In addition to performance, this
lets me identify XML documents that are not well-formed, as long as they're
well-formed up to the root element.

To do this, I had to use my own SAXHandler. To do that, in turn, I had to
modify SAXBuilder to let me use my own SAXHandler.

My questions are: (1) Is this an OK thing to be doing in general? Are there any
problems that might be caused by stopping the parse by throwing an exception?
(2) Is there a cleaner way to do this? Would it be better to do this with a
custom JDOMFactory instead, since there's already a facility for setting your
own? If so, I'd have to throw a RuntimeException instead, which I'm not wild
about. Also, there's enough processing in SAXHandler before it calls the
JDOMFactory, that I'm worried that it wouldn't behave identically. (3) Should
we add the ability to easily set your own SAXHandler? Right now, you have to
copy a ton of code in SAXBuilder.build() to do this, which is not good.

Alex

P.S. Here's the code so far (modified a little to stand alone better):

(1) I modified SAXBuilder.build() to do:
            contentHandler = getHandler(factory);
instead of:
            contentHandler = new SAXHandler(factory);

(2) And I added a default handler creator:

	protected SAXHandler getHandler(JDOMFactory factory) throws IOException
	{
		return new SAXHandler(factory);
	}

(3) In my class that identifies XML files, this is the parsing code:

		try
		{
			m_document = builder.build(in);
			m_gotEntireDocument = true;
		}
		catch(JDOMException ex)
		{
			m_document = builder.getDocument();
			m_gotEntireDocument = false;
			// If the exception happened before the root element, then m_document will
			// be null, in which case we deal with it as an actual error.
			if (m_document == null)
				throw ex;
		}

(4) And I added these inner classes:

			// Subclass of SAXBuilder that creates our special SAXHandler.
			private class StoppableSAXBuilder extends OurSAXBuilder
			{
				protected SAXHandler getHandler(JDOMFactory factory) throws IOException
				{
					m_handler = new StoppableSAXHandler(factory);
					return m_handler;
				}

				// If we've at least finished with the root element, then we return
				// the document that's been built so far.
				Document getDocument()
				{
					if (m_handler.m_gotRootElementEnd)
						return m_handler.getDocument();
					else
						return null;
				}

				private StoppableSAXHandler		m_handler;
			}

			// Subclass of SAXHandler that stops the build process after the
			// root element has been found.
			private class StoppableSAXHandler extends SAXHandler
			{
				public StoppableSAXHandler(JDOMFactory factory) throws IOException
				{
					super(factory);
				}

				public void startElement(String uri, String localName, String qName,
Attributes attributes)
							throws SAXException
				{
					checkRoot();
					// Record the fact that we've gotten to the root element.
					if (!m_gotRootElementStart)
						m_gotRootElementStart = true;
					super.startElement(uri, localName, qName, attributes);
				}

				public void characters(char[] ch, int start, int length) throws
SAXException
				{
					checkRoot();
					super.characters(ch, start, length);
				}

				public void ignorableWhitespace(char[] ch, int start, int length) throws
SAXException
				{
					checkRoot();
					super.ignorableWhitespace(ch, start, length);
				}
				public void comment(char[] ch, int start, int length) throws SAXException
				{
					checkRoot();
					super.comment(ch, start, length);
				}
				public void startEntity(String name) throws SAXException
				{
					checkRoot();
					super.startEntity(name);
				}
				public void processingInstruction(String target, String data) throws
SAXException
				{
					checkRoot();
					super.processingInstruction(target, data);
				}

				private void checkRoot() throws RootFoundException
				{
					// If we've started the root, then we now know that we've finished it.
					// Record that fact, and throw a RootFoundException if we only need
					// the root node.
					if (m_gotRootElementStart && !m_gotRootElementEnd)
					{
						m_gotRootElementEnd = true;
						if (!m_needEntireTree)
							throw new RootFoundException();
					}
				}

				private boolean			m_gotRootElementStart;
				private boolean			m_gotRootElementEnd;
			}

			private static class RootFoundException extends SAXException
			{
				RootFoundException()
				{
					// This should never be visible to the user.
					super("Root element was successfully found. (This is not an error.)");
				}
			}




More information about the jdom-interest mailing list