[jdom-interest] Re: Substituting a different <!DOCTYPE ...> when parsing an XMLfile

Jason Hunter jhunter at acm.org
Fri Jun 7 10:27:58 PDT 2002


Your idea is a good one, and it solves a problem I've had too.  

The solution really belongs at the parser stage, because it's the
parsers doing the validation.  As soon as parsers provide this, JDOM
will expose it.  

If you want to be manual about it and write a special input stream that
looks for <!DOCTYPE> and replaces the tags before the parser sees it,
that'd be a fine workaround and would work for any parser.  If you or
someone wants to write it, it's probably be a popular utility class.

-jh-

Geoff Rimmer wrote:
> 
> "Sean Huo" <sqh at qad.com> writes:
> >
> > [Substituting a different <!DOCTYPE ...> when parsing an XML file]
> >
> > A better solution is to provide an EntityResovler.
> >
> > In your implementation of the EntityResovler, you have complete control
> > over how you want to resolve the dtd reference.
> >
> > Here is a code fragment for parsing a xml document using JDOM.
> >
> > SAXBuilder builder = new SAXBuilder(true)l
> > builder.setEntityResolver(new MyEnittyResolver());  // provide your own
> > version of EntityResolver
> > builder.build( ...);
> 
> As I understand it, using an EntityResolver for replacing a DOCTYPE is
> only possible if you know what DOCTYPE you are looking for.  In other
> words, if you know a document contains a DOCTYPE with a particular
> system ID, you just create an EntityResolver which returns a
> replacement DTD every time it matches this system ID.
> 
> But the problem I was referring in my original post was for the
> following situations:
> 
> 1. You are reading an XML document which contains a DOCTYPE, but you
>    do *not* know what that DOCTYPE is.  In this case, the
>    EntityResolver does not know what public/system IDs to look for,
>    and so is unable to replace the DOCTYPE.
> 
> 2. You are reading an XML document that does *not* contain a DOCTYPE
>    at all.  In this case, builder.build() will throw an exception
>    because it cannot perform validation if there is no DOCTYPE to
>    validate against.
> 
> This is why I think JDOM should at the very least provide:
> 
>     package org.jdom.input;
> 
>     class DocTypeReplacerInputStream extends FilterInputStream
>     {
>         public DocTypeReplacerInputStream( InputStream is, DocType docType )
>         {
>             ....
>         }
> 
>         public int read() throw IOException
>         {
>             ....
>         }
>     }
> 
> which can be used as follows:
> 
>     DocType docType = new DocType(
>         "countries", "http://www.sillyfish.com/countries.dtd" );
> 
>     Document doc = new SAXBuilder( true ).build(
>         new DocTypeReplacerInputStream(
>             new FileInputStream( "countries.xml" ) ) );
> 
> to force validation against a DTD specified by the application.
> 
> In addition to providing this DocTypeReplacerInputStream class, I
> think it would be such a useful thing to have, that the following
> methods should be added to class SAXBuilder:
> 
>     public Document build( InputStream is, DocType docType );
>     public Document build( URL url, DocType docType );
>     public Document build( File file, DocType docType );
> 
> which would behave the same way as their equivalent versions without
> the DocType parameter, except that they would validate against the
> specified DocType rather than the one (if any) in the document.  You
> could then write code like this:
> 
>     DocType docType = new DocType(
>         "countries", "http://www.sillyfish.com/countries.dtd" );
> 
>     Document doc = new SAXBuilder( true )
>         .build( new FileInputStream( "countries.xml" ), docType );
> 
> --
> Geoff Rimmer <> geoff.rimmer at sillyfish.com <> www.sillyfish.com
> www.sillyfish.com/phone - Make savings on your BT and Telewest phone calls
> UPDATED 09/05/2002: 508 destinations, 12 schemes (with contact details)
> _______________________________________________
> To control your jdom-interest membership:
> http://lists.denveronline.net/mailman/options/jdom-interest/youraddr@yourhost.com





More information about the jdom-interest mailing list