[jdom-interest] Verbose XHTML 1.1 Doctype
Stein Erik Berget
seberget at escenic.com
Thu Mar 25 00:07:26 PST 2004
On Wed, 24 Mar 2004 18:47:47 +0000, David Dorward <david at dorward.me.uk>
wrote:
> I have a number of XHTML 1.1 documents, all conforming to the same
> template, which I want to extract some data from and then insert that
> data into different XHTML 1.1 documents.
>
> As a first step I am trying to read in a document and then print it out
> again without any modification. I've run into two issues:
>
> 1. It appears to be downloading the DTD from the w3c website - this
> takes time and bandwidth.
>
> 2. It seems to be expanding the Doctype line (example below).
>
> Is there any way to stop this? I'd like to leave the Doctype alone and
> save time on reading the DTD (I don't care about validation - that is
> handled elsewhere). I couldn't find anything looking at the docs, but I
> suspect this is due to not knowing what to look for.
Been there done this:
//path to find the catalog.xml file
String cat[] = {"file:///catalog.xml"};
XMLCatalogResolver resolver = new XMLCatalogResolver();
resolver.setPreferPublic(true);
resolver.setCatalogList(cat);
SAXBuilder builder = new SAXBuilder(true);
builder.setProperty("http://apache.org/xml/properties/internal/entity-resolver",
resolver);
//build the document
Document document = builder.build(new
BufferedInputStream(method.getResponseBodyAsStream()));
You will need the following import as well...
import org.apache.xerces.util.XMLCatalogResolver;
This solution uses the catalog feature of xerces. The catalog.xml file I
have looks like this:
<!DOCTYPE catalog PUBLIC "-//OASIS//DTD Entity Resolution XML Catalog
V1.0//EN"
"http://www.oasis-open.org/committees/entity/release/1.0/catalog.dtd">
<catalog xmlns="urn:oasis:names:tc:entity:xmlns:xml:catalog"
prefer="public">
<public publicId="-//W3C//DTD XHTML 1.1//EN" uri="xhtml11-flat.dtd" />
</catalog>
You can download the xhtml11-flat.dtd from the w3.org site with this url:
http://www.w3.org/TR/xhtml11/DTD/xhtml11-flat.dtd
By using the 'flat' variant you don't have to add all the other refereced
dtds and parts.
By using something simular to this you still have a validated document,
with great parsing speed.
--
Stein Erik Berget
More information about the jdom-interest
mailing list