[jdom-interest] Verbose XHTML 1.1 Doctype
David Dorward
david at dorward.me.uk
Fri Mar 26 23:31:52 PST 2004
On Thu, 2004-03-25 at 08:07, Stein Erik Berget wrote:
> On Wed, 24 Mar 2004 18:47:47 +0000, David Dorward <david at dorward.me.uk>
> wrote:
> > I have a number of XHTML 1.1 documents, all conforming to the same
> > template, which I want to extract some data from and then insert that
> > data into different XHTML 1.1 documents.
> >
> > As a first step I am trying to read in a document and then print it out
> > again without any modification. I've run into two issues:
> >
> > 1. It appears to be downloading the DTD from the w3c website - this
> > takes time and bandwidth.
Thanks to Mr Berget this issue is now resolved, and its lightning fast
(Thanks!).
> > 2. It seems to be expanding the Doctype line (example below).
This one, unfortunately, is still a problem. Does anybody have a solution?
> > Is there any way to stop this? I'd like to leave the Doctype alone and
> > save time on reading the DTD (I don't care about validation - that is
> > handled elsewhere). I couldn't find anything looking at the docs, but I
> > suspect this is due to not knowing what to look for.
My code now looks like this:
import org.apache.xerces.util.XMLCatalogResolver;
import org.jdom.*;
import org.jdom.JDOMException;
import org.jdom.input.SAXBuilder;
import org.jdom.output.XMLOutputter;
import java.io.IOException;
public class Parse {
public static void main (String [] args) {
//path to find the catalog.xml file
String cat[] = {"file:///home/david/prog/cms/java/catalog.xml"};
XMLCatalogResolver resolver = new XMLCatalogResolver();
resolver.setPreferPublic(true);
resolver.setCatalogList(cat);
SAXBuilder builder = new SAXBuilder(true);
builder.setProperty(
"http://apache.org/xml/properties/internal/entity-resolver",
resolver);
Document doc;
XMLOutputter outputter = new XMLOutputter();
try {
doc = builder.build("/home/david/prog/cms/dorward.me.uk/about/index.html");
try {
outputter.output(doc, System.out);
} catch (IOException e) {
System.err.println(e);
}
} catch (JDOMException e) {
// indicates a well-formedness or other error
System.out.println(" is not well formed: " + e.getMessage());
} catch (IOException e) {
System.out.println("Could not check ");
System.out.println(" because " + e.getMessage());
}
}
}
The input document starts:
<?xml version="1.0" encoding="ISO-8859-1"?>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.1//EN" "http://www.w3.org/TR/xhtml11/DTD/xhtml11.dtd"><html xmlns="http://www.w3.org/1999/xht
ml" xml:lang="en">
<head>
But the output document:
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.1//EN" "http://www.w3.org/TR/xhtml11/DTD/xhtml11.dtd" [
<!NOTATION w3c-xml PUBLIC "ISO 8879//NOTATION Extensible Markup Language (XML) 1.0//EN">
<!NOTATION cdata PUBLIC "-//W3C//NOTATION XML 1.0: CDATA//EN">
<!NOTATION fpi PUBLIC "ISO 8879:1986//NOTATION Formal Public Identifier//EN">
<!NOTATION length PUBLIC "-//W3C//NOTATION XHTML Datatype: Length//EN">
<!NOTATION linkTypes PUBLIC "-//W3C//NOTATION XHTML Datatype: LinkTypes//EN">
<!NOTATION mediaDesc PUBLIC "-//W3C//NOTATION XHTML Datatype: MediaDesc//EN">
<!NOTATION multiLength PUBLIC "-//W3C//NOTATION XHTML Datatype: MultiLength//EN">
<!NOTATION number PUBLIC "-//W3C//NOTATION XHTML Datatype: Number//EN">
<!NOTATION pixels PUBLIC "-//W3C//NOTATION XHTML Datatype: Pixels//EN">
<!NOTATION script PUBLIC "-//W3C//NOTATION XHTML Datatype: Script//EN">
<!NOTATION text PUBLIC "-//W3C//NOTATION XHTML Datatype: Text//EN">
<!NOTATION character PUBLIC "-//W3C//NOTATION XHTML Datatype: Character//EN">
<!NOTATION charset PUBLIC "-//W3C//NOTATION XHTML Datatype: Charset//EN">
<!NOTATION charsets PUBLIC "-//W3C//NOTATION XHTML Datatype: Charsets//EN">
<!NOTATION contentType PUBLIC "-//W3C//NOTATION XHTML Datatype: ContentType//EN">
<!NOTATION contentTypes PUBLIC "-//W3C//NOTATION XHTML Datatype: ContentTypes//EN">
<!NOTATION datetime PUBLIC "-//W3C//NOTATION XHTML Datatype: Datetime//EN">
<!NOTATION languageCode PUBLIC "-//W3C//NOTATION XHTML Datatype: LanguageCode//EN">
<!NOTATION uri PUBLIC "-//W3C//NOTATION XHTML Datatype: URI//EN">
<!NOTATION uris PUBLIC "-//W3C//NOTATION XHTML Datatype: URIs//EN">
]>
<?doc type="doctype" role="title" { XHTML 1.1 } ?><html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" version="-//W3C//DTD XHTML 1.
1//EN">
<head profile="">
--
David Dorward <http://dorward.me.uk/>
More information about the jdom-interest
mailing list