[jdom-interest] Is JDOM schema checking when it shouldn't be?
Elliotte Rusty Harold
elharo at metalab.unc.edu
Wed Jun 14 15:38:15 PDT 2000
Here's a weird one I encountered while trying to track down a bug. It's
almost certainly a problem inherited from the xerces.jar JDOM bundles.
Consider this simple well-formed but invalid document:
<test xmlns="http://www.jdom.org/">
</test>
When I tried to parse this with a SAXBuilder, JDOM actually attempted to
connect to http://www.jdom.org/ and parse the document it found there.
Naturally, since that document is HTML and not XML I got errors:
D:\speaking\xmldevcon\jdom\examples>java Validator test.xml
[Error] :1:7: Element type "html" must be declared.
[Error] :2:7: Element type "head" must be declared.
[Error] :3:8: Element type "title" must be declared.
[Error] :4:18: Attribute "http-equiv" must be declared for element type
"meta".
[Error] :4:41: Attribute "content" must be declared for element type
"meta".
[Error] :4:73: Element type "meta" must be declared.
[Fatal Error] :5:7: The element type "meta" must be terminated by the
matching end-tag "</meta>".
test.xml is not valid.
null: null
There's no reason for the parser to try to download the document at a
namespace URI, near as I can figure, unless perhaps it's some weird
Xerces behavior with regard to schemas.
Here's the class that tried to parse the file:
import org.jdom.JDOMException;
import org.jdom.input.SAXBuilder;
public class Validator {
public static void main(String[] args) {
if (args.length == 0) {
System.out.println("Usage: java Validator URL1 URL2...");
}
SAXBuilder builder = new SAXBuilder(true);
/* ^^^^ */
/* Turn on validation */
// start parsing...
// start parsing...
for (int i = 0; i < args.length; i++) {
// command line should offer URIs or file names
try {
builder.build(args[i]);
// If there are no well-formedness errors,
// then no exception is thrown
System.out.println(args[i] + " is well formed.");
}
catch (JDOMException e) { // indicates an error
System.out.println(args[i] + " is not valid.");
System.out.println(e.getMessage());
}
}
}
}
I'm still trying to track down the details, but using the default
namespace on the root element seems to be a fruitful source of bugs.
This occurs with Xerces 1.0.3 and with whichever version of Xerces is
distributed with JDOMb4. Upgrading to Xerces 1.1.0 fixes the problem.
--
+-----------------------+------------------------+-------------------+
| Elliotte Rusty Harold | elharo at metalab.unc.edu | Writer/Programmer |
+-----------------------+------------------------+-------------------+
| Java I/O (O'Reilly & Associates, 1999) |
| http://metalab.unc.edu/javafaq/books/javaio/ |
| http://www.amazon.com/exec/obidos/ISBN=1565924851/cafeaulaitA/ |
+----------------------------------+---------------------------------+
| Read Cafe au Lait for Java News: http://metalab.unc.edu/javafaq/ |
| Read Cafe con Leche for XML News: http://metalab.unc.edu/xml/ |
+----------------------------------+---------------------------------+
More information about the jdom-interest
mailing list