[jdom-interest] Parsing a MODS-document with validation fails

Thomas Scheffler thomas.scheffler at uni-jena.de
Fri Jul 22 14:08:55 PDT 2011


Am 22.07.2011 22:53, schrieb Bradley S. Huffman:
> I'm not sure about a patch that makes up a namespace prefix.  From the patch
>
>            nsPrefixCount++;
>            ns = Namespace.getNamespace("ns" + nsPrefixCount, attUri);
>
>
> Seems like a kludge.  My gut says it's something else.

This is how it is done by the Oracle JAVA DocumentBuilder. JDOM won't 
accept a namespace without a prefix so you have to build some as the 
SAXParser delivers an attribute with QName=LocalName. Before that I take 
a look in predeclared namespaces, so that any prefix that is bound to an 
URI is used before building a new one. For the testing case I submitted 
in my original mail, "xlink" is found correctly which makes it more 
beautiful than the DocumentBuilder solution that creates "ns0" on every 
element with xlink:type set fixed to "simple". Hope you can follow my 
arguments.

regards,

Thomas

> On Fri, Jul 22, 2011 at 3:12 PM, Jason Hunter<jhunter at servlets.com>  wrote:
>> Thanks, Thomas.  I'll integrate it.
>>
>> Anyone else sitting on a bug that could get fixed in 1.1.2?
>>
>> -jh-
>>
>> On Jul 22, 2011, at 12:12 AM, Thomas Scheffler wrote:
>>
>>> Am 21.07.2011 10:14, schrieb Thomas Scheffler:
>>>> Am 21.07.2011 04:18, schrieb Bradley S. Huffman:
>>>>> Which version of JDOM?  My first guess is it is something in XMLOutputter.
>>>> This is the latest and greatest 1.1.1. I would not suspect XMLOutputter here as it usually does not have any problems with namespaces. This seems to be a parsing issue.
>>> It is a bug in the SAXHandler class where attributes with a different Namespace are only detected by their QName and not by the different Namespace-URI. I attached a patch that fixes this bug.
>>> It would be great, if this could be integrated and released soon in a version 1.1.2.
>>>
>>> regards
>>>
>>> Thomas Scheffler
>>>
>>>>> On Wed, Jul 20, 2011 at 8:23 AM, Thomas Scheffler
>>>>> <thomas.scheffler at uni-jena.de>    wrote:
>>>>>> Hi,
>>>>>>
>>>>>> if I parse a valid MODS document with XML Schema validation, JDOM changes
>>>>>> attributes as it handles default values of schema not correctly (by ignoring
>>>>>> the namespace).
>>>>>>
>>>>>> Here is a short code to demonstrate this:
>>>>>>
>>>>>> SAXBuilder builder = new SAXBuilder(true);
>>>>>> builder.setFeature("http://xml.org/sax/features/namespaces", true);
>>>>>> builder.setFeature("http://xml.org/sax/features/namespace-prefixes", true);
>>>>>> builder.setFeature("http://apache.org/xml/features/validation/schema",
>>>>>> true);
>>>>>>
>>>>>> Document document = builder.build(new
>>>>>> URL("http://academiccommons.columbia.edu/download/fedora_content/show_pretty/ac:111060/CONTENT/ac111060_description.xml"));
>>>>>> XMLOutputter xout = new XMLOutputter(Format.getPrettyFormat());
>>>>>> xout.output(document, System.out);
>>>>>>
>>>>>> Here is a result fragment:
>>>>>>
>>>>>> <name type="simple">
>>>>>> <namePart type="family">Edwards</namePart>
>>>>>> <namePart type="given">Stephen A.</namePart>
>>>>>> <role>
>>>>>> <roleTerm type="text">author</roleTerm>
>>>>>> </role>
>>>>>> <affiliation>Columbia University. Computer Science</affiliation>
>>>>>> </name>
>>>>>>
>>>>>> If you look at the original document you can see, that @type of name is
>>>>>> "personal". The "simple" comes from the xlink XML-Schema that was included
>>>>>> by the MODS-Schema. Therefor the result fragment should look like this:
>>>>>>
>>>>>> <name type="personal" xlink:type="simple">
>>>>>> <namePart type="family">Edwards</namePart>
>>>>>> <namePart type="given">Stephen A.</namePart>
>>>>>> <role>
>>>>>> <roleTerm type="text">author</roleTerm>
>>>>>> </role>
>>>>>> <affiliation>Columbia University. Computer Science</affiliation>
>>>>>> </name>
>>>>>>
>>>>>> If I use DOM from Java this is done correctly (but a bit ugly as it does not
>>>>>> use the namespace prefix already defined).
>>>>>>
>>>>>> Could someone just fix this, please?


More information about the jdom-interest mailing list