[jdom-interest] Parsing a MODS-document with validation fails
Thomas Scheffler
thomas.scheffler at uni-jena.de
Fri Jul 22 14:08:55 PDT 2011
Am 22.07.2011 22:53, schrieb Bradley S. Huffman:
> I'm not sure about a patch that makes up a namespace prefix. From the patch
>
> nsPrefixCount++;
> ns = Namespace.getNamespace("ns" + nsPrefixCount, attUri);
>
>
> Seems like a kludge. My gut says it's something else.
This is how it is done by the Oracle JAVA DocumentBuilder. JDOM won't
accept a namespace without a prefix so you have to build some as the
SAXParser delivers an attribute with QName=LocalName. Before that I take
a look in predeclared namespaces, so that any prefix that is bound to an
URI is used before building a new one. For the testing case I submitted
in my original mail, "xlink" is found correctly which makes it more
beautiful than the DocumentBuilder solution that creates "ns0" on every
element with xlink:type set fixed to "simple". Hope you can follow my
arguments.
regards,
Thomas
> On Fri, Jul 22, 2011 at 3:12 PM, Jason Hunter<jhunter at servlets.com> wrote:
>> Thanks, Thomas. I'll integrate it.
>>
>> Anyone else sitting on a bug that could get fixed in 1.1.2?
>>
>> -jh-
>>
>> On Jul 22, 2011, at 12:12 AM, Thomas Scheffler wrote:
>>
>>> Am 21.07.2011 10:14, schrieb Thomas Scheffler:
>>>> Am 21.07.2011 04:18, schrieb Bradley S. Huffman:
>>>>> Which version of JDOM? My first guess is it is something in XMLOutputter.
>>>> This is the latest and greatest 1.1.1. I would not suspect XMLOutputter here as it usually does not have any problems with namespaces. This seems to be a parsing issue.
>>> It is a bug in the SAXHandler class where attributes with a different Namespace are only detected by their QName and not by the different Namespace-URI. I attached a patch that fixes this bug.
>>> It would be great, if this could be integrated and released soon in a version 1.1.2.
>>>
>>> regards
>>>
>>> Thomas Scheffler
>>>
>>>>> On Wed, Jul 20, 2011 at 8:23 AM, Thomas Scheffler
>>>>> <thomas.scheffler at uni-jena.de> wrote:
>>>>>> Hi,
>>>>>>
>>>>>> if I parse a valid MODS document with XML Schema validation, JDOM changes
>>>>>> attributes as it handles default values of schema not correctly (by ignoring
>>>>>> the namespace).
>>>>>>
>>>>>> Here is a short code to demonstrate this:
>>>>>>
>>>>>> SAXBuilder builder = new SAXBuilder(true);
>>>>>> builder.setFeature("http://xml.org/sax/features/namespaces", true);
>>>>>> builder.setFeature("http://xml.org/sax/features/namespace-prefixes", true);
>>>>>> builder.setFeature("http://apache.org/xml/features/validation/schema",
>>>>>> true);
>>>>>>
>>>>>> Document document = builder.build(new
>>>>>> URL("http://academiccommons.columbia.edu/download/fedora_content/show_pretty/ac:111060/CONTENT/ac111060_description.xml"));
>>>>>> XMLOutputter xout = new XMLOutputter(Format.getPrettyFormat());
>>>>>> xout.output(document, System.out);
>>>>>>
>>>>>> Here is a result fragment:
>>>>>>
>>>>>> <name type="simple">
>>>>>> <namePart type="family">Edwards</namePart>
>>>>>> <namePart type="given">Stephen A.</namePart>
>>>>>> <role>
>>>>>> <roleTerm type="text">author</roleTerm>
>>>>>> </role>
>>>>>> <affiliation>Columbia University. Computer Science</affiliation>
>>>>>> </name>
>>>>>>
>>>>>> If you look at the original document you can see, that @type of name is
>>>>>> "personal". The "simple" comes from the xlink XML-Schema that was included
>>>>>> by the MODS-Schema. Therefor the result fragment should look like this:
>>>>>>
>>>>>> <name type="personal" xlink:type="simple">
>>>>>> <namePart type="family">Edwards</namePart>
>>>>>> <namePart type="given">Stephen A.</namePart>
>>>>>> <role>
>>>>>> <roleTerm type="text">author</roleTerm>
>>>>>> </role>
>>>>>> <affiliation>Columbia University. Computer Science</affiliation>
>>>>>> </name>
>>>>>>
>>>>>> If I use DOM from Java this is done correctly (but a bit ugly as it does not
>>>>>> use the namespace prefix already defined).
>>>>>>
>>>>>> Could someone just fix this, please?
More information about the jdom-interest
mailing list