[jdom-interest] Parsing a MODS-document with validation fails
Bradley S. Huffman
bshuffman at gmail.com
Sun Jul 24 14:46:42 PDT 2011
And if the prefix "ns1" is already mapped to another URI because the
user decided to use "ns1" as a prefix, what kind of havoc will this
cause?
On Fri, Jul 22, 2011 at 4:08 PM, Thomas Scheffler
<thomas.scheffler at uni-jena.de> wrote:
> Am 22.07.2011 22:53, schrieb Bradley S. Huffman:
>>
>> I'm not sure about a patch that makes up a namespace prefix. From the
>> patch
>>
>> nsPrefixCount++;
>> ns = Namespace.getNamespace("ns" + nsPrefixCount, attUri);
>>
>>
>> Seems like a kludge. My gut says it's something else.
>
> This is how it is done by the Oracle JAVA DocumentBuilder. JDOM won't accept
> a namespace without a prefix so you have to build some as the SAXParser
> delivers an attribute with QName=LocalName. Before that I take a look in
> predeclared namespaces, so that any prefix that is bound to an URI is used
> before building a new one. For the testing case I submitted in my original
> mail, "xlink" is found correctly which makes it more beautiful than the
> DocumentBuilder solution that creates "ns0" on every element with xlink:type
> set fixed to "simple". Hope you can follow my arguments.
>
> regards,
>
> Thomas
>
>> On Fri, Jul 22, 2011 at 3:12 PM, Jason Hunter<jhunter at servlets.com>
>> wrote:
>>>
>>> Thanks, Thomas. I'll integrate it.
>>>
>>> Anyone else sitting on a bug that could get fixed in 1.1.2?
>>>
>>> -jh-
>>>
>>> On Jul 22, 2011, at 12:12 AM, Thomas Scheffler wrote:
>>>
>>>> Am 21.07.2011 10:14, schrieb Thomas Scheffler:
>>>>>
>>>>> Am 21.07.2011 04:18, schrieb Bradley S. Huffman:
>>>>>>
>>>>>> Which version of JDOM? My first guess is it is something in
>>>>>> XMLOutputter.
>>>>>
>>>>> This is the latest and greatest 1.1.1. I would not suspect XMLOutputter
>>>>> here as it usually does not have any problems with namespaces. This seems to
>>>>> be a parsing issue.
>>>>
>>>> It is a bug in the SAXHandler class where attributes with a different
>>>> Namespace are only detected by their QName and not by the different
>>>> Namespace-URI. I attached a patch that fixes this bug.
>>>> It would be great, if this could be integrated and released soon in a
>>>> version 1.1.2.
>>>>
>>>> regards
>>>>
>>>> Thomas Scheffler
>>>>
>>>>>> On Wed, Jul 20, 2011 at 8:23 AM, Thomas Scheffler
>>>>>> <thomas.scheffler at uni-jena.de> wrote:
>>>>>>>
>>>>>>> Hi,
>>>>>>>
>>>>>>> if I parse a valid MODS document with XML Schema validation, JDOM
>>>>>>> changes
>>>>>>> attributes as it handles default values of schema not correctly (by
>>>>>>> ignoring
>>>>>>> the namespace).
>>>>>>>
>>>>>>> Here is a short code to demonstrate this:
>>>>>>>
>>>>>>> SAXBuilder builder = new SAXBuilder(true);
>>>>>>> builder.setFeature("http://xml.org/sax/features/namespaces", true);
>>>>>>> builder.setFeature("http://xml.org/sax/features/namespace-prefixes",
>>>>>>> true);
>>>>>>>
>>>>>>> builder.setFeature("http://apache.org/xml/features/validation/schema",
>>>>>>> true);
>>>>>>>
>>>>>>> Document document = builder.build(new
>>>>>>>
>>>>>>> URL("http://academiccommons.columbia.edu/download/fedora_content/show_pretty/ac:111060/CONTENT/ac111060_description.xml"));
>>>>>>> XMLOutputter xout = new XMLOutputter(Format.getPrettyFormat());
>>>>>>> xout.output(document, System.out);
>>>>>>>
>>>>>>> Here is a result fragment:
>>>>>>>
>>>>>>> <name type="simple">
>>>>>>> <namePart type="family">Edwards</namePart>
>>>>>>> <namePart type="given">Stephen A.</namePart>
>>>>>>> <role>
>>>>>>> <roleTerm type="text">author</roleTerm>
>>>>>>> </role>
>>>>>>> <affiliation>Columbia University. Computer Science</affiliation>
>>>>>>> </name>
>>>>>>>
>>>>>>> If you look at the original document you can see, that @type of name
>>>>>>> is
>>>>>>> "personal". The "simple" comes from the xlink XML-Schema that was
>>>>>>> included
>>>>>>> by the MODS-Schema. Therefor the result fragment should look like
>>>>>>> this:
>>>>>>>
>>>>>>> <name type="personal" xlink:type="simple">
>>>>>>> <namePart type="family">Edwards</namePart>
>>>>>>> <namePart type="given">Stephen A.</namePart>
>>>>>>> <role>
>>>>>>> <roleTerm type="text">author</roleTerm>
>>>>>>> </role>
>>>>>>> <affiliation>Columbia University. Computer Science</affiliation>
>>>>>>> </name>
>>>>>>>
>>>>>>> If I use DOM from Java this is done correctly (but a bit ugly as it
>>>>>>> does not
>>>>>>> use the namespace prefix already defined).
>>>>>>>
>>>>>>> Could someone just fix this, please?
>
> _______________________________________________
> To control your jdom-interest membership:
> http://www.jdom.org/mailman/options/jdom-interest/youraddr@yourhost.com
>
More information about the jdom-interest
mailing list