[jdom-interest] JDOM Issue #5 - DTD-aware Attribute output
Rolf Lear
jdom at tuis.net
Fri Mar 23 06:21:41 PDT 2012
Hi Paul.
If you were wondering why no-one on the list has commented, it may be
because you you never sent it to the list, just to me ... ;-), so I have
CC'd the list for you...
Anyway, I have been looking in to things, and I think the problem is
that you have missed a detail in the way the data is processed.
Using your example document:
http://svn.activemath.org/LeAM-calculus/LeAM_calculus/oqmath/contin.oqmath
This document (apart from being 'big'), refers to a single DTD, which,
in the case of this document, only really defaults one attribute:
'scheme' on the 'competency' element (which defaults to "PISA").
Now, as far as I know, there are only the following ways to reference
content of the DTD:
If you are doing no DTD validation, the DTD will still be accessed to
resolve entity references. But, that is the *only* thing that will be
pulled form the DTD.
If you do validation, then the entire DTD is read, and the validation is
done, and any attributes defaulted in the DTD will be created in the XML
'Model'.
So, it is my understanding that it is impossible to have 'all the
defaulted attributes' without also having done the full DTD Validation.
As it happens, I often use the tool 'xmllint' (available on most unix
systems, including linux) to check my understanding, and, I may be wrong
on this because xmllint has the argument --dtdattr which appears to do a
partial thing of loading the defaulted attrs, but not a full validation...
Anyway, the point is that, using JDOM, and standard SAX parsing, the
only time you could have had 'all the defaulted attrs was when you were
doing full validation anyway... and that full validation fails.
So, if you do not do validating, you will not get the 'scheme'
attributes, and you will not output the scheme attributes (you do not
have them to output...).
If you do validating, then you have the scheme attributes, and then you
can now choose to ignore them on the output with the new Format setting.
Your particular problem is confusing to me, and there must be something
I am missing.... I can't figure out why you think you are getting all
the defaulted attributes when it is clear you are not validating...
So, that is my first issue, and I think it means that you are confused
too ;-)
The second issue with the namespace declarations is also confusing to
me. In your example document, every single namespace declaration is
essential.... not a single one is 'redundant'.
Is it possible that it is just a bad example?
Anyway, at the worst possible case, I have a hack that would probably
make you happy, but makes me cringe.... I would rather understand your
problem properly before I suggest it.
Thanks
Rolf
On 22/03/2012 4:27 PM, Paul Libbrecht wrote:
>
> Hello list,
>
> Rolf has been so kind to show me how JDOM issue #5 can be run.
>
> So I ran the following snippet:
>
> SAXBuilder builder = new SAXBuilder(XMLReaders.DTDVALIDATING);
> Document doc = builder.build(new URL(args[0]));
> Format speconly = Format.getRawFormat();
> speconly.setSpecifiedAttributesOnly(true);
> XMLOutputter xout = new XMLOutputter(speconly);
> xout.output(doc, System.out);
>
> which allows me to parse a JDOM source, make modifications (typically: refactorings), then output with almost no difference.
>
> The big advantage to that is that the attributes that were not there... are simply not injected from the DTD.
> This is enormous in some XML editing tradition which uses implied values a lot.
>
> There's two BUT:
>
> 1) This currently fails if the validation fails and this is a big problem to me.
> An example file would be the following:
> http://svn.activemath.org/LeAM-calculus/LeAM_calculus/oqmath/contin.oqmath
> which references a DTD nearby. This is a manually edited file.
>
> Removing the validation, sadly disables the passing of attribute presence info, it seems.
> Rolf, is there a way that the attribute presence info is passed but the validation is not stopped?
>
>
> 2) namespace declarations, which are kind of attributes, still resurface. They should be avoided if not present ideally. Doable?
>
> The approach of Rolf is better than the one I had because mine was simply checking in the DTD if the attribute was provided by it and, if yes, removing its output while in Rolf's approach, an attribute that is there is output if... it was there, simply!
>
> Thanks for comments.
>
> paul
More information about the jdom-interest
mailing list