Output validation (Re: [jdom-interest] & problems)

Tatu Saloranta cowtowncoder at yahoo.com
Mon Sep 12 13:33:39 PDT 2005


--- Paul Libbrecht <paul at activemath.org> wrote:

> 
...
> DTD-awareness would definitely help some output.
> Our DTD is a mix of several DTDs and contains
> several implicit 
> attributes such as namespace declarations... I have
> proposed a patch to 
> XMLOutputter which uses a DTD-parser loaded DTD in
> order to avoid the 
> addition of these implicit attributes... our
> re-output sources are 
> suddenly readable again!

This should definitely be possible with DTD-awareness.
Another similar thing is indentation; to safely
indent, one has to know if content is mixed or not
(can only safely indent data-oriented, ie. non-mixed
content).

StAX events actually do have (optional) knowledge of
whether they were the result of default value or not,
and I think SAX attributes may have it too (I might be
wrong here though). But that only helps if you pass
through existing elements and attributes. On the other
hand, when creating new ones, you would just leave
defaulted values out.

> But I would more call this usage of DTD a "notation"
> usage instead of a 
> validation...

True; but actual validation could (and perhaps should)
also be done; to throw exceptions on illegal calls. In
case of JDOM, this would only happen when the whole
document was serialized, not when modifying the tree
prior to output.
For SAX events, it'd be kind of easy, since many
validators use events for validation already. So it'd
just be matter of connecting output events to a
validator.
Similarly for StAX. It's trickier for random access
modes (DOM/JDOM tree level), since there are
intermediate invalid states; but not so for streaming
operations, since additions are always ordered, and
you always know when element gets closed (ie. content
can be fully validated up to that point).

> 
> I wouldn't know how to validate the output of
> XMLOutputter except 
> reparse it!!

I think the easiest way is to do it at lower level, in
SAX/StAX code that serializes content (if one is
used).
In a way, this is about the only common thing with
input (parsing) and output (serialization), in
streaming mode. That is, code would be fairly similar.

Having said all above, it's still some amount of work
to tweak the low-level streaming parser... but I think
I'll keep this idea on my todo list after all. ;-)

-+ Tatu +-


__________________________________________________
Do You Yahoo!?
Tired of spam?  Yahoo! Mail has the best spam protection around 
http://mail.yahoo.com 


More information about the jdom-interest mailing list