[jdom-interest] ID/IDREF
Murray Altheim
Murray.Altheim at eng.sun.com
Mon Sep 25 13:08:13 PDT 2000
bob wrote:
>
> Yah, I figured something like that would work.
>
> My only current sticking point is then having to parse
> the DTD/schema to get the information as which attributes
> are indeed ID attributes. Any hints? ;)
>
> I really don't want to write a DTD parser, and XML-schema
> is too much in flux for me to bother with at the moment. ;)
> Especially since both are theoretically parsed already by
> Xerces (or whichever XML parser is in use to generate the
> JDOM, if any.)
The tough part of writing only a portion of a DTD parser for
me was getting correct the conditional sections, especially
nested ones. You need to parse parameter entity declarations to
catch the condsect keywords, too. You can't merely search
through for ATTLISTs since you don't know which are active
without correct processing of parameter entities and
conditional sections. *But* if you do PEs, conditional sections
and ATTLISTs you've done all you need to grab the ID information.
ATTLISTs on their own are pretty simple. You could get the code
to do the PEs and condsects from an existing parser, say the
Sun or Xerces parser. It's in there.
> Anyhow, id() will probably be amongst the last things I
> implement. (I'm working on following[-sibling] and
> preceeding[-sibling] axes currently, which will make
> it almost completely done.)
Once I got the basic DTD parser written, I actually enjoyed the
challenge of the XPath stuff.
> As far as duplicates, I don't think it's exceptional if there
> isn't uniqueness. Doesn't The Standard simply say first-one-wins?
No, in a document instance, all IDs *must* be unique in a valid XML
document. In well-formed XML you have no IDs whatsoever. If you mean
duplicate attribute specifications in the DTD, then yes, first one
wins:
<!ATTLIST foo
id CDATA #IMPLIED
>
<!ATTLIST foo
id ID #IMPLIED
>
The type of 'id' above would be CDATA, but the below is a validation
error:
<!DOCTYPE blah [
<!ELEMENT blah ( foo )*>
<!ELEMENT foo EMPTY >
<!ATTLIST foo
id ID #IMPLIED
>
]>
<blah>
<foo id="burger"/>
<foo id="burger"/>
</blah>
> btw, Murray, is your XPath implementation built on JDOM?
> Regardless, is it available as source for perusal?
The XML parser project I wrote goes back several years and has been
used internally for various projects I've been involved with, mostly
DTD analysis (such as comparing SGML and DTDs, eg., HTML 4 and XHTML).
Since this wasn't written for an outside audience, isn't bulletproof,
isn't i18n-compatible (it only needs to parse DTDs in US ASCII), and
there's no way I could support it, it's unavailable at this time. I
wrote in support for about 3/4 of an earlier XPath draft, and it
involves a bunch of SAX API extensions I needed to truly analyse,
alter, document and re-constitute DTDs from the parse. If I'd had
time I probably could have provided some input into SAX2 based on
this experience, but like so many things I simply haven't had the
cycles.
Murray
...........................................................................
Murray Altheim, SGML/XML Grease Monkey <mailto:altheim@eng.sun.com>
XML Technology Center
Sun Microsystems, 1601 Willow Rd., MS UMPK17-102, Menlo Park, CA 94025
In the evening
The rice leaves in the garden
Rustle in the autumn wind
That blows through my reed hut. -- Minamoto no Tsunenobu
More information about the jdom-interest
mailing list