[jdom-interest] Entity resolving - design problem
Todd O'Bryan
toddobryan at mac.com
Thu Oct 23 18:17:13 PDT 2003
On Thursday, October 23, 2003, at 07:01 AM, Robert J Munro wrote:
> Todd O'Bryan wrote:
>
>> There is, in fact, a way to do this. You can subclass a Reader and
>> intercept the character stream on the way into the Parser. If you get
>> an ampersand followed by one of the entities you don't want to
>> expand, you pass it on as &entity;, if not, you just pass them
>> on.
>>
>> When you write the file back through the Writer you'll have to be
>> sure that you intercept again and change &entity; back to
>> &entity; on the way out.
>>
>> All in all, it's about twenty lines of code overwriting read() and
>> write() in subclasses of Reader and Writer.
>>
>> Email me if you need more specifics,
>> Todd
>
> That sounds like a horrendously bad idea. It goes completely against
> the whole principle of JDOM (i.e. that you deal with the data, not
> with the XML).
Until XML can do a round-trip with entities, this will continue to be a
problem. I was dealing with XML documents created by a client that
included entities which were nowhere defined. Yes, I realize undefined
entities lead to malformed XML (not just invalid), but the funny thing
is, the client was not terribly open to the idea that they should have
to fix up their bad XML before I would process it. And I could not
afford to wait and see which new undefined entity would crash my
program in a new batch of data they hadn't sent me. Got a less
horrendously bad idea now?
>
> I think the best solution in this case is to use an extra attribute in
> your own namespace (something like <img my:file="name.jpg" />) to say
> what the image filename is without a directory while it is XML, then
> generate the real src attribute with a URL by later.
You're probably right. When you're defining the format, a hack like the
one above is not the best choice. It is, however, doable. And if the
things that people called entities are data and not just entities, then
you have to deal with them.
A good example of this would be something like &date; which presumably
prints out the current date. If you resolve that on your parse, fiddle
with it and then want to re-write the original document with your
changes, you're screwed. The fact that "October 23, 2003" was once
"&date;" is just lost information. Fine if XML were only intended to go
one way, but it's not.
In the spec, they made it possible to do things with entities that are
just a really bad idea, and some of the documentation even suggested
doing these things. Then people do them, and tie themselves in knots,
and get annoyed.
>
> Javascript sections could be fixed by defining an image directory in
> .js files on each location, then changing:
> document.blah.src="/path/another.gif"
> to
> document.blah.src= imagedirectory + "another.gif"
>
> The solution I would use, however, is to put the images in the same
> location on both servers, either relative to the root of the server,
> or relative to the documents that reference them. If both those
> options really are impossible, then I'd put the images on a public
> server, and have them both point to them with absolute URLs.
>
Umm, how would you do this if you don't have access rights to the same
directory structures on the two servers? And wouldn't it be a
horrendously bad idea to make someone viewing a file on a local server
wait while the images are fetched from another server just so you don't
have to deal with resolving different file prefixes?
Todd
More information about the jdom-interest
mailing list