[jdom-interest] Deprecating some XMLOutputter constructors

guru at stinky.com guru at stinky.com
Tue Jun 19 15:15:47 PDT 2001


On Tue, Jun 19, 2001 at 05:25:41PM -0400, Alex Rosen wrote:
> (1) Is this is only a problem if you have mixed content?

Not really.  The real problem is that XML parsers do unexpected things
with text and whitespace.  One of the things XML Schema was going to
do was define which part of an element's content is significant, but I
lost patience with the whole XML spec circus long ago, so I'm not even
clear on what "ignorable whitespace" is, let alone what XML Schema
does or doesn't do.

If you SAX parse

<hello>
  my honey
</hello>

The value of getRootElement().getText() is "\n my honey\n", which when
pretty-printed with indent + newlines turns into something less pretty
due to the unexpected surrounding whitespace.

(I think "ignorable whitespace" only applies to space between element
tags, not within text blocks, so even if your parser ignores ignorable
whitespace, it won't help.  However, the solution probably is on the
parser end -- make a SAX filter that strips whitespace correctly for
your app, then don't use setTextNormalize on output.)

> (2) I guess I'm surprised that we try to do anything to character data
> that's not just whitespace.  

We don't, unless told to via setTextNormalize(true). 

> It would be nice if a pretty-printed document was logically unchanged if you
> don't have mixed content, or if you ignored all ignorable whitespace.

Yeah, it would also be nice if XML were a clear, easy-to-understand,
well-specified document format.

 - Alex

> > Unfortunately, then when you pretty-print text that's already pretty,
> > it becomes less so.
> >
> > If your input is a file containing
> >
> > <hello>
> >   my honey
> >   <hello>my baby</hello>
> > </hello>
> >
> > Then XMLOutputter("  ", true) (no normalization) will give you
> >
> > <?xml version="1.0" encoding="UTF-8"?>
> > <hello>
> >
> >   my honey
> >
> >   <hello>my baby</hello>
> >
> >
> > </hello>

-- 
Alex Chaffee                       mailto:alex at jguru.com
jGuru - Java News and FAQs         http://www.jguru.com/alex/
Creator of Gamelan                 http://www.gamelan.com/
Founder of Purple Technology       http://www.purpletech.com/
Curator of Stinky Art Collective   http://www.stinky.com/



More information about the jdom-interest mailing list