[jdom-interest] BUG: XMLOutputter inserts extra empty lines

james todd james.todd at Sun.COM
Thu Dec 6 15:41:02 PST 2001


i'm *hugely* negligent in tracking jdom internals, suggestions and the
like but i do have this thought on output formatting ... would it be
viable to create a "filter" structure whereby one can provide a list of
adapters that are singularly focussed?

that way, a series of singularly tasked formatters could exist upon
which a default "formatting list" exists which results in expected
behaviour *but* if one wants to re-arrange the formatters, drop some
and/or add others this scheme might prove to be a bit more flexible/
forgiving.

this would very likely be more expensive as it would require a traversal
for each filter ... but this is the only downside i see at the moment.

- james

Jason Hunter wrote:
> 
> I'm starting to think that instead of having separate flags we just have
> different modes with options:
> 
> RAW: prints the document as it is in memory, default
> 
> PRETTY: prints the document "pretty" (whitespace altered)
>   Option to change indent string (tab, spaces)
>   Option to change line width (72, 80, 0 for no wrap)
> 
> COMPRESSED: prints the document with whitespace removed
> 
> We might even implement it with subclasses of XMLOutputter to break out
> the code and allow others to subclass if they want special tweaking.  Or
> we might allow pluggable code units, which is more flexible than
> subclassing.  outputter.setWhitespaceLogic(xxx).
> 
> Thoughts?
> 
> -jh-
> 
> "Bradley S. Huffman" wrote:
> >
> > > This just goes to prove the adage that all whitespace handling in XML is a
> > > pain.
> >
> > Yes it is!
> >
> > This post goes to the question of what does newlines really imply?  It's
> > sounds easy at first.  We just use newlines, normalize, and indent to take:
> >
> >   <payroll>
> >   <employee><firstname>     Brad</firstname><lastname>Huffman</lastname>
> >   </employee>
> >   <employee><firstname>John     </firstname>
> >   <lastname>Doe</lastname> </employee>
> >   </payroll>
> >
> > To make it aesthetically pleasing as in:
> >
> >   <payroll>
> >       <employee>
> >           <firstname>Brad</firstname>
> >           <lastname>Huffman</lastname>
> >       </employee>
> >       <employee><firstname>John</firstname>
> >           <lastname>Doe</lastname>
> >       </employee>
> >   </payroll>
> >
> > But there are 4 situations where can have text content, between
> > <start><start>, <start></end>, </end><start>, and </end></end> tags.
> > For the most common case of short text content between a start and end tag
> > a single line is what we want case it looks best.
> >
> >           <firstname>Brad</firstname>
> >           <lastname>Huffman</lastname>
> >
> > But then in cases like:
> >
> >   </employee>
> >                     Some       randomly spaced  text    <employee>
> >
> > With newlines ON, how should this be printed?  As is? With
> > leading/trailing whitespace trimmed? With Leading/trailing whitespace
> > trimmed and text aligned with </employee> or <employee>? Something else?
> > It all depends on how newlines is defined.
> >
> > With newlines ON and normalize ON, currently leading/trailing
> > whitespace are semantically insignificant and we are free to add/remove
> > them to produced the desired alignment (hmmm, that's not quite true).  But
> > what if normalization is OFF.  Should leading/trailing whitespace be
> > insignificant in all cases, in some cases? It gets confusing quickly!
> >
> > Right now it seems text content between tags is insignificant
> > ONLY if it is empty or all whitespace (when newlines is ON).  Which
> > means the example above will be printed "as is" with newlines ON and
> > normalization OFF, which is kind of ugly for a pretty-print mode.  Even
> > with normalization ON the last tag <employee> will be printed on the same
> > line as the text while it's corresponding end tag (assume non-empty content)
> > is align with the previous end tag, again ugly IMHO.
> >
> > Hmmm, the above paragraph isn't really true either. Try setting newlines
> > to true, indent to "xxxx" (so you can see where indentation is add), and
> > normalize to false. You'll get a line separator and indentation after text
> > that is empty, and before and after text that is all whitespace. Very weird
> > behavior.
> >
> > After careful thought, I purpose undeprecating textTrim and defining the
> > following modes for XMLOutputter. Basically using the premise that turning
> > newlines ON means we care more about how it looks than the semantic meaning
> > of whitespace.  For the most part everything stays the same as what we have
> > now (or would expect to have) except for the cases with text between
> > <tag><tag>, </tag><tag>, or </tag></tag> when newlines and
> > trimming/normalizing is on.
> >
> >      Default:
> >           No content is added to or removed from a element's content.
> >
> >      textTrim:
> >           Leading/trailing whitespace are insignificant and can be removed.
> >           With newlines ON, whitespace might be added back to fit alignment
> >           needs.
> >
> >      textNormalize:
> >           Same as textTrim, but interior whitespace is compressed to
> >           a single space.
> >
> >      newlines (textTrim and textNormalize OFF):
> >           Empty content or "whitespace ONLY" content between tags is
> >           insignificant. Text content that contains one or more non-whitespace
> >           characters are left untouched and no leading/trailing whitespace
> >           are added/removed.
> >
> >      newlines (textTrim or textNormalize ON):
> >          Case of <tag>text</tag>:
> >               Start tag, text, and end tag are printed on single line with
> >               trimming/normalization of text.
> >
> >          Case of <tag>text<tag>, </tag>text<tag>, </tag>text</tag>:
> >               Start tag, text, and end tag are aligned. Text is trimmed/
> >               normalized before alignment.
> >
> > Some other possible modes might be:
> >
> >     canonical:
> >          See http://www.w3.org/TR/xml-c14n.  Even though I think it would
> >          be better to have a converter to transform the Document itself
> >          XMLOutputter is already close to outputting in canonical form it
> >          might be worth it to have both.
> >
> >     line wrap or text wrap?
> >          wrap a line after so many chars, or maybe just wrap text.
> >          Might help with some HTML/XHTML, or might this functionality be
> >          better left to something like HTML Tidy.
> >
> >     alignText:
> >         Treat all text content like tags and align them. Example
> >         <name>Bradley S. Huffman</name> could become:
> >
> >              <name>
> >                  Bradley S. Huffman
> >              </name>
> >
> > And the possibilities go on and on. Feedback?
> >
> > Brad
> > _______________________________________________
> > To control your jdom-interest membership:
> > http://lists.denveronline.net/mailman/options/jdom-interest/youraddr@yourhost.com
> _______________________________________________
> To control your jdom-interest membership:
> http://lists.denveronline.net/mailman/options/jdom-interest/youraddr@yourhost.com



More information about the jdom-interest mailing list