[jdom-interest] BUG: XMLOutputter inserts extra empty lines

Jason Hunter jhunter at acm.org
Wed Dec 5 18:52:19 PST 2001


I'm starting to think that instead of having separate flags we just have
different modes with options:

RAW: prints the document as it is in memory, default

PRETTY: prints the document "pretty" (whitespace altered)
  Option to change indent string (tab, spaces)
  Option to change line width (72, 80, 0 for no wrap)

COMPRESSED: prints the document with whitespace removed

We might even implement it with subclasses of XMLOutputter to break out
the code and allow others to subclass if they want special tweaking.  Or
we might allow pluggable code units, which is more flexible than
subclassing.  outputter.setWhitespaceLogic(xxx).

Thoughts?

-jh-

"Bradley S. Huffman" wrote:
> 
> > This just goes to prove the adage that all whitespace handling in XML is a
> > pain.
> 
> Yes it is!
> 
> This post goes to the question of what does newlines really imply?  It's
> sounds easy at first.  We just use newlines, normalize, and indent to take:
> 
>   <payroll>
>   <employee><firstname>     Brad</firstname><lastname>Huffman</lastname>
>   </employee>
>   <employee><firstname>John     </firstname>
>   <lastname>Doe</lastname> </employee>
>   </payroll>
> 
> To make it aesthetically pleasing as in:
> 
>   <payroll>
>       <employee>
>           <firstname>Brad</firstname>
>           <lastname>Huffman</lastname>
>       </employee>
>       <employee><firstname>John</firstname>
>           <lastname>Doe</lastname>
>       </employee>
>   </payroll>
> 
> But there are 4 situations where can have text content, between
> <start><start>, <start></end>, </end><start>, and </end></end> tags.
> For the most common case of short text content between a start and end tag
> a single line is what we want case it looks best.
> 
>           <firstname>Brad</firstname>
>           <lastname>Huffman</lastname>
> 
> But then in cases like:
> 
>   </employee>
>                     Some       randomly spaced  text    <employee>
> 
> With newlines ON, how should this be printed?  As is? With
> leading/trailing whitespace trimmed? With Leading/trailing whitespace
> trimmed and text aligned with </employee> or <employee>? Something else?
> It all depends on how newlines is defined.
> 
> With newlines ON and normalize ON, currently leading/trailing
> whitespace are semantically insignificant and we are free to add/remove
> them to produced the desired alignment (hmmm, that's not quite true).  But
> what if normalization is OFF.  Should leading/trailing whitespace be
> insignificant in all cases, in some cases? It gets confusing quickly!
> 
> Right now it seems text content between tags is insignificant
> ONLY if it is empty or all whitespace (when newlines is ON).  Which
> means the example above will be printed "as is" with newlines ON and
> normalization OFF, which is kind of ugly for a pretty-print mode.  Even
> with normalization ON the last tag <employee> will be printed on the same
> line as the text while it's corresponding end tag (assume non-empty content)
> is align with the previous end tag, again ugly IMHO.
> 
> Hmmm, the above paragraph isn't really true either. Try setting newlines
> to true, indent to "xxxx" (so you can see where indentation is add), and
> normalize to false. You'll get a line separator and indentation after text
> that is empty, and before and after text that is all whitespace. Very weird
> behavior.
> 
> After careful thought, I purpose undeprecating textTrim and defining the
> following modes for XMLOutputter. Basically using the premise that turning
> newlines ON means we care more about how it looks than the semantic meaning
> of whitespace.  For the most part everything stays the same as what we have
> now (or would expect to have) except for the cases with text between
> <tag><tag>, </tag><tag>, or </tag></tag> when newlines and
> trimming/normalizing is on.
> 
>      Default:
>           No content is added to or removed from a element's content.
> 
>      textTrim:
>           Leading/trailing whitespace are insignificant and can be removed.
>           With newlines ON, whitespace might be added back to fit alignment
>           needs.
> 
>      textNormalize:
>           Same as textTrim, but interior whitespace is compressed to
>           a single space.
> 
>      newlines (textTrim and textNormalize OFF):
>           Empty content or "whitespace ONLY" content between tags is
>           insignificant. Text content that contains one or more non-whitespace
>           characters are left untouched and no leading/trailing whitespace
>           are added/removed.
> 
>      newlines (textTrim or textNormalize ON):
>          Case of <tag>text</tag>:
>               Start tag, text, and end tag are printed on single line with
>               trimming/normalization of text.
> 
>          Case of <tag>text<tag>, </tag>text<tag>, </tag>text</tag>:
>               Start tag, text, and end tag are aligned. Text is trimmed/
>               normalized before alignment.
> 
> Some other possible modes might be:
> 
>     canonical:
>          See http://www.w3.org/TR/xml-c14n.  Even though I think it would
>          be better to have a converter to transform the Document itself
>          XMLOutputter is already close to outputting in canonical form it
>          might be worth it to have both.
> 
>     line wrap or text wrap?
>          wrap a line after so many chars, or maybe just wrap text.
>          Might help with some HTML/XHTML, or might this functionality be
>          better left to something like HTML Tidy.
> 
>     alignText:
>         Treat all text content like tags and align them. Example
>         <name>Bradley S. Huffman</name> could become:
> 
>              <name>
>                  Bradley S. Huffman
>              </name>
> 
> And the possibilities go on and on. Feedback?
> 
> Brad
> _______________________________________________
> To control your jdom-interest membership:
> http://lists.denveronline.net/mailman/options/jdom-interest/youraddr@yourhost.com



More information about the jdom-interest mailing list