[jdom-interest] Fwd: Formatting differences after migrating to JDOM2
Robert Krüger
krueger at lesspain.de
Mon Oct 7 00:40:12 PDT 2013
On Sun, Oct 6, 2013 at 10:05 PM, Rolf <jdom at tuis.net> wrote:
> Hi Robert.
>
> OK. I have spent some time going through things, and, admittedly, this is
> confusing, and working through the combinations/permutations for formatting
> is liable to end in a headache.
>
> So, I think I have resolved that there are a number of issues at hand in
> your case:
> 1. JDOM2 is doing different things than JDOM1
> 2. JDOM1 is probably doing the wrong thing in this case
> 3. JDOM2 is also probably doing the wrong thing, but, in fairness, changing
> the 'TextMode' of a PrettyPrint format is a 'dangerous' thing .... not by
> design, but because of the actual implementation and choices the formatter
> makes with the pretty format.
> 4. If whitespace is significant for certain members of an XML document then
> you should not be relying on the whim of JDOM to make things right, but you
> should be using the xml:space="preserve" mechanism that is designed for this
> purpose.
>
> So, here are a few 'answers'.
>
> Answer 0:
> =====================================================
> The output you are getting from JDOM 1.x is broken. If you have a 'preserve'
> text mode then there should be no whitespace between any elements
> (indenting/newlines) because that is not 'preserved' space (it's 'invented'
> whitespace).
Yes, after thinking about it that was more or less the answer I expected.
>
> The JDOM output you currently get is relying on a bug in JDOM 1.x
>
> Answer 1:
> =====================================================
> The "right" thing for you to do is to add the xml:space="preserve" to the
> sub2 elements:
>
>
> public static void main(String argv[]) throws Exception{
> Document document = new Document();
> Attribute cloneme = new Attribute("space", "preserve",
> Namespace.XML_NAMESPACE);
>
> Element root = new Element("root");
> document.addContent(root);
> Element sub1 = new Element("sub1");
> root.addContent(sub1);
> sub1.addContent(new Element("sub2").setText("Some
> text").setAttribute(cloneme.clone()));
> sub1.addContent(new Element("sub2").setText(" text with left and
> right whitespace ").setAttribute(cloneme.clone()));
> Format fmt = Format.getPrettyFormat();
> XMLOutputter xout = new XMLOutputter(fmt);
> xout.output(document, System.out);
> }
>
> Gives the output:
>
> <root>
> <sub1>
> <sub2 xml:space="preserve">Some text</sub2>
> <sub2 xml:space="preserve"> text with left and right whitespace
> </sub2>
> </sub1>
> </root>
>
> Answer 2:
> =====================================================
> The "OK" thing for you to do is to use the TextMode.TRIM_FULL_WHITE instead
> of TextMode.PRESERVE... the default TextMode for PrettyPrint is
> TextMode.TRIM, which removes white-space from either-end of the text, but
> the TRIM_FULL_WHITE will remove whitespace only when there's only
> whitespace, and will do nothing if there's any non-whitespace characters. I
> want you to be aware that other tools (JDOM, xmllint) have the right to mess
> with the whitespace ( http://www.w3.org/TR/REC-xml/#sec-white-space ). It is
> only by convention that the following will work in JDOM (I recommend
> preserving whitespace correctly with xml:space="preserve") :
>
>
> public static void main(String argv[]) throws Exception{
> Document document = new Document();
> Element root = new Element("root");
> document.addContent(root);
> Element sub1 = new Element("sub1");
> root.addContent(sub1);
> sub1.addContent(new Element("sub2").setText("Some text"));
> sub1.addContent(new Element("sub2").setText(" text with left and
> right whitespace "));
> Format fmt = Format.getPrettyFormat();
> fmt.setTextMode(Format.TextMode.TRIM_FULL_WHITE);
> XMLOutputter xout = new XMLOutputter(fmt);
> xout.output(document, System.out);
> }
>
> Gives the output:
>
>
> <root>
> <sub1>
> <sub2>Some text</sub2>
> <sub2> text with left and right whitespace </sub2>
> </sub1>
> </root>
>
> Answer 3:
> =====================================================
> JDOM 2.x uses a different (faster, and more flexible) algorithm for output
> handling. This algorithm has two major triggers: The TextMode and the
> Indent. PrettyPrint sets the TextMode to TRIM and the Indent to two spaces "
> ". The TRIM mode tells JDOM it can mess with whitespace in Text. The INDENT
> tells JDOM it can mess with the formatting of the XML structure (setting it
> to null tells JDOM not to mess with any indenting).
> You have been changing the TextMode to PRESERVE, and, as I think about that,
> JDOM should never mess with the indenting when the mode is PRESERVE. JDOM
> has code to make sure that it manages the INDENT and the TextMode correctly
> when they need to change internally, but you are basically setting an
> invalid situation by setting INDENT and PRESERVE at the same time. JDOM
> should handle that better.
>
> But, the right thing to do, is when you set PRESERVE, JDOM2 should output
> the following:
>
> <root><sub1><sub2>Some text</sub2><sub2> text with left and right
> whitespace </sub2></sub1></root>
>
> So, I think there's a bug in JDOM2, and, given the input you have
> (Format.getPrettyFormat().setTextMode(TextMode.PRESERVE) ) It should be
> outputting the above (which is not what you want).
>
> Answer 4:
> =====================================================
> You can use the Raw format, and output the spaces yourself by adding your
> own indenting and newlines.
>
Thanks a lot for your in-depth answer! It helps a lot.
Robert
More information about the jdom-interest
mailing list