[jdom-interest] End-of-line sequence.

Rolf jdom at tuis.net
Mon Nov 14 17:29:00 PST 2011


Hi all.

JDOM has been merrily using "\r\n" as an end-of-line sequence in the 
XMLOutputer since 'forever'. The XML Spec indicates that all end-of-line 
sequences should be normalized to a single '\n': 
http://www.w3.org/TR/REC-xml/#sec-line-ends The wording is such that XML 
parsers should clear out any extra '\r' characters if there are any, so 
it is not as if the code is completely broken.

But, I think it makes sense to follow the spec, and avoid having 
different XML compared to other systems.

I propose changing the line separator to follow the spec, but this has a 
very large impact on anyone who has expectations on JDOM having a 
particular line-terminator, even though they shouldn't...

I have filed https://github.com/hunterhacker/jdom/issues/53

The original decision was made by Elliotte: 
http://markmail.org/message/gv7m3xjgrkomrfe7   (it's worth noting that 
it was changed from the 'platform default' to the constant '\r\n' to 
create some consistency too).

vvv quote vvv

The one open question in this version is what to use for a line 
separator. Right now I'm using \r\n since that's most cross-platform 
compatible and friendliest to various network protocols. However, \n 
alone might be slightly friendlier to XML parsers. Another possibility 
is to ask for System.getProperty("line.separator"). However, I'm loathe 
to make the output platform dependent. What do people think?

^^^ quote ^^^

Also, the commit introducing this has interesting comments: 
https://github.com/hunterhacker/jdom/commit/958fb22a4c7088b82f0d48a933bdf4e5c6806151#L0R173

Two issues I see:
1. "\r\n" was chosen for 'Network protocol' friendliness... is this 
still a valid argument?

2. is it OK to change the standard format of all the XML that JDOM 
produces? (I have been really careful (so far) for the most part to 
ensure all whitespace (including indents and EOL/EOF is not changed) ).

I see changing the default EOL as being an easy decision, especially 
since users can still change it back easily on their Format instance.

advantages:
1. Most XML tools do not use "\r" values - better compatibility?
2. XML output will be slightly smaller - ;-)
3. XML produced by 'other' outputters (currently the StAX outputters) 
can be compared directly with XMLOutputter for testing/compatibility

disadvantages:
1. people may have 'baselines' that contain \r\n terminators, which will 
then be different from JDOM's default output.
2. there may be some (obscure) protocols that require \r\n terminators 
and users of JDOM2 will have to override the EOL to be '\r\n' for those.

Anyone have comments/suggestions?

Rolf


More information about the jdom-interest mailing list