[jdom-interest] CDATA content is not preserved

Jason Hunter jhunter at xquery.com
Wed Nov 24 16:39:08 PST 2004


 From the XML 1.0 spec:

--
2.11 End-of-Line Handling

To simplify the tasks of applications, the XML processor MUST behave as 
if it normalized all line breaks in external parsed entities (including 
the document entity) on input, before parsing, by translating both the 
two-character sequence #xD #xA and any #xD that is not followed by #xA 
to a single #xA character.
--

Now, we did decide to have XMLOutputter write 
 to preserve any 
carriage returns set via setText().  That code's in the 
escapeElementEntities() method.  But of course you can't write 
 in 
a CDATA section because it would be treated literally.

I'm not sure there's a way to preserve \r\n in a CDATA section in XML.

-jh-

Andreas Schaefer wrote:

> Hi Geeks
> 
> I just stumbled over a problem with CDATA that does not preserve the
> content of the given text when read from a file. I used the XMLOutputter
> to write to a file and then the SAXBuilder to read from the file. All
> the values are embedded into a CDATA content tag and then added to an
> element.
> 
> This is the text I try to write on a Windows system:
> 
>             "testSingleRecordAndDropEol(), I am your 2." +
> mLineSeparator + "(testSingleRecordAndDropEol()) test message"
> 
> whereas the line separator is taken from the System Properties.
> 
> That is the content of the CDATA tag (read from the file) which is
> correct with respect to the text above (in quotes are the characters of
> the string, in brackets the number of the character), please pay
> attention to the third line (I added a '>' and '<' around it):
> 
> 't' (29) 'e' (14) 's' (28) 't' (29) 'S' (28) 'i' (18) 'n' (23) 'g' (16)
> 'l' (21) 'e' (14) 'R' (27) 'e' (14) 'c' (12) 'o' (24) 'r' (27) 'd' (13)
> 'A' (10) 'n' (23) 'd' (13) 'D' (13) 'r' (27) 'o' (24) 'p' (25) 'E' (14)
> 'o' (24) 'l' (21) '(' (-1) ')' (-1) ',' (-1) ' ' (-1) 'I' (18) ' ' (-1)
> 'a' (10) 'm' (22) ' ' (-1) 'y' (34) 'o' (24) 'u' (30) 'r' (27) ' ' (-1)
> '2' (2) '.' (-1) '
> 
>>' (-1) '<
> 
> ' (-1) '(' (-1) 't' (29) 'e' (14) 's' (28) 't' (29) 'S' (28) 'i' (18)
> 'n' (23) 'g' (16) 'l' (21) 'e' (14) 'R' (27) 'e' (14) 'c' (12) 'o' (24)
> 'r' (27) 'd' (13) 'A' (10) 'n' (23) 'd' (13) 'D' (13) 'r' (27) 'o' (24)
> 'p' (25) 'E' (14) 'o' (24) 'l' (21) '(' (-1) ')' (-1) ')' (-1) ' ' (-1)
> 't' (29) 'e' (14) 's' (28) 't' (29) ' ' (-1) 'm' (22) 'e' (14) 's' (28)
> 's' (28) 'a' (10) 'g' (16) 'e' (14)
> 
> This is what I get back from the CDATA element after the file is read by
> the SAXBuilder:
> 
> 't' (29) 'e' (14) 's' (28) 't' (29) 'S' (28) 'i' (18) 'n' (23) 'g' (16)
> 'l' (21) 'e' (14) 'R' (27) 'e' (14) 'c' (12) 'o' (24) 'r' (27) 'd' (13)
> 'A' (10) 'n' (23) 'd' (13) 'D' (13) 'r' (27) 'o' (24) 'p' (25) 'E' (14)
> 'o' (24) 'l' (21) '(' (-1) ')' (-1) ',' (-1) ' ' (-1) 'I' (18) ' ' (-1)
> 'a' (10) 'm' (22) ' ' (-1) 'y' (34) 'o' (24) 'u' (30) 'r' (27) ' ' (-1)
> '2' (2) '.' (-1) '
> ' (-1) '(' (-1) 't' (29) 'e' (14) 's' (28) 't' (29) 'S' (28) 'i' (18)
> 'n' (23) 'g' (16) 'l' (21) 'e' (14) 'R' (27) 'e' (14) 'c' (12) 'o' (24)
> 'r' (27) 'd' (13) 'A' (10) 'n' (23) 'd' (13) 'D' (13) 'r' (27) 'o' (24)
> 'p' (25) 'E' (14) 'o' (24) 'l' (21) '(' (-1) ')' (-1) ')' (-1) ' ' (-1)
> 't' (29) 'e' (14) 's' (28) 't' (29) ' ' (-1) 'm' (22) 'e' (14) 's' (28)
> 's' (28) 'a' (10) 'g' (16) 'e' (14) |#]
> 
> As you can see the third line is missing when read back from the
> SAXBuilder. I guess the CDATA element does swallow either the line feed
> or carriage return. Because I am using an XML file to transfer test data
> back from a server I need preserve the content exactly and cannot afford
> to lose a character.
> 
> Or am I mistaken by thinking that CDATA does preserve the content giving
> to it?
> 
> Have a nice Turkey Day (Goobble-Goobble)
> Andreas Schaefer
> Senior Software Engineer
> 
> Upcoming Maven Presentation @ LA-JUG 12/7/04
> 
> 
> 
> _______________________________________________
> To control your jdom-interest membership:
> http://www.jdom.org/mailman/options/jdom-interest/youraddr@yourhost.com
> 


More information about the jdom-interest mailing list