[jdom-interest] encoding problem converting XML document to string

Bradley S. Huffman hip at cs.okstate.edu
Wed Nov 5 17:11:15 PST 2003


"William Krick" writes:

> I have a method that converts an XML document to a string...
> 
>   public static String docToString(Document doc) {
>     // build a string from an XML document
>     XMLOutputter outputter = new XMLOutputter("",false);

The above line says XMLOutputter will not try to pretty print (the false
for the newlines argument). Hmmm, old B7 javadocs for setNewlines seems
misleading.

> The resulting string looks like this...
> 
> <?xml version="1.0" encoding="UTF-8"?><NYPA>\u000a  <NUMVEHICLES>1   etc...
> 
> Why are my CRLF being replaced with "\u000a" and two spaces?

That's done by the XML parser per the spec.s section 2.11 "End-of-Line
Handling". Basically all CR/LF are normalized to a single LF (\u000a).
 
> Why is there any whitespace at all in the output?

See above comment about the constructor. What are you wanting your output
to look like? If whitespace is unimportant in your app., and your version
of JDOM is B8 or above, look at doing

     XMLOutputter outputter = new XMLOutputter();
     outputter.setTextNormalize(true);

Brad



More information about the jdom-interest mailing list