[jdom-interest] XMLOutputter and utf-8
    Chris Curvey 
    ccurvey at gmail.com
       
    Fri May 20 06:39:52 PDT 2005
    
    
  
Thanks to Jason & Paul for their responses. I tried Jason's suggestion for 
my example, and it works great. (And I realize that this question is 
increasingly off-topic, please forgive me.)
In my real-world problem, I'm not writing to System.out, I'm writing to an 
output stream returned from an HttpsURLConnection. So I tried this:
Document doc = getXML();
XMLOutputter out = new XMLOutputter();
out.setEncoding("UTF-8");
String renderedDoc = out.outputString(doc);
// Construct the request headers
setupHeaders(theConnection, renderedDoc.length());
// Send the request
OutputStream output = theConnection.getOutputStream();
out.output(doc, output);
I don't have access to the server on the other end of that connection, and 
the connection is encrypted, so I can't just put in a proxy server to 
capture the stream to see what's really being sent.
One more data point, which may or may not be important. I have to use the 
Beta-7 version of JDOM, because it's distributed as part of my app server, 
and putting jdom 1.0 earlier in the classpath causes the app server to 
choke. 
Many, many thanks for any help.
-Chris
On 5/20/05, Jason Hunter <jhunter at xquery.com> wrote:
> 
> You're not actually outputting the file to a byte stream. You're
> outputting it to a String, then printing the string using
> System.out.println(). System.out is a PrintStream and per the
> PrintStream Javadocs, "All characters printed by a PrintStream are
> converted into bytes using the platform's default character encoding."
> 
> Try this: out.output(doc, System.out);
> 
> That way JDOM gets to control the bytes being output.
> 
> -jh-
> 
> Chris Curvey wrote:
> 
> > Hi all,
> >
> > I'm having a little trouble figuring out utf-8 encoding with JDom. The
> > output from this sample program is returning a single hex value, \xc9
> > for an E-acute, but according to this page
> > http://www.fileformat.info/info/unicode/char/00c9/index.htm, the UTF-8
> > encoding for E-acute should be a hex pair \xc3 and \x89. (\xc9 appears
> > to be right value for UTF-16.)
> >
> > Any idea what I'm doing wrong? Or am I just misinterpreting something?
> >
> > import org.jdom.Document;
> > import org.jdom.Element;
> > import org.jdom.output.XMLOutputter;
> > import org.jdom.output.Format;
> >
> > class JdomTest
> > {
> > public static void main (String[] argv)
> > {
> > Document doc = new Document();
> > Element element = new Element("foobar");
> > element.setText("CLOISONNÉ");
> > doc.addContent(element);
> >
> > Format format = Format.getPrettyFormat();
> > format.setEncoding("UTF-8");
> > XMLOutputter out = new XMLOutputter(format);
> > System.out.println(out.outputString(doc));
> > }
> > }
> >
> >
> > ------------------------------------------------------------------------
> >
> > _______________________________________________
> > To control your jdom-interest membership:
> > http://www.jdom.org/mailman/options/jdom-interest/youraddr@yourhost.com
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.jdom.org/pipermail/jdom-interest/attachments/20050520/53bad152/attachment.htm
    
    
More information about the jdom-interest
mailing list