[jdom-interest] Todo list [eg]
Rosen, Alex
arosen at silverstream.com
Wed May 2 22:48:14 PDT 2001
Here's the patch.
I'm wondering about the warning:
* <p> Warning: using your own Writer may cause the outputter's
* preferred character encoding to be ignored. If you use
* encodings other than UTF-8, we recommend using the method that
* takes an OutputStream instead. </p>
It's misleading, because (1) the outputter's "preferred character encoding"
(the one from the constructor) is not completely ignored - it's still the one
used for the XML declaration, and (2) if you just make sure you use the same
encoding for the XMLOutputter constructor and the OutputStreamWriter
constructor, then it's fine to use this method. And, the part about UTF-8 seems
confused. How about:
Warning: Outputting to a Writer will use the Writer's encoding as the character
set of the document. However, the encoding name that's written to the XML
declaration <?xml version="1.0" encoding="foo"?> comes from the XMLOutputter
(which defaults to "UTF-8", but can also be set by XMLOutputter's constructor
or the setEncoding() method). You should ensure that these encodings match each
other.
--Alex
P.S. I threw in a bonus fix, to always print out a newline after the DOCTYPE
declaration - point #2 in
http://lists.denveronline.net/lists/jdom-interest/2001-April/005518.html. I'm
fairly sure this is the right thing to do, but feel free to take it out if you
disagree.
> -----Original Message-----
> From: Jason Hunter [mailto:jhunter at acm.org]
> Sent: Tuesday, May 01, 2001 1:42 AM
> To: Rosen, Alex
> Cc: jdom-interest at jdom.org
> Subject: Re: [jdom-interest] Todo list [eg]
>
>
> "Rosen, Alex" wrote:
> >
> > > Older JVMs don't accept UTF-8 as a name. It's a case where
> > > Java doesn't recognize the XML-style name.
> >
> > Doing some quick testing with JDK 1.1, it looks like it
> doesn't accept "UTF-8",
> > only "UTF8". It doesn't support UTF-16, no matter how it's
> written. (Neither
> > does JDK 1.2, but 1.3 does.) JDK 1.1 supports the standard
> names for the other
> > major encodings just fine.
> >
> > In order to support UTF-8 on JDK 1.1, and to ensure that
> all XML documents are
> > output with valid encoding values, I propose that the
> special-casing logic be
> > moved from printDeclaration() to makeWriter(). I.E. remove
> the the code in
> > printDeclaration() to accept "UTF8", and instead change
> makeWriter() to
> > translate "UTF-8" to "UTF8". And initialize the "encoding"
> member variable to
> > "UTF-8". Then, we can state that the interface to
> XMLOutputter can use only
> > XML-standard encoding names. (With the existing logic, that
> wouldn't be true
> > for JDK 1.1.)
> >
> > What do you think?
>
> Sounds good. Send in a patch? Please update the Javadocs along with
> the code to explain your logic. :-)
>
> -jh-
-------------- next part --------------
Index: XMLOutputter.java
===================================================================
RCS file: /home/cvspublic/jdom/src/java/org/jdom/output/XMLOutputter.java,v
retrieving revision 1.46
diff -u -r1.46 XMLOutputter.java
--- XMLOutputter.java 2001/04/27 18:21:21 1.46
+++ XMLOutputter.java 2001/05/03 05:07:36
@@ -107,7 +107,7 @@
* <code>Element</code>, to either a <code>Writer</code> or an
* <code>OutputStream</code>. Warning: using your own
* <code>Writer</code> may cause the outputter's preferred character
- * encoding to be ignored. If you use encodings other than UTF8, we
+ * encoding to be ignored. If you use encodings other than UTF-8, we
* recommend using the method that takes an OutputStream instead.
* </p>
*
@@ -139,7 +139,7 @@
private boolean suppressDeclaration = false;
/** The encoding format */
- private String encoding = "UTF8";
+ private String encoding = "UTF-8";
/** Whether or not to output the encoding in the XML declaration
* - default is <code>false</code> */
@@ -214,7 +214,8 @@
* of spaces
* @param newlines <code>true</code> indicates new lines should be
* printed, else new lines are ignored (compacted).
- * @param encoding set encoding format.
+ * @param encoding set encoding format. Use XML-style names like
+ * "UTF-8" or "ISO-8859-1" or "US-ASCII".
*/
public XMLOutputter(String indent, boolean newlines, String encoding) {
this.indent = indent;
@@ -277,7 +278,8 @@
}
/**
- * @param encoding encoding format
+ * @param encoding encoding format. Use XML-style names like
+ * "UTF-8" or "ISO-8859-1" or "US-ASCII".
**/
public void setEncoding(String encoding) {
this.encoding = encoding;
@@ -289,6 +291,8 @@
* (<code><?xml version="1.0" encoding="UTF-8"?></code>)
* includes the encoding of the document. It is common to suppress
* this in uses such as WML and other wireless device protocols.
+ * According to the XML spec, the encoding may only be omitted
+ * when using either UTF-8 or UTF-16.
* </p>
*
* @param omitEncoding <code>boolean</code> indicating whether or not
@@ -425,8 +429,14 @@
*/
protected Writer makeWriter(OutputStream out)
throws java.io.UnsupportedEncodingException {
+ // "UTF-8" is not recognized by JDK 1.1, so we'll translate into "UTF8",
+ // which works with all JDKs.
+ String javaCompatibleEncoding = this.encoding;
+ if ("UTF-8".equals(javaCompatibleEncoding))
+ javaCompatibleEncoding = "UTF8";
+
Writer writer = new OutputStreamWriter
- (new BufferedOutputStream(out), this.encoding);
+ (new BufferedOutputStream(out), javaCompatibleEncoding);
return writer;
}
@@ -435,8 +445,14 @@
*/
protected Writer makeWriter(OutputStream out, String encoding)
throws java.io.UnsupportedEncodingException {
+ // "UTF-8" is not recognized by JDK 1.1, so we'll translate into "UTF8",
+ // which works with all JDKs.
+ String javaCompatibleEncoding = this.encoding;
+ if ("UTF-8".equals(javaCompatibleEncoding))
+ javaCompatibleEncoding = "UTF8";
+
Writer writer = new OutputStreamWriter
- (new BufferedOutputStream(out), encoding);
+ (new BufferedOutputStream(out), javaCompatibleEncoding);
return writer;
}
@@ -464,7 +480,7 @@
*
* <p> Warning: using your own Writer may cause the outputter's
* preferred character encoding to be ignored. If you use
- * encodings other than UTF8, we recommend using the method that
+ * encodings other than UTF-8, we recommend using the method that
* takes an OutputStream instead. </p>
*
* @param doc <code>Document</code> to format.
@@ -482,7 +498,8 @@
if (doc.getDocType() != null) {
printDocType(doc.getDocType(), out);
- maybePrintln(out);
+ // Print new line after DOCTYPE always - same reason as above.
+ out.write(lineSeparator);
}
// Print out root element, as well as any root level
@@ -860,20 +877,11 @@
// Only print of declaration is not suppressed
if (!suppressDeclaration) {
// Assume 1.0 version
- if (encoding.equals("UTF8")) {
- out.write("<?xml version=\"1.0\"");
- if (!omitEncoding) {
- out.write(" encoding=\"UTF-8\"");
- }
- out.write("?>");
- }
- else {
- out.write("<?xml version=\"1.0\"");
- if (!omitEncoding) {
- out.write(" encoding=\"" + encoding + "\"");
- }
- out.write("?>");
+ out.write("<?xml version=\"1.0\"");
+ if (!omitEncoding) {
+ out.write(" encoding=\"" + encoding + "\"");
}
+ out.write("?>");
}
}
More information about the jdom-interest
mailing list