[jdom-interest] B9-rc1: inputstreams, or readers: Invalidencoding name "KSC5601"

Jason Hunter jhunter at acm.org
Fri Apr 18 11:09:29 PDT 2003


    /**
     * <p>
     * This builds a document from the supplied
     *   Reader.  It's the programmer's responsibility to make sure
     *   the reader matches the encoding of the file.  It's always safer
     *   to use an InputStream rather than a Reader, if it's available.
     * </p>
     *
     * @param characterStream <code>Reader</code> to read from.
     * @return <code>Document</code> - resultant Document object.
     * @throws JDOMException when errors occur in parsing.
     * @throws IOException when an I/O error prevents a document
     *         from being fully parsed.
     */
    public Document build(Reader characterStream)
        throws JDOMException, IOException {
        return build(new InputSource(characterStream));
    }

-jh-

Alex Rosen wrote:
> 
> When you use an InputStream, the parser can read the encoding name from
> the XML file and set up its own Reader with the right encoding.
> 
> When you use a Reader, it's your responsibility to set it up. Which
> would mean in this case that you'd need to read the encoding name out of
> the file yourself, instead of letting the parser do it for you. So, if
> at all possible, you should use an InputStream not a Reader. (I could've
> sworn that the JavaDoc mentioned this but I don't see it.)
> 
> Alex
> 
> >>> Rolf Lear <rlear at algorithmics.com> 4/17/2003 9:32:33 AM >>>
> My point is that the data passes XML SAXBuilder IF it is processed as
> an
> Input Stream, but fails as a Reader.
> 
> The encoding is processed "just fine" when the data is processed as a
> Reader
> InputSource, but fails as an InputStream.
> 
> As I say, I am unsure of where this is a bug, or even IF this is a bug,
> but
> it certainly is suspicious.
> 
> Attached is the Zipped XMLDocument which fails "well-formedness" as a
> ByteStream, but passes as a Reader.
> 
> Here is my test code:
> 
> ==============================
> import java.io.FileInputStream;
> import java.io.FileReader;
> 
> import org.jdom.input.SAXBuilder;
> 
> public class MainParse {
> 
>     public static void main(String[] args) {
>         try {
>             new SAXBuilder().build(new FileInputStream(args[0]));
>             System.out.println("PASSED: Processed file as an input
> stream.");
>         } catch (Exception e) {
>             System.out.println("FAILED: Processed file as an input
> stream.");
>             e.printStackTrace();
>         }
>         try {
>             new SAXBuilder().build(new FileReader(args[0]));
>             System.out.println("PASSED: Processed file as a Reader.");
>         } catch (Exception e) {
>             System.out.println("FAILED: Processed file as a Reader.");
>             e.printStackTrace();
>         }
>     }
> }
> ==================================
> 
> and this is my output from the command:
> java -cp .:/lib/jaxen-jdom.jar:./lib/jdom.jar:./lib/xerces.jar
> MainParse
> mydoc_raw.xml
> 
> FAILED: Processed file as an input stream.
> org.jdom.input.JDOMParseException: Error on line 1: Invalid encoding
> name
> "KSC5601".
>         at org.jdom.input.SAXBuilder.build(SAXBuilder.java:381)
>         at org.jdom.input.SAXBuilder.build(SAXBuilder.java:684)
>         at MainParse.main(MainParse.java:23)
> Caused by: org.xml.sax.SAXParseException: Invalid encoding name
> "KSC5601".
>         at org.apache.xerces.parsers.AbstractSAXParser.parse(Unknown
> Source)
>         at org.jdom.input.SAXBuilder.build(SAXBuilder.java:370)
>         ... 2 more
> Caused by: org.xml.sax.SAXParseException: Invalid encoding name
> "KSC5601".
>         at org.apache.xerces.parsers.AbstractSAXParser.parse(Unknown
> Source)
>         at org.jdom.input.SAXBuilder.build(SAXBuilder.java:370)
>         at org.jdom.input.SAXBuilder.build(SAXBuilder.java:684)
>         at MainParse.main(MainParse.java:23)
> Caused by: org.xml.sax.SAXParseException: Invalid encoding name
> "KSC5601".
>         at org.apache.xerces.parsers.AbstractSAXParser.parse(Unknown
> Source)
>         at org.jdom.input.SAXBuilder.build(SAXBuilder.java:370)
>         at org.jdom.input.SAXBuilder.build(SAXBuilder.java:684)
>         at MainParse.main(MainParse.java:23)
> PASSED: Processed file as an input stream.
> 
> Rolf
> 
> -----Original Message-----
> From: Jason Hunter [mailto:jhunter at acm.org]
> Sent: Wednesday, April 16, 2003 6:48 PM
> To: Rolf Lear
> Cc: Jdom-Interest (E-mail)
> Subject: Re: [jdom-interest] B9-rc1: inputstreams, or readers: Invalid
> encoding name "KSC5601"
> 
> It may be that the encoding name isn't known to XML but may be known
> to
> Java.  There's a Xerces feature to tell it to respect Java names for
> encodings.  Try that.
> 
> -jh-
> 
> > Rolf Lear wrote:
> >
> > I have been trying to find/fix performance issues in JDom, and was
> > playing around with the Verifier.
> >
> > To test the effect of changes to the Verifier, I first load an XML
> > Document in to memory, then parse it using SAXbuilder.build.
> >
> > To test wierd XML, I found this:
> >
> http://ropas.kaist.ac.kr/viewcvs/viewcvs.cgi/*checkout*/n/nXml/testdata/docu
> 
> ment/mydoc_raw.xml?rev=HEAD&content-type=text/xml
> >
> > which is partially Korean.
> >
> > First, remove the Doctype declaration in the document.
> >
> > My program does the following (See the code at the end).
> >
> > It loads the file up as an array of bytes.
> > It loads the file up as an array of Char.
> >
> > It parses each through SAXBuilder.build using an inputstream on the
> > bytes, and a reader on the chars.
> > InputSource source = new InputSource(new
> > ByteArrayInputStream(bytedata));
> > and
> > InputSource source = new InputSource(new CharArrayReader(chardata));
> >
> > Now, parsing the Reader passes, and the InputStream fails with:
> > Invalid encoding name "KSC5601" (in Xerces).
> >
> > org.jdom.input.JDOMParseException: Error on line 1: Invalid encoding
> > name "KSC5601".
> >         at org.jdom.input.SAXBuilder.build(SAXBuilder.java:381)
> >         at MainTest.main(MainTest.java:77)
> > Caused by: org.xml.sax.SAXParseException: Invalid encoding name
> > "KSC5601".
> >         at org.apache.xerces.parsers.AbstractSAXParser.parse(Unknown
> > Source)
> >         at org.jdom.input.SAXBuilder.build(SAXBuilder.java:370)
> >         ... 1 more
> > Caused by: org.xml.sax.SAXParseException: Invalid encoding name
> > "KSC5601".
> >         at org.apache.xerces.parsers.AbstractSAXParser.parse(Unknown
> > Source)
> >         at org.jdom.input.SAXBuilder.build(SAXBuilder.java:370)
> >         at MainTest.main(MainTest.java:77)
> > Caused by: org.xml.sax.SAXParseException: Invalid encoding name
> > "KSC5601".
> >         at org.apache.xerces.parsers.AbstractSAXParser.parse(Unknown
> > Source)
> >         at org.jdom.input.SAXBuilder.build(SAXBuilder.java:370)
> >         at MainTest.main(MainTest.java:77)
> >
> > Now I am the first to admit that my Unicode,charset knowledge is
> > really flakey, so any suggestions as to whether this is a bug in my
> > code, JDOM, or Xerces is welcome.
> >
> > Rolf
> >
> > ======================================================
> > /*package default.*/
> > import java.io.ByteArrayInputStream;
> > import java.io.CharArrayReader;
> > import java.io.File;
> > import java.io.FileInputStream;
> > import java.io.FileReader;
> > import java.io.IOException;
> >
> > import org.jdom.JDOMException;
> > import org.jdom.input.SAXBuilder;
> > import org.xml.sax.InputSource;
> >
> > public class MainTest {
> >
> >     private static byte[] loadedFileBytes(String filename) throws
> > IOException {
> >         File file = new File(filename);
> >         byte[] buffer = new byte[(int)file.length()];
> >         FileInputStream fis = new FileInputStream(file);
> >         int got = 0;
> >         int size = buffer.length;
> >         for (got = 0; got < size; ) {
> >             int read = fis.read(buffer, got, size - got);
> >             if (read >= 0) {
> >                 got += read;
> >             } else {
> >                 throw new IOException ("do not expect end of file
> > before " + size + " bytes, but got it at " + got + " bytes.");
> >
> >             }
> >         }
> >         if (fis.read() != -1) {
> >             throw new IOException ("Thought we read to end of file,
> > but there is still more.....");
> >         }
> >         return buffer;
> >     }
> >
> >     private static char[] loadedFileChars(String filename) throws
> > IOException {
> >         File file = new File(filename);
> >         FileReader fr = new FileReader(file);
> >         StringBuffer sb = new StringBuffer();
> >         int read = 0;
> >         char[] buffer = new char[1024*4];
> >         while ((read = fr.read(buffer)) >= 0) {
> >             sb.append(buffer, 0, read);
> >         }
> >         return sb.toString().toCharArray();
> >     }
> >
> >     public static void main(String[] args) throws
> > ClassNotFoundException, IOException {
> >         long start = System.currentTimeMillis();
> >         Class.forName("org.jdom.Verifier").getDeclaredMethods();
> >         long load = System.currentTimeMillis() - start;
> >         System.out.println("Loaded Verifier Class: " + load +
> "ms.");
> >         int iterations = new Integer(args[0]).intValue();
> >         SAXBuilder builder = new SAXBuilder(false);
> >         for (int i = 1; i < args.length; i++) {
> >             start = System.currentTimeMillis();
> >             byte[] bytedata = loadedFileBytes(args[i]);
> >             char[] chardata = loadedFileChars(args[i]);
> >             load = System.currentTimeMillis() - start;
> >             System.out.println("Loaded Data in File '" + args[i] +
> "'
> > in " + load + "ms. " + (bytedata.length / 1024) + "KB. " +
> > (chardata.length / 1024) + " KChars About to SAXBuild");
> >
> >
> >             try {
> >                 for (int j = 0; j < iterations; j++) {
> >                     InputSource source = new InputSource(new
> > ByteArrayInputStream(bytedata));
> >                     start = System.currentTimeMillis();
> >                     builder.build(source);
> >                     load = System.currentTimeMillis() - start;
> >                     System.out.println("SAXBuilder built document '"
> +
> > args[i] + "' (BYTES) iteration " + j + " in " + load + "ms.");
> >
> >                 }
> >             } catch (JDOMException e) {
> >                 e.printStackTrace();
> >             } catch (IOException ioe) {
> >                 ioe.printStackTrace();
> >             }
> >             try {
> >                 for (int j = 0; j < iterations; j++) {
> >                     InputSource source = new InputSource(new
> > CharArrayReader(chardata));
> >                     start = System.currentTimeMillis();
> >                     builder.build(source);
> >                     load = System.currentTimeMillis() - start;
> >                     System.out.println("SAXBuilder built document '"
> +
> > args[i] + "' (CHARS) iteration " + j + " in " + load + "ms.");
> >
> >                 }
> >             } catch (JDOMException e) {
> >                 e.printStackTrace();
> >             } catch (IOException ioe) {
> >                 ioe.printStackTrace();
> >             }
> >         }
> >     }
> > }
> >
> ============================================================================
> =======
> 
> _______________________________________________
> To control your jdom-interest membership:
> http://lists.denveronline.net/mailman/options/jdom-interest/youraddr@yourhost.com



More information about the jdom-interest mailing list