[jdom-interest] B9-rc1: inputstreams, or readers: Invalid encoding name "KSC5601"

Jason Hunter jhunter at acm.org
Wed Apr 16 15:48:17 PDT 2003


It may be that the encoding name isn't known to XML but may be known to
Java.  There's a Xerces feature to tell it to respect Java names for
encodings.  Try that.

-jh-

> Rolf Lear wrote:
> 
> I have been trying to find/fix performance issues in JDom, and was
> playing around with the Verifier.
> 
> To test the effect of changes to the Verifier, I first load an XML
> Document in to memory, then parse it using SAXbuilder.build.
> 
> To test wierd XML, I found this:
> http://ropas.kaist.ac.kr/viewcvs/viewcvs.cgi/*checkout*/n/nXml/testdata/document/mydoc_raw.xml?rev=HEAD&content-type=text/xml
> 
> which is partially Korean.
> 
> First, remove the Doctype declaration in the document.
> 
> My program does the following (See the code at the end).
> 
> It loads the file up as an array of bytes.
> It loads the file up as an array of Char.
> 
> It parses each through SAXBuilder.build using an inputstream on the
> bytes, and a reader on the chars.
> InputSource source = new InputSource(new
> ByteArrayInputStream(bytedata));
> and
> InputSource source = new InputSource(new CharArrayReader(chardata));
> 
> Now, parsing the Reader passes, and the InputStream fails with:
> Invalid encoding name "KSC5601" (in Xerces).
> 
> org.jdom.input.JDOMParseException: Error on line 1: Invalid encoding
> name "KSC5601".
>         at org.jdom.input.SAXBuilder.build(SAXBuilder.java:381)
>         at MainTest.main(MainTest.java:77)
> Caused by: org.xml.sax.SAXParseException: Invalid encoding name
> "KSC5601".
>         at org.apache.xerces.parsers.AbstractSAXParser.parse(Unknown
> Source)
>         at org.jdom.input.SAXBuilder.build(SAXBuilder.java:370)
>         ... 1 more
> Caused by: org.xml.sax.SAXParseException: Invalid encoding name
> "KSC5601".
>         at org.apache.xerces.parsers.AbstractSAXParser.parse(Unknown
> Source)
>         at org.jdom.input.SAXBuilder.build(SAXBuilder.java:370)
>         at MainTest.main(MainTest.java:77)
> Caused by: org.xml.sax.SAXParseException: Invalid encoding name
> "KSC5601".
>         at org.apache.xerces.parsers.AbstractSAXParser.parse(Unknown
> Source)
>         at org.jdom.input.SAXBuilder.build(SAXBuilder.java:370)
>         at MainTest.main(MainTest.java:77)
> 
> Now I am the first to admit that my Unicode,charset knowledge is
> really flakey, so any suggestions as to whether this is a bug in my
> code, JDOM, or Xerces is welcome.
> 
> Rolf
> 
> ======================================================
> /*package default.*/
> import java.io.ByteArrayInputStream;
> import java.io.CharArrayReader;
> import java.io.File;
> import java.io.FileInputStream;
> import java.io.FileReader;
> import java.io.IOException;
> 
> import org.jdom.JDOMException;
> import org.jdom.input.SAXBuilder;
> import org.xml.sax.InputSource;
> 
> public class MainTest {
> 
>     private static byte[] loadedFileBytes(String filename) throws
> IOException {
>         File file = new File(filename);
>         byte[] buffer = new byte[(int)file.length()];
>         FileInputStream fis = new FileInputStream(file);
>         int got = 0;
>         int size = buffer.length;
>         for (got = 0; got < size; ) {
>             int read = fis.read(buffer, got, size - got);
>             if (read >= 0) {
>                 got += read;
>             } else {
>                 throw new IOException ("do not expect end of file
> before " + size + " bytes, but got it at " + got + " bytes.");
> 
>             }
>         }
>         if (fis.read() != -1) {
>             throw new IOException ("Thought we read to end of file,
> but there is still more.....");
>         }
>         return buffer;
>     }
> 
>     private static char[] loadedFileChars(String filename) throws
> IOException {
>         File file = new File(filename);
>         FileReader fr = new FileReader(file);
>         StringBuffer sb = new StringBuffer();
>         int read = 0;
>         char[] buffer = new char[1024*4];
>         while ((read = fr.read(buffer)) >= 0) {
>             sb.append(buffer, 0, read);
>         }
>         return sb.toString().toCharArray();
>     }
> 
>     public static void main(String[] args) throws
> ClassNotFoundException, IOException {
>         long start = System.currentTimeMillis();
>         Class.forName("org.jdom.Verifier").getDeclaredMethods();
>         long load = System.currentTimeMillis() - start;
>         System.out.println("Loaded Verifier Class: " + load + "ms.");
>         int iterations = new Integer(args[0]).intValue();
>         SAXBuilder builder = new SAXBuilder(false);
>         for (int i = 1; i < args.length; i++) {
>             start = System.currentTimeMillis();
>             byte[] bytedata = loadedFileBytes(args[i]);
>             char[] chardata = loadedFileChars(args[i]);
>             load = System.currentTimeMillis() - start;
>             System.out.println("Loaded Data in File '" + args[i] + "'
> in " + load + "ms. " + (bytedata.length / 1024) + "KB. " +
> (chardata.length / 1024) + " KChars About to SAXBuild");
> 
> 
>             try {
>                 for (int j = 0; j < iterations; j++) {
>                     InputSource source = new InputSource(new
> ByteArrayInputStream(bytedata));
>                     start = System.currentTimeMillis();
>                     builder.build(source);
>                     load = System.currentTimeMillis() - start;
>                     System.out.println("SAXBuilder built document '" +
> args[i] + "' (BYTES) iteration " + j + " in " + load + "ms.");
> 
>                 }
>             } catch (JDOMException e) {
>                 e.printStackTrace();
>             } catch (IOException ioe) {
>                 ioe.printStackTrace();
>             }
>             try {
>                 for (int j = 0; j < iterations; j++) {
>                     InputSource source = new InputSource(new
> CharArrayReader(chardata));
>                     start = System.currentTimeMillis();
>                     builder.build(source);
>                     load = System.currentTimeMillis() - start;
>                     System.out.println("SAXBuilder built document '" +
> args[i] + "' (CHARS) iteration " + j + " in " + load + "ms.");
> 
>                 }
>             } catch (JDOMException e) {
>                 e.printStackTrace();
>             } catch (IOException ioe) {
>                 ioe.printStackTrace();
>             }
>         }
>     }
> }
> ===================================================================================



More information about the jdom-interest mailing list