[jdom-interest] B9-rc1: inputstreams, or readers: Invalid encoding
name "KSC5601"
Jason Hunter
jhunter at acm.org
Wed Apr 16 15:48:17 PDT 2003
It may be that the encoding name isn't known to XML but may be known to
Java. There's a Xerces feature to tell it to respect Java names for
encodings. Try that.
-jh-
> Rolf Lear wrote:
>
> I have been trying to find/fix performance issues in JDom, and was
> playing around with the Verifier.
>
> To test the effect of changes to the Verifier, I first load an XML
> Document in to memory, then parse it using SAXbuilder.build.
>
> To test wierd XML, I found this:
> http://ropas.kaist.ac.kr/viewcvs/viewcvs.cgi/*checkout*/n/nXml/testdata/document/mydoc_raw.xml?rev=HEAD&content-type=text/xml
>
> which is partially Korean.
>
> First, remove the Doctype declaration in the document.
>
> My program does the following (See the code at the end).
>
> It loads the file up as an array of bytes.
> It loads the file up as an array of Char.
>
> It parses each through SAXBuilder.build using an inputstream on the
> bytes, and a reader on the chars.
> InputSource source = new InputSource(new
> ByteArrayInputStream(bytedata));
> and
> InputSource source = new InputSource(new CharArrayReader(chardata));
>
> Now, parsing the Reader passes, and the InputStream fails with:
> Invalid encoding name "KSC5601" (in Xerces).
>
> org.jdom.input.JDOMParseException: Error on line 1: Invalid encoding
> name "KSC5601".
> at org.jdom.input.SAXBuilder.build(SAXBuilder.java:381)
> at MainTest.main(MainTest.java:77)
> Caused by: org.xml.sax.SAXParseException: Invalid encoding name
> "KSC5601".
> at org.apache.xerces.parsers.AbstractSAXParser.parse(Unknown
> Source)
> at org.jdom.input.SAXBuilder.build(SAXBuilder.java:370)
> ... 1 more
> Caused by: org.xml.sax.SAXParseException: Invalid encoding name
> "KSC5601".
> at org.apache.xerces.parsers.AbstractSAXParser.parse(Unknown
> Source)
> at org.jdom.input.SAXBuilder.build(SAXBuilder.java:370)
> at MainTest.main(MainTest.java:77)
> Caused by: org.xml.sax.SAXParseException: Invalid encoding name
> "KSC5601".
> at org.apache.xerces.parsers.AbstractSAXParser.parse(Unknown
> Source)
> at org.jdom.input.SAXBuilder.build(SAXBuilder.java:370)
> at MainTest.main(MainTest.java:77)
>
> Now I am the first to admit that my Unicode,charset knowledge is
> really flakey, so any suggestions as to whether this is a bug in my
> code, JDOM, or Xerces is welcome.
>
> Rolf
>
> ======================================================
> /*package default.*/
> import java.io.ByteArrayInputStream;
> import java.io.CharArrayReader;
> import java.io.File;
> import java.io.FileInputStream;
> import java.io.FileReader;
> import java.io.IOException;
>
> import org.jdom.JDOMException;
> import org.jdom.input.SAXBuilder;
> import org.xml.sax.InputSource;
>
> public class MainTest {
>
> private static byte[] loadedFileBytes(String filename) throws
> IOException {
> File file = new File(filename);
> byte[] buffer = new byte[(int)file.length()];
> FileInputStream fis = new FileInputStream(file);
> int got = 0;
> int size = buffer.length;
> for (got = 0; got < size; ) {
> int read = fis.read(buffer, got, size - got);
> if (read >= 0) {
> got += read;
> } else {
> throw new IOException ("do not expect end of file
> before " + size + " bytes, but got it at " + got + " bytes.");
>
> }
> }
> if (fis.read() != -1) {
> throw new IOException ("Thought we read to end of file,
> but there is still more.....");
> }
> return buffer;
> }
>
> private static char[] loadedFileChars(String filename) throws
> IOException {
> File file = new File(filename);
> FileReader fr = new FileReader(file);
> StringBuffer sb = new StringBuffer();
> int read = 0;
> char[] buffer = new char[1024*4];
> while ((read = fr.read(buffer)) >= 0) {
> sb.append(buffer, 0, read);
> }
> return sb.toString().toCharArray();
> }
>
> public static void main(String[] args) throws
> ClassNotFoundException, IOException {
> long start = System.currentTimeMillis();
> Class.forName("org.jdom.Verifier").getDeclaredMethods();
> long load = System.currentTimeMillis() - start;
> System.out.println("Loaded Verifier Class: " + load + "ms.");
> int iterations = new Integer(args[0]).intValue();
> SAXBuilder builder = new SAXBuilder(false);
> for (int i = 1; i < args.length; i++) {
> start = System.currentTimeMillis();
> byte[] bytedata = loadedFileBytes(args[i]);
> char[] chardata = loadedFileChars(args[i]);
> load = System.currentTimeMillis() - start;
> System.out.println("Loaded Data in File '" + args[i] + "'
> in " + load + "ms. " + (bytedata.length / 1024) + "KB. " +
> (chardata.length / 1024) + " KChars About to SAXBuild");
>
>
> try {
> for (int j = 0; j < iterations; j++) {
> InputSource source = new InputSource(new
> ByteArrayInputStream(bytedata));
> start = System.currentTimeMillis();
> builder.build(source);
> load = System.currentTimeMillis() - start;
> System.out.println("SAXBuilder built document '" +
> args[i] + "' (BYTES) iteration " + j + " in " + load + "ms.");
>
> }
> } catch (JDOMException e) {
> e.printStackTrace();
> } catch (IOException ioe) {
> ioe.printStackTrace();
> }
> try {
> for (int j = 0; j < iterations; j++) {
> InputSource source = new InputSource(new
> CharArrayReader(chardata));
> start = System.currentTimeMillis();
> builder.build(source);
> load = System.currentTimeMillis() - start;
> System.out.println("SAXBuilder built document '" +
> args[i] + "' (CHARS) iteration " + j + " in " + load + "ms.");
>
> }
> } catch (JDOMException e) {
> e.printStackTrace();
> } catch (IOException ioe) {
> ioe.printStackTrace();
> }
> }
> }
> }
> ===================================================================================
More information about the jdom-interest
mailing list