[jdom-interest] B9-rc1: inputstreams, or readers: Invalid encoding name "KSC5601"
Rolf Lear
rlear at algorithmics.com
Wed Apr 16 08:43:59 PDT 2003
I have been trying to find/fix performance issues in JDom, and was playing
around with the Verifier.
To test the effect of changes to the Verifier, I first load an XML Document
in to memory, then parse it using SAXbuilder.build.
To test wierd XML, I found this:
http://ropas.kaist.ac.kr/viewcvs/viewcvs.cgi/*checkout*/n/nXml/testdata/docu
ment/mydoc_raw.xml?rev=HEAD&content-type=text/xml
which is partially Korean.
First, remove the Doctype declaration in the document.
My program does the following (See the code at the end).
It loads the file up as an array of bytes.
It loads the file up as an array of Char.
It parses each through SAXBuilder.build using an inputstream on the bytes,
and a reader on the chars.
InputSource source = new InputSource(new ByteArrayInputStream(bytedata));
and
InputSource source = new InputSource(new CharArrayReader(chardata));
Now, parsing the Reader passes, and the InputStream fails with: Invalid
encoding name "KSC5601" (in Xerces).
org.jdom.input.JDOMParseException: Error on line 1: Invalid encoding name
"KSC5601".
at org.jdom.input.SAXBuilder.build(SAXBuilder.java:381)
at MainTest.main(MainTest.java:77)
Caused by: org.xml.sax.SAXParseException: Invalid encoding name "KSC5601".
at org.apache.xerces.parsers.AbstractSAXParser.parse(Unknown Source)
at org.jdom.input.SAXBuilder.build(SAXBuilder.java:370)
... 1 more
Caused by: org.xml.sax.SAXParseException: Invalid encoding name "KSC5601".
at org.apache.xerces.parsers.AbstractSAXParser.parse(Unknown Source)
at org.jdom.input.SAXBuilder.build(SAXBuilder.java:370)
at MainTest.main(MainTest.java:77)
Caused by: org.xml.sax.SAXParseException: Invalid encoding name "KSC5601".
at org.apache.xerces.parsers.AbstractSAXParser.parse(Unknown Source)
at org.jdom.input.SAXBuilder.build(SAXBuilder.java:370)
at MainTest.main(MainTest.java:77)
Now I am the first to admit that my Unicode,charset knowledge is really
flakey, so any suggestions as to whether this is a bug in my code, JDOM, or
Xerces is welcome.
Rolf
======================================================
/*package default.*/
import java.io.ByteArrayInputStream;
import java.io.CharArrayReader;
import java.io.File;
import java.io.FileInputStream;
import java.io.FileReader;
import java.io.IOException;
import org.jdom.JDOMException;
import org.jdom.input.SAXBuilder;
import org.xml.sax.InputSource;
public class MainTest {
private static byte[] loadedFileBytes(String filename) throws
IOException {
File file = new File(filename);
byte[] buffer = new byte[(int)file.length()];
FileInputStream fis = new FileInputStream(file);
int got = 0;
int size = buffer.length;
for (got = 0; got < size; ) {
int read = fis.read(buffer, got, size - got);
if (read >= 0) {
got += read;
} else {
throw new IOException ("do not expect end of file before " +
size + " bytes, but got it at " + got + " bytes.");
}
}
if (fis.read() != -1) {
throw new IOException ("Thought we read to end of file, but
there is still more.....");
}
return buffer;
}
private static char[] loadedFileChars(String filename) throws
IOException {
File file = new File(filename);
FileReader fr = new FileReader(file);
StringBuffer sb = new StringBuffer();
int read = 0;
char[] buffer = new char[1024*4];
while ((read = fr.read(buffer)) >= 0) {
sb.append(buffer, 0, read);
}
return sb.toString().toCharArray();
}
public static void main(String[] args) throws ClassNotFoundException,
IOException {
long start = System.currentTimeMillis();
Class.forName("org.jdom.Verifier").getDeclaredMethods();
long load = System.currentTimeMillis() - start;
System.out.println("Loaded Verifier Class: " + load + "ms.");
int iterations = new Integer(args[0]).intValue();
SAXBuilder builder = new SAXBuilder(false);
for (int i = 1; i < args.length; i++) {
start = System.currentTimeMillis();
byte[] bytedata = loadedFileBytes(args[i]);
char[] chardata = loadedFileChars(args[i]);
load = System.currentTimeMillis() - start;
System.out.println("Loaded Data in File '" + args[i] + "' in " +
load + "ms. " + (bytedata.length / 1024) + "KB. " + (chardata.length / 1024)
+ " KChars About to SAXBuild");
try {
for (int j = 0; j < iterations; j++) {
InputSource source = new InputSource(new
ByteArrayInputStream(bytedata));
start = System.currentTimeMillis();
builder.build(source);
load = System.currentTimeMillis() - start;
System.out.println("SAXBuilder built document '" +
args[i] + "' (BYTES) iteration " + j + " in " + load + "ms.");
}
} catch (JDOMException e) {
e.printStackTrace();
} catch (IOException ioe) {
ioe.printStackTrace();
}
try {
for (int j = 0; j < iterations; j++) {
InputSource source = new InputSource(new
CharArrayReader(chardata));
start = System.currentTimeMillis();
builder.build(source);
load = System.currentTimeMillis() - start;
System.out.println("SAXBuilder built document '" +
args[i] + "' (CHARS) iteration " + j + " in " + load + "ms.");
}
} catch (JDOMException e) {
e.printStackTrace();
} catch (IOException ioe) {
ioe.printStackTrace();
}
}
}
}
============================================================================
=======
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://jdom.org/pipermail/jdom-interest/attachments/20030416/60f22ac3/attachment.htm
More information about the jdom-interest
mailing list