[jdom-interest] B9-rc1: inputstreams, or readers: Invalidenco
ding name "KSC5601"
Rolf Lear
rlear at algorithmics.com
Mon Apr 21 06:40:23 PDT 2003
It is amazing how complicated things become when you say exactly the
opposite of what you mean.
It seems that I have confused things horribly.
Using a file INPUTSTREAM it FAILS,
Using a file READER, it PASSES.
Rolf
-----Original Message-----
From: Jason Hunter [mailto:jhunter at acm.org]
Sent: Friday, April 18, 2003 2:09 PM
To: Alex Rosen
Cc: rlear at algorithmics.com; jdom-interest at jdom.org
Subject: Re: [jdom-interest] B9-rc1: inputstreams, or readers:
Invalidencoding name "KSC5601"
/**
* <p>
* This builds a document from the supplied
* Reader. It's the programmer's responsibility to make sure
* the reader matches the encoding of the file. It's always safer
* to use an InputStream rather than a Reader, if it's available.
* </p>
*
* @param characterStream <code>Reader</code> to read from.
* @return <code>Document</code> - resultant Document object.
* @throws JDOMException when errors occur in parsing.
* @throws IOException when an I/O error prevents a document
* from being fully parsed.
*/
public Document build(Reader characterStream)
throws JDOMException, IOException {
return build(new InputSource(characterStream));
}
-jh-
Alex Rosen wrote:
>
> When you use an InputStream, the parser can read the encoding name from
> the XML file and set up its own Reader with the right encoding.
>
> When you use a Reader, it's your responsibility to set it up. Which
> would mean in this case that you'd need to read the encoding name out of
> the file yourself, instead of letting the parser do it for you. So, if
> at all possible, you should use an InputStream not a Reader. (I could've
> sworn that the JavaDoc mentioned this but I don't see it.)
>
> Alex
>
> >>> Rolf Lear <rlear at algorithmics.com> 4/17/2003 9:32:33 AM >>>
> My point is that the data passes XML SAXBuilder IF it is processed as
> an
> Input Stream, but fails as a Reader.
>
> The encoding is processed "just fine" when the data is processed as a
> Reader
> InputSource, but fails as an InputStream.
>
> As I say, I am unsure of where this is a bug, or even IF this is a bug,
> but
> it certainly is suspicious.
>
> Attached is the Zipped XMLDocument which fails "well-formedness" as a
> ByteStream, but passes as a Reader.
>
> Here is my test code:
>
> ==============================
> import java.io.FileInputStream;
> import java.io.FileReader;
>
> import org.jdom.input.SAXBuilder;
>
> public class MainParse {
>
> public static void main(String[] args) {
> try {
> new SAXBuilder().build(new FileInputStream(args[0]));
> System.out.println("PASSED: Processed file as an input
> stream.");
> } catch (Exception e) {
> System.out.println("FAILED: Processed file as an input
> stream.");
> e.printStackTrace();
> }
> try {
> new SAXBuilder().build(new FileReader(args[0]));
> System.out.println("PASSED: Processed file as a Reader.");
> } catch (Exception e) {
> System.out.println("FAILED: Processed file as a Reader.");
> e.printStackTrace();
> }
> }
> }
> ==================================
>
> and this is my output from the command:
> java -cp .:/lib/jaxen-jdom.jar:./lib/jdom.jar:./lib/xerces.jar
> MainParse
> mydoc_raw.xml
>
> FAILED: Processed file as an input stream.
> org.jdom.input.JDOMParseException: Error on line 1: Invalid encoding
> name
> "KSC5601".
> at org.jdom.input.SAXBuilder.build(SAXBuilder.java:381)
> at org.jdom.input.SAXBuilder.build(SAXBuilder.java:684)
> at MainParse.main(MainParse.java:23)
> Caused by: org.xml.sax.SAXParseException: Invalid encoding name
> "KSC5601".
> at org.apache.xerces.parsers.AbstractSAXParser.parse(Unknown
> Source)
> at org.jdom.input.SAXBuilder.build(SAXBuilder.java:370)
> ... 2 more
> Caused by: org.xml.sax.SAXParseException: Invalid encoding name
> "KSC5601".
> at org.apache.xerces.parsers.AbstractSAXParser.parse(Unknown
> Source)
> at org.jdom.input.SAXBuilder.build(SAXBuilder.java:370)
> at org.jdom.input.SAXBuilder.build(SAXBuilder.java:684)
> at MainParse.main(MainParse.java:23)
> Caused by: org.xml.sax.SAXParseException: Invalid encoding name
> "KSC5601".
> at org.apache.xerces.parsers.AbstractSAXParser.parse(Unknown
> Source)
> at org.jdom.input.SAXBuilder.build(SAXBuilder.java:370)
> at org.jdom.input.SAXBuilder.build(SAXBuilder.java:684)
> at MainParse.main(MainParse.java:23)
> PASSED: Processed file as an input stream.
>
> Rolf
>
> -----Original Message-----
> From: Jason Hunter [mailto:jhunter at acm.org]
> Sent: Wednesday, April 16, 2003 6:48 PM
> To: Rolf Lear
> Cc: Jdom-Interest (E-mail)
> Subject: Re: [jdom-interest] B9-rc1: inputstreams, or readers: Invalid
> encoding name "KSC5601"
>
> It may be that the encoding name isn't known to XML but may be known
> to
> Java. There's a Xerces feature to tell it to respect Java names for
> encodings. Try that.
>
> -jh-
>
> > Rolf Lear wrote:
> >
> > I have been trying to find/fix performance issues in JDom, and was
> > playing around with the Verifier.
> >
> > To test the effect of changes to the Verifier, I first load an XML
> > Document in to memory, then parse it using SAXbuilder.build.
> >
> > To test wierd XML, I found this:
> >
>
http://ropas.kaist.ac.kr/viewcvs/viewcvs.cgi/*checkout*/n/nXml/testdata/docu
>
> ment/mydoc_raw.xml?rev=HEAD&content-type=text/xml
> >
> > which is partially Korean.
> >
> > First, remove the Doctype declaration in the document.
> >
> > My program does the following (See the code at the end).
> >
> > It loads the file up as an array of bytes.
> > It loads the file up as an array of Char.
> >
> > It parses each through SAXBuilder.build using an inputstream on the
> > bytes, and a reader on the chars.
> > InputSource source = new InputSource(new
> > ByteArrayInputStream(bytedata));
> > and
> > InputSource source = new InputSource(new CharArrayReader(chardata));
> >
> > Now, parsing the Reader passes, and the InputStream fails with:
> > Invalid encoding name "KSC5601" (in Xerces).
> >
> > org.jdom.input.JDOMParseException: Error on line 1: Invalid encoding
> > name "KSC5601".
> > at org.jdom.input.SAXBuilder.build(SAXBuilder.java:381)
> > at MainTest.main(MainTest.java:77)
> > Caused by: org.xml.sax.SAXParseException: Invalid encoding name
> > "KSC5601".
> > at org.apache.xerces.parsers.AbstractSAXParser.parse(Unknown
> > Source)
> > at org.jdom.input.SAXBuilder.build(SAXBuilder.java:370)
> > ... 1 more
> > Caused by: org.xml.sax.SAXParseException: Invalid encoding name
> > "KSC5601".
> > at org.apache.xerces.parsers.AbstractSAXParser.parse(Unknown
> > Source)
> > at org.jdom.input.SAXBuilder.build(SAXBuilder.java:370)
> > at MainTest.main(MainTest.java:77)
> > Caused by: org.xml.sax.SAXParseException: Invalid encoding name
> > "KSC5601".
> > at org.apache.xerces.parsers.AbstractSAXParser.parse(Unknown
> > Source)
> > at org.jdom.input.SAXBuilder.build(SAXBuilder.java:370)
> > at MainTest.main(MainTest.java:77)
> >
> > Now I am the first to admit that my Unicode,charset knowledge is
> > really flakey, so any suggestions as to whether this is a bug in my
> > code, JDOM, or Xerces is welcome.
> >
> > Rolf
> >
> > ======================================================
> > /*package default.*/
> > import java.io.ByteArrayInputStream;
> > import java.io.CharArrayReader;
> > import java.io.File;
> > import java.io.FileInputStream;
> > import java.io.FileReader;
> > import java.io.IOException;
> >
> > import org.jdom.JDOMException;
> > import org.jdom.input.SAXBuilder;
> > import org.xml.sax.InputSource;
> >
> > public class MainTest {
> >
> > private static byte[] loadedFileBytes(String filename) throws
> > IOException {
> > File file = new File(filename);
> > byte[] buffer = new byte[(int)file.length()];
> > FileInputStream fis = new FileInputStream(file);
> > int got = 0;
> > int size = buffer.length;
> > for (got = 0; got < size; ) {
> > int read = fis.read(buffer, got, size - got);
> > if (read >= 0) {
> > got += read;
> > } else {
> > throw new IOException ("do not expect end of file
> > before " + size + " bytes, but got it at " + got + " bytes.");
> >
> > }
> > }
> > if (fis.read() != -1) {
> > throw new IOException ("Thought we read to end of file,
> > but there is still more.....");
> > }
> > return buffer;
> > }
> >
> > private static char[] loadedFileChars(String filename) throws
> > IOException {
> > File file = new File(filename);
> > FileReader fr = new FileReader(file);
> > StringBuffer sb = new StringBuffer();
> > int read = 0;
> > char[] buffer = new char[1024*4];
> > while ((read = fr.read(buffer)) >= 0) {
> > sb.append(buffer, 0, read);
> > }
> > return sb.toString().toCharArray();
> > }
> >
> > public static void main(String[] args) throws
> > ClassNotFoundException, IOException {
> > long start = System.currentTimeMillis();
> > Class.forName("org.jdom.Verifier").getDeclaredMethods();
> > long load = System.currentTimeMillis() - start;
> > System.out.println("Loaded Verifier Class: " + load +
> "ms.");
> > int iterations = new Integer(args[0]).intValue();
> > SAXBuilder builder = new SAXBuilder(false);
> > for (int i = 1; i < args.length; i++) {
> > start = System.currentTimeMillis();
> > byte[] bytedata = loadedFileBytes(args[i]);
> > char[] chardata = loadedFileChars(args[i]);
> > load = System.currentTimeMillis() - start;
> > System.out.println("Loaded Data in File '" + args[i] +
> "'
> > in " + load + "ms. " + (bytedata.length / 1024) + "KB. " +
> > (chardata.length / 1024) + " KChars About to SAXBuild");
> >
> >
> > try {
> > for (int j = 0; j < iterations; j++) {
> > InputSource source = new InputSource(new
> > ByteArrayInputStream(bytedata));
> > start = System.currentTimeMillis();
> > builder.build(source);
> > load = System.currentTimeMillis() - start;
> > System.out.println("SAXBuilder built document '"
> +
> > args[i] + "' (BYTES) iteration " + j + " in " + load + "ms.");
> >
> > }
> > } catch (JDOMException e) {
> > e.printStackTrace();
> > } catch (IOException ioe) {
> > ioe.printStackTrace();
> > }
> > try {
> > for (int j = 0; j < iterations; j++) {
> > InputSource source = new InputSource(new
> > CharArrayReader(chardata));
> > start = System.currentTimeMillis();
> > builder.build(source);
> > load = System.currentTimeMillis() - start;
> > System.out.println("SAXBuilder built document '"
> +
> > args[i] + "' (CHARS) iteration " + j + " in " + load + "ms.");
> >
> > }
> > } catch (JDOMException e) {
> > e.printStackTrace();
> > } catch (IOException ioe) {
> > ioe.printStackTrace();
> > }
> > }
> > }
> > }
> >
>
============================================================================
> =======
>
> _______________________________________________
> To control your jdom-interest membership:
>
http://lists.denveronline.net/mailman/options/jdom-interest/youraddr@yourhos
t.com
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://jdom.org/pipermail/jdom-interest/attachments/20030421/e83f96a1/attachment.htm
More information about the jdom-interest
mailing list