[jdom-interest] SAXBuilder: How to handle non UTF-8 characters?
(JDOMParseException)
Matthias Klein
matthias at cmklein.de
Wed Nov 10 20:01:35 PST 2004
I have several files containing the results of an ItemSearchRequest at
Amazon.com.
Most files are 100-200kB XML files, which are, according to the XML
declaration, UTF-8 encoded.
I read the file from Amazons REST interface (target is the url which
responds with the XML file):
Reader reader = new InputStreamReader(url.openStream());
BufferedReader bufferedreader = new BufferedReader(reader);
StringBuffer sb = new StringBuffer();
while (((c = bufferedreader.read()) != -1) && (c != 0)) {
sb.append((char)c);
}
result = sb.toString();
The string "result" will then be written into a RandomAccessFile.
Yet when I try to build a JDOM Document from the file using
Document doc = builder.build(file);
I keep getting a JDOMParseException for some of the files. Reason: The file
apparently contains non UTF-8 characters.
Question: How can I get the SAXBuilder to ignore those characters? Does
anybody know the reason why those characters even appear every once in a
while?
Below is the first part of the exception I mentioned.
Thanks
Matt
org.jdom.input.JDOMParseException: Error on line 1 of document
file:/d:/JavaCode/result.xml: Zeichenumwandlungsfehler: "Malformed UTF-8
char -- is an XML encoding declaration missing?" (Zeilenzahl möglicherweise
zu niedrig)
at org.jdom.input.SAXBuilder.build(SAXBuilder.java:465)
at org.jdom.input.SAXBuilder.build(SAXBuilder.java:810)
at org.jdom.input.SAXBuilder.build(SAXBuilder.java:789)
at AmazonConnector.doItemSearch(AmazonConnector.java:98)
at MidTermProjectMain.main(MidTermProjectMain.java:41)
Caused by: org.xml.sax.SAXParseException: Zeichenumwandlungsfehler:
"Malformed UTF-8 char -- is an XML encoding declaration missing?"
(Zeilenzahl möglicherweise zu
niedrig)
at
org.apache.crimson.parser.InputEntity.fatal(InputEntity.java:1100)
at
org.apache.crimson.parser.InputEntity.fillbuf(InputEntity.java:1072)
at org.apache.crimson.parser.InputEntity.isEOF(InputEntity.java:262)
at
org.apache.crimson.parser.InputEntity.parsedContent(InputEntity.java:472)
at org.apache.crimson.parser.Parser2.content(Parser2.java:1871)
at org.apache.crimson.parser.Parser2.maybeElement(Parser2.java:1552)
at org.apache.crimson.parser.Parser2.content(Parser2.java:1824)
at org.apache.crimson.parser.Parser2.maybeElement(Parser2.java:1552)
at org.apache.crimson.parser.Parser2.content(Parser2.java:1824)
at org.apache.crimson.parser.Parser2.maybeElement(Parser2.java:1552)
at org.apache.crimson.parser.Parser2.content(Parser2.java:1824)
at org.apache.crimson.parser.Parser2.maybeElement(Parser2.java:1552)
at org.apache.crimson.parser.Parser2.content(Parser2.java:1824)
at org.apache.crimson.parser.Parser2.maybeElement(Parser2.java:1552)
at org.apache.crimson.parser.Parser2.content(Parser2.java:1824)
at org.apache.crimson.parser.Parser2.maybeElement(Parser2.java:1552)
at org.apache.crimson.parser.Parser2.parseInternal(Parser2.java:534)
at org.apache.crimson.parser.Parser2.parse(Parser2.java:318)
at
org.apache.crimson.parser.XMLReaderImpl.parse(XMLReaderImpl.java:442)
at org.jdom.input.SAXBuilder.build(SAXBuilder.java:453)
... 4 more
More information about the jdom-interest
mailing list