[jdom-interest] SAXBuilder: How to handle non UTF-8 characters? (JDOMParseException)

Jason Hunter jhunter at xquery.com
Thu Nov 11 00:05:44 PST 2004


You're reading from the url.openStream() using an InputStreamReader 
without passing the ISR the charset to use, so it's using the default 
charset on your system.  That's corrupting the data.  That alone isn't 
enough to give the error, but when outputting you're probably doing 
something similar.

-jh-

Matthias Klein wrote:

> I have several files containing the results of an ItemSearchRequest at
> Amazon.com. 
> Most files are 100-200kB XML files, which are, according to the XML
> declaration, UTF-8 encoded.
> 
> I read the file from Amazons REST interface (target is the url which
> responds with the XML file):
> 
>     Reader reader = new InputStreamReader(url.openStream());
>     BufferedReader bufferedreader = new BufferedReader(reader);
>     StringBuffer sb = new StringBuffer();    
>     while (((c = bufferedreader.read()) != -1) && (c != 0)) {
> 
>                 sb.append((char)c); 
>             }
>     result = sb.toString();
> 
> The string "result" will then be written into a RandomAccessFile.
> 
> Yet when I try to build a JDOM Document from the file using
> 
>    Document doc = builder.build(file);
> 
> I keep getting a JDOMParseException for some of the files. Reason: The file
> apparently contains non UTF-8 characters.
> 
> Question: How can I get the SAXBuilder to ignore those characters? Does
> anybody know the reason why those characters even appear every once in a
> while?
> 
> Below is the first part of the exception I mentioned.
> 
> Thanks
> 
> Matt
> 
> 
> org.jdom.input.JDOMParseException: Error on line 1 of document
> file:/d:/JavaCode/result.xml: Zeichenumwandlungsfehler: "Malformed UTF-8
> char -- is an XML encoding declaration missing?" (Zeilenzahl möglicherweise
> zu niedrig)
>         at org.jdom.input.SAXBuilder.build(SAXBuilder.java:465)
>         at org.jdom.input.SAXBuilder.build(SAXBuilder.java:810)
>         at org.jdom.input.SAXBuilder.build(SAXBuilder.java:789)
>         at AmazonConnector.doItemSearch(AmazonConnector.java:98)
>         at MidTermProjectMain.main(MidTermProjectMain.java:41)
> Caused by: org.xml.sax.SAXParseException: Zeichenumwandlungsfehler:
> "Malformed UTF-8 char -- is an XML encoding declaration missing?"
> (Zeilenzahl möglicherweise zu
> niedrig)
>         at
> org.apache.crimson.parser.InputEntity.fatal(InputEntity.java:1100)
>         at
> org.apache.crimson.parser.InputEntity.fillbuf(InputEntity.java:1072)
>         at org.apache.crimson.parser.InputEntity.isEOF(InputEntity.java:262)
>         at
> org.apache.crimson.parser.InputEntity.parsedContent(InputEntity.java:472)
>         at org.apache.crimson.parser.Parser2.content(Parser2.java:1871)
>         at org.apache.crimson.parser.Parser2.maybeElement(Parser2.java:1552)
>         at org.apache.crimson.parser.Parser2.content(Parser2.java:1824)
>         at org.apache.crimson.parser.Parser2.maybeElement(Parser2.java:1552)
>         at org.apache.crimson.parser.Parser2.content(Parser2.java:1824)
>         at org.apache.crimson.parser.Parser2.maybeElement(Parser2.java:1552)
>         at org.apache.crimson.parser.Parser2.content(Parser2.java:1824)
>         at org.apache.crimson.parser.Parser2.maybeElement(Parser2.java:1552)
>         at org.apache.crimson.parser.Parser2.content(Parser2.java:1824)
>         at org.apache.crimson.parser.Parser2.maybeElement(Parser2.java:1552)
>         at org.apache.crimson.parser.Parser2.content(Parser2.java:1824)
>         at org.apache.crimson.parser.Parser2.maybeElement(Parser2.java:1552)
>         at org.apache.crimson.parser.Parser2.parseInternal(Parser2.java:534)
>         at org.apache.crimson.parser.Parser2.parse(Parser2.java:318)
>         at
> org.apache.crimson.parser.XMLReaderImpl.parse(XMLReaderImpl.java:442)
>         at org.jdom.input.SAXBuilder.build(SAXBuilder.java:453)
>         ... 4 more
> 
> 
> 
> _______________________________________________
> To control your jdom-interest membership:
> http://www.jdom.org/mailman/options/jdom-interest/youraddr@yourhost.com
> 


More information about the jdom-interest mailing list