[jdom-interest] non-ascii characters in xml document

John L. Webber - Jentro AG John.Webber at jentro.com
Thu Nov 29 23:32:46 PST 2001


Dave,

This solution is pretty inelegant and may seem like overkill, but it
works pretty well (as long as we're talking about attribute values or
text content): try Base64-encoding the "suspect" strings before
inserting them, and simply decode them when you need to use the text. We
use that method frequently for handling things like encrypted passwords
in files, and I've even sent rather large (7000+ lines) files completely
Base64-encoded. The performance loss is small, as long as the operations
are not too frequent.

Regards,

John


Dave Neuendorf wrote:
> 
> To look at a simpler test case, I commented out my code that saves xml in gzip format,
> and just used straight UTF-8 xml to and from a file. The "curly" single and double
> quote characters give me exceptions like this:
> 
>      [java] org.jdom.JDOMException: Error on line 1 of document
> file:/C:/Development/Projects/HierarchicalPIM/default.xml: Character
> conversion error: "Unconvertible UTF-8 character beginning with
> 0x92" (line number may be too low).
>      [java]     at org.jdom.input.SAXBuilder.build(SAXBuilder.java:296)
> 
> It sees the single and double quote chars as 0x92 and 0x93, respectively. Maybe these
> characters aren't Unicode. Could they be Windows-specific character codes, since the
> text is being pasted from a Windows application into a Java app?
> 


-- 
---------------------------------------------------------
 Jentro AG
 John L. Webber, Software Development
---------------------------------------------------------
 Peter-Henlein-Strasse 28, 85540 Haar/Munich, Germany
 Tel. +49 89 462 385 0     mailto:John.Webber at jentro.com 
 Fax  +49 89 462 385 29    http://www.jentro.com
---------------------------------------------------------
 {we get anything internet-ready}
---------------------------------------------------------



More information about the jdom-interest mailing list