[jdom-interest] non-ascii characters in xml document
Ian Lea
ian.lea at blackwell.co.uk
Fri Nov 30 01:28:46 PST 2001
You might also like to look at the Javaworld article
"Java Tip 117: Transfer binary data in an XML document"
at http://www.javaworld.com/javaworld/javatips/jw-javatip117.html
--
Ian.
ian.lea at blackwell.co.uk
"John L. Webber - Jentro AG" wrote:
>
> Dave,
>
> This solution is pretty inelegant and may seem like overkill, but it
> works pretty well (as long as we're talking about attribute values or
> text content): try Base64-encoding the "suspect" strings before
> inserting them, and simply decode them when you need to use the text. We
> use that method frequently for handling things like encrypted passwords
> in files, and I've even sent rather large (7000+ lines) files completely
> Base64-encoded. The performance loss is small, as long as the operations
> are not too frequent.
>
> Regards,
>
> John
>
> Dave Neuendorf wrote:
> >
> > To look at a simpler test case, I commented out my code that saves xml in gzip format,
> > and just used straight UTF-8 xml to and from a file. The "curly" single and double
> > quote characters give me exceptions like this:
> >
> > [java] org.jdom.JDOMException: Error on line 1 of document
> > file:/C:/Development/Projects/HierarchicalPIM/default.xml: Character
> > conversion error: "Unconvertible UTF-8 character beginning with
> > 0x92" (line number may be too low).
> > [java] at org.jdom.input.SAXBuilder.build(SAXBuilder.java:296)
> >
> > It sees the single and double quote chars as 0x92 and 0x93, respectively. Maybe these
> > characters aren't Unicode. Could they be Windows-specific character codes, since the
> > text is being pasted from a Windows application into a Java app?
More information about the jdom-interest
mailing list