[jdom-interest] Why does setText(" ") not do what I want?

Edelson, Justin Justin.Edelson at mtvn.com
Tue Oct 24 08:18:14 PDT 2006


>I pass the characters with the Windows tool charmap.exe that 
>produces unicode. It works fine with some unicode characters 
>(for example Ç) but not with all. Alpha & # 945 ; or Beta 
>& # 946 ; are not working, these are transferred to & amp ; # 945 ;

I don't really understand how you're using charmap here, but if JDOM is outputting "α" (I assume the extra spaces were for clarity), then you must be inputting "α" So I don't see how this worked for any characters.

To be clear, "α" is not a Unicode character. It's a string containing the XML-escaped representation of a Unicode character. When the FAQ says "pass regular Unicode characters" it means characters that are not escaped. To put it in a form that email won't garble, pass "A", not "A"

If you have static Strings, you may be able to use Unicode characters directly (i.e. not escaped) in your source code or resource bundles, but this requires that the source files are Unicode files. In practice, most source files are ASCII and use the Java Unicode escape sequence. In your code below, you could have something like this:
String ContentElement = "Hello \u03B1"; // 3B1 is hex for 945

>So the 2nd solution is to parse text data, in my case these 
>are strings.
>But how can I parse text data which is not in a XML format?
>
> ...
>
>In a Java servlet I get strings like ContentElement from a 
>html form which can contain unicode characters like '& # 945 
>;' Then I build an XML document with these strings using 
>method setText. The method transforms '& # 945 ;' to '& amp ; # 945 ;'
So as I said above, "&#945" isn't a Unicode character. For cases like this, where you're accepting end-user input that may have XML-escaped characters within it, I'd suggest using the StringEscapeUtils class in Jakarta Commons Lang [http://jakarta.apache.org/commons/lang/api-release/org/apache/commons/lang/StringEscapeUtils.html] Note that this class provides both unescapeXml() and unescapeHtml() methods. The unescapeHtml() method is especially useful if you're getting strings with © or other HTML entities in it.

Hope this helps.

Justin

>-----Original Message-----
>From: jdom-interest-bounces at jdom.org 
>[mailto:jdom-interest-bounces at jdom.org] On Behalf Of John Monaco
>Sent: Tuesday, October 24, 2006 9:43 AM
>To: jdom-interest at jdom.org
>Subject: [jdom-interest] Why does setText(" ") not do what I want?
>
>Hi,
>
>I read "Why does setText(" ") not do what I want?" in FAQ 
>but I do not understand the 2 solutions.
>
>>The solution is to pass regular Unicode characters to the setText()
>>method or, if you have text data that you want to be 
>interpreted as XML, 
>>pass it through an XML parser before it goes into JDOM. This 
>is what the
>>SAXBuilder and DOMBuilder classes do.
>
>
>I pass the characters with the Windows tool charmap.exe that 
>produces unicode. It works fine with some unicode characters 
>(for example Ç) but not with all. Alpha & # 945 ; or Beta 
>& # 946 ; are not working, these are transferred to & amp ; # 945 ;
>
>
>So the 2nd solution is to parse text data, in my case these 
>are strings.
>But how can I parse text data which is not in a XML format?
>
>
>This is what I'm doing:
>
>Element root = new Element("Rootelement");   
>Document document = new Document(root);
>root.addContent(new Element("FirstElement").setText(ContentElement));
>
>
>In a Java servlet I get strings like ContentElement from a 
>html form which can contain unicode characters like '& # 945 
>;' Then I build an XML document with these strings using 
>method setText. The method transforms '& # 945 ;' to '& amp ; # 945 ;'
>
>
>Are there any other solutions?
>
>Thanks in advance,
>John
>
>-- 
>Der GMX SmartSurfer hilft bis zu 70% Ihrer Onlinekosten zu sparen! 
>Ideal für Modem und ISDN: http://www.gmx.net/de/go/smartsurfer
>_______________________________________________
>To control your jdom-interest membership:
>http://www.jdom.org/mailman/options/jdom-interest/youraddr@yourhost.com
>



More information about the jdom-interest mailing list