[jdom-interest] XML escaping and unescaping

Jason Hunter jhunter at xquery.com
Fri Nov 19 16:58:24 PST 2004


When you call elt.getText() you get the decoded (semantic) form.  Think 
of JDOM as representing the XML infoset and the " or CDATA 
representation as just one way to encode the XML data when written as a 
stream of bytes.  If you call elt.setText("This \"is\" a test") the 
outputter will write what you have below.

In other words, it's not part of standard class libs since it's almost 
never needed by normal programmers.  JDOM via the parsers handles the 
input and JDOM via XMLOutputter handles the output.  You just deal with 
plain old strings and you don't mind which chars are special and which 
aren't.

-jh-

d.wall at computer.org wrote:

> Thanks.  I'll take a look at your escapers and compare.  It's a bit 
> amazing that such functionality isn't just part of the standard class 
> libraries by now.
> 
> As for coming back in, an XML parser won't decode a string for you, will 
> it?  I mean, if my XML looks like:
> 
> <data>
> <field>This &quot;is&quot; a test.</field>
> </data>
> 
> I would expect that getting the data->field text value would return:
>      This &quot;is&quot; a test.
> 
> Are you saying some XML parsers will return instead:
>      This "is" a test.
> 
> My impression is that such an encoded element would return the String 
> still encoded.
> 
> David
> 
> 
> Jason Hunter wrote:
> 
>> XMLOutputter has escapeElementEntities() and escapeAttributeEntities() 
>> that do what you want and have a pluggaable EscapeStrategy to handle 
>> characters outside the selected output encoding.  We don't have code 
>> to do the reverse as we rely on XML parsers for that.
>>
>> -jh-
>>
>> d.wall at computer.org wrote:
>>
>>> Does JDOM come with any utility routines that will take a String and 
>>> make it XML safe?  And also a routine that takes an XML safe encoding 
>>> and converts it back to a regular String?
>>>
>>> i.e.
>>>
>>> String -> XML Safe string -> String
>>>
>>> "This" -> "This"  -> "This"  (no change needed)
>>> "4+3<4+4" -> "4+3&lt;4+4" -> "4+3<4+4"
>>>
>>> I only ask because I have some basic routines that do this, but they 
>>> only map the following:
>>>
>>>  >   &gt;
>>> <   &lt;
>>> &   &amp;
>>> '     &apos;
>>> "    &quot;
>>>
>>> It currently doesn't deal with escaped character codes like &#039; It 
>>> seems that putting data into XML and getting it back from XML is so 
>>> common that there must be a general routine to do this rather than 
>>> having to rely on my own implementation.
>>>
>>> Thanks,
>>> David
>>>
>>> _______________________________________________
>>> To control your jdom-interest membership:
>>> http://www.jdom.org/mailman/options/jdom-interest/youraddr@yourhost.com
>>>
>>
> 
> _______________________________________________
> To control your jdom-interest membership:
> http://www.jdom.org/mailman/options/jdom-interest/youraddr@yourhost.com
> 


More information about the jdom-interest mailing list