[jdom-interest] Outputting Entity reference for non US-ASCII characters

Alex Rosen arosen at novell.com
Thu Oct 16 07:14:24 PDT 2003


If I remember correctly JDOM now (in the current CVS tree, not in beta
9) will automatically escape any characters that the output encoding
can't handle. So if you just tell it to output in US-ASCII, it'll turn
all chars > 127 into entities.

(Note that there were some API changes after beta 9 so you'll have to
do a little work to use the latest code.)

Alex

>>> Benjamin Kopic <benjamin.kopic at panContext.com> 10/16/2003 4:37:39
AM >>>
Hi

I need to write some sort of Entity handling routine that converts all
of the non US-ASCII characters
to their SGML Entity reference. There was some discussion on this
subject way back, but I am not sure
what came out of it. All of the documents I need to produce have to
comply to the following restriction:
http://www.ncbi.nlm.nih.gov/entrez/query/static/entities.html 

What would be the best way:

a) write EntityRef for each one of these and then let JDOM
XMLOutputter
do the conversion (I assume it
does it)

b) write my own String conversion utility that converts the chars
outside 127 bit range to their entity
ref value.

Actually, what I really would like to know is if JDOM would convert a
Unicode String to an XML String
that is valid for a particular encoding (i.e. US-ASCII) simply by
registering EntityRef for each of
the characters outside the range for the given encoding?

Best regards

Benjamin
-- 
benjamin kopic
m: +44 (0)780 154 7643
t: +44 (0)20 7794 3090
e: benjamin.kopic at panContext.com 
w: http://www.panContext.com/



More information about the jdom-interest mailing list