[jdom-interest] Fwd: Simple xhtml/entity resolver?

Oliver Ruebenacker curoli at gmail.com
Thu Mar 29 07:51:47 PDT 2012


    Hello,

  (forwarding this to the list, as I accidentally only sent to Rolf)

 I think there is a misunderstanding. I don't want to output as XML. I
want to render the XHTML as text like a very primitive browser would
display it.

 I'm building a String by traversing the tree by calling
Element.getContent(). For example, a © can be encoded in XML as
"©". Presumably, the Element tree would contain an EntityRef with
name "copy". But what if an XML document contains "&169;" or
"&x00A9;"? How would the EntityRef object look like?

  Thanks!

     Take care
     Oliver

On Thu, Mar 29, 2012 at 9:46 AM, Rolf Lear <jdom at tuis.net> wrote:
>
> Hi Oliver.
>
> If you already have the XHTML content as JDOM Elements, then you should be
> able to (just) do:
>
> XMLOutputter xout = new XMLOutputter();
> String fragment = xout.outputString(element);
>
> If you want to change the format of the output (indenting, etc.), you can
> add a 'Format' to the XMLOutputter with:
>
> XMLOutputter xout = new XMLOutputter(Format.getPrettyFormat());
> String fragment = xout.outputString(element);
>
>
> I think you may be chasing a red-herring with the Entity References.
>
> The EntityRef code is a 'CYA' implementation, but, in reality, the
> SystemID and PublicID are never going to be needed in regular usage.
>
> The only place I know of where you have entity references is if you
> specify your input parser should ignore entity-reference lookups when
> parsing, and in JDOM you will end up with an EntityRef instead of it's
> 'underlying' text.
>
> Rolf
>
>
> On Thu, 29 Mar 2012 09:23:36 -0400, Oliver Ruebenacker <curoli at gmail.com>
> wrote:
>> Hello,
>>
>>   I need a simple way to convert some XHTML fragments, provided as a
>> JDOM Element, into plain text. I am willing to ignore most HTML tags
>> and consider only the most commonly used predefined entities.
>>
>>   In JDOM, an entity reference has a name, a public id and a system
>> id. I think I know what the named means, for named entities. But what
>> about numeric entities, how do I get the code point? And what are
>> public id and system id?
>>
>>   Thanks!
>>
>>      Take care
>>      Oliver



--
Oliver Ruebenacker, Computational Cell Biologist
Virtual Cell (http://vcell.org)
SBPAX: Turning Bio Knowledge into Math Models (http://www.sbpax.org)
http://www.oliver.curiousworld.org


-- 
Oliver Ruebenacker, Computational Cell Biologist
Virtual Cell (http://vcell.org)
SBPAX: Turning Bio Knowledge into Math Models (http://www.sbpax.org)
http://www.oliver.curiousworld.org



More information about the jdom-interest mailing list