[jdom-interest] Simple xhtml/entity resolver?
Olivier Jaquemet
olivier.jaquemet at jalios.com
Thu Mar 29 08:47:33 PDT 2012
Hi Oliver,
JDom is a great tool for parsing XML...
... but for XHTML fragment (which may not be completely XHTML compliant
... ?)
and specially for text extraction, I would strongly suggest JSoup
http://jsoup.org/
String text = org.jsoup.Jsoup.parse(html).text();
Whatever is your html it will work like a charm (even it is an ugly copy
paste wysiwyg from word or any ugly html export from whatever website)
Olivier
On 29/03/2012 15:23, Oliver Ruebenacker wrote:
> Hello,
>
> I need a simple way to convert some XHTML fragments, provided as a
> JDOM Element, into plain text. I am willing to ignore most HTML tags
> and consider only the most commonly used predefined entities.
>
> In JDOM, an entity reference has a name, a public id and a system
> id. I think I know what the named means, for named entities. But what
> about numeric entities, how do I get the code point? And what are
> public id and system id?
>
> Thanks!
>
> Take care
> Oliver
>
--
Olivier Jaquemet<olivier.jaquemet at jalios.com>
Ingénieur R&D Jalios S.A. - http://www.jalios.com/
@OlivierJaquemet +33970461480
More information about the jdom-interest
mailing list