[jdom-interest] JDOM and text outside tags
Stein Erik Berget
seberget at escenic.com
Wed Oct 22 23:30:37 PDT 2003
On Wed, 22 Oct 2003 18:06:09 +0900, Jacques-Albert De Blasio
<jacquesalbert.deblasio at toshiba.co.jp> wrote:
> Hi all,
>
> I have a problem with JDOM and I am sure that one of you JDOM guru could
> help me out :)
>
> In a program I'm writing, I first fetch HTML pages on the web, tidy them
> with NekoHTML (JTidy was not sufficient as it could not parse japanese
> html pages) and then transform the DOM outputed by NekoHTML into JDOM
> documents.
>
> My problem is the following: in a given page, I have tags such as
>
> <TD>
> <SMALL>
> <IMG src = "..." /> some_text <BR />
> <IMG src =" ..." /> some_other_text <BR />
> </SMALL>
> </TD>
>
> How can I fetch the "some_text" and "some_other_text" ?
You get the text from the <SMALL> tag, using code looking something like
this:
smallElement.getText();
smallElement.getTextNormalize();
If you have the <SMALL> tag as an element as you see this will be quite
easy to accomplish.
Good luck, and have a nice day!
--
Stein Erik Berget
Research & Development
Escenic AS + 47 23 27 34 40 (switchboard)
Sommerogt 13-15 + 47 23 27 34 01 (fax)
Box 2393 Solli http://www.escenic.com/
N-0201 OSLO
More information about the jdom-interest
mailing list