[jdom-interest] JDOM extract sentence correctly

perez msmr at netcabo.pt
Sun Feb 24 01:05:26 PST 2008


I have a doubt about the use of JDOM parsing a xml document. The outcome is
not what I expect..

I did the next program to parse a xml document. I have considered that the
root of the document id the element body

import org.jdom.*;
import org.jdom.input.SAXBuilder;
import org.jdom.output.*;

import java.io.File;
import java.io.IOException;
import java.util.*;

public class Ex04 {

public static void main(String[] args) {

String filename = "Test.xml";

SAXBuilder b = new SAXBuilder();

try {
Document doc = b.build(new File(filename));
Element root = doc.getRootElement();

Element body = root.getChild("body");
bodyExtract(body);
}
// indicates a well-formedness error
catch (JDOMException e) { 
System.out.println(args[0] + " is not well-formed.");
System.out.println(e.getMessage());
} 
catch (IOException e) { 
System.out.println(e);
} 

}

public static void bodyExtract(Element current) {

String aaa = current.getText();

List children = current.getChildren();

Iterator iterator = children.iterator();
while (iterator.hasNext()) {
Element child = (Element) iterator.next();
bodyExtract(child);
}
}
}

#######################################################################

Part of the original Test.xml file is:
...
<body>
The   http://www.linux.org/ Linux  is na open-source operating system,
created by   http://technorati.com/tag/linus-torvals Linus Torvalds  in the
80’s. 
...

The output of the program above is:

The is an open-source operating system, created by in the 80’s. 
Linux
Linus Torvalds

I want to analyze semantically the sentences. Thus I need that the output is
something like this:

The Linux is an open-source operating system, created by Linus Torvalds
in the 80’s. 

How can I solve this problem,

Thanx for your help

MP 

-- 
View this message in context: http://www.nabble.com/JDOM-extract-sentence-correctly-tp15662095p15662095.html
Sent from the JDOM - General mailing list archive at Nabble.com.




More information about the jdom-interest mailing list