[jdom-interest] JDOM extract sentence correctly
perez
msmr at netcabo.pt
Sun Feb 24 01:05:26 PST 2008
I have a doubt about the use of JDOM parsing a xml document. The outcome is
not what I expect..
I did the next program to parse a xml document. I have considered that the
root of the document id the element body
import org.jdom.*;
import org.jdom.input.SAXBuilder;
import org.jdom.output.*;
import java.io.File;
import java.io.IOException;
import java.util.*;
public class Ex04 {
public static void main(String[] args) {
String filename = "Test.xml";
SAXBuilder b = new SAXBuilder();
try {
Document doc = b.build(new File(filename));
Element root = doc.getRootElement();
Element body = root.getChild("body");
bodyExtract(body);
}
// indicates a well-formedness error
catch (JDOMException e) {
System.out.println(args[0] + " is not well-formed.");
System.out.println(e.getMessage());
}
catch (IOException e) {
System.out.println(e);
}
}
public static void bodyExtract(Element current) {
String aaa = current.getText();
List children = current.getChildren();
Iterator iterator = children.iterator();
while (iterator.hasNext()) {
Element child = (Element) iterator.next();
bodyExtract(child);
}
}
}
#######################################################################
Part of the original Test.xml file is:
...
<body>
The http://www.linux.org/ Linux is na open-source operating system,
created by http://technorati.com/tag/linus-torvals Linus Torvalds in the
80’s.
...
The output of the program above is:
The is an open-source operating system, created by in the 80’s.
Linux
Linus Torvalds
I want to analyze semantically the sentences. Thus I need that the output is
something like this:
The Linux is an open-source operating system, created by Linus Torvalds
in the 80’s.
How can I solve this problem,
Thanx for your help
MP
--
View this message in context: http://www.nabble.com/JDOM-extract-sentence-correctly-tp15662095p15662095.html
Sent from the JDOM - General mailing list archive at Nabble.com.
More information about the jdom-interest
mailing list