[jdom-interest] Problem getting my XML in
Per Norrman
per.norrman at austers.se
Wed Aug 18 16:18:05 PDT 2004
Hi,
You have a few problems, but first a general Java advice: It pays
off to print exception stack traces.
1) The hostname in your URL in the attached program is wrong.
If you do a stack trace here
> Document doc = null;
> try {
> doc = builder.build(urlObj);
> }
> catch(Exception ex) {
ex.printStackTrace();
> return "Error on making xml returned SAXable" +
> ex.getMessage();
> }
you'll see that the exception is java.net.UnknownHostException
2) But fixing that reveals another, more serious, problem. This service
(I wouldn't call it webservice btw) does *not* return XML. It returns
html with the XML document escaped within a <pre> element!!!
Do a "view page source" on the full URL and you'll see for yourself. I have
no idea why they do it like that--the point is kind of lost.
Of course, your program fails miserably at this point:
>
> String geneTrackGeneId =
> doc.getRootElement().getChild("Entrezgene_track-info").getChild("Gene-track").getChild("Gene-track_geneid").getTextTrim();;
>
since the root element is the only element in the document.
Now, perhaps the guys at NCBI provides a method for obtaining *real* XML -- then
use that. But if they don't, you can always build a new document from the text
of the <pre> element. Kind of awkward and not quite robust, but it works. A
sample program is attached.
/pmn
PS. That was one hell of a noisy XML document. I wonder what the
markup to actual data ratio is. Is there a term for this? DS.
-------------- next part --------------
package gene;
import java.io.IOException;
import java.io.StringReader;
import org.jdom.Document;
import org.jdom.Element;
import org.jdom.input.SAXBuilder;
import org.jdom.xpath.XPath;
import org.xml.sax.EntityResolver;
import org.xml.sax.InputSource;
import org.xml.sax.SAXException;
/**
* @author Per Norrman
*
*/
public class GetGene {
String _urlPrefix = "http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=text&db=gene&dopt=xml&uid=";
String _docType = "<!ELEMENT Entrezgene ANY>";
String _entrezPublicID = "-//NCBI//NCBI Entrezgene/EN";
public String getGene(String uid) {
try {
SAXBuilder builder = new SAXBuilder();
builder.setEntityResolver(new EntityResolver() {
public InputSource resolveEntity(String publicId,
String systemId) throws SAXException, IOException
{
if (publicId != null && publicId.equals(_entrezPublicID)) {
return new InputSource(new StringReader(_docType));
}
return null;
}
});
String url = _urlPrefix + uid;
System.out.println("URL=" + url);
System.out.println("load");
Document bogus = builder.build(url);
String xml = bogus.getRootElement().getText();
Document doc = builder.build(new StringReader(xml));
XPath xpath = XPath
.newInstance("/Entrezgene/Entrezgene_track-info/Gene-track/Gene-track_geneid");
Element node = (Element) xpath.selectSingleNode(doc);
if (node != null) {
return node.getText();
} else {
throw new RuntimeException("Could not find stuff");
}
} catch (Exception e) {
e.printStackTrace();
throw new RuntimeException(e);
}
}
public static void main(String[] args) throws Exception {
System.out.println(new GetGene().getGene("4537"));
}
}
More information about the jdom-interest
mailing list