[jdom-interest] Resolving Entities...when no DTD is assigned (not
DOCTYPE declaration) in XML
Vish D.
vishpool at gmail.com
Wed Aug 31 15:10:42 PDT 2005
Hello all,
I am having some trouble figuring out how to go about resolving entities
when an XML file doesn't have DOCTYPE declaration (no DTD attached to it),
but contains entities that are 'non-standarad' (such as, ' ', etc...).
I need to do this in such a way that I don't change the XML file (without
added DOCTYPE declaration, etc..).
My need for the above is as follows:
SAXBuilder builder = new SAXBuilder();
....
fulltextXML = builder.build(new FileInputStream(filename));
-- fails with an exception ---
C:\HTMLs\00063185_200_1_67\00063185_200_1_67_Document.xml is not
well-formed.
org.jdom.input.JDOMParseException: Error on line 5: The entity "nbsp" was
referenced, but not declared.
Error on line 5: The entity "nbsp" was referenced, but not declared.
Is there a way to resolve such entities, without having to declare the
DOCTYPE in the XML file?
Thanks in advance!
Vish
Sample XML file:
XML FILE
--------------
<?xml version="1.0" encoding="UTF-8"?>
<object_document>
<art_title> Muscular Alteration of Gill Geometry in vitro: Implications for
Bivalve Pumping Processes -- Medler and Silverman 200 (1): 77 -- The
Biological Bulletin</art_title>
<converted_from type='HTML'>BiolBull V 200 I 1 P 77 Fulltext 00063185.htm
</converted_from>
<fulltext> Biol. Bull. 200: 77-86. (February 2001)© 2001 Marine
Biological LaboratoryMuscular Alteration of Gill Geometry in vitro:
Implications for Bivalve Pumping ProcessesScott Medler* and Harold
SilvermanLouisiana State University, Baton Rouge, Louisiana 70803* Author to
whom correspondence should be addressed. Current address: Department of
Biology, Colorado State University, Ft. Collins, CO 80523. E-mail:
Skmedler{at}aol.com<!-- var u = "Skmedler", d = "aol.com <http://aol.com>";
document.getElementById("em0").innerHTML = "" + u + "@" + d + ""//-->
Received 23 March 2000; accepted 19 October 2000.
</fulltext>
<jrnl_title>BiolBull</jrnl_title>
<issn>00063185</issn>
<volume>200</volume>
<issue>1</issue>
<fpage>77</fpage>
</object_document>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.jdom.org/pipermail/jdom-interest/attachments/20050831/e3e3625c/attachment.htm
More information about the jdom-interest
mailing list