[jdom-interest] newbie question: parsing in xhtml containing mathml
Morten Andersen
mortena at mip.sdu.dk
Tue Jul 6 04:08:05 PDT 2004
Well, the task is pretty simple, but I can't get anything working.
I want to parse in an xhtml document containing mathml with all the
entitities defined like alpha and beta. This should then be transformed
using xslt into another xml-document.
The test-xhtml document is shown below:
----
<?xml version="1.0" encoding="ISO-8859-1"?>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.1 plus MathML 2.0//EN"
"http://www.w3.org/TR/MathML2/dtd/xhtml-math11-f.dtd" [<!ENTITY mathml
"http://www.w3.org/1998/Math/MathML">]>
<html xmlns="http://www.w3.org/1999/xhtml">
<body>
<math xmlns="http://www.w3.org/1998/Math/MathML">
<mrow>
<mi>ζ</mi>
</mrow>
</math>
</body>
</html>
----
Here is what I've tryed:
Parsing the document in using a SAXBuilder with the default settings:
-----
SAXBuilder builder = new SAXBuilder();
FileInputStream stream = null;
if (file.exists()) {
try {
stream = new FileInputStream(file);
InputStreamReader reader = new InputStreamReader(stream);
builder.setValidation(false);
try {
doc = builder.build(reader);
} catch (Exception e) {
e.printStackTrace();
}
}
}
---
This results in this error:
"org.jdom.IllegalTargetException: The target "IS10744:arch" is not legal
for JDOM/XML Processing Instructions: Processing instruction targets cannot
contain colons."
Then I tryed to trick the SAXBuilder so that the DTD's are not used by
setting the entityResolver to an entityResolver, that doesn't do anything.
---
SAXBuilder builder = new SAXBuilder();
builder.setEntityResolver(new NoOpEntityResolver());
---
This results in some output to System.err:
"[Fatal Error] :1:66: White spaces are required between publicId
and systemId."
But the transformation seems to occur.
I tryed writing the parsed document to a file. This file doesn't contain
the entity: ζ
---
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.1 plus MathML 2.0//EN"
"http://www.w3.org/TR/MathML2/dtd/xhtml-math11-f.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">
<body bgcolor="white">
Hello world
<math xmlns="http://www.w3.org/1998/Math/MathML">
<mrow>
<mi>?</mi>
</mrow>
</math>
</body>
</html>
---
That could be due to an encoding mistake somewhere.
So as you can tell I've been struggling with this issue for quite some time
getting nowhere. Is it really that difficult parsing in an xhtml document
and transforming it using xslt?
How can I transform an xhtml document containing mathml into another xml
document using xslt?
Regards
Morten Andersen
Master of applied mathematics and computer science
Associate professor
The Maersk Institute of Production technology at Southern Danish University
www.mip.sdu.dk
Campusvej 55
DK-5230 Odense M
Denmark
+45 65 50 36 54
+45 61 71 11 03
Jabber id: hat at jabber.dk
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://jdom.org/pipermail/jdom-interest/attachments/20040706/85ead6fd/attachment.htm
More information about the jdom-interest
mailing list