[jdom-interest] Parsing Microsoft Word Documents
Per Norrman
per.norrman at austers.se
Sat Dec 18 03:26:33 PST 2004
Hugo Garcia wrote:
> Hi
>
> I am trying to parse a Microsoft Wrod document with the SAXBuilder but
> I get an error that attributes must be qouted. When I look at the
> document I see that indeed some attibutes, especially in various meta
> tags are not quoted. I wonder if anyone has run into this problem and
> if so if you have a work around or solution.
>
Then it's not XML, but probably HTML produced by saving a Word doc
in html format. You can always try using the tagsoup parser:
http://mercury.ccil.org/~cowan/XML/tagsoup/
The, just create the SAXBuilder like so:
new SAXBuilder("org.ccil.cowan.tagsoup.Parser");
/pmn
More information about the jdom-interest
mailing list