[jdom-interest] Facing problem reading comments data, need help
Jason Hunter
jhunter at servlets.com
Fri Jul 6 15:33:12 PDT 2007
It's a malformed XML document. You may want to run it through Tidy to
make it well formed. If you want JDOM to parse malformed documents, you
can use the unverified factory class (easier than what Paul was saying,
just pass it to the builder). It builds w/o any sanity checking. You
could also subclass that factory to just ignore this one particular
thing. Of course, the best action is to pass JDOM legal XML.
-jh-
Robin Kwek wrote:
> Hi fellow members,
>
> I'm working on a program to analyze web page structural similarity. The
> parser I have is able to work with JDOM and have been able to read html
> files and convert them into respective DOM tree structure.
>
> But there are some web pages using "<!---" and JDOM sounded off stating
> that the data is not legal for a JDOM comment: Comment data cannot start
> with a hyphen, giving an IllegalDataException.
>
> Actually I do not want comments to be read in as I'm primarily concerned
> with the structure of web page, tried searching through SAX features and
> property but I can't find a way to prevent the parser or JDOM from
> reading in comments.
>
> Thus posting this to ask if anyone has a way out to do this? Another way
> I'm thinking of is to turn off the verifier so that the illegal comments
> can be read in and then I can filter them out later but don't seems to
> find the method to turn it off, does anyone know where is it in the javdoc?
>
> Thanks in advance.
>
>
> Send instant messages to your online friends http://uk.messenger.yahoo.com
>
>
> ------------------------------------------------------------------------
>
> _______________________________________________
> To control your jdom-interest membership:
> http://www.jdom.org/mailman/options/jdom-interest/youraddr@yourhost.com
More information about the jdom-interest
mailing list