[jdom-interest] Feature Request
Dennis Sosnoski
dms at sosnoski.com
Thu Feb 19 19:32:33 PST 2004
I'd suggest instead using TagSoup
(http://www.ccil.org/~cowan/XML/tagsoup). It implements its own SAX2
parser for HTML, so doesn't interfere with anything else in your system.
The only downside I've noticed is that the handling it uses to turn HTML
into XHTML can go berserk in some cases of real-world HTML, such as
<script> and <style> elements within the <body> (it properly tries to
force them into a <head> element, so you end up with multiple <head>s
and <body>s). I've figured out how to easily patch it to get around some
of these issues, so let me know if you run into problems.
- Dennis
Chris B. wrote:
>Jeremy.Prellwitz at siras.com wrote:
>
>
>
>>It is not NekoHTML that i'm worried about.
>>
>>
>>
>
>I'm worried about it because I suspect I will have to do some major work
>on either NekoHTML or JTidy for a project I'm working on, and I want to
>understand the situation as clearly as possible, because if that happens
>I *may* have an opportunity to fix Neko properly.
>
>
>
>>It is parsing regular XML documents in the same webapp.
>>
>>
>>
>
>According to the Neko web site....
>" The Xerces2 implementation dynamically instantiates the default parser
>configuration to construct parser objects via the Jar service facility.
>The Jar file |nekohtmlXni.jar| contains a |META-INF/services| file that
>is read by Xerces2 implementation for this purpose."
>
>If I understand this correctly, if you don't use nekohtmlXni.jar, then
>you won't have the problem?
>
>
>
>
>>Basically, NekoHTML interferes with the
>>creation of Xerces parsers'. When i create a SAXBuilder object, it
>>creates a parser that is using the HTML configuration setup by NekoHTML.
>>If I could create my own Xerces parser, and instantiate it with the
>>specific standard configuration class that it needs, and then pass it into
>>the constructor of the SAXBuilder object, then i don't have to worry about
>>a the SAXBuilder object creating a parser on its own, that uses the HTML
>>configuration setup by NekoHTML.
>>
>>
>>-jeremy
>>
>>
>>
>> "Chris B."
>> <chris at tech.com.a
>> u> To
>> Jeremy.Prellwitz at siras.com
>> 02/19/2004 05:55 cc
>> PM jdom-interest at jdom.org
>> Subject
>> Re: [jdom-interest] Feature Request
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>As much as I think its a good idea, how would it help you directly,
>>since NekoHTML doesn't seem to conform to XMLReader? (Which seems to be
>>its problem).
>>
>>
>>Jeremy.Prellwitz at siras.com wrote:
>>
>>
>>
>>
>>
>>>This is what I was trying to describe, just without mentioning it as
>>>specifically/consisely as you just did. I wouldn't have brought up my own
>>>little issue if I didn't think that passing in your own XMLReader instance
>>>could offer usefulness to others. It seems like a simple enough change to
>>>the SAXBuilder.java class, and conincidently, it would smooth out my code
>>>
>>>
>>>
>>>
>>a
>>
>>
>>
>>
>>>little bit. :-)
>>>
>>>-jeremy
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>>It seems to me that supplying your own XMLReader is a sensible enough
>>>>activity that it deserves a proper method or constructor in SAXBuilder
>>>>to pass it in.
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>
>>>
>>>
>>>
>>>
>>
>>
>>
>>
>>> "Chris B."
>>>
>>>
>>>
>>>
>>
>>
>>
>>
>>> <chris at tech.com.a
>>>
>>>
>>>
>>>
>>
>>
>>
>>
>>> u> To
>>>
>>>
>>>
>>>
>>
>>
>>
>>
>>> Jason Hunter <jhunter at xquery.com>
>>>
>>>
>>>
>>>
>>
>>
>>
>>
>>> 02/19/2004 05:00 cc
>>>
>>>
>>>
>>>
>>
>>
>>
>>
>>> PM Jeremy.Prellwitz at siras.com,
>>>
>>>
>>>
>>>
>>
>>
>>
>>
>>> jdom-interest at jdom.org
>>>
>>>
>>>
>>>
>>
>>
>>
>>
>>> Subject
>>>
>>>
>>>
>>>
>>
>>
>>
>>
>>> Re: [jdom-interest] Feature Request
>>>
>>>
>>>
>>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>>Jason Hunter wrote:
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>>Sounds like nekohtml is being a Bad Citizen, but I think you can do
>>>>exactly what you want by subclassing SAXBuilder and overriding
>>>>createParser().
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>It seems to me that supplying your own XMLReader is a sensible enough
>>>activity that it deserves a proper method or constructor in SAXBuilder
>>>to pass it in.
>>>
>>>
>>>
>>>
>>>_______________________________________________
>>>To control your jdom-interest membership:
>>>http://lists.denveronline.net/mailman/options/jdom-interest/youraddr@yourhost.com
>>>
>>>
>>>
>>>
>>
>>
>>
>>
>>>
>>>
>>>
>>>
>>_______________________________________________
>>To control your jdom-interest membership:
>>http://lists.denveronline.net/mailman/options/jdom-interest/youraddr@yourhost.com
>>
>>
>>
>>
>_______________________________________________
>To control your jdom-interest membership:
>http://lists.denveronline.net/mailman/options/jdom-interest/youraddr@yourhost.com
>
>
>
More information about the jdom-interest
mailing list