[jdom-interest] Parsing HTML elements
Rolf Lear
jdom at tuis.net
Tue Nov 20 15:08:49 PST 2012
Hi Paul.
In the mail below I suggested using a parsing proxy. The term I meant to
use is a 'Filter'. See this article here:
http://www.ibm.com/developerworks/xml/library/x-tipsaxfilter/
You can do some magic with
http://www.jdom.org/docs/apidocs/org/jdom2/input/SAXBuilder.html#setXMLFilter(org.xml.sax.XMLFilter)
For example, your filter could exend
http://docs.oracle.com/javase/6/docs/api/org/xml/sax/helpers/XMLFilterImpl.html
and then override the method
http://docs.oracle.com/javase/6/docs/api/org/xml/sax/helpers/XMLFilterImpl.html#startElement(java.lang.String,%20java.lang.String,%20java.lang.String,%20org.xml.sax.Attributes)
to set the 'attrs' URI's correctly, and then call super.startElement(....).
Rolf
On 20/11/2012 12:14 PM, Rolf Lear wrote:
>
> Hmmm not using the default API.
>
> JDOM expects the getURI() method to have a value if there is a prefix
> for the attribute. This is reasonable... ;)
>
> This indicates the sax stream is broken. JDOM should be throwing
> "Namespace URIs must be non-null and non-empty Strings".
>
> If you cannot fic the SAX stream code, you can maybe write a proxy class
> that fixes the URIs as the events pass through.
>
> Rolf
>
>
> Rolf
>
> Paul Libbrecht <paul at hoplahup.net> wrote:
>
> Hello JDOm experts,
>
> I'm hitting a wall here and I am not sure who is responsible.
> Just like the previous series of post, I am trying to parse an HTML
> document.
> In this case I use the CyberNeko HTML parser
> http://nekohtml.sourceforge.net/ which creates a SAX stream hence is
> easily convertible to a JDOM document.
>
> Now, my big issue is that the document I have (which I cannot easily
> change right now) contains undeclared namespace-prefixed attribute-names!
>
> Do I have a way to predefine the namespace somewhere?
>
> thanks in advance
>
> Paul
> _______________________________________________
> To control your jdom-interest membership:
> http://www.jdom.org/mailman/options/jdom-interest/youraddr@yourhost.com
>
>
>
> _______________________________________________
> To control your jdom-interest membership:
> http://www.jdom.org/mailman/options/jdom-interest/youraddr@yourhost.com
>
More information about the jdom-interest
mailing list