[jdom-interest] Parsing HTML elements
Paul Libbrecht
paul at hoplahup.net
Wed Nov 21 12:59:44 PST 2012
Thanks Rolf,
that'd be the right thing indeed which I did not think of.
For now, I've implemented a replacement of the raw data... that is simpler.
I sure agree JDOM should refuse to do anything with undeclared prefixes.
I had tried to add namespace declarations within the factory but that has not been taken in account.
thanks.
Paul
Le 21 nov. 2012 à 00:08, Rolf Lear a écrit :
> Hi Paul.
>
> In the mail below I suggested using a parsing proxy. The term I meant to use is a 'Filter'. See this article here:
>
> http://www.ibm.com/developerworks/xml/library/x-tipsaxfilter/
>
> You can do some magic with http://www.jdom.org/docs/apidocs/org/jdom2/input/SAXBuilder.html#setXMLFilter(org.xml.sax.XMLFilter)
>
> For example, your filter could exend http://docs.oracle.com/javase/6/docs/api/org/xml/sax/helpers/XMLFilterImpl.html
>
> and then override the method http://docs.oracle.com/javase/6/docs/api/org/xml/sax/helpers/XMLFilterImpl.html#startElement(java.lang.String,%20java.lang.String,%20java.lang.String,%20org.xml.sax.Attributes)
>
> to set the 'attrs' URI's correctly, and then call super.startElement(....).
>
> Rolf
>
> On 20/11/2012 12:14 PM, Rolf Lear wrote:
>>
>> Hmmm not using the default API.
>>
>> JDOM expects the getURI() method to have a value if there is a prefix
>> for the attribute. This is reasonable... ;)
>>
>> This indicates the sax stream is broken. JDOM should be throwing
>> "Namespace URIs must be non-null and non-empty Strings".
>>
>> If you cannot fic the SAX stream code, you can maybe write a proxy class
>> that fixes the URIs as the events pass through.
>>
>> Rolf
>>
>>
>> Rolf
>>
>> Paul Libbrecht <paul at hoplahup.net> wrote:
>>
>> Hello JDOm experts,
>>
>> I'm hitting a wall here and I am not sure who is responsible.
>> Just like the previous series of post, I am trying to parse an HTML
>> document.
>> In this case I use the CyberNeko HTML parser
>> http://nekohtml.sourceforge.net/ which creates a SAX stream hence is
>> easily convertible to a JDOM document.
>>
>> Now, my big issue is that the document I have (which I cannot easily
>> change right now) contains undeclared namespace-prefixed attribute-names!
>>
>> Do I have a way to predefine the namespace somewhere?
>>
>> thanks in advance
>>
>> Paul
>> _______________________________________________
>> To control your jdom-interest membership:
>> http://www.jdom.org/mailman/options/jdom-interest/youraddr@yourhost.com
>>
>>
>>
>> _______________________________________________
>> To control your jdom-interest membership:
>> http://www.jdom.org/mailman/options/jdom-interest/youraddr@yourhost.com
>>
>
More information about the jdom-interest
mailing list