[jdom-interest] Content missing after conversion from W3C Element to JDOM2 Element

Larsen larsen007 at web.de
Thu Nov 8 01:20:52 PST 2012


Hi Rolf,

first of all, thanks for your extensive help!


> The Java API documentation is a mess in this area.... JDK 1.5 package  
> information indicates that the org.w3c.dom API supports DOM Level 2:
> http://docs.oracle.com/javase/1.5.0/docs/api/org/w3c/dom/package-summary.html

That´s nice to hear. I was already wondering wether my English is too bad  
or if the javadoc is so crudely written that I can´t understand it.


> What would be useful is if you could determine the library that you are  
> using. Since you have already 'hacked' the code, why don't you  
> temporarily add the line: System.out.println(text.getClass()); to the  
> method. This will tell you the concrete implementation of DOM that's  
> broken.

It´s "org.w3c.tidy.DOMTextImpl". I use JTidy to bring HTML code I obtain  
 from a customer´s database into Java objects.
So, should I file a bug against JTidy?


My code in that area in case it helps:

     private org.w3c.dom.Document getDocFromTidy(String html) {

         Tidy tidy = new Tidy();
         tidy.setShowWarnings(false);
         tidy.setQuiet(true);
         tidy.setXHTML(true);
         tidy.setDocType("omit");

         // convert text representation to Document
         InputStream bais = new ByteArrayInputStream(html.getBytes());

         try {
             bais.close();
         } catch (IOException e) {
             log.error("Exception on closing the InputStream", e);
         }

         return tidy.parseDOM(bais, null);
     }



Lars


More information about the jdom-interest mailing list