[jdom-interest] removing pcdata from jdom-Elements

Sat Mar 12 08:37:50 PST 2005

Hi,

Document.getDescendants returns an iterator that uses
other iterators internally, so I think you'll be
getting concurrent modification exceptions
with your approach.

An approach that works is to 'manually' traverse the tree
an 'rebuilding' the content for each element. Somthing like
this:

     private static List makeList(Text text) {
         List l = new ArrayList();
         StringTokenizer st = new StringTokenizer(text.getText());
         while(st.hasMoreTokens()) {
             Element w = new Element("w");
             w.setText(st.nextToken());
             l.add(w);
         }
         return l;
     }

     private static void process(Element element) {
         List content = new ArrayList();
         for (Iterator i = element.removeContent().iterator(); i.hasNext();) {
             Object o = i.next();
             System.out.println(o);
             if (o instanceof Element) {
                 Element e = (Element) o;
                 process(e);
                 content.add(e);
             } else if (o instanceof Text){
                 content.addAll(makeList((Text)o));
             } else {
                 content.add(o);
             }
         }
         element.setContent(content);

     }

     public static void main(String[] args) throws Exception {
         String xml = "<s>someone said: <q>this sucks bigtime</q> and i agreed</s>";
         Document doc = new SAXBuilder().build(new StringReader(xml));
         process(doc.getRootElement());
         new XMLOutputter().output(doc, System.out);
     }

Kai Wörner skrev:
> Hi all,
> 
> I want to do this to a XML-Document:
> 
> (before:)
> <s>someone said: <q>this sucks bigtime</q> and i agreed</s>
> 
> (after:)
> <s><w>someone</w><w>said:</w><q><w>this</w><w>sucks</w><w>bigtime</w></q><w>
> and</w><w>i</w><w>agreed</w></s>
> 
> I thought I'll get all Elements via
> 
> Iterator myI = doc.getDescendants(new ElementFilter());
> 
> iterate through them, look for PCDATA via Element.getText, chop it with a
> StringTokenizer, add the Tokens as new <w>-Elements to the actual Element
> and get rid of the PCDATA itself. But how do I do this? Is there something
> like Element.removeContent(new onlyThePCDATAContentSparingElementsFilter())?
> 
> Thanks
> 
> Kai
> 
> _______________________________________________
> To control your jdom-interest membership:
> http://www.jdom.org/mailman/options/jdom-interest/youraddr@yourhost.com
>