[jdom-interest] String length shorten after .getChild().getText() is being used.

Jacques wong jacques_wong at hotmail.com
Wed Apr 2 06:29:44 PDT 2008


Hi,
 
Tatu's mentioned a good point that my question not asked very clearly, I'm quite junior on both JDOM & Java, please forgive me if I put something cannot be understood here.
 
My XML is very simple, but I put the encoding as UTF-8 as if XMLOutputter can't display the wording correctly when I change encoding to "big5".
 
<?xml version="1.0" encoding="UTF-8"?><?dsd href="zurich.dsd"?><DB>
  <Record>   <ThxRegTxt>Dollar Money Market °òª÷</ThxRegTxt>   <NxtRegTxt>Japanese Yen Money Market</NxtRegTxt>   <InnerReg>    <beg loop="2">SIZE=-2&gt;</beg>    <end loop="3">&lt;/TD&gt;</end>   </InnerReg>  </Record>  </DB>
 
This is the code I used to display my XML on the console, it works without any problem.
 
   try {
       
       Document docXML = new SAXBuilder().build(new File(xmlPath));                   XMLOutputter outputter = new XMLOutputter(Format.getPrettyFormat());       Format format = outputter.getFormat();       format.setEncoding("big5");       outputter.setFormat(format);       outputter.output(docXML, System.out);     } catch (IOException e) {        e.printStackTrace();     }
 
 
Then I tried to use JDOM to load into the Vector.
 
        Vector xmlRecVector = null;        xmlRecVector = new Vector();             Document docXML = new SAXBuilder().build(new File(xmlPath)); // xmlPath is the path of the XML
        Element rootElementList = docXML.getRootElement();
        List recDBList = rootElementList.getChildren("Record");           Iterator i = recDBList.iterator();        int idxOfList = 0;
 
        while (i.hasNext()) {
              
              Element recElement = (Element) i.next();              idxOfList = recDBList.indexOf(recElement);                     DbXmlHandlerBean recDBObj = new DbXmlHandlerBean(); //DbXmlHandlerBean is an external data type.                     recDBObj.setRecIndex(idxOfList);
              String s = recElement.getChild("ThxRegTxt").getText();
              
              System.out.println(s+" : " + s.length() + "\n"); // I used this to count number of character stored.
 
              // Store into my object, it works fine, you can ignore these codes.
              recDBObj.setNxtRegTxt(recElement.getChild("NxtRegTxt").getText());              recDBObj.setInnerRegBeg(recElement.getChild("InnerReg").getChild("beg").getText());              recDBObj.setInnerBegLoop(Integer.parseInt(recElement.getChild("InnerReg").getChild("beg").getAttributeValue("loop")));                          recDBObj.setInnerRegEnd(recElement.getChild("InnerReg").getChild("end").getText());              recDBObj.setInnerEndLoop(Integer.parseInt(recElement.getChild("InnerReg").getChild("end").getAttributeValue("loop")));
 
              // Put store XML object into Vector              xmlRecVector.add(recDBObj);
 
        } // end of while loop
 
 
After I've stored the object, I display the whole vector object again.
 
   DbXmlHandlerBean recDBObj = new DbXmlHandlerBean();   System.out.println("Print Stored Record...");      for (int i=0; i<recVector.size(); i++) {         recDBObj = (DbXmlHandlerBean) recVector.elementAt(i);     
     System.out.println("Record: " + recDBObj.getRecIndex());     System.out.println("Thx: " + recDBObj.getThxRegTxt() + "  Nxt: " + recDBObj.getNxtRegTxt());     System.out.println("InnerBeg: " + recDBObj.getInnerRegBeg() + " loop: " + recDBObj.getInnerBegLoop());     System.out.println("InnerEnd: " + recDBObj.getInnerRegEnd() + " loop: " + recDBObj.getInnerEndLoop() + "\n");        }
 
But for this time, it can't display my stored text with correct big5 code, but for english only, it works fine.
 
 
I believe if the XMLOutputter can display out big5 information, even it should work by using SAXBuilder() for a Document object, the effect should be the same, but I think something that has been missed.I've no idea the mechanism of Xerces related to JDOM under JDK 1.5. Hope some professional can help me to solve this problem. Thanks. 
 
Regards,
Jacques.



> Date: Tue, 1 Apr 2008 09:52:11 -0700> From: cowtowncoder at yahoo.com> Subject: Re: [jdom-interest] String length shorten after .getChild().getText() is being used.> To: jdom-interest at jdom.org> > > --- Jacques wong <jacques_wong at hotmail.com> wrote:> > > Hi,> > I'm using JDOM v1.1. Basically, I can use most of> > the function of the JDOM, but I found some stranges> > when I use Element.getChild().getText(); I've an XML> > that contain some big5 characters (externally> > created XML file), both using XMLOutputter for> > outputting screen and XML file have no affection on> > the big5 codeset displays. However, when I tried to> > query each text one by one by using> > Element.getChild().getText(), the String returned> > always is shorter than the original in XMLfile, and> > the Big5 characters are displayed incorrectly. I> > tried to use the conversion. String s = new> > Shorter as measured by... ? Number of characters in> it? Since JDOM is not a parser, encoding/decoding> issues are dealt with by the underlying parser;> default being Xerces when using JDK 1.5+.> > I doubt JDOM has anything to do with the problem. By> the time it gets data from parser, it's all in java> chars/Strings, decoded from input (byte stream> usually) as necessary.> But without a sample document it is impossible to know> what exactly goes wrong.> > The most common error is that the encoding declaration> in the xml document is wrong, and contents are encoded> using some other encoding.> Second common problem is developers printing out text> to console, and console being unable to display it> properly.> > >> String(recElement.getChild("ThxRegTxt").getText().getBytes("UTF-8"),"big5");> > but seems it's not displaying correctly also. My> > No kidding, that's about worst piece of code anyone> could write. I wish compilers would refuse to compile> it. :-p> If it worked as expected, your input data was broken,> and you were just lucky that 2 wrongs made right.> > -+ Tatu +-> > > > _____________________________________________________________________
_______________> You rock. That's why Blockbuster's offering you one month of Blockbuster Total Access, No Cost. > http://tc.deals.yahoo.com/tc/blockbuster/text5.com> _______________________________________________> To control your jdom-interest membership:> http://www.jdom.org/mailman/options/jdom-interest/youraddr@yourhost.com
_________________________________________________________________
2 GB ¶W¤j®e¶q ¡B²©ö¡B°ª®Ä²v¡B±j¤j¦w¥þ¨¾Å@ ¡X ¥ß§Y¤É¯Å Windows Live Hotmail 
http://mail.live.com
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.jdom.org/pipermail/jdom-interest/attachments/20080402/965154e4/attachment.htm


More information about the jdom-interest mailing list