<html>
<head>
<style>
.hmmessage P
{
margin:0px;
padding:0px
}
body.hmmessage
{
FONT-SIZE: 9pt;
FONT-FAMILY:Tahoma
}
</style>
</head>
<body class='hmmessage'>Hi,<BR>
<BR>
Tatu's mentioned a good point that my question not asked very clearly, I'm quite junior on both JDOM & Java, please forgive me if I put something cannot be understood here.<BR>
<BR>
My XML is very simple, but I put the encoding as UTF-8 as if XMLOutputter can't display the wording correctly when I change encoding to "big5".<BR>
<BR>
<?xml version="1.0" encoding="UTF-8"?><BR><?dsd href="zurich.dsd"?><BR><DB><BR>
<Record><BR> <ThxRegTxt>Dollar Money Market 基金</ThxRegTxt><BR> <NxtRegTxt>Japanese Yen Money Market</NxtRegTxt><BR> <InnerReg><BR> <beg loop="2">SIZE=-2&gt;</beg><BR> <end loop="3">&lt;/TD&gt;</end><BR> </InnerReg><BR> </Record> <BR></DB><BR>
<BR>
This is the code I used to display my XML on the console, it works without any problem.<BR>
<BR>
try {<BR>
<BR>
Document docXML = new SAXBuilder().build(new File(xmlPath));<BR> <BR> XMLOutputter outputter = new XMLOutputter(Format.getPrettyFormat());<BR> Format format = outputter.getFormat();<BR> format.setEncoding("big5");<BR> outputter.setFormat(format);<BR> outputter.output(docXML, System.out);<BR><BR> } catch (IOException e) {<BR> e.printStackTrace();<BR> }<BR>
<BR>
<BR>
Then I tried to use JDOM to load into the Vector.<BR>
<BR>
Vector xmlRecVector = null;<BR> xmlRecVector = new Vector();<BR> <BR> Document docXML = new SAXBuilder().build(new File(xmlPath)); // xmlPath is the path of the XML<BR>
Element rootElementList = docXML.getRootElement();<BR>
List recDBList = rootElementList.getChildren("Record");<BR> <BR> Iterator i = recDBList.iterator();<BR> int idxOfList = 0;<BR>
<BR>
while (i.hasNext()) {<BR>
<BR>
Element recElement = (Element) i.next();<BR> idxOfList = recDBList.indexOf(recElement);<BR> <BR> DbXmlHandlerBean recDBObj = new DbXmlHandlerBean(); //DbXmlHandlerBean is an external data type.<BR> <BR> recDBObj.setRecIndex(idxOfList);<BR>
String s = recElement.getChild("ThxRegTxt").getText();<BR>
<BR>
System.out.println(s+" : " + s.length() + "\n"); // I used this to count number of character stored.<BR>
<BR>
// Store into my object, it works fine, you can ignore these codes.<BR>
recDBObj.setNxtRegTxt(recElement.getChild("NxtRegTxt").getText());<BR> recDBObj.setInnerRegBeg(recElement.getChild("InnerReg").getChild("beg").getText());<BR> recDBObj.setInnerBegLoop(Integer.parseInt(recElement.getChild("InnerReg").getChild("beg").getAttributeValue("loop"))); <BR> recDBObj.setInnerRegEnd(recElement.getChild("InnerReg").getChild("end").getText());<BR> recDBObj.setInnerEndLoop(Integer.parseInt(recElement.getChild("InnerReg").getChild("end").getAttributeValue("loop")));<BR>
<BR>
// Put store XML object into Vector<BR> xmlRecVector.add(recDBObj);<BR>
<BR>
} // end of while loop<BR>
<BR>
<BR>
After I've stored the object, I display the whole vector object again.<BR>
<BR>
DbXmlHandlerBean recDBObj = new DbXmlHandlerBean();<BR> System.out.println("Print Stored Record...");<BR> <BR> for (int i=0; i<recVector.size(); i++) { <BR> recDBObj = (DbXmlHandlerBean) recVector.elementAt(i); <BR>
System.out.println("Record: " + recDBObj.getRecIndex());<BR> System.out.println("Thx: " + recDBObj.getThxRegTxt() + " Nxt: " + recDBObj.getNxtRegTxt());<BR> System.out.println("InnerBeg: " + recDBObj.getInnerRegBeg() + " loop: " + recDBObj.getInnerBegLoop());<BR> System.out.println("InnerEnd: " + recDBObj.getInnerRegEnd() + " loop: " + recDBObj.getInnerEndLoop() + "\n"); <BR> }<BR>
<BR>
But for this time, it can't display my stored text with correct big5 code, but for english only, it works fine.<BR>
<BR>
<BR>
I believe if the XMLOutputter can display out big5 information, even it should work by using SAXBuilder() for a Document object, the effect should be the same, but I think something that has been missed.<BR>I've no idea the mechanism of Xerces related to JDOM under JDK 1.5. Hope some professional can help me to solve this problem. Thanks.
<BR>
Regards,<BR>
Jacques.<BR><BR><BR>
<HR id=stopSpelling>
<BR>
> Date: Tue, 1 Apr 2008 09:52:11 -0700<BR>> From: cowtowncoder@yahoo.com<BR>> Subject: Re: [jdom-interest] String length shorten after .getChild().getText() is being used.<BR>> To: jdom-interest@jdom.org<BR>> <BR>> <BR>> --- Jacques wong <jacques_wong@hotmail.com> wrote:<BR>> <BR>> > Hi,<BR>> > I'm using JDOM v1.1. Basically, I can use most of<BR>> > the function of the JDOM, but I found some stranges<BR>> > when I use Element.getChild().getText(); I've an XML<BR>> > that contain some big5 characters (externally<BR>> > created XML file), both using XMLOutputter for<BR>> > outputting screen and XML file have no affection on<BR>> > the big5 codeset displays. However, when I tried to<BR>> > query each text one by one by using<BR>> > Element.getChild().getText(), the String returned<BR>> > always is shorter than the original in XMLfile, and<BR>> > the Big5 characters are displayed incorrectly. I<BR>> > tried to use the conversion. String s = new<BR>> <BR>> Shorter as measured by... ? Number of characters in<BR>> it? Since JDOM is not a parser, encoding/decoding<BR>> issues are dealt with by the underlying parser;<BR>> default being Xerces when using JDK 1.5+.<BR>> <BR>> I doubt JDOM has anything to do with the problem. By<BR>> the time it gets data from parser, it's all in java<BR>> chars/Strings, decoded from input (byte stream<BR>> usually) as necessary.<BR>> But without a sample document it is impossible to know<BR>> what exactly goes wrong.<BR>> <BR>> The most common error is that the encoding declaration<BR>> in the xml document is wrong, and contents are encoded<BR>> using some other encoding.<BR>> Second common problem is developers printing out text<BR>> to console, and console being unable to display it<BR>> properly.<BR>> <BR>> ><BR>> String(recElement.getChild("ThxRegTxt").getText().getBytes("UTF-8"),"big5");<BR>> > but seems
it's not displaying correctly also. My<BR>> <BR>> No kidding, that's about worst piece of code anyone<BR>> could write. I wish compilers would refuse to compile<BR>> it. :-p<BR>> If it worked as expected, your input data was broken,<BR>> and you were just lucky that 2 wrongs made right.<BR>> <BR>> -+ Tatu +-<BR>> <BR>> <BR>> <BR>> ____________________________________________________________________________________<BR>> You rock. That's why Blockbuster's offering you one month of Blockbuster Total Access, No Cost. <BR>> http://tc.deals.yahoo.com/tc/blockbuster/text5.com<BR>> _______________________________________________<BR>> To control your jdom-interest membership:<BR>> http://www.jdom.org/mailman/options/jdom-interest/youraddr@yourhost.com<BR><BR><br /><hr />2 GB 超大容量 、簡易、高效率、強大安全防護 — <a href='http://mail.live.com' target='_new'>立即升級 Windows Live Hotmail </a></body>
</html>