[jdom-interest] [xml-dev] Cannot close an XML file used for parsing
Jack Bush
netbeansfan at yahoo.com.au
Wed Oct 29 05:45:59 PDT 2008
Hi Everyone,
I have added the additional I/O statements in the finally clause as follows but the problem still persisted:
readData()
// reading data (html) from the webpage and save it in html format.
try {
….
}
catch { …. }
finally {
System.out.flush();
isInHtml.close();
disInHtml.close();
fosOutHtml.flush();
fosOutHtml.getFD().sync();
fosOutHtml.close();
}
// convert the html webpage format to xml format
try {
….
}
catch { …. }
finally {
System.out.flush();
fwOutXml.flush();
fwOutXml.close();
pwOutXml.flush();
pwOutXml.close();
}
Below is a short listing of the new XML file:
<?xml version="1.0" encoding="iso-8859-1" ?>
- <<html>
- <<head>
<<meta http-equiv="Content-Type"content="text/html; charset=UTF-8" />
<<meta name="keywords"content="California, cities, towns, villages, list, zipcodes, postal codes, united states, ca" />
<<meta name="description"content="Cities, towns and suburbs in California, United States(CA) starting with A" />
<<title>Cities and Towns in Californiastarting with A – ABC Company</title>
<<link rel="stylesheet"href="http://www.abc.com/style.css"type="text/css"media="screen" />
</head>
- <<body>
<<a name="top" />
- <<div id="container">
- <<div id="header">
<<div id="postmark" />
- <<a href="http://www.abc.com/"class="imglink">
<<img id="logoimg"src="http://www.abc.com/images/zipcodes.gif"width="192"height="33"alt="Zipcodes America Logo" />
</a>
<<hr />
</div>
- <<div id="nav">
- <<ul>
- <<li>
<<a href="http://www.abc.com/"title="Home Page">Home</a>
</li>
- <<li>
<<strong>Search</strong>
(zipcode or suburb)
- <<div class="hide">
<<form method="post"action="http://www.abc.com/search" /> // line 23
</div>
<<input type="text"name="q"class="searchbox"alt="Search query" />
<<br />
<<input type="submit"value="find!"class="searchbutton"alt="Perform search" />
<<div class="hide" />
</li>
…
What I find it interesting is that it is possible to parse the above XML file with the same parseData() from another class without any problem. As a result, I have come to the following conclusion so far:
( i ) There is some file locking that is prevent saxBuilder from parsing the XML file at the time.
( ii ) The light_html2xml does not appears to have correctly converted over the orginal Html to Xml but some how it has been picked up by the parser in the same class, but not by the same parser from another class.
( iii ) I would like to use another conversion tool such as Tagsoup in place of light_html2xml to determine where the cause of this issue is coming from. As a result, would anyone be able to assist me coming up with a few lines of conversion statements using Tagsoup since I am not familiar with using this tool?
( iv ) light_html2xml is good as it strip out all namespace, DTD, Entity Resolver, etc and only return what I need. JTidy does correct conversion but include namespace, DTD, Entity Resolver which makes parsing difficulty.
Many thanks again,
Jack
________________________________
From: Sheila M. Morrissey <Sheila.Morrissey at portico.org>
To: Jack Bush <netbeansfan at yahoo.com.au>
Sent: Wednesday, 29 October, 2008 12:52:06 AM
Subject: RE: [xml-dev] Cannot close an XML file used for parsing
Jack – did you try fosOutHtml.getFD().sync() after the flush?
Regards
Sheila
________________________________
From:Jack Bush [mailto:netbeansfan at yahoo.com.au]
Sent: Tuesday, October 28, 2008 8:41 AM
To: Robert Koberg
Cc: xml-dev at lists.xml.org
Subject: Re: [xml-dev] Cannot close an XML file used for parsing
Hi Robert,
Thanks for responding to this post.
I have added your suggestion but the issue still persist. Nevertheless, I do believe that this is caused by the new XML file not having been closed properly.
There is no problem with light-html2xml method which has worked in the past..
Any more suggestion to try out?
Thanks,
Jack
________________________________
From:Robert Koberg <rob at koberg.com>
To: Jack Bush <netbeansfan at yahoo.com.au>
Cc: xml-dev at lists.xml.org
Sent: Tuesday, 28 October, 2008 9:42:21 AM
Subject: Re: [xml-dev] Cannot close an XML file used for parsing
close the stream or reader in a finally block to avoid leaving it open
if an error occurs.
try{
}catch(....){
}finally {
}
On Oct 27, 2008, at 6:03 PM, Jack Bush wrote:
> Hi All,
>
> I appears to have difficulty closing (possibly flushing it first) an
> XML file that was subsequently being parsed without success. The
> error generated is:
>
> org.jdom.input.JDOMParseException: Error on line 23: The element
> type "form" must be terminated by the matching end-tag "</form>".
>
> Below is the code snippets of readData() to retrieve (HTML) data
> from a website, save it to a file, then convert to XML format before
> returning the new filename:
> public String readData() {
>
> try {
> URL url = new URL("http://www.abc.com");
> URLConnection connection = url.openConnection();
> InputStream isInHtml = url.openStream(); // throws an
> IOException
> disInHtml = new DataInputStream(new
> BufferedInputStream(isInHtml));
> System.out.flush();
> FileOutputStream fosOutHtml = null;
> fosOutHtml = new FileOutputStream("C:\\Temp\\ABC..html");
> int oneChar, count=0;
> while ((oneChar=disInHtml.read()) != -1)
> fosOutHtml.write(oneChar);
> isInHtml.close();
> disInHtml.close();
> fosOutHtml.flush(); // optional
> fosOutHtml..close();
> .....
> }
>
> try {
> File fileInHtml = new File("C:\\Temp\\ABC.html");
> FileReader frInHtml = new FileReader(fileInHtml);
> BufferedReader brInHtml = new BufferedReader(frInHtml);
> String string = "";
> while (brInHtml..ready())
> string += brInHtml.readLine() + "\n";
> fwOutXml = new FileWriter("C:\\Temp\\ABC.xml");
> pwOutXml = new PrintWriter(fwOutXml);
> light_html2xml html2xml = new light_html2xml();
> pwOutXml.print(html2xml.Html2Xml(string));
> system.out.flush() // optional
> fwOutXml.flush(); // optional
> fwOutXml.close();
> pwOutXml.flush(); // optional
> pwOutXml.close();
> return fileInHtml.getAbsolutePath();
> ....
> }
> }
>
> // parseData reads the XML file using the name returned by readData()
> public void parseData(String XMLFilename)
> {
> try
> {
> FileReader frInXml = new FileReader(FileName);
> BufferedReader brInXml = new BufferedReader(frInXml);
> SAXBuilder saxBuilder = new
> SAXBuilder("org.apache.xerces.parsers.SAXParser"); //
> JDOMParseException generated.
> ....
> }
> These codes would worked when they were in a single method but I
> have since placed some structure around them using a number methods.
>
> This issue has risen in th past where I have been able to close the
> XML file prior to reading them again. However, I don't have a
> solution for it this time round.
>
> I am running JDK 1.6.0_10, Netbeans 6.1, JDOM 1.1 on Windows XP
> platform.
>
> Any assistance would be appreciated..
>
> Many thanks,
>
> Jack
>
Search 1000's of available singles in your area at the new Yahoo!7 Dating. Get Started http://au.dating.yahoo.com/?cid=53151&pid=1011
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.jdom.org/pipermail/jdom-interest/attachments/20081029/57e7d855/attachment.htm
More information about the jdom-interest
mailing list