[jdom-interest] MalformedURLException
Michael Kay
mike at saxonica.com
Fri Jan 5 01:53:47 PST 2007
URLs do not allow non-ASCII characters; they must be escaped using the %xx
convention. The Javadoc for java.net.URL is worth reading. In particular:
The URL class does not itself encode or decode any URL components according
to the escaping mechanism defined in RFC2396. It is the responsibility of
the caller to encode any fields, which need to be escaped prior to calling
URL, and also to decode any escaped fields, that are returned from URL.
Furthermore, because URL has no knowledge of URL escaping, it does not
recognise equivalence between the encoded or decoded form of the same URL.
For example, the two URLs:
http://foo.com/hello world/ and http://foo.com/hello%20world
would be considered not equal to each other.
Note, the <http://java.sun.com/j2se/1.5.0/docs/api/java/net/URI.html> URI
class does perform escaping of its component fields in certain
circumstances. The recommended way to manage the encoding and decoding of
URLs is to use <http://java.sun.com/j2se/1.5.0/docs/api/java/net/URI.html>
URI, and to convert between these two classes using
<http://java.sun.com/j2se/1.5.0/docs/api/java/net/URL.html#toURI%28%29>
toURI() and
<http://java.sun.com/j2se/1.5.0/docs/api/java/net/URI.html#toURL%28%29>
URI.toURL().
The <http://java.sun.com/j2se/1.5.0/docs/api/java/net/URLEncoder.html>
URLEncoder and
<http://java.sun.com/j2se/1.5.0/docs/api/java/net/URLDecoder.html>
URLDecoder classes can also be used, but only for HTML form encoding, which
is not the same as the encoding scheme defined in RFC2396.
In your case you seem to be using the URL class solely in order to set the
SystemID property on a source, starting from a File object. For that I would
use the File.toURI() method, assuming you don't need to run on anything
earlier than JDK 1.4
Michael Kay
http://www.saxonica.com/
_____
From: jdom-interest-bounces at jdom.org [mailto:jdom-interest-bounces at jdom.org]
On Behalf Of Huma
Sent: 05 January 2007 08:20
To: jdom-interest at jdom.org
Subject: [jdom-interest] MalformedURLException
I have a model.xml file which I have placed in a directory /home/huma/abäßö.
But when I parse this file I get a MalformedURLException. Strangely, if the
same file is moved to the location /home/huma, I don't get any exception. To
summarize, if the xml file is in some directory that contains non-english
characters, I am getting a MalformedURL exception. Is there some
work-around? Can someone help me with this?
Here is the code snippet:
final String DEF_ENC = "UTF-8";
String str = new String("ab\u00e4\u00df\u00f6");
String newStr = null;
try {
newStr = new String( str.getBytes(DEF_ENC), DEF_ENC);
str = "/home/huma/" + newStr + "/model.xml";
String fileName = null;
fileName = new String(str.getBytes(DEF_ENC), DEF_ENC);
File file = new File(fileName);
File parent = file.getParentFile();
InputStream iStream = null;
iStream = new FileInputStream(file);
InputSource is = new InputSource(iStream);
if(parent != null)
{
is.setSystemId(parent.toURL().toString());
}
SAXBuilder saxbuilder = new SAXBuilder(true);
Document doc = saxbuilder.build (is);
Element root = doc.getRootElement();
String version = root.getAttributeValue("schemaVersion");
System.out.println("version: " + version);
}
catch (FileNotFoundException e1) {
// TODO Auto-generated catch block
e1.printStackTrace();
}
catch (UnsupportedEncodingException e) {
// TODO Auto-generated catch block
e.printStackTrace();
} catch (MalformedURLException e) {
// TODO Auto-generated catch block
e.printStackTrace();
} catch (JDOMException e) {
// TODO Auto-generated catch block
e.printStackTrace();
} catch (IOException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
}
This is the exception I am getting:
java.net.MalformedURLException: no protocol: model.dtd
at java.net.URL.<init>(URL.java:537)
at java.net.URL.<init>(URL.java:434)
at java.net.URL.<init>(URL.java:383)
at org.apache.xerces.impl.XMLEntityManager.setupCurrentEntity(Unknown
Source)
at org.apache.xerces.impl.XMLEntityManager.startEntity(Unknown Source)
at org.apache.xerces.impl.XMLEntityManager.startDTDEntity(Unknown
Source)
at org.apache.xerces.impl.XMLDTDScannerImpl.setInputSource(Unknown
Source)
at
org.apache.xerces.impl.XMLDocumentScannerImpl$DTDDispatcher.dispatch(Unknown
Source)
at org.apache.xerces.impl.XMLDocumentFragmentScannerImpl.scanDocument
(Unknown Source)
at org.apache.xerces.parsers.XML11Configuration.parse(Unknown Source)
at org.apache.xerces.parsers.XML11Configuration.parse(Unknown Source)
at org.apache.xerces.parsers.XMLParser.parse (Unknown Source)
at org.apache.xerces.parsers.AbstractSAXParser.parse(Unknown Source)
at org.apache.xerces.jaxp.SAXParserImpl$JAXPSAXParser.parse(Unknown
Source)
at org.jdom.input.SAXBuilder.build(SAXBuilder.java :453)
at ModelTest.main(ModelTest.java:47)
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.jdom.org/pipermail/jdom-interest/attachments/20070105/5dfd79aa/attachment.htm
More information about the jdom-interest
mailing list