[jdom-interest]   not getting converted to  

Per Norrman pernorrman at telia.com
Fri Apr 23 08:43:20 PDT 2004


Hi,

you're not doing anything wrong. Note that   is an entity that
is declared somewhere in the html dtd (here, 
http://www.w3.org/TR/html4/sgml/entities.html, for html 4.01). Most 
(all?) browsers can handle html character entities even if the
html file doesn't explicitly refer to the dtd.

But you are generating XML, and XML does not recognize  . XML 1.0
defines five pre defined entities: lt, gt, amp, apos and quote.

You can extend XMLOutputter and override escapeElementEntities to 
produce &nbsp, but then you must also declare this entity.

/pmn

Robert Taylor wrote:
> Thanks for the reply Jason,
> 
> I must still be doing something wrong.
> 
> Here is the relavent snippet of my code:
> 
> SAXBuilder builder = new SAXBuilder();
> Document doc = builder.build(docname);
> 
> XSLTransformer transformer = new XSLTransformer(sheetname);
> Document doc2 = transformer.transform(doc);
> XMLOutputter outp = new XMLOutputter();
> outp.output(doc2, System.out);
> 
> XML:
> <?xml version="1.0"?>
> <data>123456</data>
> 
> XSL:
> <?xml version="1.0" ?>
> <xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform" >
>   <xsl:output method="html" indent="yes" encoding="ASCII"/>
> 
> <xsl:template match="/">
> <html>
> <head></head>
> <body>
> <xsl:apply-templates/>
> </body>
> </html>
> </xsl:template>
> 
> <xsl:template match="data">
> This&#160;&#160;&#160;is&#160;&#160;&#160;
> some&#160;&#160;&#160;text&#160;&#160;&#160; <xsl:value-of select="."/>
> </xsl:template>
> 
> </xsl:stylesheet>
> 
> JDOM output in the Windows DOS console with encoding="ASCII":
> <?xml version="1.0" encoding="UTF-8"?>
> <html><head /><body>
> This?á?á?áis?á?á?ásome?á?á?átext?á?á?á 123456</body></html>
> 
> Xalan output in Windows DOS cosole with encoding="ASCII"
> <html>
> <head>
> <META http-equiv="Content-Type" content="text/html; charset=ASCII">
> </head>
> <body>
> This&nbsp;&nbsp;&nbsp;is&nbsp;&nbsp;&nbsp;some&nbsp;&nbsp;&nbsp;text&nbsp;&nbsp;&nbsp; 123456</body>
> </html>
> 
> If I change the code such that I manually set the Format of the XMLOutputter
> (XMLOutputter seems to ignore any formatting information in the XSL document):
> Format format = Format.getPrettyFormat();
> format.setEncoding("ASCII");
> XMLOutputter outp = new XMLOutputter(format);
> 
> JDOM output:
> <html>
>   <head />
>   <body>This&#xa0;&#xa0;&#xa0;is&#xa0;&#xa0;&#xa0;some&#xa0;&#xa0;&#xa0;text&#xa0;&#xa0;&#xa0; 123456</body>
> </html>
> 
> 
> So the question is, how do I set up the JDOM XMLOutputter to
> convert the &#160; such that when I view them in the Windows DOS console
> they are rendered as &nbsp; characters (like Xalan does)?
> 
> 
> robert
> 
> 
> 
> 
>>-----Original Message-----
>>From: jdom-interest-admin at jdom.org
>>[mailto:jdom-interest-admin at jdom.org]On Behalf Of Jason Hunter
>>Sent: Thursday, April 22, 2004 5:18 PM
>>To: Robert Taylor
>>Cc: jdom-interest at jdom.org
>>Subject: Re: [jdom-interest] &#160; not getting converted to &nbsp;
>>
>>
>>The output you see contains the direct UTF-8 character for a
>>non-breaking space.  It shows up like a funny character because the
>>environment in which you're viewing the file probably isn't UTF-8 aware.
>>  Semantically though the files are identical.  The JDOM one uses one
>>char where the others use six.  If you want ASCII encoding, set the
>>outputter to use ASCII.  It'll then automatically encode chars that
>>can't be represented within ASCII.  You can also just set an escape
>>strategy on the outputter directly if you want UTF-8 but want to encode
>>characters that wouldn't ordinarily need to be encoded.
>>
>>-jh-
>>
>>Robert Taylor wrote:
>>
>>
>>>Greetings, I'm using JDOMBeta10 and am trying to transform an XML document into an HTML document.
>>>I've chosen Xalan-Java v2.6.0 for transformation and have set the system property
>>>javax.xml.transform.TransformerFactory with org.apache.xalan.processor.TransformerFactoryImpl as
>>>discussed here:
>>>
>>>http://www.jdom.org/docs/apidocs/org/jdom/transform/XSLTransformer.html
>>>
>>>based on this documentation:
>>>
>>>http://www.dpawson.co.uk/xsl/sect2/nbsp.html#d6353e246
>>>
>>>it appears that there is an encoding issue.
>>>
>>>I can use the same xml document and style sheet with "pure" Xalan classes
>>>and the document is transformed as expected.
>>>
>>>XML:
>>><?xml version="1.0"?>
>>><data>123456</data>
>>>
>>>XSL:
>>>
>>><?xml version="1.0" ?>
>>><xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.0">
>>>  <xsl:output method="html" indent="yes"/>
>>>
>>><xsl:template match="/">
>>><html>
>>><head></head>
>>><body>
>>><xsl:apply-templates/>
>>></body>
>>></html>
>>></xsl:template>
>>>
>>><xsl:template match="data">
>>>This&#160;is&#160;some&#160;text&#160; <xsl:value-of select="."/>
>>></xsl:template>
>>>
>>></xsl:stylesheet>
>>>
>>>
>>>JDOM output:
>>><?xml version="1.0" encoding="UTF-8"?>
>>><html><head /><body>
>>>This is some text  123456</body></html>
>>>
>>>Xalan output:
>>><html>
>>><head>
>>><META http-equiv="Content-Type" content="text/html; charset=UTF-8">
>>></head>
>>><body>
>>>This&nbsp;is&nbsp;some&nbsp;text&nbsp; 123456</body>
>>></html>
>>>
>>>Any ideas?
>>>
>>>robert
>>>
>>>_______________________________________________
>>>To control your jdom-interest membership:
>>>http://lists.denveronline.net/mailman/options/jdom-interest/youraddr@yourhost.com
>>>
>>
>>_______________________________________________
>>To control your jdom-interest membership:
>>http://lists.denveronline.net/mailman/options/jdom-interest/youraddr@yourhost.com
>>
> 
> 
> _______________________________________________
> To control your jdom-interest membership:
> http://lists.denveronline.net/mailman/options/jdom-interest/youraddr@yourhost.com
> 



More information about the jdom-interest mailing list