[jdom-interest] Turning of entity expansion
Paul Chapman
chapman at zemsys.com
Wed Sep 4 00:15:24 PDT 2002
OK, so JDom has helpfully converted a character (the &) that could
be confused with an XML reserved character(<, >...) into & for you.
This is normally what you would want, so I doubt it can be turned off.
JDOM does not know that © is already encoded for XML, so it tries
to do it for you.
This comes back to your original comment:
> >When I look at the output my Unicode reference has been
> >changed into the actual character, which I do not want, I want
> >this line to be output verbatim.
So, why is the actual character not acceptible? I am not saying you
are right or wrong to want the original character, I am trying to
ascertain the reason why the translated character is not acceptible
to you. The copyright symbol appears quite happily in my browser
when I use it. Like this: ©
-Paul.
ion wrote:
> Here is an example, consider the following simple program:
>
> import java.io.*; import org.jdom.*;
> import org.jdom.input.*; import org.jdom.output.*;
> public class test {
> public static void main(String args[]) {
> Document doc = new Document(new Element("html"));
> DocType docType = new DocType("html", "-//W3C//DTD XHTML 1.0
> Transitional//EN",
>
> "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd");
> doc.setDocType(docType);
> Element root = doc.getRootElement();
> Element head = new Element("head");
> head.addContent(new Element("title").setText("Blah"));
> root.addContent(head);
> Element body = new Element("body");
> body.addContent(new Element("p").setText("" © blah blah"));
> root.addContent(body);
> String newItem = args[0];
> XMLOutputter outputter = new XMLOutputter(" ", true);
> outputter.setTextNormalize(false);
> try {
> outputter.output(doc, new FileWriter((newItem+".html")));
> } catch(Exception e) { System.err.println(e.getMessage());}
> }
> }
>
> (I apologise for the crapness of it, I quickly created it)
> Which is some program, that could perhaps be used to output
> templates for some html page, or more realistically include input
> from some other XML file. Executing this like this:
>
> java test test
>
> produces the output file test.html:
>
> <?xml version="1.0" encoding="UTF-8"?>
> <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"
> "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
> <html>
> <head>
> <title>Blah</title>
> </head>
> <body>
> <p>&quot; &#169; blah blah</p>
> </body>
> </html>
>
>>JDOM at all. They are expanded by the parser before JDOM ever "sees"
>>
> them. So
>
>>to keep your original character entities intact, you would have to address
>>
> this
>
>>(in some way that I can't answer) by tweaking the parser you use.
>>
>
> Ok, so it was my parser expanding the character entities but...
>
> Amphersands have been expanded to "&", why is this?
>
> --SNIP--
>
>>>That's overstating it a bit, no? He's asking for a particular one of two
>>>forms that are completely equivalent in XML's eyes, right?
>>>
> --SNIP--
>
> This is a very good point, if they ARE equivalent then there should be the
> option to output either form.
>
> --SNIP--
>
>>>misunderstanding of XML. But there are certainly reasonable cases where
>>>something else might care, and you might want to have control over this
>>>(irrespective of this particular case).
>>>
> --SNIP--
>
> Definately. But it seems as though the only case is the amphersand. Is this
> right?
>
> How can I output an amphersand verbatim?
>
> Regards
>
> Empty
>
> _______________________________________________
> To control your jdom-interest membership:
> http://lists.denveronline.net/mailman/options/jdom-interest/youraddr@yourhost.com
>
>
--
Paul Chapman
Email: chapman at zemsys.com
Mobile: +61 418 340 935
More information about the jdom-interest
mailing list