[jdom-interest] Turning of entity expansion
Paul Chapman
chapman at zemsys.com
Wed Sep 4 00:45:46 PDT 2002
Firstly the Text object is trying to be helpful by allowing you
to insert _any_ text content without it being misinterpretted as
XML directives. This if you wanted to output:
<test> x < y </test>
It helpfully makes it valid XML:
<text> x < y </test>
Ditto for the & and the > characters.
The question that now arises is: why you want to dump predefined XML
content into your output? Where is it coming from? Could you perhaps
build this into JDOM too?
For example: you could build JDOM from that text and then insert your
mini-JDOM tree into your overall structure.
Is this any help?
-Paul.
ion wrote:
>>OK, so JDom has helpfully converted a character (the &) that could
>>be confused with an XML reserved character(<, >...) into & for you.
>>This is normally what you would want, so I doubt it can be turned off.
>>
>>JDOM does not know that © is already encoded for XML, so it tries
>>to do it for you.
>>
>>This comes back to your original comment:
>>
>> > >When I look at the output my Unicode reference has been
>> > >changed into the actual character, which I do not want, I want
>> > >this line to be output verbatim.
>>
>>So, why is the actual character not acceptible? I am not saying you
>>are right or wrong to want the original character, I am trying to
>>ascertain the reason why the translated character is not acceptible
>>to you. The copyright symbol appears quite happily in my browser
>>when I use it. Like this: ©
>>
>
> certain xml validators complained, but i guess this is no biggie, I will
> use it like that from now on, but more importantly I have now discovered
> the source of my problem, not being nable to output '&' as '&' and not
> "&", '<' as '<' instead of "<" and '>' as '>' instead of ">".
>
> Maybe I am just ignorant of the correct way to
> do this, how would one go about inserting inline elements?
>
> Would it not be easier to be able to allow the verbatim insertion of text
> sections so that one could more efficiently include sections of XML that
> one knows to be valid?
>
> I would have thought that perhaps the Text object could have provided
> this functionality?
>
> Regards
>
> Empty
>
>>-Paul.
>>
>>ion wrote:
>>
>>
>>>Here is an example, consider the following simple program:
>>>
>>>import java.io.*; import org.jdom.*;
>>>import org.jdom.input.*; import org.jdom.output.*;
>>>public class test {
>>> public static void main(String args[]) {
>>> Document doc = new Document(new Element("html"));
>>> DocType docType = new DocType("html", "-//W3C//DTD XHTML 1.0
>>>Transitional//EN",
>>>
>>>"http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd");
>>> doc.setDocType(docType);
>>> Element root = doc.getRootElement();
>>> Element head = new Element("head");
>>> head.addContent(new Element("title").setText("Blah"));
>>> root.addContent(head);
>>> Element body = new Element("body");
>>> body.addContent(new Element("p").setText("" © blah
>>>
> blah"));
>
>>> root.addContent(body);
>>> String newItem = args[0];
>>> XMLOutputter outputter = new XMLOutputter(" ", true);
>>> outputter.setTextNormalize(false);
>>> try {
>>> outputter.output(doc, new FileWriter((newItem+".html")));
>>> } catch(Exception e) { System.err.println(e.getMessage());}
>>> }
>>>}
>>>
>>>(I apologise for the crapness of it, I quickly created it)
>>>Which is some program, that could perhaps be used to output
>>>templates for some html page, or more realistically include input
>>>from some other XML file. Executing this like this:
>>>
>>>java test test
>>>
>>>produces the output file test.html:
>>>
>>><?xml version="1.0" encoding="UTF-8"?>
>>><!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"
>>>"http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
>>><html>
>>> <head>
>>> <title>Blah</title>
>>> </head>
>>> <body>
>>> <p>&quot; &#169; blah blah</p>
>>> </body>
>>></html>
>>>
>>>>JDOM at all. They are expanded by the parser before JDOM ever "sees"
>>>>
>>>>
>>>them. So
>>>
>>>
>>>>to keep your original character entities intact, you would have to
>>>>
> address
>
>>>this
>>>
>>>
>>>>(in some way that I can't answer) by tweaking the parser you use.
>>>>
>>>>
>>>Ok, so it was my parser expanding the character entities but...
>>>
>>>Amphersands have been expanded to "&", why is this?
>>>
>>>--SNIP--
>>>
>>>
>>>>>That's overstating it a bit, no? He's asking for a particular one of
>>>>>
> two
>
>>>>>forms that are completely equivalent in XML's eyes, right?
>>>>>
>>>>>
>>>--SNIP--
>>>
>>>This is a very good point, if they ARE equivalent then there should be
>>>
> the
>
>>>option to output either form.
>>>
>>>--SNIP--
>>>
>>>
>>>>>misunderstanding of XML. But there are certainly reasonable cases where
>>>>>something else might care, and you might want to have control over this
>>>>>(irrespective of this particular case).
>>>>>
>>>>>
>>>--SNIP--
>>>
>>>Definately. But it seems as though the only case is the amphersand. Is
>>>
> this
>
>>>right?
>>>
>>>How can I output an amphersand verbatim?
>>>
>>>Regards
>>>
>>>Empty
>>>
>>>_______________________________________________
>>>To control your jdom-interest membership:
>>>
>>>
> http://lists.denveronline.net/mailman/options/jdom-interest/youraddr@yourhos
> t.com
>
>>>
>>
>>--
>>
>>Paul Chapman
>>
>>Email: chapman at zemsys.com
>>Mobile: +61 418 340 935
>>
>>
>
>
--
Paul Chapman
Email: chapman at zemsys.com
Mobile: +61 418 340 935
More information about the jdom-interest
mailing list