From jdom at tuis.net  Sun Jan  1 16:57:33 2012
From: jdom at tuis.net (Rolf)
Date: Sun, 01 Jan 2012 19:57:33 -0500
Subject: [jdom-interest] JDOM 2 Alpha release
Message-ID: <4F0100FD.4020107@tuis.net>

Hi all and Happy New Year!

I have just uploaded a new JDOM2 'package' jdom-2.x-2012.01.01.19.15 
which I am designating as the JDOM2 "Alpha" Release.

Find it here: https://github.com/hunterhacker/jdom/downloads

The JDOM2 pages have been updated to match the JavaDoc API, Code 
coverage, and Unit-test results as well. See the 'entry' page here:

https://github.com/hunterhacker/jdom/wiki/JDOM-2.0#wiki-links

There are eleven 'issues' currently outstanding. None of them are bugs 
in the core functionality of JDOM. In other words, this JDOM2 Alpha 
release has no known bugs.

Please help with this and ensure we discover any gremlins sooner rather 
than later.

It is my expectation that for regular users there will be very few 
'interface' changes between now and JDOM2 final release. There may be 
some 'transparent' extensions to the API, and there may/will be changes 
to the 'sub-classing' API, so if you have custom sub-classes of JDOM 
code then you will probably want to pay special attention.

If you *do* have sub-classes of JDOM code now is a very important time 
to test JDOM2 to see if your code will break, and how JDOM2 can best be 
adapted/fixed to continue to support your custom requirements.

To create some form of 'deadline' for JDOM2 I intend to (provisionally):

- 2 Feb
     GroundHog Day!
     all current issues resolved - submit any issues to the mailing list 
if you encounter any.
     Deadline for new feature requests/enhancements - mail the list if 
you have any.


- 14 Feb
     'Valentine' *BETA* Release on 14th February - may shift depending 
on any large enhancements/requests.

- 29 Feb
     'Leap Day' Second *BETA* - All class/method signatures 'locked' Bug 
Fixing only

- 9 Apr
     'Easter' JDOM2 Release


So, please get playing with JDOM2, if you don't provide feedback in this 
time period there's a good chance there will not be an opportunity later 
to get that 'sweet' feature in that you want.


Please Note
===========

I believe this release is 'stable' in the sense that the code is fully 
functional. I believe that while there may be bugs, the code is 
generally in good condition, and it can be trusted to do 'the right 
thing' with nearly as much confidence as JDOM 1.1.2.

This is an alpha release though, and the expectation is that there will 
be some issues with the code, and I fully expect there to be small 
changes to some method/interface calls as the need arises.


Happy Coding!

Rolf

From jdom at tuis.net  Mon Jan  2 12:27:40 2012
From: jdom at tuis.net (Rolf)
Date: Mon, 02 Jan 2012 15:27:40 -0500
Subject: [jdom-interest] JDOM and memory
Message-ID: <4F02133C.5010704@tuis.net>

Hi all.

Memory optimization has never been a top priority for JDOM. At the same 
time, for what it does, JDOM is not a 'terrible' memory user. Still, I 
have done some analysis, and, I believe I can trim about a quarter to a 
half of 'JDOM Overhead' memory usage by making two 'simple' changes....

The first is to merge the ContentList class in to the Element class (and 
also in to Document). This will reduce the number of Java objects by 
about half, and that will save about 32 bytes per Element at a minimum 
in a 64-bit JRE. Additionally, by lazy-initialization of the Content 
array, we can save memory on otherwise 'empty' Elements.

This can be done by extending the Element (and perhaps Document) class 
to extend 'List'. It can all be done in a 'backward compatible' way, but 
also leads to some interesting possibilities, like:

   for (Content c : element) {
      ... do something
   }

(for backward compatibility, Element.getContent() will return 'this').


The second change is to make the AttributeList instance in Element a 
lazy-initialization. This would save memory on all Elements that have no 
attributes, but would have an impact for people who sub-class the 
Element class and may expect the attributes field to be non-null.


I am trying to get a feel for how important this sort of optimization 
may be. If there is interest then I will make some changes, and test the 
impact. I may make a separate branch in github to test it out....

If the above changes are unrealistic then I don't think it makes sense 
to even try....

Rolf

From jdom at tuis.net  Tue Jan  3 05:22:39 2012
From: jdom at tuis.net (Rolf Lear)
Date: Tue, 03 Jan 2012 08:22:39 -0500
Subject: [jdom-interest] Maven build
Message-ID: <95d7ddad8eb3087da3ba87cb56c5a785@tuis.net>


Hi all.

I am going to start playing with the concept of loading the 'snapshot'
builds up on to maven central as 'SNAPSHOT' type builds. This is to ensure
I get some 'practice' before the final JDOM2.0 release.

If you are currently using maven to load your JDOM 1.x jars you should
ensure that you set your maven version dependencies correctly so that you
do not start pulling any JDOM 2.x jars. My understanding is that if I label
the versions as SNAPSHOT then they should be ignored by you, but, for
everyone's peace of mind, in your 'real' development environments you
should restrict your dependencies to version 1.1.2 only

I expect to start 'playing' with this in the next week or so.

Rolf

From mike at saxonica.com  Wed Jan  4 15:05:12 2012
From: mike at saxonica.com (Michael Kay)
Date: Wed, 04 Jan 2012 23:05:12 +0000
Subject: [jdom-interest] XML Schema classification help
In-Reply-To: <CABhr9SsG4fJK7oFQRX8K36CHRoLkfApLc8JdY-h32neYHoYXRw@mail.gmail.com>
References: <CABhr9SsG4fJK7oFQRX8K36CHRoLkfApLc8JdY-h32neYHoYXRw@mail.gmail.com>
Message-ID: <4F04DB28.20409@saxonica.com>

On 04/01/2012 19:11, cliff palmer wrote:
> I need to examine XML documents contained in multiple columns in a 
> database table with over a million rows and identify each of the 
> different structures used for the XML data, producing a count if the 
> number of instances that use each structure.
>
> I thought of using the SAXParser then creating a list of the XML 
> headers in the order used and storing each unique list and 
> accumulating a count based on matching an already encountered list 
> object, but I am hoping there is a less cumbersome approach.
>
> I would appreciate any and all suggestions.
>
You've chosen an odd place to ask the question, since there's nothing 
specific in JDOM that will help you.

The key thing you need to do is to define what are the rules for your 
taxonomy. Presumably it's something more complex than categorizing 
documents by the name of their root element, or the namespaces they use. 
But presumably a document with four paragraphs and two images and one 
with five paragraphs and no images go in the same bucket. So what are 
the rules?

Michael Kay
Saxonica

From mike at saxonica.com  Wed Jan  4 16:13:15 2012
From: mike at saxonica.com (Michael Kay)
Date: Thu, 05 Jan 2012 00:13:15 +0000
Subject: [jdom-interest] XML Schema classification help
In-Reply-To: <4F04E4C2.7020708@tuis.net>
References: <CABhr9SsG4fJK7oFQRX8K36CHRoLkfApLc8JdY-h32neYHoYXRw@mail.gmail.com>
	<4F04AE51.2060104@tuis.net>
	<CABhr9SvABYbapTjaRMpQQVVv6Yowkb0XyBmvWCiesFu7aFBVzg@mail.gmail.com>
	<4F04E4C2.7020708@tuis.net>
Message-ID: <4F04EB1B.1050809@saxonica.com>


>
> Unfortunately (for you), this is not something that I think there is 
> an easy, or preexisting solution for (nothing comes to mind).
>
Well, there are a number of tools that generate a schema from an 
instance (including my own venerable DTDGenerator) but it's far from 
clear that two instances belong in the same bucket if and only if such a 
tool imputes the same schema for both instances.

Michael Kay
Saxonica

From palmercliff at gmail.com  Tue Jan 10 10:46:07 2012
From: palmercliff at gmail.com (cliff palmer)
Date: Tue, 10 Jan 2012 13:46:07 -0500
Subject: [jdom-interest] Finding XPath location for an Element
Message-ID: <CABhr9SuMPPJmewf16o8NEXiBD+Wi_QxS9tch+EJ1FPRFknoTuA@mail.gmail.com>

I'd like to be able to find the XPath search or the node hierarchy for
an Element.  For example, if the Element is <d> in:
<a>
    <b>
           <c>
                 <d> </d>
           </c>
     </b>
</a>

I'd like to have either the XPath search argument ("/a//b//c//d) or
the list of nodes in the elements parents ("a b c d").

Is there a method that returns this?

Cliff

From mj-lists at expertsystems.se  Tue Jan 10 11:01:37 2012
From: mj-lists at expertsystems.se (Mattias Jiderhamn)
Date: Tue, 10 Jan 2012 20:01:37 +0100
Subject: [jdom-interest]  Finding XPath location for an Element
Message-ID: <4F0C8B11.9060301@expertsystems.se>

while(node != null) {
   ... // Build XPath or list bottom up
   node = node.getParent();
}

</Mattias>

----- Original Message -----
Subject: [jdom-interest] Finding XPath location for an Element
Date: Tue, 10 Jan 2012 13:46:07 -0500
From: cliff palmer <palmercliff at gmail.com>

I'd like to be able to find the XPath search or the node hierarchy for
an Element. For example, if the Element is <d> in:
<a>
<b>
<c>
<d> </d>
</c>
</b>
</a>

I'd like to have either the XPath search argument ("/a//b//c//d) or
the list of nodes in the elements parents ("a b c d").

Is there a method that returns this?

Cliff
_______________________________________________
To control your jdom-interest membership:
http://www.jdom.org/mailman/options/jdom-interest/youraddr at yourhost.com


-- 

   </Mattias>	


From jdom at tuis.net  Tue Jan 10 11:07:06 2012
From: jdom at tuis.net (Rolf Lear)
Date: Tue, 10 Jan 2012 14:07:06 -0500
Subject: [jdom-interest] Finding XPath location for an Element
In-Reply-To: <CABhr9SuMPPJmewf16o8NEXiBD+Wi_QxS9tch+EJ1FPRFknoTuA@mail.gmail.com>
References: <CABhr9SuMPPJmewf16o8NEXiBD+Wi_QxS9tch+EJ1FPRFknoTuA@mail.gmail.com>
Message-ID: <6fd1973aec9fe8c2294c7727eba3b221@tuis.net>


Hi Cliff

No method 'native' to JDOM, but the code is simple (and you can 'season'
to taste...):

String xpath = "";
Element p = element;
while (p != null) {
  xpath = "/" + p.getName() + xpath;
  p = p.getParentElement();
}
System.out.println(xpath);

But, the problem is that this will get *all* 'd' Elements that have an
ancestry with the same XPath

Rolf

On Tue, 10 Jan 2012 13:46:07 -0500, cliff palmer <palmercliff at gmail.com>
wrote:
> I'd like to be able to find the XPath search or the node hierarchy for
> an Element.  For example, if the Element is <d> in:
> <a>
>     <b>
>            <c>
>                  <d> </d>
>            </c>
>      </b>
> </a>
> 
> I'd like to have either the XPath search argument ("/a//b//c//d) or
> the list of nodes in the elements parents ("a b c d").
> 
> Is there a method that returns this?
> 
> Cliff
> _______________________________________________
> To control your jdom-interest membership:
> http://www.jdom.org/mailman/options/jdom-interest/youraddr at yourhost.com

From paul at hoplahup.net  Tue Jan 10 11:37:32 2012
From: paul at hoplahup.net (Paul Libbrecht)
Date: Tue, 10 Jan 2012 20:37:32 +0100
Subject: [jdom-interest] Finding XPath location for an Element
In-Reply-To: <6fd1973aec9fe8c2294c7727eba3b221@tuis.net>
References: <CABhr9SuMPPJmewf16o8NEXiBD+Wi_QxS9tch+EJ1FPRFknoTuA@mail.gmail.com>
	<6fd1973aec9fe8c2294c7727eba3b221@tuis.net>
Message-ID: <8671B1F0-694B-4686-9E02-99F275EA7279@hoplahup.net>

Isn't the trick to compute the index as in b[0] ??
Lists of jdom using indexOf, elementA.getChildren('b').indexOf(elementB) are perfect for this.

paul


Le 10 janv. 2012 ? 20:07, Rolf Lear a ?crit :

> 
> Hi Cliff
> 
> No method 'native' to JDOM, but the code is simple (and you can 'season'
> to taste...):
> 
> String xpath = "";
> Element p = element;
> while (p != null) {
>  xpath = "/" + p.getName() + xpath;
>  p = p.getParentElement();
> }
> System.out.println(xpath);
> 
> But, the problem is that this will get *all* 'd' Elements that have an
> ancestry with the same XPath
> 
> Rolf
> 
> On Tue, 10 Jan 2012 13:46:07 -0500, cliff palmer <palmercliff at gmail.com>
> wrote:
>> I'd like to be able to find the XPath search or the node hierarchy for
>> an Element.  For example, if the Element is <d> in:
>> <a>
>>    <b>
>>           <c>
>>                 <d> </d>
>>           </c>
>>     </b>
>> </a>
>> 
>> I'd like to have either the XPath search argument ("/a//b//c//d) or
>> the list of nodes in the elements parents ("a b c d").
>> 
>> Is there a method that returns this?
>> 
>> Cliff
>> _______________________________________________
>> To control your jdom-interest membership:
>> http://www.jdom.org/mailman/options/jdom-interest/youraddr at yourhost.com
> _______________________________________________
> To control your jdom-interest membership:
> http://www.jdom.org/mailman/options/jdom-interest/youraddr at yourhost.com


From jdom at tuis.net  Tue Jan 10 12:14:05 2012
From: jdom at tuis.net (Rolf Lear)
Date: Tue, 10 Jan 2012 15:14:05 -0500
Subject: [jdom-interest] Finding XPath location for an Element
In-Reply-To: <8671B1F0-694B-4686-9E02-99F275EA7279@hoplahup.net>
References: <CABhr9SuMPPJmewf16o8NEXiBD+Wi_QxS9tch+EJ1FPRFknoTuA@mail.gmail.com>
	<6fd1973aec9fe8c2294c7727eba3b221@tuis.net>
	<8671B1F0-694B-4686-9E02-99F275EA7279@hoplahup.net>
Message-ID: <1f91a4bb672be2fc94376a9296c4e7d2@tuis.net>


Yes, you could do that and get an exact path to a particular element....

I had not thought it through as far as you, but the indexing is reasonable
(so is 'season to taste')....

But then you should probably also take it further again and ensure that
the namespace management is correct too....

but how would you set up the XPath references/links for an XPath query
with namespaces ... easily ... and in a 'general' way?

Rolf


On Tue, 10 Jan 2012 20:37:32 +0100, Paul Libbrecht <paul at hoplahup.net>
wrote:
> Isn't the trick to compute the index as in b[0] ??
> Lists of jdom using indexOf, elementA.getChildren('b').indexOf(elementB)
> are perfect for this.
> 
> paul
> 
> 
> Le 10 janv. 2012 ? 20:07, Rolf Lear a ?crit :
> 
>> 
>> Hi Cliff
>> 
>> No method 'native' to JDOM, but the code is simple (and you can
'season'
>> to taste...):
>> 
>> String xpath = "";
>> Element p = element;
>> while (p != null) {
>>  xpath = "/" + p.getName() + xpath;
>>  p = p.getParentElement();
>> }
>> System.out.println(xpath);
>> 
>> But, the problem is that this will get *all* 'd' Elements that have an
>> ancestry with the same XPath
>> 
>> Rolf
>> 
>> On Tue, 10 Jan 2012 13:46:07 -0500, cliff palmer
<palmercliff at gmail.com>
>> wrote:
>>> I'd like to be able to find the XPath search or the node hierarchy for
>>> an Element.  For example, if the Element is <d> in:
>>> <a>
>>>    <b>
>>>           <c>
>>>                 <d> </d>
>>>           </c>
>>>     </b>
>>> </a>
>>> 
>>> I'd like to have either the XPath search argument ("/a//b//c//d) or
>>> the list of nodes in the elements parents ("a b c d").
>>> 
>>> Is there a method that returns this?
>>> 
>>> Cliff
>>> _______________________________________________
>>> To control your jdom-interest membership:
>>>
http://www.jdom.org/mailman/options/jdom-interest/youraddr at yourhost.com
>> _______________________________________________
>> To control your jdom-interest membership:
>> http://www.jdom.org/mailman/options/jdom-interest/youraddr at yourhost.com

From jdom at tuis.net  Mon Jan 16 16:20:53 2012
From: jdom at tuis.net (Rolf Lear)
Date: Mon, 16 Jan 2012 19:20:53 -0500
Subject: [jdom-interest] JDOM and memory
In-Reply-To: <4F149866.50606@xerox.com>
References: <4F02133C.5010704@tuis.net> <4F149866.50606@xerox.com>
Message-ID: <4F14BEE5.6080501@tuis.net>

Hi Leigh

I am uncertain if I am missing something in whether your 
comments/suggestions are specifically related to memory improvement of 
JDOM2 (the subject line), or just general improvements. Reading your 
comments they seem to be unrelated to memory specifically, but more 
general performance/convenience. That's fine if it is, I just want to 
make sure I am not missing something...

Just to summarize your mail very briefly, you are addressing three 
areas: getChild*(), XPath, and Exceptions


getChild...()
=============

As for the getChild(...), getChildren(...) and getContent(Filter) 
methods. They all derive from the same concept ... create a FilterList 
on the underlying ContentList, and scan it for all available (or the 
first available for getChild(...) ) matching content.

JDOM2 already has overridden the 'inefficient' iterator (and 
listIterator) methods to provide a more efficient iterator (a 
significant improvement in performance over JDOM 1.x see 
http://hunterhacker.github.com/jdom/jdom2/performance.html and scroll 
down about half the page to 'First major performance cycle', compare the 
results table to the one below.... )

These improvements do *not* override the isEmpty() call though, and that 
should absolutely be overridden too. By default it compares size() == 0, 
and that would require a full scan of the underlying content, but 
iterator.hasNext() in JDOM2 only does a 'lazy' scan.

So, introduce issue #57, override isEmpty() on FilterList. Since 
ContentList has a fast size() method then there is no need to change 
ContentList.isEmpty(). I am trying to think of any other methods that 
would be slow? There is no way to avoid a full scan for FilterList.size()

So, in summary on the getChildren code... you should already be seeing 
improved performance on the getChildren() method calls with more 
efficient iterators, and soon the isEmpty will be even faster too.

If/when the ContentList 'moves' in to Element to save memory, these 
improvements will be preserved.


XPath
=====

In regards to the XPath I took notes from the XOM project which has the 
'query()' method on all nodes... so for example you can:

element.query(myxpath);

I had a hard look at it and it makes some sense to do something similar. 
Especially now in JDOM2 where XPath supports more than just Element and 
Document 'context' items.

The issue is that full XPath support requires both Namespace and 
'Variable' contexts (XOM does address the Namespace context). This would 
be hard to implement on a simple 'query' method. Additionally, XPaths 
are intended to be 'compiled' and 'reused'. The XOM 'query' 
implementation does not support the reuse of the XPath. The simple query 
method would have to be limited, but would still cover (sucks out of 
thin air) 95% of XPath use in JDOM I am sure.

So, the current XPath implementation in JDOM2 is able to do the full 
gamut of operation, but loses some convenience because you need to 
access it outside of the Element/Content.

I certainly feel that making XPath more accessible to JDOM content would 
be 'friendly', but I worry that it will breed performance problems if it 
is too easy... At the time I worked the JDOM2 XPath code I looked in to 
what it would take to extend the functionality in to the 'Content' area 
of JDOM (like XOM), but found there were more issues than can be 
resolved by a person working alone with limited XPath experience (me). I 
figured I would come back to it. Perhaps now is the time.

Still, taking your JDOMUtil examples:

 > JDOMUtil.selectElementChildren(element, xpath)
 > JDOMUtil.selectElement(element, xpath)
 > JDOMUtil.selectAttribute(element, xpath)
 > JDOMUtil.ref(Element element, String xpath, String defaultValue)

In JDOM2, these same concepts can be 'easily' obtained with:

Filters.element().filter(XPath.selectNodes(element, xpath));
... not sure what the selectElement() would do, but you get the idea.
Filters.attribute().filter(XPath.selectNodes(element, xpath));
... well, the 'defaultValue' would take a tweak....


Exceptions
==========

Interesting observation. I can see the benefit of a JDOM 'Runtime' 
exception in addition to JDOMException. There are a few places where it 
could be useful to indicate a programmatic issue that does not need to 
be explicitly thrown/caught. XPath library is a good example.

I'll think some more on that... see if I can see a problem with 
introducing JDOMRuntimeException...... and see what other places it 
would possibly make sense.


So, thanks for the comments. If there's anything I missed, 
misunderstood, or needs attention, please don't hesitate!

Rolf

On 16/01/2012 4:36 PM, Leigh L Klotz Jr wrote:
> I'm currently evaluating the alpha of JDOM2. Most of the problems I've
> found with JDOM and Java 6 have been fixed in a utility class I have
> called JDOMUtil. A good deal of the methods in there are handling
> generic types,
>
> As for the question below, I don't often have the use case of for()
> iterating over, element.getContent(), but I do often iterate over the
> following:
> element.getChildren()
> element.getChildren(name)
> element.getChildren().isEmpty() as a surrogate for element.hasChildren()
>
> You could have Element.getContent() return a List implementation of your
> own, and make the Iterable.iterate() method in it (which is what for()
> calls) be efficient. That might also make element.getChildren.hasNext be
> efficient, or you could implement isEmpty directly.
>
> For JDOMUtil, I often use these:
> JDOMUtil.selectElementChildren(element, xpath)
> JDOMUtil.selectElement(element, xpath)
> JDOMUtil.selectAttribute(element, xpath)
> JDOMUtil.ref(Element element, String xpath, String defaultValue)
>
> The JDOMUtil.ref(Element element, String xpath, String defaultValue)
> method returns either the leaf-node value of the XPath expression, or
> the defaultValue if the nodeset is empty.
>
> I've also wrapped every one of the JDOMUtil XPath calls with something
> that throws a RuntimeException wrapper for JDOMException, and I let pass
> JDOMException and IOException only on serialization and parsing
> utilities. I believe that checked exceptions for XPath errors are a
> detraction from the simplicity of JDOM. XPath exceptions are always
> internal programming errors, and it is the rare case where they can be
> corrected at the point of invocation. Parsing and IO exceptions can come
> from external system interaction and can reasonably be expected to be
> correctable in point source code.
>
> Leigh.
>


From thomas.scheffler at uni-jena.de  Tue Jan 17 00:10:00 2012
From: thomas.scheffler at uni-jena.de (Thomas Scheffler)
Date: Tue, 17 Jan 2012 09:10:00 +0100
Subject: [jdom-interest] suggested JDOM2 improvements
Message-ID: <4F152CD8.5030508@uni-jena.de>

Hi,

first I want to thank all on working on JDOM2. While going over the 
Javadocs I noticed some issues and got some ideas I want to share.

When creating a XPath instance, the instance should be unmodifiable, e.g.

remove setNamespace() methods and use

XPathFactory.newInstance().compile(String xpath, Namespace... namespaces)

One thing that is left then is variables and XPath instances should be 
threadsafe then. One way to achieve this would be to create a 
XPathVariable class and use var-args on selectNodes:

xPath.selectNodes(NamespaceAware context, XPathVariable... variables)

Then you can improve the XPathFactory on using a weak cache that always 
returns the same instance. This would not only allow to share a XPath 
instance across multiple threads but also decrease memory consumption.

----

What I would take into consideration is allow generics in XPath, e.g.

XPath<Element> test=XPath.newInstance("/foo/bar", SOME_ELEMENT_HINT);
XPath<Attribute> test2=XPath.newInstance("/foo/@bar", SOME_ATTRIBUTE_HINT);

Or if you do not want this, you can return <? extends NamespaceAware> by 
default.

----

One other thing I noticed is the practice of making JDOMConstants an 
interface. Usually interface means something like

if (o instanceof (JDOMConstants)){
	((JDOMConstance)o).doSomething();
}

It would be "better" code to make JDOMConstants a final class with 
private constructor and use "import static JDOMConstants.*" where you 
need it. That would not result in such statements: "Element _is a_ 
JDOMConstants".

----

And before starting another mail, please count my vote on moving 
ContentList into Element.

I'm really looking forward to JDOM2 release.

regards Thomas

-- 
Thomas Scheffler
Friedrich-Schiller-Universit?t Jena
Th?ringer Universit?ts- und Landesbibliothek
Bibliotheksplatz 2
07743 Jena
Phone: ++49 3641 940027
FAX:   ++49 3641 940022

From jdom at tuis.net  Tue Jan 17 05:42:59 2012
From: jdom at tuis.net (Rolf Lear)
Date: Tue, 17 Jan 2012 08:42:59 -0500
Subject: [jdom-interest] suggested JDOM2 improvements
In-Reply-To: <4F152CD8.5030508@uni-jena.de>
References: <4F152CD8.5030508@uni-jena.de>
Message-ID: <b368618ea79598ee225d75c7d6e82e1d@tuis.net>


Hi Thomas.

Interesting Feedback. A lot to respond to... not complaining ;-)


First, the easy thing: XPath and thread-safety.... it will never happen.
There's just too much to 'require', not the least of which would be that
all of JDOM would need to be thread-safe. For example, someone modifying an
Element's content in one thread while that same Element is being queried
(XPath) in another. Some things would make sense to be Thread-safe
(Namespace class is...), but in the case of XPath it would just never
happen. Additionally, our default XPath 'engine' Jaxen makes no claims
about being thread-safe.

If thread-safety is not an 'intrinsic' property of XPath, then there is no
real sense in making it 'immutable' if it removes 'convenience'.

In the 'simple' XPath case (no extra namespaces, no Variables) XPath is
still a 1-liner, which is hard to beat. In a complicated case I see more
complexity trying to 'massage' your Namespaces and Variables in to some new
type structures (the varargs) than the existing concept of adding
Namespaces and setting variables.

Also, keeping backward compatibility is a strong consideration.

In reality I think I would like to see code examples of what you think it
'should' look like to get a better idea, but at the moment i am not
convinced that it's actually broken enough to require fixing.


Right, what about the generics and XPath return types....? Well, this is a
complicated one, and I thought about it hard. The problem boils down to the
fact that XPath expressions can return Boolean, Double, and String in
addition to whatever JDOM nodes are selected. There is no common 'base' to
selectNodes results other than 'Object'. Really!

This means that XPath has to be *able* to return List<?> (but not
necessarily always). There is no option.

This problem is what inspired a lot of the Filters class, because the
List<?> return type is not convenient, yet it can be coerced into something
that *is* convenient. The Filter instances do full type (and other)
checking on the values in the List and not only re-casts the generic type
of the result, but it also 'silently' removes any content that cannot be
coerced.

Thus my intention was that people would do things like:

XPathFactory xpfactory = XPathFactory.newInstance();
XPath xpath = xpfactory.compile("//*");
List<Element> nodes =
Filters.element().filter(xpath.selectNodes(document));

I can see that this model could be modified somewhat to put the filter in
at the XPath compile time to become something like:

XPathFactory xpfactory = XPathFactory.newInstance();
XPath<Element> xpath = xpfactory.compile("//*", Filter.element());
List<Element> nodes = xpath.selectNodes(document);

I think that is a valuable modification, and it is nice because the
compile(String) would return XPath<?>, and the compile(String, Filter<E>)
would return XPath<E>. This would all still be backward compatible with
JDOM 1.x

Filed issue #58


JDOMConstants. Hmmm, I think that shows my 'age'. It is a kick-back to
when import-static was not available. Old habits and so on. Point taken ...
;-). It should be an easy change. Filed Issue #59


Finally, the ContentList in Element. I am getting to it.... doing some
tidy-up first. Javadoc mostly. This will be a relatively big change, and
impacts all 'custom' JDOM implementations. It is not a certainty yet for
JDOM2.


Thanks for the feedback. Appreciate it!

Rolf


On Tue, 17 Jan 2012 09:10:00 +0100, Thomas Scheffler
<thomas.scheffler at uni-jena.de> wrote:
> Hi,
> 
> first I want to thank all on working on JDOM2. While going over the 
> Javadocs I noticed some issues and got some ideas I want to share.
> 
> When creating a XPath instance, the instance should be unmodifiable,
e.g.
> 
> remove setNamespace() methods and use
> 
> XPathFactory.newInstance().compile(String xpath, Namespace...
namespaces)
> 
> One thing that is left then is variables and XPath instances should be 
> threadsafe then. One way to achieve this would be to create a 
> XPathVariable class and use var-args on selectNodes:
> 
> xPath.selectNodes(NamespaceAware context, XPathVariable... variables)
> 
> Then you can improve the XPathFactory on using a weak cache that always 
> returns the same instance. This would not only allow to share a XPath 
> instance across multiple threads but also decrease memory consumption.
> 
> ----
> 
> What I would take into consideration is allow generics in XPath, e.g.
> 
> XPath<Element> test=XPath.newInstance("/foo/bar", SOME_ELEMENT_HINT);
> XPath<Attribute> test2=XPath.newInstance("/foo/@bar",
SOME_ATTRIBUTE_HINT);
> 
> Or if you do not want this, you can return <? extends NamespaceAware> by

> default.
> 
> ----
> 
> One other thing I noticed is the practice of making JDOMConstants an 
> interface. Usually interface means something like
> 
> if (o instanceof (JDOMConstants)){
> 	((JDOMConstance)o).doSomething();
> }
> 
> It would be "better" code to make JDOMConstants a final class with 
> private constructor and use "import static JDOMConstants.*" where you 
> need it. That would not result in such statements: "Element _is a_ 
> JDOMConstants".
> 
> ----
> 
> And before starting another mail, please count my vote on moving 
> ContentList into Element.
> 
> I'm really looking forward to JDOM2 release.
> 
> regards Thomas

From jdom at tuis.net  Tue Jan 17 19:31:10 2012
From: jdom at tuis.net (Rolf Lear)
Date: Tue, 17 Jan 2012 22:31:10 -0500
Subject: [jdom-interest] JDOM2 and Runtime Exceptions
Message-ID: <4F163CFE.4030209@tuis.net>

Hi all.

Recent discussions have highlighted the area of how JDOM handles some 
exceptions. In particular the context was XPath expressions. JDOM 
specifies (and 'always' has specified) that XPath throws JDOMException 
in the event of a failure on XPath. This has been 'questioned' from the 
perspective that this would not be the fault of JDOM if the XPath 
expression failed to compile, or evaluate.

Exceptions that are outside the control of the programmer, like 
IOException, should be thrown and caught, but an illegal XPath is more 
of a bug/programming error than an Exception, and hence should be 
treated more like a NullPointerException, IllegalArgumentException, 
IndexOutOfBoundsException, etc.

Certainly it is 'ugly' to have to try/catch even the simplest XPath 
expressions:

List<?> nodes = null;
try {
   nodes = XPath.selectNodes(document, "//tag");
} catch (JDOMException e) {
   // handle it somehow
   ...
}
// do something with nodes.

This would all be much simpler if the code throws a RuntimeException 
instead:

List<?> nodes = XPath.selectNodes(document, "//tag");


So, having used XPath as one example, I can then extrapolate the issue 
in to other general areas (sticking with concepts that are 'old' - in 
JDOM as well as JDOM2 - JDOM2 has additional areas of concern):
1. SAXOutputter throws JDOMExcepion on all it's calls because it traps 
SAXException from the output target: 
http://jdom.org/docs/apidocs/org/jdom/output/SAXOutputter.html#output%28org.jdom.Document%29
2. DOMOutputter throws JDOMException to wrap 
ParserConfigurationException from Java's DocumentBuilder.
3. XSLTransform throws a subclass of JDOMException.

Interestingly, XMLOutputter throws IOException, but not JDOMException.


Taking the issue to an abstract level, there are a number of places 
where JDOM throws the checked exception JDOMException, and that 
exception requires cumbersome handling in situations where unchecked 
exceptions would (potentially) be a better choice.


There are a number issues at stake here though:

1. In JDOM the JDOMException is specified ( 
http://jdom.org/docs/apidocs/org/jdom/JDOMException.html ) as being the 
'top level Exception JDOM classes can throw'. But that's already *not* 
true. We have had all sorts of runtime exceptions thrown from various 
classes like 'Element' which throws IlleglNameException from it's 
constructor... So, should JDOMException be redefined to be JDOM-specific 
problems only?

2. Where is the 'line'? Should SAXOutputter throw SAXException instead 
of JDOMException (like XMLOutputter throws IOException not 
JDOMException)? Should SAXOutputter throw some new RuntimeException 
instead? How could the 'system' be described so that this inconsistency 
of exceptions is better controlled?

3. It creates a major backward-compatibility issue to remove the 'throws 
JDOMException' from methods. Existing code that does:

try {
   nodes = XPath.selectNodes(document, "//tag");
} catch (JDOMException jde) {
   // handle it somehow
   ...
}

Fails to compile with:

     [javac] 
....\src\java\org\jdom2\test\cases\xpath\AbstractTestXPath.java:595: 
exception org.jdom2.JDOMException is never thrown in body of 
corresponding try statement
     [javac] 		} catch (JDOMException jde) {
     [javac] 		  ^
     [javac] 1 error


I have been playing with the code anyway, and I like the looks of the 
results of replacing 'strategic' JDOMExceptions with a runtime 
Exception. For example, I created a new unchecked JDOMRuntimeException 
class. From this class I created two subclasses: XPathCompileException 
and XPathEvaluationException. I made all the code 'work' nicely with 
these exceptions and the code looks very clean.

Backward compatibility is 'screwed' though, but somewhat mitigated by 
the fact that 'old' code can be modified from:

    ...
} catch (JDOMException jde) {
    ...


to

    ...
} catch (JDOMRuntimeException jde) {
    ...

Alternatively, depending on the actual exception handling, the try/catch 
can be completely removed and handling can be cascaded up to a higher 
point....


Apart from renaming all the packages to org.jdom2, this would be the 
most significant migration problem for any users of JDOM/JDOM2. 
Documenting it as a migration issue should be relatively easy, but the 
fix would not be a pure search/replace, but the exceptions would have to 
be identified and fixed individually.

Admittedly in a tool like eclipse, it is quite easy to put 'Runtime' in 
your copy/paste buffer, and go from one compile problem to the next 
simply looking for the 'unreachable code' problem and adding the 
'Runtime' to the middle of 'JDOMException'.


Sorry for the long mail, but this is a 'feature' which could make JDOM2 
much easier to work with, but would certainly make a migration from JDOM 
more complicated.


Would love some thoughts on this....

Rolf

From mike at saxonica.com  Wed Jan 18 01:12:23 2012
From: mike at saxonica.com (Michael Kay)
Date: Wed, 18 Jan 2012 09:12:23 +0000
Subject: [jdom-interest] JDOM2 and Runtime Exceptions
In-Reply-To: <4F163CFE.4030209@tuis.net>
References: <4F163CFE.4030209@tuis.net>
Message-ID: <4F168CF7.2000706@saxonica.com>

On 18/01/2012 03:31, Rolf Lear wrote:
> Hi all.
>
> Recent discussions have highlighted the area of how JDOM handles some 
> exceptions. In particular the context was XPath expressions. JDOM 
> specifies (and 'always' has specified) that XPath throws JDOMException 
> in the event of a failure on XPath. This has been 'questioned' from 
> the perspective that this would not be the fault of JDOM if the XPath 
> expression failed to compile, or evaluate.
If A calls B, and B calls C, and C fails, I think it's very much an open 
question whether B should attempt to translate/interpret any errors 
coming from C before passing them back to A. To some extent it depends 
on the level of transparency - if it's obvious to A that the request 
will involve a call on C, then perhaps passing back C's exception 
unchanged is reasonable. But if B wants to encapsulate C, and have 
flexibility to choose different service suppliers (e.g. to call D 
instead of calling C), then it's tough on A to pass back an exception 
from a component it didn't know was involved. Might JDOM ever choose to 
invoke a different XPath provider, or to include its own XPath engine? 
For example, one that supports XPath 2.0? In that case, exposing 
third-party exceptions would be an embarrassment.
>
>
> Exceptions that are outside the control of the programmer, like 
> IOException, should be thrown and caught, but an illegal XPath is more 
> of a bug/programming error than an Exception, and hence should be 
> treated more like a NullPointerException, IllegalArgumentException, 
> IndexOutOfBoundsException, etc.
Again this is an open question. URISyntaxException is very similar to a 
compile-time XPath exception in this regard, and that is a checked 
exception (and yes, it can be a pain). On the other hand 
PatternSyntaxException is unchecked. There's no logical reason to make 
them different.

I'm one of those who believes that the discipline and extra effort 
caused by having to think about exceptions makes for better engineered 
and more robust programs. I hate C# from this perspective; you never 
know whether you have tested the exception handling code in your 
application adequately. Similarly StAX is a mess from the exception 
handling point of view - Sax, where every method can throw SAXException, 
is much easier to work with.

>
>
> Would love some thoughts on this....
>
>
I don't think you'll please everyone here, but even without the 
compatibility implications, I'm not convinced that moving to unchecked 
exceptions would be an improvement.


From noel at peralex.com  Wed Jan 18 03:38:44 2012
From: noel at peralex.com (Noel Grandin)
Date: Wed, 18 Jan 2012 13:38:44 +0200
Subject: [jdom-interest] JDOM2 and Runtime Exceptions
In-Reply-To: <4F163CFE.4030209@tuis.net>
References: <4F163CFE.4030209@tuis.net>
Message-ID: <4F16AF44.6090400@peralex.com>

I agree that programming errors should throw something that extends 
RuntimeException.

If you're going to make a change like that, JDOM2 is the right time to 
do it :-)

Regards, Noel Grandin

On 2012-01-18 05:31, Rolf Lear wrote:
> Hi all.
>
> Recent discussions have highlighted the area of how JDOM handles some 
> exceptions. In particular the context was XPath expressions. JDOM 
> specifies (and 'always' has specified) that XPath throws JDOMException 
> in the event of a failure on XPath. This has been 'questioned' from 
> the perspective that this would not be the fault of JDOM if the XPath 
> expression failed to compile, or evaluate.
>
> Exceptions that are outside the control of the programmer, like 
> IOException, should be thrown and caught, but an illegal XPath is more 
> of a bug/programming error than an Exception, and hence should be 
> treated more like a NullPointerException, IllegalArgumentException, 
> IndexOutOfBoundsException, etc.
>
> Certainly it is 'ugly' to have to try/catch even the simplest XPath 
> expressions:
>
> List<?> nodes = null;
> try {
>   nodes = XPath.selectNodes(document, "//tag");
> } catch (JDOMException e) {
>   // handle it somehow
>   ...
> }
> // do something with nodes.
>
> This would all be much simpler if the code throws a RuntimeException 
> instead:
>
> List<?> nodes = XPath.selectNodes(document, "//tag");
>
>
>
> So, having used XPath as one example, I can then extrapolate the issue 
> in to other general areas (sticking with concepts that are 'old' - in 
> JDOM as well as JDOM2 - JDOM2 has additional areas of concern):
> 1. SAXOutputter throws JDOMExcepion on all it's calls because it traps 
> SAXException from the output target: 
> http://jdom.org/docs/apidocs/org/jdom/output/SAXOutputter.html#output%28org.jdom.Document%29
> 2. DOMOutputter throws JDOMException to wrap 
> ParserConfigurationException from Java's DocumentBuilder.
> 3. XSLTransform throws a subclass of JDOMException.
>
> Interestingly, XMLOutputter throws IOException, but not JDOMException.
>
>
> Taking the issue to an abstract level, there are a number of places 
> where JDOM throws the checked exception JDOMException, and that 
> exception requires cumbersome handling in situations where unchecked 
> exceptions would (potentially) be a better choice.
>
>
> There are a number issues at stake here though:
>
> 1. In JDOM the JDOMException is specified ( 
> http://jdom.org/docs/apidocs/org/jdom/JDOMException.html ) as being 
> the 'top level Exception JDOM classes can throw'. But that's already 
> *not* true. We have had all sorts of runtime exceptions thrown from 
> various classes like 'Element' which throws IlleglNameException from 
> it's constructor... So, should JDOMException be redefined to be 
> JDOM-specific problems only?
>
> 2. Where is the 'line'? Should SAXOutputter throw SAXException instead 
> of JDOMException (like XMLOutputter throws IOException not 
> JDOMException)? Should SAXOutputter throw some new RuntimeException 
> instead? How could the 'system' be described so that this 
> inconsistency of exceptions is better controlled?
>
> 3. It creates a major backward-compatibility issue to remove the 
> 'throws JDOMException' from methods. Existing code that does:
>
> try {
>   nodes = XPath.selectNodes(document, "//tag");
> } catch (JDOMException jde) {
>   // handle it somehow
>   ...
> }
>
> Fails to compile with:
>
>     [javac] 
> ....\src\java\org\jdom2\test\cases\xpath\AbstractTestXPath.java:595: 
> exception org.jdom2.JDOMException is never thrown in body of 
> corresponding try statement
>     [javac]         } catch (JDOMException jde) {
>     [javac]           ^
>     [javac] 1 error
>
>
>
>
> I have been playing with the code anyway, and I like the looks of the 
> results of replacing 'strategic' JDOMExceptions with a runtime 
> Exception. For example, I created a new unchecked JDOMRuntimeException 
> class. From this class I created two subclasses: XPathCompileException 
> and XPathEvaluationException. I made all the code 'work' nicely with 
> these exceptions and the code looks very clean.
>
> Backward compatibility is 'screwed' though, but somewhat mitigated by 
> the fact that 'old' code can be modified from:
>
>    ...
> } catch (JDOMException jde) {
>    ...
>
>
> to
>
>    ...
> } catch (JDOMRuntimeException jde) {
>    ...
>
> Alternatively, depending on the actual exception handling, the 
> try/catch can be completely removed and handling can be cascaded up to 
> a higher point....
>
>
> Apart from renaming all the packages to org.jdom2, this would be the 
> most significant migration problem for any users of JDOM/JDOM2. 
> Documenting it as a migration issue should be relatively easy, but the 
> fix would not be a pure search/replace, but the exceptions would have 
> to be identified and fixed individually.
>
> Admittedly in a tool like eclipse, it is quite easy to put 'Runtime' 
> in your copy/paste buffer, and go from one compile problem to the next 
> simply looking for the 'unreachable code' problem and adding the 
> 'Runtime' to the middle of 'JDOMException'.
>
>
>
> Sorry for the long mail, but this is a 'feature' which could make 
> JDOM2 much easier to work with, but would certainly make a migration 
> from JDOM more complicated.
>
>
> Would love some thoughts on this....
>
> Rolf
> _______________________________________________
> To control your jdom-interest membership:
> http://www.jdom.org/mailman/options/jdom-interest/youraddr at yourhost.com
>

Disclaimer: http://www.peralex.com/disclaimer.html


From jdom at tuis.net  Wed Jan 18 17:03:54 2012
From: jdom at tuis.net (Rolf Lear)
Date: Wed, 18 Jan 2012 20:03:54 -0500
Subject: [jdom-interest] JDOM2 and Runtime Exceptions
In-Reply-To: <4F163CFE.4030209@tuis.net>
References: <4F163CFE.4030209@tuis.net>
Message-ID: <4F176BFA.9060505@tuis.net>

Hi all.

This issue has been nagging at me. I finally pulled out my copy of 
'Effective Java'.

Quoting some sections (Item 58):

 > Use checked exceptions for conditions from which the caller can 
reasonably be expected to recover. Each checked exception ... is 
therefore a potent indication to the API user that the associated 
condition is a possible outcome. [this] presents a mandate [for the API 
user] to recover from the condition.

 > If a program throws an unchecked exception ... it is generally the 
case that recovery is impossible and continued execution would do more 
harm than good. Use runtime exceptions to indicate programming errors. 
The great majority of runtime exceptions indicate precondition 
violations. Precondition violation is simply a failure by the caller to 
adhere to the contract established by the API specification.


Putting the logic together like the above makes sense. It makes sense 
that a 'null' XPath expression is a 'precondition violation', and hence 
a NullPointerException, and it also makes sense that an invalid XPath 
expression is something that the caller can reasonably be expected to 
recover from, and should be checked - even if it is inconvenient 
sometimes...

Thus, I think I have it settled in my mind that changing to an unchecked 
exception is wrong (even if the code looks a lot prettier).

I think I may still differentiate between an XPath 'compile' exception, 
and an XPath 'evaluation' Exception instead of using a blanket 
JDOMException. Psychologically that makes it an 'XPath' problem, not a 
JDOM problem.

Rolf


On 17/01/2012 10:31 PM, Rolf Lear wrote:
> Hi all.
>
> Recent discussions have highlighted the area of how JDOM handles some 
> exceptions. In particular the context was XPath expressions. JDOM 
> specifies (and 'always' has specified) that XPath throws JDOMException 
> in the event of a failure on XPath. This has been 'questioned' from 
> the perspective that this would not be the fault of JDOM if the XPath 
> expression failed to compile, or evaluate.
>
> Exceptions that are outside the control of the programmer, like 
> IOException, should be thrown and caught, but an illegal XPath is more 
> of a bug/programming error than an Exception, and hence should be 
> treated more like a NullPointerException, IllegalArgumentException, 
> IndexOutOfBoundsException, etc.
>
> Certainly it is 'ugly' to have to try/catch even the simplest XPath 
> expressions:
>
> List<?> nodes = null;
> try {
>   nodes = XPath.selectNodes(document, "//tag");
> } catch (JDOMException e) {
>   // handle it somehow
>   ...
> }
> // do something with nodes.
>
> This would all be much simpler if the code throws a RuntimeException 
> instead:
>
> List<?> nodes = XPath.selectNodes(document, "//tag");
>
>
>
> So, having used XPath as one example, I can then extrapolate the issue 
> in to other general areas (sticking with concepts that are 'old' - in 
> JDOM as well as JDOM2 - JDOM2 has additional areas of concern):
> 1. SAXOutputter throws JDOMExcepion on all it's calls because it traps 
> SAXException from the output target: 
> http://jdom.org/docs/apidocs/org/jdom/output/SAXOutputter.html#output%28org.jdom.Document%29
> 2. DOMOutputter throws JDOMException to wrap 
> ParserConfigurationException from Java's DocumentBuilder.
> 3. XSLTransform throws a subclass of JDOMException.
>
> Interestingly, XMLOutputter throws IOException, but not JDOMException.
>
>
> Taking the issue to an abstract level, there are a number of places 
> where JDOM throws the checked exception JDOMException, and that 
> exception requires cumbersome handling in situations where unchecked 
> exceptions would (potentially) be a better choice.
>
>
> There are a number issues at stake here though:
>
> 1. In JDOM the JDOMException is specified ( 
> http://jdom.org/docs/apidocs/org/jdom/JDOMException.html ) as being 
> the 'top level Exception JDOM classes can throw'. But that's already 
> *not* true. We have had all sorts of runtime exceptions thrown from 
> various classes like 'Element' which throws IlleglNameException from 
> it's constructor... So, should JDOMException be redefined to be 
> JDOM-specific problems only?
>
> 2. Where is the 'line'? Should SAXOutputter throw SAXException instead 
> of JDOMException (like XMLOutputter throws IOException not 
> JDOMException)? Should SAXOutputter throw some new RuntimeException 
> instead? How could the 'system' be described so that this 
> inconsistency of exceptions is better controlled?
>
> 3. It creates a major backward-compatibility issue to remove the 
> 'throws JDOMException' from methods. Existing code that does:
>
> try {
>   nodes = XPath.selectNodes(document, "//tag");
> } catch (JDOMException jde) {
>   // handle it somehow
>   ...
> }
>
> Fails to compile with:
>
>     [javac] 
> ....\src\java\org\jdom2\test\cases\xpath\AbstractTestXPath.java:595: 
> exception org.jdom2.JDOMException is never thrown in body of 
> corresponding try statement
>     [javac]         } catch (JDOMException jde) {
>     [javac]           ^
>     [javac] 1 error
>
>
>
>
> I have been playing with the code anyway, and I like the looks of the 
> results of replacing 'strategic' JDOMExceptions with a runtime 
> Exception. For example, I created a new unchecked JDOMRuntimeException 
> class. From this class I created two subclasses: XPathCompileException 
> and XPathEvaluationException. I made all the code 'work' nicely with 
> these exceptions and the code looks very clean.
>
> Backward compatibility is 'screwed' though, but somewhat mitigated by 
> the fact that 'old' code can be modified from:
>
>    ...
> } catch (JDOMException jde) {
>    ...
>
>
> to
>
>    ...
> } catch (JDOMRuntimeException jde) {
>    ...
>
> Alternatively, depending on the actual exception handling, the 
> try/catch can be completely removed and handling can be cascaded up to 
> a higher point....
>
>
> Apart from renaming all the packages to org.jdom2, this would be the 
> most significant migration problem for any users of JDOM/JDOM2. 
> Documenting it as a migration issue should be relatively easy, but the 
> fix would not be a pure search/replace, but the exceptions would have 
> to be identified and fixed individually.
>
> Admittedly in a tool like eclipse, it is quite easy to put 'Runtime' 
> in your copy/paste buffer, and go from one compile problem to the next 
> simply looking for the 'unreachable code' problem and adding the 
> 'Runtime' to the middle of 'JDOMException'.
>
>
>
> Sorry for the long mail, but this is a 'feature' which could make 
> JDOM2 much easier to work with, but would certainly make a migration 
> from JDOM more complicated.
>
>
> Would love some thoughts on this....
>
> Rolf
> _______________________________________________
> To control your jdom-interest membership:
> http://www.jdom.org/mailman/options/jdom-interest/youraddr at yourhost.com
>


From jdom at tuis.net  Thu Jan 19 12:41:41 2012
From: jdom at tuis.net (Rolf Lear)
Date: Thu, 19 Jan 2012 15:41:41 -0500
Subject: [jdom-interest] JDOM2 and Runtime Exceptions
In-Reply-To: <4F18709E.3020502@xerox.com>
References: <4F163CFE.4030209@tuis.net> <4F168CF7.2000706@saxonica.com>
	<4F18709E.3020502@xerox.com>
Message-ID: <69c4dc0f0b038d49a45074da43984dd1@tuis.net>


Hi Leigh, all

Despite my earlier mail referencing 'Effective Java', I went further in to
the book, and it then contradicts itself in "item 59" which claims "Avoid
unnecessary use of checked exceptions". It quite clearly contradicts the
"item 58".... so even Bloch is not able to clearly define a 'rule' for
checked exceptions.

This process has been an exercise of frustration. It is quite clear that
there is no clear 'right' way of doing things. There is no clear
'precedent' on how it should be done either.

Should XPath be like regex with a compile() and match() process, neither
of which throw checked exceptions? The 'similarity' between XPath and Regex
is quite convincing...

Despite my earlier claim that xpath exceptions can be 'recovered from
easily by the caller', I am not actually convinced. How do you 'recover'
from a bad expression? How do you recover from an expression that does
arithmatic with a value that is non-numeric? The argument for having
checked exceptions is very unclear, and the convenience of unchecked
exceptions is substantial.

In a 'fresh' world, if I were writing the JDOM/XPath API from scratch, I
think it would be very reasonable to throw an XPathSyntaxException for bad
XPaths just like java.util.regex.Pattern throws PatternSyntaxException.
Similarly XPathEvaluationException for issues encountered in the document.

But backward compatibility is a big issue too.

I think a big part of the API problem is because it is so closely tied to
Jaxen. Jaxen throws checked exceptions too. I am not saying that checked
exceptions are wring, but nor are they right.

On the train this morning I played again with the JDOM/XPath API. I think
I have a working solution, and I think I am more comfortable with it. It
took a while to come to, but Java already has a well defined process for
it... ;-) Deprecation.

The thrown exceptions of a method are part of it's public API. I don't
like the JDOMException thrown from XPath methods, so I am going to
deprecate them.... JDOM 1.x users will get compile warnings, not errors.
That's the compatibility problem solved.

Then, I break down the XPath in to a 'compile' and 'evaluate' step, and
make them throw unchecked exceptions that make sense for the particular
issue.

The new methods will be called 'XPath<T> compile(...)' instead of
newInstance(...), and I think I will call the new execution methods  
List<T> matchAll(context)   and   T matchFirst(context)  .

I have looked in to XPath 2.0, and by being smart with the API, and 'nice'
with the option of applying a Filter directly to the XPath, I think it is
reasonable to have the best of both worlds. With the changes as I have them
now I think plugging in a different XPath2.0 back-end should be easy when
one is available, and it will 'just work'. XPath 2.0 clearly differentiates
between 'static analysis' portion of the XPath, and the 'dynamic
evaluation' stage.

Since this is such a grey area I think someone needs to just 'decide', and
I think I will do just that.... Deprecate the old methods, keep their
signatures unchanged (including exceptions), and implement a new, clean,
unchecked, and generified set of methods.

I like the idea of XPath being similar in 'feel' to RegEx.

Time for me to get on the train again, and spend an hour playing with what
feels right.

Rolf


On Thu, 19 Jan 2012 11:35:58 -0800, Leigh L Klotz Jr
<leigh.klotz at xerox.com> wrote:
> On 01/18/2012 01:12 AM, Michael Kay wrote:
>> I don't think you'll please everyone here, but even without the 
>> compatibility implications, I'm not convinced that moving to unchecked 
>> exceptions would be an improvement.
>>
> 
> We use JDOM in our hand-written because it is a convenient, expressive 
> API, giving much of the compactness and other benefits we see from XPath

> itself and other higher-level XML interfaces such as XQuery.
> 
> However, we haven't found the JDOM1 XPath Java interface to be 
> convenient or expressive, because of the verbosity and the checked 
> exceptions, which in our case are all programming errors of one sort or 
> another.  (We don't let end users type in XPath expressions.)  Instead, 
> we use a static JDOMUtil wrapper class with methods such as 
> selectElement, selectElements, selectAttributes, selectContent, and ref 
> (leaf-node value).
> 
> So for us, the JDOM XPath API is a implementation of a way to run XPath 
> expressions over JDOM objects, and not a convenient, expressive API that

> we use to hand write Java code.
> 
> JDOM2 with the filters may offer an expressive API that would let us do 
> away with the profusion of select* utility methods, but with checked 
> exceptions it still won't be convenient, and we still won't use it
> directly.
> 
> Leaving in the checked exceptions means less migration headache for 
> other users, and since we're not going to use it directly, it doesn't 
> matter much.  Another reason we may shift away from JDOM XPath API is 
> that we're disenchanted with Jaxen as well and are hoping to find a fast

> (at runtime) way to use Saxon on JDOM from hand-written Java code.  That

> probably won't use the JDOM XPath API at all.
> 
> Leigh.

From leigh.klotz at xerox.com  Thu Jan 19 14:06:17 2012
From: leigh.klotz at xerox.com (Leigh L Klotz Jr)
Date: Thu, 19 Jan 2012 14:06:17 -0800
Subject: [jdom-interest] JDOM2 and Runtime Exceptions
In-Reply-To: <69c4dc0f0b038d49a45074da43984dd1@tuis.net>
References: <4F163CFE.4030209@tuis.net> <4F168CF7.2000706@saxonica.com>
	<4F18709E.3020502@xerox.com>
	<69c4dc0f0b038d49a45074da43984dd1@tuis.net>
Message-ID: <4F1893D9.5050206@xerox.com>

Given what I decided about our usage of org.jdom.xpath packages being 
isolated, the issue of exception checking isn't a big one for me, but 
sadly that's because we can't much use it anyway.

If you're interested in doing refactoring, making it easier to use a 
different XPath implementation would be my suggested goal.

Leigh.


From laurent.bihanic at atos.net  Fri Jan 20 01:26:48 2012
From: laurent.bihanic at atos.net (BIHANIC Laurent)
Date: Fri, 20 Jan 2012 09:26:48 +0000
Subject: [jdom-interest] suggested JDOM2 improvements
In-Reply-To: <b368618ea79598ee225d75c7d6e82e1d@tuis.net>
References: <4F152CD8.5030508@uni-jena.de>
	<b368618ea79598ee225d75c7d6e82e1d@tuis.net>
Message-ID: <4F193347.7040208@atos.net>

Hi Rolf,

Le 17/01/12 14:42, Rolf Lear a ?crit :
> In the 'simple' XPath case (no extra namespaces, no Variables) XPath is
> still a 1-liner, which is hard to beat. In a complicated case I see more
> complexity trying to 'massage' your Namespaces and Variables in to some new
> type structures (the varargs) than the existing concept of adding
> Namespaces and setting variables.
>
> Also, keeping backward compatibility is a strong consideration.
>
> In reality I think I would like to see code examples of what you think it
> 'should' look like to get a better idea, but at the moment i am not
> convinced that it's actually broken enough to require fixing.

Well, as 99% of our XML use namespaces, using JDOM XPath is not a 1-liner.
And as the XPath API throws non-runtime exceptions, pre-compiling XPath
expressions (as we do for regex) requires using a class initializer to map
JDOMException to runtime exceptions.
The only case where we can't compile XPath expressions is when we want to use
variables. Which defeats the whole purpose of compiling XPath! Or we have to
use thread-local compiled XPaths.

So, I think it would be great to split the XPath API in two parts.

One for constructing compiled XPath expressions, including the namespaces,
using either a constructor/factory method with varargs, e.g. compile(String
expr, Namespace... namespace), or a builder/DSL. The result being an immutable
thread-safe XPath object. This part would only throw runtime exceptions,
IllegalArgumentException seeming sufficient.

A second for evaluating compiled XPaths on documents, taking optional variable
bindings as argument and throwing regular exceptions, e.g. find(context,
Map<String,Object> bindings)

If we go this way, we should leave the existing XPath class unchanged and
deprecate it and create a new separate class.

Regards,

Laurent
________________________________


Ce message et les pi?ces jointes sont confidentiels et r?serv?s ? l'usage exclusif de ses destinataires. Il peut ?galement ?tre prot?g? par le secret professionnel. Si vous recevez ce message par erreur, merci d'en avertir imm?diatement l'exp?diteur et de le d?truire. L'int?grit? du message ne pouvant ?tre assur?e sur Internet, la responsabilit? du groupe Atos ne pourra ?tre engag?e quant au contenu de ce message. Bien que les meilleurs efforts soient faits pour maintenir cette transmission exempte de tout virus, l'exp?diteur ne donne aucune garantie ? cet ?gard et sa responsabilit? ne saurait ?tre engag?e pour tout dommage r?sultant d'un virus transmis.

This e-mail and the documents attached are confidential and intended solely for the addressee; it may also be privileged. If you receive this e-mail in error, please notify the sender immediately and destroy it. As its integrity cannot be secured on the Internet, the Atos group liability cannot be triggered for the message content. Although the sender endeavors to maintain a computer virus-free network, the sender does not warrant that this transmission is virus-free and will not be liable for any damages resulting from any virus transmitted.


From mike at saxonica.com  Fri Jan 20 02:45:19 2012
From: mike at saxonica.com (Michael Kay)
Date: Fri, 20 Jan 2012 10:45:19 +0000
Subject: [jdom-interest] suggested JDOM2 improvements
In-Reply-To: <4F193347.7040208@atos.net>
References: <4F152CD8.5030508@uni-jena.de>
	<b368618ea79598ee225d75c7d6e82e1d@tuis.net>
	<4F193347.7040208@atos.net>
Message-ID: <4F1945BF.30201@saxonica.com>


 >The only case where we can't compile XPath expressions is when we want 
to use variables. Which defeats the whole purpose of compiling XPath!

Absolutely!

 >Or we have to use thread-local compiled XPaths. So, I think it would 
be great to split the XPath API in two parts.

That' definitely the way to go if you're making changes to this area. If 
you're not familiar with it, do take a look at the s9api design in Saxon:

http://www.saxonica.com/documentation/javadoc/net/sf/saxon/s9api/XPathCompiler.html

That involves three classes:

XPathCompiler contains the static context (variable and namespace 
declarations)

XPathExecutable is the thread-safe compiled and reusable XPath expression

XPathEvaluator contains the dynamic context (variable values, context item)

You can eliminate the XPathEvaluator by having a more complex evaluate() 
method on the XPathExecutable, e.g. one that supplies the variable 
values as a Map; but this doesn't reduce the overall number of objects 
involved, it just replaces the XPathEvaluator object with a Map object.

The other big design problem with an XPath API is the types used for 
variable values and for the evaluation result. With the JAXP API I get 
an enormous amount of support hassle caused by the lack of type safety 
in the way JAXP does this. In s9api I decided, despite the complexity, 
to introduce classes XdmValue, XdmItem, XdmAtomicValue etc to make the 
whole thing type-safe, and I don't regret the decision. (I also have 
XdmNode which abstracts over DOM, JDOM, XOM etc nodes.)

If you're designing a new XPath API in 2012 then I think it's essential 
to think about how it will support XPath 2.0.

Michael Kay
Saxonica


From noel at peralex.com  Fri Jan 20 04:57:11 2012
From: noel at peralex.com (Noel Grandin)
Date: Fri, 20 Jan 2012 14:57:11 +0200
Subject: [jdom-interest] JDOM2 and Runtime Exceptions
In-Reply-To: <69c4dc0f0b038d49a45074da43984dd1@tuis.net>
References: <4F163CFE.4030209@tuis.net> <4F168CF7.2000706@saxonica.com>
	<4F18709E.3020502@xerox.com>
	<69c4dc0f0b038d49a45074da43984dd1@tuis.net>
Message-ID: <4F1964A7.6020001@peralex.com>


You are correct, how exactly to use exceptions is still a matter of 
taste and debate.

That being said, I prefer programmatic problems to be unchecked.

And I don't think backwards compatibility w.r.t. exceptions is such a 
big deal - JDOM2 already requires quite a few changes.
Changing my catch block and throws clauses is not a big deal, and it's 
not the kind of change that would subtly corrupt my code either.

On 2012-01-19 22:41, Rolf Lear wrote:
> Hi Leigh, all
>
> Despite my earlier mail referencing 'Effective Java', I went further in to
> the book, and it then contradicts itself in "item 59" which claims "Avoid
> unnecessary use of checked exceptions". It quite clearly contradicts the
> "item 58".... so even Bloch is not able to clearly define a 'rule' for
> checked exceptions.
>
> This process has been an exercise of frustration. It is quite clear that
> there is no clear 'right' way of doing things. There is no clear
> 'precedent' on how it should be done either.
>

Disclaimer: http://www.peralex.com/disclaimer.html


From jdom at tuis.net  Fri Jan 20 05:56:56 2012
From: jdom at tuis.net (Rolf Lear)
Date: Fri, 20 Jan 2012 08:56:56 -0500
Subject: [jdom-interest] suggested JDOM2 improvements
In-Reply-To: <4F1945BF.30201@saxonica.com>
References: <4F152CD8.5030508@uni-jena.de>
	<b368618ea79598ee225d75c7d6e82e1d@tuis.net>
	<4F193347.7040208@atos.net> <4F1945BF.30201@saxonica.com>
Message-ID: <779b646e68bc8d6f49267a572345c616@tuis.net>


I have looked at the Saxon API, as well as the native Java API. I have
also looked in to XPath2.0.

Mostly my 'experience' with XPath is through the current JDOM API. There
are things I like, and things I dislike, and things I have had to relearn
because the JDOM/XPath API has skewed my experience.

I think I am settling on the following model:

1. deprecate the current XPath entirely. Keep it fully backward compatible
with JDOM 1.x
2. new JDOM2 XPathFactory concept which can have different implementation
back-ends (Jaxen, Saxon, whatever).
3. XPathFactories are thread-safe and reusable in any threads.
4. have a single 'default' XPathFactory instance obtainable with
XPathFactory.instance(). The default back-end instance() can be changed
with a system property.
5. the default 'default' back-end will continue to be Jaxen
6. Other back-ends can be used at will by calling the
XPathFactory.newInstance(String) method (or some direct constructor on the
Factory if it exposes one).
6. At the other end of the system will be an interface XPathCompiled<T>.
This will be immutable, but not thread-safe. Similar concept/behaviour to
javax.xml.xpath.XPathExpression.
7. XPathCompiled<T> will not have the 'special' valueOf, numberValue,
booleanValue that org.jdom.xpath.XPath has. These methods are extensions to
the basic XPath concept and make support for other types impossible (like
XPath 2.0).
8. Instead, XPathCompiled<T> has a generic type which will match the
result values from the expression. The generic type is set by the JDOM
Filter.
9. XPathCompiled<T> can return the full list of results, or alternatively
just the first result. The results will be type-cast to the specified
Filter.
10. The compiling and running methods for the new API will throw unchecked
exceptions (like the javax.xml.xpath.* API).

That will be the base model.

Using this model I expect a base (comprehensive) factory method:

public <E> XPathCompiled<E> compile(String xpath, Filter<E> filter,
Map<String,Object> variables, List<Namespace> namespaces);

In addition there will be variations on the compile method that cater for
simplified conditions, like the basic no-namespace, no-variable, no-filter:

public XPathCompiled<Object> compile(String xpath);


The XPathCompiled<T> class will have:

public List<T> evaluate(Object context);
public T evaluateFirst(Object context);

The evaluateFirst method is a convenience method that will be defined to
return the first value in the evaluate() results, or null if the result is
empty. Implementations can choose to have some short-circuit logic if
possible.

To make life easier it is helpful to have an intermediate class that can
manage the variable and namespace contexts for you. Thus a helper class
XPathBuilder<T> will support managing these (getters/setters for variables,
namespaces). It will also have a compile() method to create an
XPathCompiled<T> using the state of the XPathBuilder at compile time.

Since this new API will impose a 'Filter' on top of the XPath results
there may/will be times when debugging problems will be a challenge.. for
example: Am I missing element X because it was not selected by the XPath or
because it was eliminated by the filter? To answer that sort of question
there needs to be an XPathResult<T> object which contains the pre and post
filtered results (as well as other useful debugging information).

Thus, XPathCompiled<T> will also have:
public XPathResult<T> evaluateResult(Object context);


Examples of the way I see it working are:

//the following two are identical:
String name = XPathFactory.instance().compile("//name/text()",
Filters.string()).evaluateFirst(document);
String name = XPathFactory.instance().evaluateFirst(document,
"//name/text()", Filters.string());

// just select the current node.
Object val = XPathFactory.instance().evaluateFirst(context, "node()");

// create a builder and use it to compile an XPath.
XPathBuilder<Element> builder = new XPathBuilder(Filters.element());
builder.setXPath("//ns:*");
builder.addNamespace("ns", "http://example.com/mynamespace");
XPathCompiled<Element> xpath = builder.compile(XPathFactory.instance());
List<Element> mine = xpath.evaluate(mydocument);

// Get a diagnostic
XPathResult<Element> result = XPathFactory.instance().compile("//@*",
Filters.element()).evaluateResult(context);
if (!result.filtered().isEmpty()) {
   List<Object> filtered = result.filtered();
   System.out.println("The following results were selected by the XPath
but removed by the Filter: " + fltered.toString());
   List<Element> survived = result.result();
   System.out.println("The following results were selected by the XPath
but removed by the Filter: " + survived.toString());
}


This is all taking longer than I expected. I think I will have to put a
'proof of concept' out there, and extend the ALPHA release phase.....


Rolf


In essence this API shifts the 'onus' on ensuring the return value is of
the appropriate type to the 'user'. They know the XPath query, they should
know the return type.

>From what I can tell, this model should be compatible with any back-end,
including XPath 2.0. It does not impose any XPath-specific logic modifiers.
If you want a 'number' back from your XPath then you need to use the XPath
number() function to get one. If you want the XPath result cast as a String
using the XPath string-conversion logic, then you should wrap your XPath
query in the XPath string() function. This same logic follows through to
XPath2.0


On Fri, 20 Jan 2012 10:45:19 +0000, Michael Kay <mike at saxonica.com> wrote:
>>The only case where we can't compile XPath expressions is when we want 
> to use variables. Which defeats the whole purpose of compiling XPath!
> 
> Absolutely!
> 
>  >Or we have to use thread-local compiled XPaths. So, I think it would 
> be great to split the XPath API in two parts.
> 
> That' definitely the way to go if you're making changes to this area. If

> you're not familiar with it, do take a look at the s9api design in
Saxon:
> 
>
http://www.saxonica.com/documentation/javadoc/net/sf/saxon/s9api/XPathCompiler.html
> 
> That involves three classes:
> 
> XPathCompiler contains the static context (variable and namespace 
> declarations)
> 
> XPathExecutable is the thread-safe compiled and reusable XPath
expression
> 
> XPathEvaluator contains the dynamic context (variable values, context
item)
> 
> You can eliminate the XPathEvaluator by having a more complex evaluate()

> method on the XPathExecutable, e.g. one that supplies the variable 
> values as a Map; but this doesn't reduce the overall number of objects 
> involved, it just replaces the XPathEvaluator object with a Map object.
> 
> The other big design problem with an XPath API is the types used for 
> variable values and for the evaluation result. With the JAXP API I get 
> an enormous amount of support hassle caused by the lack of type safety 
> in the way JAXP does this. In s9api I decided, despite the complexity, 
> to introduce classes XdmValue, XdmItem, XdmAtomicValue etc to make the 
> whole thing type-safe, and I don't regret the decision. (I also have 
> XdmNode which abstracts over DOM, JDOM, XOM etc nodes.)
> 
> If you're designing a new XPath API in 2012 then I think it's essential 
> to think about how it will support XPath 2.0.
> 
> Michael Kay
> Saxonica
> 
> _______________________________________________
> To control your jdom-interest membership:
> http://www.jdom.org/mailman/options/jdom-interest/youraddr at yourhost.com

From mike at saxonica.com  Fri Jan 20 06:31:07 2012
From: mike at saxonica.com (Michael Kay)
Date: Fri, 20 Jan 2012 14:31:07 +0000
Subject: [jdom-interest] suggested JDOM2 improvements
In-Reply-To: <779b646e68bc8d6f49267a572345c616@tuis.net>
References: <4F152CD8.5030508@uni-jena.de>
	<b368618ea79598ee225d75c7d6e82e1d@tuis.net>
	<4F193347.7040208@atos.net> <4F1945BF.30201@saxonica.com>
	<779b646e68bc8d6f49267a572345c616@tuis.net>
Message-ID: <4F197AAB.4050202@saxonica.com>

 >public XPathCompiled<Object> compile(String xpath);

I started introducing generics for this in Saxon 9.4 and the experience 
wasn't wholly positive; it left a lot of cases where there were warnings 
that needed to be ignored. That may be because I found generics to be 
deeper and more bewildering than I expected.

It's not at all clear to me how your types such as 
XPathCompiled<Element> are supposed to work. Do they rely excessively on 
the ability of the XPath engine to do static type analysis of the 
supplied expression?

Michael Kay
Saxonica


From jdom at tuis.net  Fri Jan 20 06:50:53 2012
From: jdom at tuis.net (Rolf Lear)
Date: Fri, 20 Jan 2012 09:50:53 -0500
Subject: [jdom-interest] suggested JDOM2 improvements
In-Reply-To: <4F197AAB.4050202@saxonica.com>
References: <4F152CD8.5030508@uni-jena.de>
	<b368618ea79598ee225d75c7d6e82e1d@tuis.net>
	<4F193347.7040208@atos.net> <4F1945BF.30201@saxonica.com>
	<779b646e68bc8d6f49267a572345c616@tuis.net>
	<4F197AAB.4050202@saxonica.com>
Message-ID: <530e8a5bc8b2312c0a9ff17c8b303ed9@tuis.net>


No, no static type analysis.

JDOM has 'always' had the 'Filter' concept. You could, for example, do:

List comments = element.getContent(new
ContentFilter(ContentFilter.COMMENT));

In order to make the above 'generic' in JDOM2, the getContent() has to
return an appropriate type for whatever the Filter returns. I 'extended'
the Filter class to have a generic return type. Thus, it is now possible
to:

List<Comment> comments = element.getContent(Filters.comment());

The Filter implementations all follow the rules:
1. if the content to be filtered does not match the filter, then the
content is discareded.
2. if the content matches the filter, then it is explicitly cast to the
generic type of the filter.

What this means is that you are guaranteed that the generic type of the
Filter results is accurate, and it is impossible to 'force' Filter results
to have badly-loaded result lists.

Filter instances can do more than just type-checking on the input data,
but can also do anything else to filter the content, like checking for
particular names, etc.

With the XPath library, I intend to apply the same Filter concept to the
XPath results.

Since the user knows the XPath expression, they will also know the
anticipated return type. If they want to select Elements then they can
apply an Element filter. If they want to select 'everything' then
they can use a 'passthough' filter which 'does no filtering' (but as a
result can only 'cast' to Object).

Essentially the Filter concept is a way to coerce unknown data in to a
user defined type while ensuring the results will never generate
class-cast, and providing an opportunity to discard what you do not want.
It is ideal for XPath results.

The 'user' creates their own filter
http://hunterhacker.github.com/jdom/jdom2/apidocs/org/jdom2/filter/Filter.html
, or reuses one of the 'common' filters accessible in the 'Filters' class
http://hunterhacker.github.com/jdom/jdom2/apidocs/org/jdom2/filter/Filters.html

Most Filter implementations take a Class instance (matching the generic
type of the Filter) as a constructor argument, and any values that match
the filter are cast using the Class.cast() method.

Rolf


On Fri, 20 Jan 2012 14:31:07 +0000, Michael Kay <mike at saxonica.com> wrote:
>>public XPathCompiled<Object> compile(String xpath);
> 
> I started introducing generics for this in Saxon 9.4 and the experience 
> wasn't wholly positive; it left a lot of cases where there were warnings

> that needed to be ignored. That may be because I found generics to be 
> deeper and more bewildering than I expected.
> 
> It's not at all clear to me how your types such as 
> XPathCompiled<Element> are supposed to work. Do they rely excessively on

> the ability of the XPath engine to do static type analysis of the 
> supplied expression?
> 
> Michael Kay
> Saxonica

From mike at saxonica.com  Fri Jan 20 06:57:02 2012
From: mike at saxonica.com (Michael Kay)
Date: Fri, 20 Jan 2012 14:57:02 +0000
Subject: [jdom-interest] suggested JDOM2 improvements
In-Reply-To: <530e8a5bc8b2312c0a9ff17c8b303ed9@tuis.net>
References: <4F152CD8.5030508@uni-jena.de>
	<b368618ea79598ee225d75c7d6e82e1d@tuis.net>
	<4F193347.7040208@atos.net> <4F1945BF.30201@saxonica.com>
	<779b646e68bc8d6f49267a572345c616@tuis.net>
	<4F197AAB.4050202@saxonica.com>
	<530e8a5bc8b2312c0a9ff17c8b303ed9@tuis.net>
Message-ID: <4F1980BE.4020702@saxonica.com>

Thanks for the explanation.

I wonder, though, if discarding data of the wrong type is better than 
throwing a ClassCastException? It's very easy in XPath, for example, to 
ask for a text node when you thought you were asking for a string. 
Expressions that return nothing are the hardest thing to debug as it is.

Michael Kay
Saxonica

On 20/01/2012 14:50, Rolf Lear wrote:
> No, no static type analysis.
>
> JDOM has 'always' had the 'Filter' concept. You could, for example, do:
>
> List comments = element.getContent(new
> ContentFilter(ContentFilter.COMMENT));
>
> In order to make the above 'generic' in JDOM2, the getContent() has to
> return an appropriate type for whatever the Filter returns. I 'extended'
> the Filter class to have a generic return type. Thus, it is now possible
> to:
>
> List<Comment>  comments = element.getContent(Filters.comment());
>
> The Filter implementations all follow the rules:
> 1. if the content to be filtered does not match the filter, then the
> content is discareded.
> 2. if the content matches the filter, then it is explicitly cast to the
> generic type of the filter.
>
> What this means is that you are guaranteed that the generic type of the
> Filter results is accurate, and it is impossible to 'force' Filter results
> to have badly-loaded result lists.
>
> Filter instances can do more than just type-checking on the input data,
> but can also do anything else to filter the content, like checking for
> particular names, etc.
>
> With the XPath library, I intend to apply the same Filter concept to the
> XPath results.
>
> Since the user knows the XPath expression, they will also know the
> anticipated return type. If they want to select Elements then they can
> apply an Element filter. If they want to select 'everything' then
> they can use a 'passthough' filter which 'does no filtering' (but as a
> result can only 'cast' to Object).
>
> Essentially the Filter concept is a way to coerce unknown data in to a
> user defined type while ensuring the results will never generate
> class-cast, and providing an opportunity to discard what you do not want.
> It is ideal for XPath results.
>
> The 'user' creates their own filter
> http://hunterhacker.github.com/jdom/jdom2/apidocs/org/jdom2/filter/Filter.html
> , or reuses one of the 'common' filters accessible in the 'Filters' class
> http://hunterhacker.github.com/jdom/jdom2/apidocs/org/jdom2/filter/Filters.html
>
> Most Filter implementations take a Class instance (matching the generic
> type of the Filter) as a constructor argument, and any values that match
> the filter are cast using the Class.cast() method.
>
> Rolf
>
>
> On Fri, 20 Jan 2012 14:31:07 +0000, Michael Kay<mike at saxonica.com>  wrote:
>>> public XPathCompiled<Object>  compile(String xpath);
>> I started introducing generics for this in Saxon 9.4 and the experience
>> wasn't wholly positive; it left a lot of cases where there were warnings
>> that needed to be ignored. That may be because I found generics to be
>> deeper and more bewildering than I expected.
>>
>> It's not at all clear to me how your types such as
>> XPathCompiled<Element>  are supposed to work. Do they rely excessively on
>> the ability of the XPath engine to do static type analysis of the
>> supplied expression?
>>
>> Michael Kay
>> Saxonica


From jdom at tuis.net  Fri Jan 20 07:16:16 2012
From: jdom at tuis.net (Rolf Lear)
Date: Fri, 20 Jan 2012 10:16:16 -0500
Subject: [jdom-interest] suggested JDOM2 improvements
In-Reply-To: <4F1980BE.4020702@saxonica.com>
References: <4F152CD8.5030508@uni-jena.de>
	<b368618ea79598ee225d75c7d6e82e1d@tuis.net>
	<4F193347.7040208@atos.net> <4F1945BF.30201@saxonica.com>
	<779b646e68bc8d6f49267a572345c616@tuis.net>
	<4F197AAB.4050202@saxonica.com>
	<530e8a5bc8b2312c0a9ff17c8b303ed9@tuis.net>
	<4F1980BE.4020702@saxonica.com>
Message-ID: <2252a7c03b6c5fe5a1829818f8b86c8f@tuis.net>


I agree with the debug issue. That is exactly why in the model I intend to
provide I will make it possible to return the 'XPathResult<T>' and not just
a List<T>.
The XPathResult<T> will allow you to inspect the base XPath results in a
List<Object> as well as the filter results in the List<T> format.

XPath has always been a vulnerable area for type-casting. Nothing has
stopped the user from coding inappropriate casts for XPath results. With
JDOM2 the user will have the option of trading class-cast-exceptions for
missing result conditions. If the user is anxious to keep the
class-cast-exception option then they can choose to use unfiltered XPath
results.

In general, a user writing an XPath expression has to know ahead of time
what the return types will be (including XPath 2.0 with it's plethora of
atomic types). Using the Filter concept allows the user to anticipate the
type of his/her choice, and not have to statically build the type in to the
API.

Mitigating the debug issue with a XPathResult<T> with useful methods
interrogating intermediate results (and a useful toString()) is a good
compromise, I think.

As long as people understand that the XPath results are 'filtered' a
second time then everything should be fine.

Remember that the users can always elect to have unfiltered results too,
but then they have to live with List<Object> results.

Rolf

On Fri, 20 Jan 2012 14:57:02 +0000, Michael Kay <mike at saxonica.com> wrote:
> Thanks for the explanation.
> 
> I wonder, though, if discarding data of the wrong type is better than 
> throwing a ClassCastException? It's very easy in XPath, for example, to 
> ask for a text node when you thought you were asking for a string. 
> Expressions that return nothing are the hardest thing to debug as it is.
> 
> Michael Kay
> Saxonica
> 
> On 20/01/2012 14:50, Rolf Lear wrote:
>> No, no static type analysis.
>>
>> JDOM has 'always' had the 'Filter' concept. You could, for example, do:
>>
>> List comments = element.getContent(new
>> ContentFilter(ContentFilter.COMMENT));
>>
>> In order to make the above 'generic' in JDOM2, the getContent() has to
>> return an appropriate type for whatever the Filter returns. I
'extended'
>> the Filter class to have a generic return type. Thus, it is now
possible
>> to:
>>
>> List<Comment>  comments = element.getContent(Filters.comment());
>>
>> The Filter implementations all follow the rules:
>> 1. if the content to be filtered does not match the filter, then the
>> content is discareded.
>> 2. if the content matches the filter, then it is explicitly cast to the
>> generic type of the filter.
>>
>> What this means is that you are guaranteed that the generic type of the
>> Filter results is accurate, and it is impossible to 'force' Filter
>> results
>> to have badly-loaded result lists.
>>
>> Filter instances can do more than just type-checking on the input data,
>> but can also do anything else to filter the content, like checking for
>> particular names, etc.
>>
>> With the XPath library, I intend to apply the same Filter concept to
the
>> XPath results.
>>
>> Since the user knows the XPath expression, they will also know the
>> anticipated return type. If they want to select Elements then they can
>> apply an Element filter. If they want to select 'everything' then
>> they can use a 'passthough' filter which 'does no filtering' (but as a
>> result can only 'cast' to Object).
>>
>> Essentially the Filter concept is a way to coerce unknown data in to a
>> user defined type while ensuring the results will never generate
>> class-cast, and providing an opportunity to discard what you do not
want.
>> It is ideal for XPath results.
>>
>> The 'user' creates their own filter
>>
http://hunterhacker.github.com/jdom/jdom2/apidocs/org/jdom2/filter/Filter.html
>> , or reuses one of the 'common' filters accessible in the 'Filters'
class
>>
http://hunterhacker.github.com/jdom/jdom2/apidocs/org/jdom2/filter/Filters.html
>>
>> Most Filter implementations take a Class instance (matching the generic
>> type of the Filter) as a constructor argument, and any values that
match
>> the filter are cast using the Class.cast() method.
>>
>> Rolf
>>
>>
>> On Fri, 20 Jan 2012 14:31:07 +0000, Michael Kay<mike at saxonica.com> 
>> wrote:
>>>> public XPathCompiled<Object>  compile(String xpath);
>>> I started introducing generics for this in Saxon 9.4 and the
experience
>>> wasn't wholly positive; it left a lot of cases where there were
warnings
>>> that needed to be ignored. That may be because I found generics to be
>>> deeper and more bewildering than I expected.
>>>
>>> It's not at all clear to me how your types such as
>>> XPathCompiled<Element>  are supposed to work. Do they rely excessively
>>> on
>>> the ability of the XPath engine to do static type analysis of the
>>> supplied expression?
>>>
>>> Michael Kay
>>> Saxonica

From jdom at tuis.net  Fri Jan 20 08:13:41 2012
From: jdom at tuis.net (Rolf Lear)
Date: Fri, 20 Jan 2012 11:13:41 -0500
Subject: [jdom-interest] suggested JDOM2 improvements
In-Reply-To: <CAHzJPErobaSzcZmnLVEEVJ=kF5=fWJoWFRSktEjcK-fLGM1-rg@mail.gmail.com>
References: <4F152CD8.5030508@uni-jena.de>
	<b368618ea79598ee225d75c7d6e82e1d@tuis.net>
	<4F193347.7040208@atos.net> <4F1945BF.30201@saxonica.com>
	<779b646e68bc8d6f49267a572345c616@tuis.net>
	<4F197AAB.4050202@saxonica.com>
	<530e8a5bc8b2312c0a9ff17c8b303ed9@tuis.net>
	<4F1980BE.4020702@saxonica.com>
	<2252a7c03b6c5fe5a1829818f8b86c8f@tuis.net>
	<CAHzJPErobaSzcZmnLVEEVJ=kF5=fWJoWFRSktEjcK-fLGM1-rg@mail.gmail.com>
Message-ID: <f57b48aa42e680037669bed937ddd243@tuis.net>


No, I have not considered that.

It is important for JDOM2 to get the API right. I do not want to be
deprecating anything after 2.0

I am targeting a second alpha release for Groundhog Day (feb 2nd). I am
expecting to have a memory-efficiency improvement and any other API changes
in for that release (currently only XPath has concerns). Additionally there
are a few issues I am working on for that release:

https://github.com/hunterhacker/jdom/issues

I intend to clear out all the issues except the serialization (which is a
major pain). All the others will either be rejected, or resolved.

Assuming no other issues I anticipate keeping to the schedule:
http://markmail.org/message/dqxabjn56vt3dbik

It is pretty tight already, and the quality of the release is strongly
dependent on how much feedback there is....

... really, I would love for people to get more involved.

If anyone has contributions or wish-list items for JDOM they should speak
up.

With this XPath API change I think I will push out an intermediate ALPHA
sometime this weekend with the new XPath API in as a 'trial' for people to
play with and criticise.

I will perhaps out out another intermediate ALPHA with the
memory-efficiency changes sometime after that... but I have not yet started
working on that properly... soon.

So, expect a somewhat quick turnaround in the next two weeks for ALPHA_XP,
ALPHA_MEM and ALPHA_GH   (XPath, Memory, and Ground-Hog Day) respectively.

If anything else comes up for development before then I will probably have
to slip the timetable somehow.

That's if it is just me working on it.

Rolf


On Fri, 20 Jan 2012 07:42:51 -0800, Joe Bowbeer <joe.bowbeer at gmail.com>
wrote:
> Any thought to releasing JDOM2 with the existing functionality and
> targeting XPath redesign for JDOM2.1?
> 

From jdom at tuis.net  Fri Jan 20 12:28:04 2012
From: jdom at tuis.net (Rolf Lear)
Date: Fri, 20 Jan 2012 15:28:04 -0500
Subject: [jdom-interest] suggested JDOM2 improvements
In-Reply-To: <4F19C1B6.6060406@xerox.com>
References: <4F152CD8.5030508@uni-jena.de><b368618ea79598ee225d75c7d6e82e1d@tuis.net><4F193347.7040208@atos.net>
	<4F1945BF.30201@saxonica.com>
	<779b646e68bc8d6f49267a572345c616@tuis.net>
	<4F19C1B6.6060406@xerox.com>
Message-ID: <e631efb52dbb6ed044e3862c2b979991@tuis.net>


XPath and JDOM have always been very loosely coupled. For years (and still
now) there is no need to have direct support for XPath in JDOM. Saxon does
OK without using any of the JDOM/XPath code for example.

What the XPath code in JDOM does (or should do) is to provide a convenient
interface for the functionality. The native javax.xml.xpath.* entry point
is not useful because the JDOM classes do not conform to the same NODESET
type model.

So, given the alternatives:
1. shoehorn XPath support on top of the javax.xml.xpath.* model
2. continue with JDOM1 XPath model
3. build  new 'better' model
4. remove xpath support form JDOM and let it be a 3rd-party add-on.

I think 3 is the best.

But, there are problems with having the support: if you claim support, it
has to actually work when needed. This in turn means that you have to have
some sort of starting point. Jaxen is the only viable alternative (at the
moment) that I know of simply because it's licensing is permissive enough
and it has the right history with JDOM (sorry Michael, Saxon does not make
sense for a ship-it-with-JDOM library).

So, to make a working default system, but then provide the mechanisms to
customize it.

But, you should not change the 'global' default implementaiton from within
Java code. This is because JDOM is often used in multiple places of code:
for example, eclipse has JDOM built in. Hypothetically lets say the Eclispe
'Git' plugin changes the 'default' XPath backend to some new XPath2.0
custom value, then suddenly the 'CVS' plugin is no longer getting the
results it wants.

Setting a JVM-wide System property is a compromise already. It has real
problems because people think they can race the static initializer to
change the System property before it is used the first time.... It is
accessed only once on the first time the XPathFactory is created.

On the other hand, because XPathFactory instances are specified to be
thread-safe, there is nothing stopping you from doing:

public static final XPathFactory XPATH =
XPathFactory.newInstance("com.example.xpath20.XPathFactory");

Then in your code you can freely use:

XPathCompiled<Object> xp = XPATH.compile("//*");


You have in fact been exploiting one of the major flaws in the JDOM 1.x
XPath library: that there is no way to have multiple concurrent XPath
libraries active at the same time. When you do:

    XPath.setXPathClass(JaxenXPath.class);

you are changing the global JDOM XPath library for all JDOM users in the
same JVM. This is not an OK thing to do from a JDOM API perspective.

Bottom line is that there is no good way to allow the 'world' to change
the default XPathFactory from inside a running JVM.

Allowing the world to create a custom instance is a good compromoise, and
allowing the global default instance to be changed from the command-line is
also a decent compromise.

The best practice would be for you to get your own instance of your own
factory, then use that instance from wherever you need it.


So, if you can think of a better way to allow all JDOM users (in any
potential JVM use-case) to get the JDOMFactory of their choice.

Based on my limited understanding of your environment, it would seem to me
that having a single method on your JDOMUtil class like:

private static final AtomicReference<XPathFactory> myfactory = new
AtomicReference<XPathFactory>();
public static final XPathFactory instance() {
    final XPathFactory ret = myfactory.get();
    if (ret == null) {
      ret = XPathFactory.newInstance("my.custom.factory.ClassName");
      if (myfactory.compareAndSet(null, ret) {
        return ret;
      }
      return myfactory.get();
    }
    return ret;
}

That way you can a single location to access your particular factory. You
never have to worry about the System properties. You can change the factory
at your leaisure, and 'everything just works'.

If your use case is more complicated than that, there is nothing stopping
you from having complete control of your factory simply by not using the
newInstance(String) method at all. There is nothing stopping you from
doing:

public static final XPathFactory myfactory = new
MyFactoryImplementation();


Oh, it is hard to keep things straight in my head between what code I have
on my laptop, and what's in the alpha release, so I'll just talk from the
perspective of what's on my laptop now, and what will be in the next Alpha
release.


Rolf


On Fri, 20 Jan 2012 11:34:14 -0800, Leigh L Klotz Jr
<leigh.klotz at xerox.com> wrote:
> On 01/20/2012 05:56 AM, Rolf Lear wrote:
>> 2. new JDOM2 XPathFactory concept which can have different 
>> implementationback-ends (Jaxen, Saxon, whatever).
> +1
>>
>> 3. XPathFactories are thread-safe and reusable in any threads.
>>
> +1
>>
>> 4. have a single 'default' XPathFactory instance obtainable with
>> XPathFactory.instance(). The default back-end instance() can be changed
>> with a system property.
>>
> This is causing me trouble at the moment.  I have to override the 
> XPathFactory, to provide common function definitions and to avoid 
> performance problems that Java classlibrary and JAXP cause.  In JDOM1 I 
> do this in a static class:
> public class JDOMUtil {
>    static {
>      try {
>        XPath.setXPathClass(JaxenXPath.class);
>      } catch (JDOMException e) {
>        throw new RuntimeException(e);
>      }
> }
> 
> I can be assured that it works, and though I'm not sure under what 
> conditions it throws a checked exception, if it does throw one, it's a 
> system startup failure to be debugged by a system engineer.
> 
> With JDOM 2 alpha I have to do this
> 
> // replaced with
> -Dorg.jdom2.xpath.XPathFactory=com.example.jaxen.JaxenXPath
>    static {
>      if 
>
(!(JaxenXPath.class.getName().equals(System.getProperty(JDOMConstants.JDOM2_PROPERTY_XPATH_FACTORY))))
> 
> {
>        throw new RuntimeException(String.format("JDOM Not set up 
> property with -D%=%", JDOMConstants.JDOM2_PROPERTY_XPATH_FACTORY,
>                                                 
> JaxenXPath.class.getName()));
>      }
>    }
> 
> Now I've got JDOM2 dependencies off in a faraway place of Java CLI, 
> where they can easily get lost.
>> 6. Other back-ends can be used at will by calling the
>> XPathFactory.newInstance(String) method (or some direct constructor on 
>> the
>> Factory if it exposes one). 
> This doesn't help me fix the above problem, because all of the 
> ThreadLocal cache logic and pretty entrypoints into the XPath class 
> itself are hardwired to use the System-property defined constructor.  So

> they might as well not be there.
> 
>> 5. the default 'default' back-end will continue to be Jaxen
>>
> 
> Personally, I'd prefer it if you broke this requirement up into a few 
> parts and made it easy to have a Jaxen backend.
> For example, you might say that there's no XPath support without also 
> loading jdom2.jar and jdom2-jaxen.jar.
> Right now, with Jaxen having JDOM1 support built in, and then JDOM2 
> having Jaxen support built in, it causes a bit of circular confusion 
> trying to get things to work.
> 
> If we could configure JDOM to use Saxon and have it get good performance

> without unnecessary recalculations, we'd not even load Jaxen all.
> 
> Leigh.

From leigh.klotz at xerox.com  Fri Jan 20 14:58:43 2012
From: leigh.klotz at xerox.com (Leigh L Klotz Jr)
Date: Fri, 20 Jan 2012 14:58:43 -0800
Subject: [jdom-interest] suggested JDOM2 improvements
In-Reply-To: <e631efb52dbb6ed044e3862c2b979991@tuis.net>
References: <4F152CD8.5030508@uni-jena.de><b368618ea79598ee225d75c7d6e82e1d@tuis.net><4F193347.7040208@atos.net>
	<4F1945BF.30201@saxonica.com>
	<779b646e68bc8d6f49267a572345c616@tuis.net>
	<4F19C1B6.6060406@xerox.com>
	<e631efb52dbb6ed044e3862c2b979991@tuis.net>
Message-ID: <4F19F1A3.2030900@xerox.com>

On 01/20/2012 12:28 PM, Rolf Lear wrote:
>
> On the other hand, because XPathFactory instances are specified to be
> thread-safe, there is nothing stopping you from doing:
>
> public static final XPathFactory XPATH =
> XPathFactory.newInstance("com.example.xpath20.XPathFactory");
>
> Then in your code you can freely use:
>
> XPathCompiled<Object> xp = XPATH.compile("//*");
>
> ...
>
> The best practice would be for you to get your own instance of your own
> factory, then use that instance from wherever you need it.
>
>

I'd like to use a custom factory as you describe above, but right now, 
that makes all public methods on org.jdom2.xpath.XPath useless, because 
they use a static threadlocal factory which can only be the result of 
XPathFactory.newInstance(), which is the DEFAULTFACTORY from 
XPathFactory, which is settable only by the System property:

public abstract class XPath {

     private static final ThreadLocal<XPathFactory> localfactory =
             new ThreadLocal<XPathFactory>();

     public static List<?> selectNodes(final Object context, final 
String path)
             throws JDOMException {
         return newInstance(path).selectNodes(context);
     }

     public static final XPath newInstance(final String path) throws 
JDOMException {
         XPathFactory fac = localfactory.get();
         if (fac == null) {
             fac = XPathFactory.newInstance();
             localfactory.set(fac);
         }
         return fac.compile(path);
     }
}

The reason I use a custom factory is to work around a performance 
problem with Jaxen: 
org.jaxen.saxpath.helpers.XPathReaderFactory.createReader() does an 
expensive synchronized System.getProperty() that causes concurrency 
bottlenecks, and it's done frequently, and there's no way to configure 
Jaxen or JDOM to use a specific implementation class rather than consult 
System.getProperty every time.

To fix this, I have to split apart a whole stack of factory code from 
JDOM and Jaxen, just in order to get at the createReader() method.

Another reason to use a custom XPath factory would be to use the JDOM 
API for XPath to get the work done with Saxon.

So, to summarize, my complaint is that if I want to use a custom XPath 
factory for whatever reason (and I've given two above), I cannot use any 
of the XPath public static methods.

Leigh.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.jdom.org/pipermail/jdom-interest/attachments/20120120/ad08e8ad/attachment.html>

From jdom at tuis.net  Fri Jan 20 15:19:33 2012
From: jdom at tuis.net (Rolf Lear)
Date: Fri, 20 Jan 2012 18:19:33 -0500
Subject: [jdom-interest] suggested JDOM2 improvements
In-Reply-To: <4F19F1A3.2030900@xerox.com>
References: <4F152CD8.5030508@uni-jena.de><b368618ea79598ee225d75c7d6e82e1d@tuis.net><4F193347.7040208@atos.net>
	<4F1945BF.30201@saxonica.com>
	<779b646e68bc8d6f49267a572345c616@tuis.net>
	<4F19C1B6.6060406@xerox.com>
	<e631efb52dbb6ed044e3862c2b979991@tuis.net>
	<4F19F1A3.2030900@xerox.com>
Message-ID: <4F19F685.70703@tuis.net>

Hi Leigh

I think we are both missing something here.

In JDOM2 I'm convinced that XPath is deprecated... so, while it is still 
in the ALPHA at the moment it will have a viable replacement by the next 
ALPHA.

We'll make sure the replacement is 'good' for custom/other XPath backend 
implementations.

Give me a day to polish up a proposed replacement. I think you are 
missing the tricks of the XPathFactory code in the current ALPHA 
release, but there's not much point in fighting it when it is going to 
change anyway.

Rolf

From leigh.klotz at xerox.com  Fri Jan 20 16:44:12 2012
From: leigh.klotz at xerox.com (Leigh L Klotz Jr)
Date: Fri, 20 Jan 2012 16:44:12 -0800
Subject: [jdom-interest] suggested JDOM2 improvements
In-Reply-To: <4F19F685.70703@tuis.net>
References: <4F152CD8.5030508@uni-jena.de><b368618ea79598ee225d75c7d6e82e1d@tuis.net><4F193347.7040208@atos.net>
	<4F1945BF.30201@saxonica.com>
	<779b646e68bc8d6f49267a572345c616@tuis.net>
	<4F19C1B6.6060406@xerox.com>
	<e631efb52dbb6ed044e3862c2b979991@tuis.net>
	<4F19F1A3.2030900@xerox.com> <4F19F685.70703@tuis.net>
Message-ID: <4F1A0A5C.7060409@xerox.com>

No problem, I understand now.
BTW I've decided my desire to replace the XPath factory to work around a 
Jaxen bug is in fact a problem with Jaxen instead; it's just that I have 
no belief I can get the Jaxen bug fixed ever.

Since you're re-working the XPath class I'll hold off on any more 
uninformed comments...

Leigh.


On 01/20/2012 03:19 PM, Rolf Lear wrote:
>
> Hi Leigh
>
> I think we are both missing something here.
>
> In JDOM2 I'm convinced that XPath is deprecated... so, while it is still
> in the ALPHA at the moment it will have a viable replacement by the next
> ALPHA.
>
> We'll make sure the replacement is 'good' for custom/other XPath backend
> implementations.
>
> Give me a day to polish up a proposed replacement. I think you are
> missing the tricks of the XPathFactory code in the current ALPHA
> release, but there's not much point in fighting it when it is going to
> change anyway.
>
> Rolf
>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.jdom.org/pipermail/jdom-interest/attachments/20120120/e16f39f4/attachment.html>

From jdom at tuis.net  Sun Jan 22 18:13:59 2012
From: jdom at tuis.net (Rolf Lear)
Date: Sun, 22 Jan 2012 21:13:59 -0500
Subject: [jdom-interest] JDOM ALPHA - Second Alpha Released
Message-ID: <4F1CC267.6060006@tuis.net>

Hi all.

I have just pushed a second ALPHA release up to github. This release 
contains a new XPath API for JDOM. Please see the page 
https://github.com/hunterhacker/jdom/wiki/JDOM2-Feature:-XPath-Upgrade

If you ae new to the alpha releases please see the wiki pages here:
https://github.com/hunterhacker/jdom/wiki/JDOM2-Features

For those who have played with the first alpha already, the highlights 
of this second alpha release are:

1. new XPath API. The first alpha release had a first-attempt at 
improving the XPath API. That attempt was 'reverted' completely. It has 
been replaced with a second attempt. This second attempt deprecates the 
JDOM 1.x class 'XPath', and introduces a number of new API classes. 
Please see 
https://github.com/hunterhacker/jdom/wiki/JDOM2-Feature:-XPath-Upgrade

2. the entire 'backend' of the *Outputter code has been refactored. This 
change should be transparent to everyone *unless* you oberride/customize 
some outputters (typically XMLOutputter). If you have a customised 
XMLOutputter then your code is basically going to need a big refactor. 
The changes to the Outputter implementations are not yet documented on 
the wiki page, but, essentially, the formatting code and the 'target' 
code have been completely separated. The XMLOutputter no longer has any 
code that deals with the 'look&feel' of the output.


I expect this alpha release to generate a fair amount of discussion 
regarding the XPath API changes. Please take it for a test drive and 
make your opinions known.

I expect to be putting out yet another ALPHA drop in the next week, 
probably with code related to memory efficiency.

Thanks all.

Rolf

From jdom at tuis.net  Mon Jan 23 08:28:37 2012
From: jdom at tuis.net (Rolf Lear)
Date: Mon, 23 Jan 2012 11:28:37 -0500
Subject: [jdom-interest] JDOM and memory
In-Reply-To: <4F02133C.5010704@tuis.net>
References: <4F02133C.5010704@tuis.net>
Message-ID: <a8a0cebf065eb28730e6ef7151ba5a7e@tuis.net>


Hi all.

I have started on this memory optimization, and it is still in early
stages. There is one API issue though:

The Element API has the two methods:

addContent(Content node)
addContent(Collection<? extends Content> newContent)

if I make Element implement List<Content> (which is what this
memory-change will do), then the above two methods become ambiguous because
Element will be both Content and List<Content>

The logical thing to do would be to deprecate addContent(Collection) since
the List.addAll(Collection ...) is the obvious substitute.

In the interim people migrating from JDOM 1.x will have compile errors,
and will have to either:
1. choose to change all addContent() calls where the content is Element to
either add(element), or addAll(element) to add the element or it's content
respectively... - which would make no sense because that would guarantee an
exception because you cannot add an Element's content to some other element
without first detaching it.

The bottom line is that all the addContent* methods are equivalent to the
regular List.add* methods.... and there is no ambiguity in those, it is
either add(Content) or addAll(Collection...)

So far the results look promising. I have a baseline memory footprint that
I am aiming to improve on, and when I have results it will be easier to
discuss whether the changes would be worth the improvements.

But, for now, it would seem impossible to merge ContentList in to Element
without some compatibility problems...

Rolf


On Mon, 02 Jan 2012 15:27:40 -0500, Rolf <jdom at tuis.net> wrote:
> Hi all.
> 
> Memory optimization has never been a top priority for JDOM. At the same 
> time, for what it does, JDOM is not a 'terrible' memory user. Still, I 
> have done some analysis, and, I believe I can trim about a quarter to a 
> half of 'JDOM Overhead' memory usage by making two 'simple' changes....
> 
> The first is to merge the ContentList class in to the Element class (and

> also in to Document). ....

From mike at saxonica.com  Mon Jan 23 08:59:35 2012
From: mike at saxonica.com (Michael Kay)
Date: Mon, 23 Jan 2012 16:59:35 +0000
Subject: [jdom-interest] JDOM and memory
In-Reply-To: <a8a0cebf065eb28730e6ef7151ba5a7e@tuis.net>
References: <4F02133C.5010704@tuis.net>
	<a8a0cebf065eb28730e6ef7151ba5a7e@tuis.net>
Message-ID: <4F1D91F7.5070404@saxonica.com>

On 23/01/2012 16:28, Rolf Lear wrote:
> Hi all.
>
> I have started on this memory optimization, and it is still in early
> stages. There is one API issue though:
>
> The Element API has the two methods:
>
> addContent(Content node)
> addContent(Collection<? extends Content>  newContent)
>
> if I make Element implement List<Content>  (which is what this
> memory-change will do), then the above two methods become ambiguous because
> Element will be both Content and List<Content>
And that suggests to me that it is a bad idea.

The class hierarchy should reflect "is-a" relationships, it shouldn't be 
designed to tweak performance. It's not true that an Element and its 
contents are the same thing, therefore it's wrong to treat them as being 
the same object. It will only lead to confusion.

You can achieve the memory saving by having Element.getChildren() create 
the returned List object dynamically (it doesn't need to copy any data 
to achieve this).

 > The logical thing to do would be to deprecate addContent(Collection)

I don't think that solves the problem. There will be cases where 
existing code fixes up to the wrong method, and ends up adding the 
children of an element to a new parent rather than adding the element 
itself.

Michael Kay
Saxonica

From jdom at tuis.net  Mon Jan 23 09:04:48 2012
From: jdom at tuis.net (Rolf Lear)
Date: Mon, 23 Jan 2012 12:04:48 -0500
Subject: [jdom-interest] JDOM and memory
In-Reply-To: <4F1D91F7.5070404@saxonica.com>
References: <4F02133C.5010704@tuis.net>
	<a8a0cebf065eb28730e6ef7151ba5a7e@tuis.net>
	<4F1D91F7.5070404@saxonica.com>
Message-ID: <3e7472a6ab9c791fd291a7b579a42f53@tuis.net>


Heh... you are right.

Element should not be List<Content>, and the getContent() method can
create a dynamic implementation as needed. That's the solution... Element
already has rules about synchronization so multiple 'active' dynamic
instances should not be a problem....

Thanks. I will play with that concept.

Rolf

On Mon, 23 Jan 2012 16:59:35 +0000, Michael Kay <mike at saxonica.com> wrote:
> On 23/01/2012 16:28, Rolf Lear wrote:
>> Hi all.
>>
>> I have started on this memory optimization, and it is still in early
>> stages. There is one API issue though:
>>
>> The Element API has the two methods:
>>
>> addContent(Content node)
>> addContent(Collection<? extends Content>  newContent)
>>
>> if I make Element implement List<Content>  (which is what this
>> memory-change will do), then the above two methods become ambiguous
>> because
>> Element will be both Content and List<Content>
> And that suggests to me that it is a bad idea.
> 
> The class hierarchy should reflect "is-a" relationships, it shouldn't be

> designed to tweak performance. It's not true that an Element and its 
> contents are the same thing, therefore it's wrong to treat them as being

> the same object. It will only lead to confusion.
> 
> You can achieve the memory saving by having Element.getChildren() create

> the returned List object dynamically (it doesn't need to copy any data 
> to achieve this).
> 
>  > The logical thing to do would be to deprecate addContent(Collection)
> 
> I don't think that solves the problem. There will be cases where 
> existing code fixes up to the wrong method, and ends up adding the 
> children of an element to a new parent rather than adding the element 
> itself.
> 
> Michael Kay
> Saxonica

From jdom at tuis.net  Mon Jan 23 12:15:44 2012
From: jdom at tuis.net (Rolf Lear)
Date: Mon, 23 Jan 2012 15:15:44 -0500
Subject: [jdom-interest] JDOM and memory
In-Reply-To: <CAHzJPEr0Sk3JTPFMVVBAwwOJ989xOZ-GR95QCc8jWTtLXw1MyQ@mail.gmail.com>
References: <4F02133C.5010704@tuis.net>
	<a8a0cebf065eb28730e6ef7151ba5a7e@tuis.net>
	<4F1D91F7.5070404@saxonica.com>
	<3e7472a6ab9c791fd291a7b579a42f53@tuis.net>
	<CAHzJPEr0Sk3JTPFMVVBAwwOJ989xOZ-GR95QCc8jWTtLXw1MyQ@mail.gmail.com>
Message-ID: <8da72d1c7408e42283b49f498eb498d7@tuis.net>


Yes, it is useful.

XOM has nice features, and a lot of that comes from being able to look
back on things with hindsight.... which is a big advantage.

JDOM is not XOM though, and JDOM carries a legacy which is both an
advantage and a disadvantage.

In the past (before I took on JDOM2) I have looked in to XOM, and, having
been a JDOM user I could not personally justify the 'cost' of learning yet
another API that accomplished the same function as JDOM.... it would
require a new jar to deploy, new learning, etc. I can see that someone new
to Java/XML would find XOM appealing... but is it really as good as it
claims? It is hard to tell. What XOM claims to be fluff, others claim to be
useful. What is interesting is reading the list of 'design principles' that
I mostly agree with.... but also some that I don't. 

Now that I know more about JDOM it is interesting to realize that before I
decided to commit myself to JDOM2, one of the considerations I had was
'should I use some other library, or should I make JDOM2 better?' I
investigated XOM then, and decided it was not 'nice', and that I 'prefer'
JDOM. I don't think I am alone in that logical thinking. (I also looked in
to dom4j, DOM, etc.). I did not just 'decide' to do JDOM2, it really is my
belief that all the other libraries are 'behind the curve' when it comes to
usability in the Java5+ world. 

Fundamentally though, I have to change in JDOM what makes sense to change
while taking in to consideration the legacy of JDOM. I think the new
generics application in JDOM2 is very successful from a usability point of
view, and also very compatible with legacy code. It is a 'win'. I don't
think I can agree with Elliotte's comments about the generics
implementation in the Collections API being so broken that it is not worth
using in the XOM API. Not having a List API is one of the big reasons I
didn't try XOM in the past.

So, in the context of this particular mail thread, I strongly believe that
JDOM is doing the right thing by using the List API, and that the
COntentList is a good concept. It is just memeory hungry, and I have make
one failed attempt to make it better. I think the next version will be
right. But removing the List API entirely is a 'bad idea'.

In general I am very happy to borrow good concepts from places, for
example, I specifically looked at how XOM does the XPath implementation,
and was suprised at how minimalist it is. So minimalist that it does not
support the full specification... it simply does not support variables... 

Anyway, I think on the whole that JDOM has a reasonable balance between
being comprehensive and being usable. The API on the whole is well
mannered, and in the JDOM2 work I have done I have changed very little of
the API (other than the XPath stuff). It is all functionally compatible
with 1.x, and while there are small deviations at the technical level they
are all 'replacements' (e.g. using enums instead of int-constants).

Regardless, XOM provides a good comparison of functionality.... and a good
measure of what's right and wrong - at least for the aspects that I have
'checked out'.

I am rambling. If you have any particular concepts in XOM (or any other
library) that you like you should point them out!

Rolf


On Mon, 23 Jan 2012 09:23:36 -0800, Joe Bowbeer <joe.bowbeer at gmail.com>
wrote:
> It may be useful to compare and contrast with XOM?
> 
> http://xom.nu/designprinciples.xhtml#d0e389
> 
> On Mon, Jan 23, 2012 at 9:04 AM, Rolf Lear wrote:
> 
>>
>> Heh... you are right.
>>
>> Element should not be List<Content>, and the getContent() method can
>> create a dynamic implementation as needed. That's the solution...
Element
>> already has rules about synchronization so multiple 'active' dynamic
>> instances should not be a problem....
>>
>> Thanks. I will play with that concept.
>>
>> Rolf
>>


From leigh.klotz at xerox.com  Tue Jan 24 11:26:41 2012
From: leigh.klotz at xerox.com (Leigh L Klotz Jr)
Date: Tue, 24 Jan 2012 11:26:41 -0800
Subject: [jdom-interest] JDOM 1.1.2 / Saxon 9.4.0.1: namespace xmlns=""
 could not be added as a namespace
Message-ID: <4F1F05F1.9030107@xerox.com>

Has anyone encountered this? It doesn't happen with JDOM 1.1.1, but it 
does happen with JDOM 1.1.2.

Vanilla XSLT transform:

<?xml version="1.0"?>
<xsl:transform version="1.0" 
xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:template match="*|@*|text()">
<xsl:copy-of select="." />
</xsl:template>
</xsl:transform>

Document with default namespace change and any attribute on the element: 
FAILS:

<?xml version="1.0" encoding="UTF-8"?>
<description>
<foo xmlns="http://example.com/foo">
<bar x="...">...</bar>
</foo>
</description>

Document with default namespace change and no attribute on the element: 
WORKS:
<?xml version="1.0" encoding="UTF-8"?>
<description>
<foo xmlns="http://example.com/foo">
<bar >...</bar>
</foo>
</description>

Here's the error:

org.jdom.IllegalAddException: The namespace xmlns="" could not be added 
as a namespace to "bar": The namespace prefix "" collides with the 
element namespace prefix
     at org.jdom.Element.addNamespaceDeclaration(Element.java:363)
     at org.jdom.input.SAXHandler.transferNamespaces(SAXHandler.java:714)
     at org.jdom.input.SAXHandler.startElement(SAXHandler.java:563)
     at 
net.sf.saxon.event.ContentHandlerProxy.startContent(ContentHandlerProxy.java:366)
     at 
net.sf.saxon.event.NamespaceReducer.startContent(NamespaceReducer.java:192)
     at 
net.sf.saxon.event.ComplexContentOutputter.startContent(ComplexContentOutputter.java:583)
     at 
net.sf.saxon.tree.tiny.TinyElementImpl.copy(TinyElementImpl.java:350)
     at 
net.sf.saxon.expr.instruct.CopyOf.processLeavingTail(CopyOf.java:510)
     at 
net.sf.saxon.expr.instruct.Template.applyLeavingTail(Template.java:212)
     at net.sf.saxon.trans.Mode.applyTemplates(Mode.java:1032)
     at 
net.sf.saxon.trans.TextOnlyCopyRuleSet.process(TextOnlyCopyRuleSet.java:58)
     at net.sf.saxon.trans.Mode.applyTemplates(Mode.java:1020)
     at net.sf.saxon.Controller.transformDocument(Controller.java:1957)
     at net.sf.saxon.Controller.transform(Controller.java:1803)
     at 
net.sf.saxon.s9api.XsltTransformer.transform(XsltTransformer.java:430)


I'm using this code fragment to tell Saxon9 to serialize to JDOM:

   import net.sf.saxon.s9api.SAXDestination;
   import org.jdom.input.SAXHandler;
   import net.sf.saxon.s9api.Destination;

   SAXHandler saxHandler = new SAXHandler();
   Destination saxDestination = new SAXDestination(saxHandler);
   xsltTransformer.setSource(new JDOMSource(document));
   xsltTransformer.setDestination(saxDestination);
   xsltTransformer.transform();

If this isn't a JDOM bug, then I guess it must be a Saxon one.

Leigh.


From jdom at tuis.net  Tue Jan 24 12:30:57 2012
From: jdom at tuis.net (Rolf Lear)
Date: Tue, 24 Jan 2012 15:30:57 -0500
Subject: [jdom-interest] =?utf-8?b?SkRPTSAxLjEuMiAvIFNheG9uIDkuNC4wLjE6?=
 =?utf-8?q?_namespace_xmlns=3D=22=22_could_not_be_added_as_a_namespace?=
In-Reply-To: <4F1F05F1.9030107@xerox.com>
References: <4F1F05F1.9030107@xerox.com>
Message-ID: <268d84b06cc20c3cbbe72dbec85d1672@tuis.net>


Hi Leigh.

I am at my office so I can't debug this issue right now... and
additionally I have not played with Saxon XSLT code.

but, inspecting the JDOM 1.1.2 code it is 'clear' that the Saxon code
triggered the following Sax 'events':


...
// maybe some other startPrefixMapping(..., ...);  
startPrefixMapping("", "");  // indicate that the "" prefix is linked to
the "" URI
startElement("http://example.com/foo", "bar", "bar", attributes);
...


This is a broken chain of SAX events.... it is indicating that the ""
prefix maps to "" (xmlns=""), but then loads the element in the foo
namespace xmlns="http://example.com/foo"

In the particular examples you cite there should be exactly one
startPrefixMapping("", "") call per document and it should happen before
the 'document' start element (or will it be zero calls for "","" since it
is assumed... I forget).

when the new element processes the 'additional' namespace xmlns="" it
finds that the element itself has the "" prefix, but it is mapped to a
different URI. Hence the exception.

Now, as to why this is different in 1.1.2 vs. 1.1.1 I am not sure.... and
that in itself is suspicious....

If you have the code in hand you can more easily debug the issue...
(easier than me right now...).

I can load it up in a few hours time and inspect it too. I suspect that
the issue is a Saxon one, but then why the difference between 1.1.1 and
1.1.2 ... I am not sure.

Rolf


On Tue, 24 Jan 2012 11:26:41 -0800, Leigh L Klotz Jr
<leigh.klotz at xerox.com> wrote:
> Has anyone encountered this? It doesn't happen with JDOM 1.1.1, but it 
> does happen with JDOM 1.1.2.
> 
> Vanilla XSLT transform:
> 
> <?xml version="1.0"?>
> <xsl:transform version="1.0" 
> xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
> <xsl:template match="*|@*|text()">
> <xsl:copy-of select="." />
> </xsl:template>
> </xsl:transform>
> 
> Document with default namespace change and any attribute on the element:

> FAILS:
> 
> <?xml version="1.0" encoding="UTF-8"?>
> <description>
> <foo xmlns="http://example.com/foo">
> <bar x="...">...</bar>
> </foo>
> </description>
> 
> Document with default namespace change and no attribute on the element: 
> WORKS:
> <?xml version="1.0" encoding="UTF-8"?>
> <description>
> <foo xmlns="http://example.com/foo">
> <bar >...</bar>
> </foo>
> </description>
> 
> Here's the error:
> 
> org.jdom.IllegalAddException: The namespace xmlns="" could not be added 
> as a namespace to "bar": The namespace prefix "" collides with the 
> element namespace prefix
>      at org.jdom.Element.addNamespaceDeclaration(Element.java:363)
>      at
org.jdom.input.SAXHandler.transferNamespaces(SAXHandler.java:714)
>      at org.jdom.input.SAXHandler.startElement(SAXHandler.java:563)
>      at 
>
net.sf.saxon.event.ContentHandlerProxy.startContent(ContentHandlerProxy.java:366)
>      at 
>
net.sf.saxon.event.NamespaceReducer.startContent(NamespaceReducer.java:192)
>      at 
>
net.sf.saxon.event.ComplexContentOutputter.startContent(ComplexContentOutputter.java:583)
>      at 
> net.sf.saxon.tree.tiny.TinyElementImpl.copy(TinyElementImpl.java:350)
>      at 
> net.sf.saxon.expr.instruct.CopyOf.processLeavingTail(CopyOf.java:510)
>      at 
> net.sf.saxon.expr.instruct.Template.applyLeavingTail(Template.java:212)
>      at net.sf.saxon.trans.Mode.applyTemplates(Mode.java:1032)
>      at 
>
net.sf.saxon.trans.TextOnlyCopyRuleSet.process(TextOnlyCopyRuleSet.java:58)
>      at net.sf.saxon.trans.Mode.applyTemplates(Mode.java:1020)
>      at net.sf.saxon.Controller.transformDocument(Controller.java:1957)
>      at net.sf.saxon.Controller.transform(Controller.java:1803)
>      at 
> net.sf.saxon.s9api.XsltTransformer.transform(XsltTransformer.java:430)
> 
> 
> I'm using this code fragment to tell Saxon9 to serialize to JDOM:
> 
>    import net.sf.saxon.s9api.SAXDestination;
>    import org.jdom.input.SAXHandler;
>    import net.sf.saxon.s9api.Destination;
> 
>    SAXHandler saxHandler = new SAXHandler();
>    Destination saxDestination = new SAXDestination(saxHandler);
>    xsltTransformer.setSource(new JDOMSource(document));
>    xsltTransformer.setDestination(saxDestination);
>    xsltTransformer.transform();
> 
> If this isn't a JDOM bug, then I guess it must be a Saxon one.
> 
> Leigh.
> 
> _______________________________________________
> To control your jdom-interest membership:
> http://www.jdom.org/mailman/options/jdom-interest/youraddr at yourhost.com

From leigh.klotz at xerox.com  Tue Jan 24 13:13:19 2012
From: leigh.klotz at xerox.com (Leigh L Klotz Jr)
Date: Tue, 24 Jan 2012 13:13:19 -0800
Subject: [jdom-interest] JDOM 1.1.2 / Saxon 9.4.0.1: namespace xmlns=""
 could not be added as a namespace
In-Reply-To: <268d84b06cc20c3cbbe72dbec85d1672@tuis.net>
References: <4F1F05F1.9030107@xerox.com>
	<268d84b06cc20c3cbbe72dbec85d1672@tuis.net>
Message-ID: <4F1F1EEF.2060603@xerox.com>

Thanks, Rolf.  This is more than enough analysis on your part.  I 
appreciate it.
Leigh.

On 01/24/2012 12:30 PM, Rolf Lear wrote:
>
> Hi Leigh.
>
> I am at my office so I can't debug this issue right now... and
> additionally I have not played with Saxon XSLT code.
>
> but, inspecting the JDOM 1.1.2 code it is 'clear' that the Saxon code
> triggered the following Sax 'events':
>
>
> ...
> // maybe some other startPrefixMapping(..., ...);
> startPrefixMapping("", "");  // indicate that the "" prefix is linked to
> the "" URI
> startElement("http://example.com/foo", "bar", "bar", attributes);
> ...
>
>
> This is a broken chain of SAX events.... it is indicating that the ""
> prefix maps to "" (xmlns=""), but then loads the element in the foo
> namespace xmlns="http://example.com/foo"
>
> In the particular examples you cite there should be exactly one
> startPrefixMapping("", "") call per document and it should happen before
> the 'document' start element (or will it be zero calls for "","" since it
> is assumed... I forget).
>
> when the new element processes the 'additional' namespace xmlns="" it
> finds that the element itself has the "" prefix, but it is mapped to a
> different URI. Hence the exception.
>
> Now, as to why this is different in 1.1.2 vs. 1.1.1 I am not sure.... and
> that in itself is suspicious....
>
> If you have the code in hand you can more easily debug the issue...
> (easier than me right now...).
>
> I can load it up in a few hours time and inspect it too. I suspect that
> the issue is a Saxon one, but then why the difference between 1.1.1 and
> 1.1.2 ... I am not sure.
>
> Rolf
>
>
>
> On Tue, 24 Jan 2012 11:26:41 -0800, Leigh L Klotz Jr
> <leigh.klotz at xerox.com> wrote:
> > Has anyone encountered this? It doesn't happen with JDOM 1.1.1, but it
> > does happen with JDOM 1.1.2.
> >
> > Vanilla XSLT transform:
> >
> > <?xml version="1.0"?>
> > <xsl:transform version="1.0"
> > xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
> > <xsl:template match="*|@*|text()">
> > <xsl:copy-of select="." />
> > </xsl:template>
> > </xsl:transform>
> >
> > Document with default namespace change and any attribute on the 
> element:
>
> > FAILS:
> >
> > <?xml version="1.0" encoding="UTF-8"?>
> > <description>
> > <foo xmlns="http://example.com/foo">
> > <bar x="...">...</bar>
> > </foo>
> > </description>
> >
> > Document with default namespace change and no attribute on the element:
> > WORKS:
> > <?xml version="1.0" encoding="UTF-8"?>
> > <description>
> > <foo xmlns="http://example.com/foo">
> > <bar >...</bar>
> > </foo>
> > </description>
> >
> > Here's the error:
> >
> > org.jdom.IllegalAddException: The namespace xmlns="" could not be added
> > as a namespace to "bar": The namespace prefix "" collides with the
> > element namespace prefix
> >      at org.jdom.Element.addNamespaceDeclaration(Element.java:363)
> >      at
> org.jdom.input.SAXHandler.transferNamespaces(SAXHandler.java:714)
> >      at org.jdom.input.SAXHandler.startElement(SAXHandler.java:563)
> >      at
> >
> net.sf.saxon.event.ContentHandlerProxy.startContent(ContentHandlerProxy.java:366) 
>
> >      at
> >
> net.sf.saxon.event.NamespaceReducer.startContent(NamespaceReducer.java:192) 
>
> >      at
> >
> net.sf.saxon.event.ComplexContentOutputter.startContent(ComplexContentOutputter.java:583) 
>
> >      at
> > net.sf.saxon.tree.tiny.TinyElementImpl.copy(TinyElementImpl.java:350)
> >      at
> > net.sf.saxon.expr.instruct.CopyOf.processLeavingTail(CopyOf.java:510)
> >      at
> > net.sf.saxon.expr.instruct.Template.applyLeavingTail(Template.java:212)
> >      at net.sf.saxon.trans.Mode.applyTemplates(Mode.java:1032)
> >      at
> >
> net.sf.saxon.trans.TextOnlyCopyRuleSet.process(TextOnlyCopyRuleSet.java:58) 
>
> >      at net.sf.saxon.trans.Mode.applyTemplates(Mode.java:1020)
> >      at net.sf.saxon.Controller.transformDocument(Controller.java:1957)
> >      at net.sf.saxon.Controller.transform(Controller.java:1803)
> >      at
> > net.sf.saxon.s9api.XsltTransformer.transform(XsltTransformer.java:430)
> >
> >
> > I'm using this code fragment to tell Saxon9 to serialize to JDOM:
> >
> >    import net.sf.saxon.s9api.SAXDestination;
> >    import org.jdom.input.SAXHandler;
> >    import net.sf.saxon.s9api.Destination;
> >
> >    SAXHandler saxHandler = new SAXHandler();
> >    Destination saxDestination = new SAXDestination(saxHandler);
> >    xsltTransformer.setSource(new JDOMSource(document));
> >    xsltTransformer.setDestination(saxDestination);
> >    xsltTransformer.transform();
> >
> > If this isn't a JDOM bug, then I guess it must be a Saxon one.
> >
> > Leigh.
> >
> > _______________________________________________
> > To control your jdom-interest membership:
> > http://www.jdom.org/mailman/options/jdom-interest/youraddr at yourhost.com
>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.jdom.org/pipermail/jdom-interest/attachments/20120124/ecea23dc/attachment.html>

From jdom at tuis.net  Tue Jan 24 15:34:56 2012
From: jdom at tuis.net (Rolf Lear)
Date: Tue, 24 Jan 2012 18:34:56 -0500
Subject: [jdom-interest] JDOM 1.1.2 / Saxon 9.4.0.1: namespace xmlns=""
 could not be added as a namespace
In-Reply-To: <4F1F1EEF.2060603@xerox.com>
References: <4F1F05F1.9030107@xerox.com>
	<268d84b06cc20c3cbbe72dbec85d1672@tuis.net>
	<4F1F1EEF.2060603@xerox.com>
Message-ID: <4F1F4020.2050006@tuis.net>

Hi Leigh.

I have tracked down the issue. It comes from this change I made here:

https://github.com/hunterhacker/jdom/commit/f026e89780b3259fa049fd223ceaacfee16fce65

So, The Saxon code is getting the event fired from the JDOMSource....
... which in turn is breaking the Saxon side of things ... gigo.

In essence I traded one bug for another.

The original bug was that namespaces used by Attributes were being 
'missed' in the SAX Event stream, but now that they are checked, we need 
to ensure that the no-namespace namespace is excluded.

It is an easy fix, but a slower process to get JDOM 1.1.3 out.

Rolf

On 24/01/2012 4:13 PM, Leigh L Klotz Jr wrote:
> Thanks, Rolf.  This is more than enough analysis on your part.  I
> appreciate it.
> Leigh.
>
> On 01/24/2012 12:30 PM, Rolf Lear wrote:
>>
>> Hi Leigh.
>>
>> I am at my office so I can't debug this issue right now... and
>> additionally I have not played with Saxon XSLT code.
>>
>> but, inspecting the JDOM 1.1.2 code it is 'clear' that the Saxon code
>> triggered the following Sax 'events':
>>
>>
>> ...
>> // maybe some other startPrefixMapping(..., ...);
>> startPrefixMapping("", "");  // indicate that the "" prefix is linked to
>> the "" URI
>> startElement("http://example.com/foo", "bar", "bar", attributes);
>> ...
>>
>>
>> This is a broken chain of SAX events.... it is indicating that the ""
>> prefix maps to "" (xmlns=""), but then loads the element in the foo
>> namespace xmlns="http://example.com/foo"
>>
>> In the particular examples you cite there should be exactly one
>> startPrefixMapping("", "") call per document and it should happen before
>> the 'document' start element (or will it be zero calls for "","" since it
>> is assumed... I forget).
>>
>> when the new element processes the 'additional' namespace xmlns="" it
>> finds that the element itself has the "" prefix, but it is mapped to a
>> different URI. Hence the exception.
>>
>> Now, as to why this is different in 1.1.2 vs. 1.1.1 I am not sure.... and
>> that in itself is suspicious....
>>
>> If you have the code in hand you can more easily debug the issue...
>> (easier than me right now...).
>>
>> I can load it up in a few hours time and inspect it too. I suspect that
>> the issue is a Saxon one, but then why the difference between 1.1.1 and
>> 1.1.2 ... I am not sure.
>>
>> Rolf
>>
>>
>>
>> On Tue, 24 Jan 2012 11:26:41 -0800, Leigh L Klotz Jr
>> <leigh.klotz at xerox.com> wrote:
>> > Has anyone encountered this? It doesn't happen with JDOM 1.1.1, but it
>> > does happen with JDOM 1.1.2.
>> >
>> > Vanilla XSLT transform:
>> >
>> > <?xml version="1.0"?>
>> > <xsl:transform version="1.0"
>> > xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
>> > <xsl:template match="*|@*|text()">
>> > <xsl:copy-of select="." />
>> > </xsl:template>
>> > </xsl:transform>
>> >
>> > Document with default namespace change and any attribute on the
>> element:
>>
>> > FAILS:
>> >
>> > <?xml version="1.0" encoding="UTF-8"?>
>> > <description>
>> > <foo xmlns="http://example.com/foo">
>> > <bar x="...">...</bar>
>> > </foo>
>> > </description>
>> >
>> > Document with default namespace change and no attribute on the element:
>> > WORKS:
>> > <?xml version="1.0" encoding="UTF-8"?>
>> > <description>
>> > <foo xmlns="http://example.com/foo">
>> > <bar >...</bar>
>> > </foo>
>> > </description>
>> >
>> > Here's the error:
>> >
>> > org.jdom.IllegalAddException: The namespace xmlns="" could not be added
>> > as a namespace to "bar": The namespace prefix "" collides with the
>> > element namespace prefix
>> >      at org.jdom.Element.addNamespaceDeclaration(Element.java:363)
>> >      at
>> org.jdom.input.SAXHandler.transferNamespaces(SAXHandler.java:714)
>> >      at org.jdom.input.SAXHandler.startElement(SAXHandler.java:563)
>> >      at
>> >
>> net.sf.saxon.event.ContentHandlerProxy.startContent(ContentHandlerProxy.java:366)
>>
>> >      at
>> >
>> net.sf.saxon.event.NamespaceReducer.startContent(NamespaceReducer.java:192)
>>
>> >      at
>> >
>> net.sf.saxon.event.ComplexContentOutputter.startContent(ComplexContentOutputter.java:583)
>>
>> >      at
>> > net.sf.saxon.tree.tiny.TinyElementImpl.copy(TinyElementImpl.java:350)
>> >      at
>> > net.sf.saxon.expr.instruct.CopyOf.processLeavingTail(CopyOf.java:510)
>> >      at
>> > net.sf.saxon.expr.instruct.Template.applyLeavingTail(Template.java:212)
>> >      at net.sf.saxon.trans.Mode.applyTemplates(Mode.java:1032)
>> >      at
>> >
>> net.sf.saxon.trans.TextOnlyCopyRuleSet.process(TextOnlyCopyRuleSet.java:58)
>>
>> >      at net.sf.saxon.trans.Mode.applyTemplates(Mode.java:1020)
>> >      at net.sf.saxon.Controller.transformDocument(Controller.java:1957)
>> >      at net.sf.saxon.Controller.transform(Controller.java:1803)
>> >      at
>> > net.sf.saxon.s9api.XsltTransformer.transform(XsltTransformer.java:430)
>> >
>> >
>> > I'm using this code fragment to tell Saxon9 to serialize to JDOM:
>> >
>> >    import net.sf.saxon.s9api.SAXDestination;
>> >    import org.jdom.input.SAXHandler;
>> >    import net.sf.saxon.s9api.Destination;
>> >
>> >    SAXHandler saxHandler = new SAXHandler();
>> >    Destination saxDestination = new SAXDestination(saxHandler);
>> >    xsltTransformer.setSource(new JDOMSource(document));
>> >    xsltTransformer.setDestination(saxDestination);
>> >    xsltTransformer.transform();
>> >
>> > If this isn't a JDOM bug, then I guess it must be a Saxon one.
>> >
>> > Leigh.
>> >
>> > _______________________________________________
>> > To control your jdom-interest membership:
>> > http://www.jdom.org/mailman/options/jdom-interest/youraddr at yourhost.com
>>
>


From leigh.klotz at xerox.com  Tue Jan 24 15:52:06 2012
From: leigh.klotz at xerox.com (Leigh L Klotz Jr)
Date: Tue, 24 Jan 2012 15:52:06 -0800
Subject: [jdom-interest] JDOM 1.1.2 / Saxon 9.4.0.1: namespace xmlns=""
 could not be added as a namespace
In-Reply-To: <4F1F4020.2050006@tuis.net>
References: <4F1F05F1.9030107@xerox.com>
	<268d84b06cc20c3cbbe72dbec85d1672@tuis.net>
	<4F1F1EEF.2060603@xerox.com> <4F1F4020.2050006@tuis.net>
Message-ID: <4F1F4426.1050608@xerox.com>

Thanks!  I'll stick with 1.1.1 if this isn't easily fixed.

Leigh.

On 01/24/2012 03:34 PM, Rolf Lear wrote:
>
> Hi Leigh.
>
> I have tracked down the issue. It comes from this change I made here:
>
> https://github.com/hunterhacker/jdom/commit/f026e89780b3259fa049fd223ceaacfee16fce65 
>
>
> So, The Saxon code is getting the event fired from the JDOMSource....
> ... which in turn is breaking the Saxon side of things ... gigo.
>
> In essence I traded one bug for another.
>
> The original bug was that namespaces used by Attributes were being
> 'missed' in the SAX Event stream, but now that they are checked, we need
> to ensure that the no-namespace namespace is excluded.
>
> It is an easy fix, but a slower process to get JDOM 1.1.3 out.
>
> Rolf
>
> On 24/01/2012 4:13 PM, Leigh L Klotz Jr wrote:
> > Thanks, Rolf.  This is more than enough analysis on your part.  I
> > appreciate it.
> > Leigh.
> >
> > On 01/24/2012 12:30 PM, Rolf Lear wrote:
> >>
> >> Hi Leigh.
> >>
> >> I am at my office so I can't debug this issue right now... and
> >> additionally I have not played with Saxon XSLT code.
> >>
> >> but, inspecting the JDOM 1.1.2 code it is 'clear' that the Saxon code
> >> triggered the following Sax 'events':
> >>
> >>
> >> ...
> >> // maybe some other startPrefixMapping(..., ...);
> >> startPrefixMapping("", "");  // indicate that the "" prefix is 
> linked to
> >> the "" URI
> >> startElement("http://example.com/foo", "bar", "bar", attributes);
> >> ...
> >>
> >>
> >> This is a broken chain of SAX events.... it is indicating that the ""
> >> prefix maps to "" (xmlns=""), but then loads the element in the foo
> >> namespace xmlns="http://example.com/foo"
> >>
> >> In the particular examples you cite there should be exactly one
> >> startPrefixMapping("", "") call per document and it should happen 
> before
> >> the 'document' start element (or will it be zero calls for "","" 
> since it
> >> is assumed... I forget).
> >>
> >> when the new element processes the 'additional' namespace xmlns="" it
> >> finds that the element itself has the "" prefix, but it is mapped to a
> >> different URI. Hence the exception.
> >>
> >> Now, as to why this is different in 1.1.2 vs. 1.1.1 I am not 
> sure.... and
> >> that in itself is suspicious....
> >>
> >> If you have the code in hand you can more easily debug the issue...
> >> (easier than me right now...).
> >>
> >> I can load it up in a few hours time and inspect it too. I suspect 
> that
> >> the issue is a Saxon one, but then why the difference between 1.1.1 
> and
> >> 1.1.2 ... I am not sure.
> >>
> >> Rolf
> >>
> >>
> >>
> >> On Tue, 24 Jan 2012 11:26:41 -0800, Leigh L Klotz Jr
> >> <leigh.klotz at xerox.com> wrote:
> >> > Has anyone encountered this? It doesn't happen with JDOM 1.1.1, 
> but it
> >> > does happen with JDOM 1.1.2.
> >> >
> >> > Vanilla XSLT transform:
> >> >
> >> > <?xml version="1.0"?>
> >> > <xsl:transform version="1.0"
> >> > xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
> >> > <xsl:template match="*|@*|text()">
> >> > <xsl:copy-of select="." />
> >> > </xsl:template>
> >> > </xsl:transform>
> >> >
> >> > Document with default namespace change and any attribute on the
> >> element:
> >>
> >> > FAILS:
> >> >
> >> > <?xml version="1.0" encoding="UTF-8"?>
> >> > <description>
> >> > <foo xmlns="http://example.com/foo">
> >> > <bar x="...">...</bar>
> >> > </foo>
> >> > </description>
> >> >
> >> > Document with default namespace change and no attribute on the 
> element:
> >> > WORKS:
> >> > <?xml version="1.0" encoding="UTF-8"?>
> >> > <description>
> >> > <foo xmlns="http://example.com/foo">
> >> > <bar >...</bar>
> >> > </foo>
> >> > </description>
> >> >
> >> > Here's the error:
> >> >
> >> > org.jdom.IllegalAddException: The namespace xmlns="" could not be 
> added
> >> > as a namespace to "bar": The namespace prefix "" collides with the
> >> > element namespace prefix
> >> >      at org.jdom.Element.addNamespaceDeclaration(Element.java:363)
> >> >      at
> >> org.jdom.input.SAXHandler.transferNamespaces(SAXHandler.java:714)
> >> >      at org.jdom.input.SAXHandler.startElement(SAXHandler.java:563)
> >> >      at
> >> >
> >> 
> net.sf.saxon.event.ContentHandlerProxy.startContent(ContentHandlerProxy.java:366)
> >>
> >> >      at
> >> >
> >> 
> net.sf.saxon.event.NamespaceReducer.startContent(NamespaceReducer.java:192)
> >>
> >> >      at
> >> >
> >> 
> net.sf.saxon.event.ComplexContentOutputter.startContent(ComplexContentOutputter.java:583)
> >>
> >> >      at
> >> > 
> net.sf.saxon.tree.tiny.TinyElementImpl.copy(TinyElementImpl.java:350)
> >> >      at
> >> > 
> net.sf.saxon.expr.instruct.CopyOf.processLeavingTail(CopyOf.java:510)
> >> >      at
> >> > 
> net.sf.saxon.expr.instruct.Template.applyLeavingTail(Template.java:212)
> >> >      at net.sf.saxon.trans.Mode.applyTemplates(Mode.java:1032)
> >> >      at
> >> >
> >> 
> net.sf.saxon.trans.TextOnlyCopyRuleSet.process(TextOnlyCopyRuleSet.java:58)
> >>
> >> >      at net.sf.saxon.trans.Mode.applyTemplates(Mode.java:1020)
> >> >      at 
> net.sf.saxon.Controller.transformDocument(Controller.java:1957)
> >> >      at net.sf.saxon.Controller.transform(Controller.java:1803)
> >> >      at
> >> > 
> net.sf.saxon.s9api.XsltTransformer.transform(XsltTransformer.java:430)
> >> >
> >> >
> >> > I'm using this code fragment to tell Saxon9 to serialize to JDOM:
> >> >
> >> >    import net.sf.saxon.s9api.SAXDestination;
> >> >    import org.jdom.input.SAXHandler;
> >> >    import net.sf.saxon.s9api.Destination;
> >> >
> >> >    SAXHandler saxHandler = new SAXHandler();
> >> >    Destination saxDestination = new SAXDestination(saxHandler);
> >> >    xsltTransformer.setSource(new JDOMSource(document));
> >> >    xsltTransformer.setDestination(saxDestination);
> >> >    xsltTransformer.transform();
> >> >
> >> > If this isn't a JDOM bug, then I guess it must be a Saxon one.
> >> >
> >> > Leigh.
> >> >
> >> > _______________________________________________
> >> > To control your jdom-interest membership:
> >> > 
> http://www.jdom.org/mailman/options/jdom-interest/youraddr at yourhost.com
> >>
> >
>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.jdom.org/pipermail/jdom-interest/attachments/20120124/78871971/attachment.html>

From jdom at tuis.net  Wed Jan 25 06:42:27 2012
From: jdom at tuis.net (Rolf Lear)
Date: Wed, 25 Jan 2012 09:42:27 -0500
Subject: [jdom-interest] JDOM 1.x release schedule
Message-ID: <98cae462f08b0dcd7161b15737854dfc@tuis.net>


Hi All.

Given the bug fix pending in the JDOM 1.1.x stream I believe a new release
of the JDOM 1.1.x is required.

On the other hand I do not want to be releasing 1.1.x versions for every
issue that arises.

I think a compromise schedule is viable, and it goes something like this:

1. build the current JDOM 1.x stream with the current bug fix in it and
post it on the github download page. Call it
JDOM.1.1.x.hotfix.2012.01.25.zip
2. if any additional bug fixes are needed another hotfix package will be
built.
3. at some fixed point in time we schedule a formal 1.1.3 release that
contains all the fixes.
4. if any bug comes up that is considered to be 'critical' an
earlier-than-schedule release could be made.

In this case, I think 1st March 2012 is a good candidate date... 5 weeks
from now.

Later today I will build the current JDOM 1.x code base as
JDOM.1.1.x.hotfix.2012.01.25 and I will post it to github.
If any other issues arise I will create hotfix updates to address them.
On March 1st I will rebuild the JDOM code again as 1.1.3 and do the formal
release process to www.jdom.org as well as maven-central.

Does this sound like a viable process?

Rolf

From olivier.jaquemet at jalios.com  Wed Jan 25 06:57:44 2012
From: olivier.jaquemet at jalios.com (Olivier Jaquemet)
Date: Wed, 25 Jan 2012 15:57:44 +0100
Subject: [jdom-interest] JDOM 1.x release schedule
In-Reply-To: <98cae462f08b0dcd7161b15737854dfc@tuis.net>
References: <98cae462f08b0dcd7161b15737854dfc@tuis.net>
Message-ID: <4F201868.50002@jalios.com>

Hi Rolf,

This process sounds good to me.
It does provide a valid and official build for people needing quick fixes.
But other users looking for a more "long term support" release are thus 
not required to update too often.

Olivier

On 25/01/2012 15:42, Rolf Lear wrote:
> Hi All.
>
> Given the bug fix pending in the JDOM 1.1.x stream I believe a new release
> of the JDOM 1.1.x is required.
>
> On the other hand I do not want to be releasing 1.1.x versions for every
> issue that arises.
>
> I think a compromise schedule is viable, and it goes something like this:
>
> 1. build the current JDOM 1.x stream with the current bug fix in it and
> post it on the github download page. Call it
> JDOM.1.1.x.hotfix.2012.01.25.zip
> 2. if any additional bug fixes are needed another hotfix package will be
> built.
> 3. at some fixed point in time we schedule a formal 1.1.3 release that
> contains all the fixes.
> 4. if any bug comes up that is considered to be 'critical' an
> earlier-than-schedule release could be made.
>
> In this case, I think 1st March 2012 is a good candidate date... 5 weeks
> from now.
>
> Later today I will build the current JDOM 1.x code base as
> JDOM.1.1.x.hotfix.2012.01.25 and I will post it to github.
> If any other issues arise I will create hotfix updates to address them.
> On March 1st I will rebuild the JDOM code again as 1.1.3 and do the formal
> release process to www.jdom.org as well as maven-central.
>
> Does this sound like a viable process?
>
> Rolf
> _______________________________________________
> To control your jdom-interest membership:
> http://www.jdom.org/mailman/options/jdom-interest/youraddr at yourhost.com
>

-- 
Olivier Jaquemet<olivier.jaquemet at jalios.com>
Ing?nieur R&D Jalios S.A. - http://www.jalios.com/
@OlivierJaquemet +33970461480


From jdom at tuis.net  Sat Jan 28 07:24:20 2012
From: jdom at tuis.net (Rolf Lear)
Date: Sat, 28 Jan 2012 10:24:20 -0500
Subject: [jdom-interest] JDOM 1.x release schedule
In-Reply-To: <98cae462f08b0dcd7161b15737854dfc@tuis.net>
References: <98cae462f08b0dcd7161b15737854dfc@tuis.net>
Message-ID: <4F241324.8090406@tuis.net>

Hi all.

I believe 1.1.2 is a more reliable version of JDOM than 1.1.1. 
Unfortunately there is already one known new issue 1.1.2 related to 
people using the SAXOutputter (which is used in XML Transformations).

This issue will be resolved in 1.1.3.

Until 1.1.3 is released though there is a 'hotfix' for this issue here:
https://github.com/hunterhacker/jdom/downloads
Download the jdom-1.1.2.hf1.zip file. The direct link is:
https://github.com/downloads/hunterhacker/jdom/jdom-1.1.2.hf1.zip

This zip file is in the same format that you would normally download 
from www.jdom.org.

If any other issues come up with 1.1.2 they will be fixed and released 
as a second hotfix package.

All issues found and fixed before 1 March 2012 will be accumulated and 
released as a 1.1.3 on that date.

If you are currently running with 1.1.2 please continue to do so. If you 
run in to any issues please report them here on this list, check the 
open and recently fixed issues on github of issues found in 1.1.2: 
https://github.com/hunterhacker/jdom/issues?labels=found+in+1.1.2

Despite recent evidence to the contrary, I do believe that 1.1.2 is more 
stable than 1.1.1.

If you have an issue in 1.1.2 and it has been resolved in the issues 
list above, then please use the most recent 1.1.2 hotfix Jar from the 
downloads page.

Thanks.

Rolf


On 25/01/2012 9:42 AM, Rolf Lear wrote:
>
> Hi All.
>
> Given the bug fix pending in the JDOM 1.1.x stream I believe a new release
> of the JDOM 1.1.x is required.
>
> On the other hand I do not want to be releasing 1.1.x versions for every
> issue that arises.
>


From jdom at tuis.net  Sat Jan 28 08:38:32 2012
From: jdom at tuis.net (Rolf Lear)
Date: Sat, 28 Jan 2012 11:38:32 -0500
Subject: [jdom-interest] JDOM and memory
In-Reply-To: <4F02133C.5010704@tuis.net>
References: <4F02133C.5010704@tuis.net>
Message-ID: <4F242488.4000708@tuis.net>

Hi All ... An update...

I have played with a number of options, and have not had significant 
success with any.

Merging Content-list in to Element has a number of problems:
1. Document and Element end up duplicating a lot of code
2. It changes the API of Document and Element with it implementing 
List<Content>

Document and Element almost always contain content... it is seldom that 
you have empty Elements (there is normally some text at least). As a 
result, the savings of not having to have a content array are limited.

There can be some saving in not having a separate object as the list, 
but it does not amount to much. Given the issues with the API this 
approach does not make sense.

Michael Kay suggested keeping the ContentList independent of the 
Element, and creating an instance when it was referenced in 
getContent(). The problem with this is that the management of 
ConcurrentModification becomes very complicated, and, as far as I can 
tell, essentially impossible if there are multiple differet instances of 
the ContentList class for any particular Element. Given that almost all 
Element instances have content, it is not worth the effort to lose the 
ConcurrentModification control, and not actually save any memory in a 
typical use case.

So, neither option for changing the ContentList system is very successful.

On the other hand, it is relatively common to have no Attributes on an 
Element, and some careful changes to the Element class (adding a 
hasAttributes() method and making the AttributeList variable a 'lazy' 
initialised field) this means that in ideal cases we never need to 
actually create an AttributeList instance for the Element. This has a 
significant impact on the 'hamlet' test, where there are essentially no 
attributes. It has no 'negative' impact on memory in the worst case 
either, and it has positive (small but significant) impact on performance.

So, the lazy initialization of AttributeList is a 'win'.

Finally, I have in the past had some success with the concept of 
'reusing' String values. XML Parsers (like SAX, etc.) typically create a 
new String instance for all the variables they pass. For example, the 
Element names, prefixes, etc. are all new instances of String. Thus, if 
you have hundreds of Elements called 'car' in your input XML, you will 
get hundreds of different String Element names with the value 'car'. I 
have built a class that does something similar to String.intern() in 
order to rationalize the hundreds of different-but-equals() values that 
are passed in by the parsers.

I have incorporated this 'caching' class in to a new JDOMFactory called 
'SlimJDOMFactory'. This factory 'normalizes' all String values to a 
single instance of each unique String value. This significantly reduces 
the amount of memory used in the JDOM tree especially if there are lots 
of: similarly named attributes, elements, white-space-padding in 
otherwise empty elements, or between elements. This process is 
significantly slower through...

For example, with the 'hamlet' test case, the 'baseline' memory 
footprint for hamlet in JDOM is 2.27MB in 4.75ms.
With the SlimJDOMFactory it is: 1.77MB in 8.5ms
With Lazy AttributeList it is: 2.06MB in 4.55ms
With the both it is 1.57MB in 8.3ms

I am pushing both of these changes in to github. The AttributeList is an 
easy one to justify. It is fully compatible with prior code, it has 
positive memory and perfomance impacts.

The SlimJDOMFactory is also justifiable when you consider:
1. the user has to decide to use it specifically.
2. The memory saving can be very significant.
3. Even though the parse time is slower, the GC time savings can be 
significant if the document 'hangs around' for a long time - the quicker 
GC time can add up fast.
4. When you have lots of code doing comparisons it is much faster to do 
equals() calls on Strings that are == as well. It saves a hashCode 
calculation as well as a string character scan to prove equals().

Rolf

On 02/01/2012 3:27 PM, Rolf wrote:
> Hi all.
>
> Memory optimization has never been a top priority for JDOM. At the same
> time, for what it does, JDOM is not a 'terrible' memory user. Still, I
> have done some analysis, and, I believe I can trim about a quarter to a
> half of 'JDOM Overhead' memory usage by making two 'simple' changes....
>
> The first is to merge the ContentList class in to the Element class (and
> also in to Document). This will reduce the number of Java objects by
> about half, and that will save about 32 bytes per Element at a minimum
> in a 64-bit JRE. Additionally, by lazy-initialization of the Content
> array, we can save memory on otherwise 'empty' Elements.
>
> This can be done by extending the Element (and perhaps Document) class
> to extend 'List'. It can all be done in a 'backward compatible' way, but
> also leads to some interesting possibilities, like:
>
> for (Content c : element) {
> ... do something
> }
>
> (for backward compatibility, Element.getContent() will return 'this').
>
>
> The second change is to make the AttributeList instance in Element a
> lazy-initialization. This would save memory on all Elements that have no
> attributes, but would have an impact for people who sub-class the
> Element class and may expect the attributes field to be non-null.
>
>
> I am trying to get a feel for how important this sort of optimization
> may be. If there is interest then I will make some changes, and test the
> impact. I may make a separate branch in github to test it out....
>
> If the above changes are unrealistic then I don't think it makes sense
> to even try....
>
> Rolf
> _______________________________________________
> To control your jdom-interest membership:
> http://www.jdom.org/mailman/options/jdom-interest/youraddr at yourhost.com
>


From mike at saxonica.com  Sat Jan 28 10:37:43 2012
From: mike at saxonica.com (Michael Kay)
Date: Sat, 28 Jan 2012 18:37:43 +0000
Subject: [jdom-interest] JDOM and memory
In-Reply-To: <4F242488.4000708@tuis.net>
References: <4F02133C.5010704@tuis.net> <4F242488.4000708@tuis.net>
Message-ID: <4F244077.9050901@saxonica.com>


>
>
> Finally, I have in the past had some success with the concept of 
> 'reusing' String values. XML Parsers (like SAX, etc.) typically create 
> a new String instance for all the variables they pass. For example, 
> the Element names, prefixes, etc. are all new instances of String. 
> Thus, if you have hundreds of Elements called 'car' in your input XML, 
> you will get hundreds of different String Element names with the value 
> 'car'. I have built a class that does something similar to 
> String.intern() in order to rationalize the hundreds of 
> different-but-equals() values that are passed in by the parsers.
Have you measured how your optimization compares with the effect of 
setting the http://xml.org/sax/features/string-interning property on the 
SAX parser?

Are you doing the interning in a way that guarantees strings can be 
compared using "==", and if so, are you taking advantage of this when 
doing the comparisons? .The big win comes with XPath searches such as 
//x. Does the interning introduce any synchronization? (This is the big 
disadvantage with Saxon's NamePool - it speeds up XPath searching 
substantially, but the contention in a highly concurrent workload can 
become quite significant.)

Are you pooling the QName as a whole, or the local name, prefix and URI 
separately?

Michael Kay
Saxonica
>
> I have incorporated this 'caching' class in to a new JDOMFactory 
> called 'SlimJDOMFactory'. This factory 'normalizes' all String values 
> to a single instance of each unique String value. This significantly 
> reduces the amount of memory used in the JDOM tree especially if there 
> are lots of: similarly named attributes, elements, white-space-padding 
> in otherwise empty elements, or between elements. This process is 
> significantly slower through...
>
> For example, with the 'hamlet' test case, the 'baseline' memory 
> footprint for hamlet in JDOM is 2.27MB in 4.75ms.
> With the SlimJDOMFactory it is: 1.77MB in 8.5ms
> With Lazy AttributeList it is: 2.06MB in 4.55ms
> With the both it is 1.57MB in 8.3ms
>
> I am pushing both of these changes in to github. The AttributeList is 
> an easy one to justify. It is fully compatible with prior code, it has 
> positive memory and perfomance impacts.
>
> The SlimJDOMFactory is also justifiable when you consider:
> 1. the user has to decide to use it specifically.
> 2. The memory saving can be very significant.
> 3. Even though the parse time is slower, the GC time savings can be 
> significant if the document 'hangs around' for a long time - the 
> quicker GC time can add up fast.
> 4. When you have lots of code doing comparisons it is much faster to 
> do equals() calls on Strings that are == as well. It saves a hashCode 
> calculation as well as a string character scan to prove equals().
>
> Rolf
>
> On 02/01/2012 3:27 PM, Rolf wrote:
>> Hi all.
>>
>> Memory optimization has never been a top priority for JDOM. At the same
>> time, for what it does, JDOM is not a 'terrible' memory user. Still, I
>> have done some analysis, and, I believe I can trim about a quarter to a
>> half of 'JDOM Overhead' memory usage by making two 'simple' changes....
>>
>> The first is to merge the ContentList class in to the Element class (and
>> also in to Document). This will reduce the number of Java objects by
>> about half, and that will save about 32 bytes per Element at a minimum
>> in a 64-bit JRE. Additionally, by lazy-initialization of the Content
>> array, we can save memory on otherwise 'empty' Elements.
>>
>> This can be done by extending the Element (and perhaps Document) class
>> to extend 'List'. It can all be done in a 'backward compatible' way, but
>> also leads to some interesting possibilities, like:
>>
>> for (Content c : element) {
>> ... do something
>> }
>>
>> (for backward compatibility, Element.getContent() will return 'this').
>>
>>
>> The second change is to make the AttributeList instance in Element a
>> lazy-initialization. This would save memory on all Elements that have no
>> attributes, but would have an impact for people who sub-class the
>> Element class and may expect the attributes field to be non-null.
>>
>>
>> I am trying to get a feel for how important this sort of optimization
>> may be. If there is interest then I will make some changes, and test the
>> impact. I may make a separate branch in github to test it out....
>>
>> If the above changes are unrealistic then I don't think it makes sense
>> to even try....
>>
>> Rolf
>> _______________________________________________
>> To control your jdom-interest membership:
>> http://www.jdom.org/mailman/options/jdom-interest/youraddr at yourhost.com
>>
>
> _______________________________________________
> To control your jdom-interest membership:
> http://www.jdom.org/mailman/options/jdom-interest/youraddr at yourhost.com
>


From jdom at tuis.net  Sat Jan 28 11:42:02 2012
From: jdom at tuis.net (Rolf Lear)
Date: Sat, 28 Jan 2012 14:42:02 -0500
Subject: [jdom-interest] JDOM and memory
In-Reply-To: <4F244077.9050901@saxonica.com>
References: <4F02133C.5010704@tuis.net> <4F242488.4000708@tuis.net>
	<4F244077.9050901@saxonica.com>
Message-ID: <4F244F8A.5020709@tuis.net>

On 28/01/2012 1:37 PM, Michael Kay wrote:
>
>>
>>
>> Finally, I have in the past had some success with the concept of
>> 'reusing' String values. XML Parsers (like SAX, etc.) typically create
>> a new String instance for all the variables they pass. For example,
>> the Element names, prefixes, etc. are all new instances of String.
>> Thus, if you have hundreds of Elements called 'car' in your input XML,
>> you will get hundreds of different String Element names with the value
>> 'car'. I have built a class that does something similar to
>> String.intern() in order to rationalize the hundreds of
>> different-but-equals() values that are passed in by the parsers.
> Have you measured how your optimization compares with the effect of
> setting the http://xml.org/sax/features/string-interning property on the
> SAX parser?
>
> Are you doing the interning in a way that guarantees strings can be
> compared using "==", and if so, are you taking advantage of this when
> doing the comparisons? .The big win comes with XPath searches such as
> //x. Does the interning introduce any synchronization? (This is the big
> disadvantage with Saxon's NamePool - it speeds up XPath searching
> substantially, but the contention in a highly concurrent workload can
> become quite significant.)
>
> Are you pooling the QName as a whole, or the local name, prefix and URI
> separately?
>
> Michael Kay
> Saxonica

Hi Michael,

In answer to your questions...

no, I have not compared against string-interning property. I was not 
aware of that. But, reading the documentation, it says: All element 
names, prefixes, attribute names, Namespace URIs, and local names are 
internalized using java.lang.String.intern.

This is *not* a good thing. String.intern() uses PermGen space to intern 
the value (as if the value is a String constant in the code). PermGen 
space is typically limited to a hundred or so megabytes. I have, in the 
past, run in to significant issues where you get OutOfMemory issues when 
String.intern is used liberally.... and changing -Xmx makes no 
difference... very confusing the first time you run in to that....

So, I have not compared, to string-intern of the SAX parser. And I would 
not recommend that people use that unless they know what they are doing, 
and what sort of data they have.

The mechanism I do use is based on previous experience with this sort of 
problem, and it works by doing a memory-efficient hash-table to store 
unique instances of String. Subsequent lookups in to the hash table 
return the previously stored string value, if any. Because the 
hash-table is not a global hash table, and because it is not linked in 
to any core Java structures, you cannot guarantee == based comparisons, 
but, in many cases, the String.equals() returns immediately because you 
are in fact comparing identical instances and the first linke of 
String.equals() does the == comparison.

My method does not use any synchronization, and I expect each JDOM 
builder to have it's own cache, possibly for the duration of a single 
parse only. It makes a difference on small-scale items only. I have in 
the past built a thread-safe and 'global' type cache using similar 
principles, and it is a good concept, but it would be overkill for here. 
With JDOM in particular you do not want large memory structures hanging 
around... and limiting this cache to a single builder is about the right 
sort of compromise. Further, because I have implemented in a new 
JDOMFactory implementation, it is easy for the JDOM user to manage how 
long it lives for, and they can call the SlimJDOMFactory.clearCache() to 
remove any previously cached String instances. In other words, the JDOM 
user can use it as much or as little as they want ( but not concurrently)

In my testing the Jaxen-based XPath expressions are in fact faster with 
the 'cached' string values ... about 1ms faster on a 30ms process... not 
very significant (not significant enough to be purely attributable to 
that ...).

So, it is a single-threaded cache that reuses previously cached values. 
It can be applied to a single, or consecutive processes, and the cache 
itself is available outside the SlimJDOMFactory if people want to borrow 
that code in their own way.

In my experience, the benefit of this sort of caching is most obvious in 
a GC - monitored environment where the GC times can be substantially 
shortened.... but not easily measured.

Rolf

From jdom at tuis.net  Sat Jan 28 14:02:32 2012
From: jdom at tuis.net (Rolf Lear)
Date: Sat, 28 Jan 2012 17:02:32 -0500
Subject: [jdom-interest] JDOM and memory
In-Reply-To: <45B3B70B-6BB9-4D18-A3D9-5B5844948B9D@hoplahup.net>
References: <4F02133C.5010704@tuis.net> <4F242488.4000708@tuis.net>
	<4F244077.9050901@saxonica.com> <4F244F8A.5020709@tuis.net>
	<45B3B70B-6BB9-4D18-A3D9-5B5844948B9D@hoplahup.net>
Message-ID: <4F247078.2050102@tuis.net>


public class OOM {
	public static void main(String[] args) {
		int i = 0;
		String[] strings = new String[10000000];
		try {
			while (true) {
				i++;
				strings[i] = ("Number " + i).intern();
				if (0 == (i % 100000)) {
					System.out.println(strings[i]);
				}
			}
		} catch (Throwable t) {
			System.out.println("Last was " + i);
		}
	}
}


.....
Number 700000
Number 800000
Number 900000
Exception in thread "RMI TCP Connection(idle)" 
java.lang.OutOfMemoryError: PermGen space
Last was 984460


I had to store the result in the 'strings' array... I learned something 
... Java 6 does GC in the perm-gen space.... I watched it clearing out 
the values in the JVisualVM monitor.... but keeping a reference to the 
intern'd string causes OOM as expected.

In many places 1,000,000 strings is not a lot....

Rolf


On 28/01/2012 4:17 PM, Paul Libbrecht wrote:
> Interesting,
>
> the very first thing I did when writing OmdocJdom, a library with
> subclasses for each element type, is to use string-interning. I do not
> believe you can reach Out-Of-Memory by having such a diversity in
> element names, prefixes, etc... unless you are building a kind of super
> generic editor or modifier. 100Mb of strings is quite a lot (far more
> than all DTDs I've been touching thus far in my life I believe). We
> never ran into OOM for this (but with Lucene we did).
>
> paul
>
>
> Le 28 janv. 2012 ? 20:42, Rolf Lear a ?crit :
>
>> no, I have not compared against string-interning property. I was not
>> aware of that. But, reading the documentation, it says: All element
>> names, prefixes, attribute names, Namespace URIs, and local names are
>> internalized using java.lang.String.intern.
>>
>> This is *not* a good thing. String.intern() uses PermGen space to
>> intern the value (as if the value is a String constant in the code).
>> PermGen space is typically limited to a hundred or so megabytes. I
>> have, in the past, run in to significant issues where you get
>> OutOfMemory issues when String.intern is used liberally.... and
>> changing -Xmx makes no difference... very confusing the first time you
>> run in to that....
>>
>> So, I have not compared, to string-intern of the SAX parser. And I
>> would not recommend that people use that unless they know what they
>> are doing, and what sort of data they have.
>


From mike at saxonica.com  Sat Jan 28 14:31:20 2012
From: mike at saxonica.com (Michael Kay)
Date: Sat, 28 Jan 2012 22:31:20 +0000
Subject: [jdom-interest] JDOM and memory
In-Reply-To: <4F247078.2050102@tuis.net>
References: <4F02133C.5010704@tuis.net> <4F242488.4000708@tuis.net>
	<4F244077.9050901@saxonica.com> <4F244F8A.5020709@tuis.net>
	<45B3B70B-6BB9-4D18-A3D9-5B5844948B9D@hoplahup.net>
	<4F247078.2050102@tuis.net>
Message-ID: <4F247738.9080207@saxonica.com>


> In many places 1,000,000 strings is not a lot....
>
The Saxon NamePool is optimized for much lower numbers than this: it's 
rare to have more than a couple of thousand element and attribute names. 
The only time I've seen large numbers reached is with pathological 
applications that generate random namespace prefixes.

Michael Kay
Saxonica

From jdom at tuis.net  Sat Jan 28 14:56:47 2012
From: jdom at tuis.net (Rolf Lear)
Date: Sat, 28 Jan 2012 17:56:47 -0500
Subject: [jdom-interest] JDOM and memory
In-Reply-To: <df032ece-6d14-4b24-bdd5-6556156f893d@email.android.com>
References: <4F02133C.5010704@tuis.net> <4F242488.4000708@tuis.net>
	<4F244077.9050901@saxonica.com> <4F244F8A.5020709@tuis.net>
	<45B3B70B-6BB9-4D18-A3D9-5B5844948B9D@hoplahup.net>
	<4F247078.2050102@tuis.net>
	<df032ece-6d14-4b24-bdd5-6556156f893d@email.android.com>
Message-ID: <4F247D2F.2000803@tuis.net>

(I did a reply, not reply all, so it did not go to the list).

I disagree.... it is element names, attribute names, and in the case of 
SlimJDOMFactory, it is the XML Text content (whitespace padding between 
elements is ripe for reuse). If you put JDOM in something like a TomCat 
server with long-running applications, the PermGen space dies pretty 
fast.... especially with the way that tomcat has multiple classloaders, etc.

I consider it to be bad practice for a library to make routine use of 
the PermGen space.

I modified the example slightly... added timing to it... and then I 
compared it to the StringBin tool I built.... ;-)

Here are the two code examples:

public class OOM {
	public static void main(String[] args) {
		int i = 0;
		String[] strings = new String[10000000];
		long time = System.currentTimeMillis();
		try {
			while (true) {
				i++;
				strings[i] = ("Number " + i).intern();
				if (0 == (i % 100000)) {
					System.out.printf("%s at %.4f/ms\n", strings[i], (1.0 * i) / 
(System.currentTimeMillis() - time));
				}
			}
		} catch (Error t) {
			System.out.println("Last was " + i);
			throw t;
		}
	}
}


and second example:

public class OOMSB {
	public static void main(String[] args) {
		int i = 0;
		String[] strings = new String[10000000];
		StringBin sb = new StringBin();
		long time = System.currentTimeMillis();
		try {
			while (true) {
				i++;
				strings[i] = sb.reuse("Number " + i);
				if (0 == (i % 100000)) {
					System.out.printf("%s at %.4f/ms\n", strings[i], (1.0 * i) / 
(System.currentTimeMillis() - time));
				}
			}
		} catch (Error t) {
			System.out.println("Last was " + i);
			throw t;
		}
	}
}


The String.intern() fails at:
Number 500000 at 99.1080/ms
Number 600000 at 79.0306/ms
Number 700000 at 65.1405/ms
Number 800000 at 54.7608/ms
Number 900000 at 46.9851/ms
Number 1000000 at 40.9920/ms
Last was 1043637
Exception in thread "main" java.lang.OutOfMemoryError: PermGen space
	at java.lang.String.intern(Native Method)
	at net.tuis.debug.OOM.main(OOM.java:12)


The StringBin version fails at.....

Number 9500000 at 693.2788/ms
Number 9600000 at 697.7758/ms
Number 9700000 at 701.9829/ms
Number 9800000 at 706.4081/ms
Number 9900000 at 596.7810/ms
Exception in thread "main" java.lang.ArrayIndexOutOfBoundsException: 
10000000
	at net.tuis.debug.OOMSB.main(OOMSB.java:15)


Another reason to not use String.intern..... it is slow... ;-)

Rolf


On 28/01/2012 5:08 PM, Paul Libbrecht wrote:
>
>
>
> Rolf Lear<jdom at tuis.net>  a ?crit :
>
>> ... Java 6 does GC in the perm-gen space.... I watched it clearing out
>> the values in the JVisualVM monitor.... but keeping a reference to the
>> intern'd string causes OOM as expected.
>
> very cute example, thanks for that!
>
>> In many places 1,000,000 strings is not a lot....
>
> I agree, but not in element names!
>
> paul
>


From jdom at tuis.net  Sat Jan 28 16:46:52 2012
From: jdom at tuis.net (Rolf Lear)
Date: Sat, 28 Jan 2012 19:46:52 -0500
Subject: [jdom-interest] JDOM and memory
In-Reply-To: <CAHzJPEqHSF0CQXLL6QFhNrF+JG_wQHfgMa-sMK=tK7kLkHddGA@mail.gmail.com>
References: <4F02133C.5010704@tuis.net> <4F242488.4000708@tuis.net>
	<CAHzJPEqHSF0CQXLL6QFhNrF+JG_wQHfgMa-sMK=tK7kLkHddGA@mail.gmail.com>
Message-ID: <4F2496FC.30307@tuis.net>

Hi Joe.

Thanks for that. I have run in to the problem before with the backing 
array not being the same as the actual String content. In the StringBin 
code I specifically account for that: 
https://github.com/hunterhacker/jdom/blob/master/core/src/java/org/jdom2/util/StringBin.java#L371

In essence, it ensures he String is as compact as possible.

Rolf

On 28/01/2012 7:10 PM, Joe Bowbeer wrote:
> A per-document string pool is a feature of binary xml formats.
>
> A potential problem with per-factory string pooling is the possibility
> of retaining large character arrays.  Android's String class description
> explains the problem:
>
>     This class is implemented using a char[]. The length of the array
>     may exceed the length of the string. For example, the string "Hello"
>     may be backed by the array |['H', 'e', 'l', 'l', 'o', 'W'. 'o', 'r',
>     'l', 'd']| with offset 0 and length 5.
>     Multiple strings can share the same char[] because strings are
>     immutable. The |substring(int)
>     <http://developer.android.com/reference/java/lang/String.html#substring(int)>| method
>     *always* returns a string that shares the backing array of its
>     source string. Generally this is an optimization: fewer character
>     arrays need to be allocated, and less copying is necessary. But this
>     can also lead to unwanted heap retention. Taking a short substring
>     of long string means that the long shared char[] won't be garbage
>     until both strings are garbage. This typically happens when parsing
>     small substrings out of a large input. To avoid this where
>     necessary, call |new String(longString.subString(...))|. The string
>     copy constructor always ensures that the backing array is no larger
>     than necessary.
>
>
> ...from http://developer.android.com/reference/java/lang/String.html
>
> If xml parsers create new strings, is it to avoid retaining the entire
> source document?
>
> I suggest choosing a name for the Slim factory that is more descriptive
> of what it does, as "slim" may depend on taste and application.
>
> Joe
>
> On Sat, Jan 28, 2012 at 8:38 AM, Rolf Lear wrote:
>
>     Hi All ... An update...
>
>     I have played with a number of options, and have not had significant
>     success with any.
>
>     Merging Content-list in to Element has a number of problems:
>     1. Document and Element end up duplicating a lot of code
>     2. It changes the API of Document and Element with it implementing
>     List<Content>
>
>     Document and Element almost always contain content... it is seldom
>     that you have empty Elements (there is normally some text at least).
>     As a result, the savings of not having to have a content array are
>     limited.
>
>     There can be some saving in not having a separate object as the
>     list, but it does not amount to much. Given the issues with the API
>     this approach does not make sense.
>
>     Michael Kay suggested keeping the ContentList independent of the
>     Element, and creating an instance when it was referenced in
>     getContent(). The problem with this is that the management of
>     ConcurrentModification becomes very complicated, and, as far as I
>     can tell, essentially impossible if there are multiple differet
>     instances of the ContentList class for any particular Element. Given
>     that almost all Element instances have content, it is not worth the
>     effort to lose the ConcurrentModification control, and not actually
>     save any memory in a typical use case.
>
>     So, neither option for changing the ContentList system is very
>     successful.
>
>     On the other hand, it is relatively common to have no Attributes on
>     an Element, and some careful changes to the Element class (adding a
>     hasAttributes() method and making the AttributeList variable a
>     'lazy' initialised field) this means that in ideal cases we never
>     need to actually create an AttributeList instance for the Element.
>     This has a significant impact on the 'hamlet' test, where there are
>     essentially no attributes. It has no 'negative' impact on memory in
>     the worst case either, and it has positive (small but significant)
>     impact on performance.
>
>     So, the lazy initialization of AttributeList is a 'win'.
>
>     Finally, I have in the past had some success with the concept of
>     'reusing' String values. XML Parsers (like SAX, etc.) typically
>     create a new String instance for all the variables they pass. For
>     example, the Element names, prefixes, etc. are all new instances of
>     String. Thus, if you have hundreds of Elements called 'car' in your
>     input XML, you will get hundreds of different String Element names
>     with the value 'car'. I have built a class that does something
>     similar to String.intern() in order to rationalize the hundreds of
>     different-but-equals() values that are passed in by the parsers.
>
>     I have incorporated this 'caching' class in to a new JDOMFactory
>     called 'SlimJDOMFactory'. This factory 'normalizes' all String
>     values to a single instance of each unique String value. This
>     significantly reduces the amount of memory used in the JDOM tree
>     especially if there are lots of: similarly named attributes,
>     elements, white-space-padding in otherwise empty elements, or
>     between elements. This process is significantly slower through...
>
>     For example, with the 'hamlet' test case, the 'baseline' memory
>     footprint for hamlet in JDOM is 2.27MB in 4.75ms.
>     With the SlimJDOMFactory it is: 1.77MB in 8.5ms
>     With Lazy AttributeList it is: 2.06MB in 4.55ms
>     With the both it is 1.57MB in 8.3ms
>
>     I am pushing both of these changes in to github. The AttributeList
>     is an easy one to justify. It is fully compatible with prior code,
>     it has positive memory and perfomance impacts.
>
>     The SlimJDOMFactory is also justifiable when you consider:
>     1. the user has to decide to use it specifically.
>     2. The memory saving can be very significant.
>     3. Even though the parse time is slower, the GC time savings can be
>     significant if the document 'hangs around' for a long time - the
>     quicker GC time can add up fast.
>     4. When you have lots of code doing comparisons it is much faster to
>     do equals() calls on Strings that are == as well. It saves a
>     hashCode calculation as well as a string character scan to prove
>     equals().
>
>     Rolf
>
>
>     On 02/01/2012 3:27 PM, Rolf wrote:
>
>         Hi all.
>
>         Memory optimization has never been a top priority for JDOM. At
>         the same
>         time, for what it does, JDOM is not a 'terrible' memory user.
>         Still, I
>         have done some analysis, and, I believe I can trim about a
>         quarter to a
>         half of 'JDOM Overhead' memory usage by making two 'simple'
>         changes....
>
>         The first is to merge the ContentList class in to the Element
>         class (and
>         also in to Document). This will reduce the number of Java objects by
>         about half, and that will save about 32 bytes per Element at a
>         minimum
>         in a 64-bit JRE. Additionally, by lazy-initialization of the Content
>         array, we can save memory on otherwise 'empty' Elements.
>
>         This can be done by extending the Element (and perhaps Document)
>         class
>         to extend 'List'. It can all be done in a 'backward compatible'
>         way, but
>         also leads to some interesting possibilities, like:
>
>         for (Content c : element) {
>         ... do something
>         }
>
>         (for backward compatibility, Element.getContent() will return
>         'this').
>
>
>         The second change is to make the AttributeList instance in Element a
>         lazy-initialization. This would save memory on all Elements that
>         have no
>         attributes, but would have an impact for people who sub-class the
>         Element class and may expect the attributes field to be non-null.
>
>
>         I am trying to get a feel for how important this sort of
>         optimization
>         may be. If there is interest then I will make some changes, and
>         test the
>         impact. I may make a separate branch in github to test it out....
>
>         If the above changes are unrealistic then I don't think it makes
>         sense
>         to even try....
>
>         Rolf
>
>
>
> _______________________________________________
> To control your jdom-interest membership:
> http://www.jdom.org/mailman/options/jdom-interest/youraddr at yourhost.com


From jdom at tuis.net  Sat Jan 28 16:49:18 2012
From: jdom at tuis.net (Rolf Lear)
Date: Sat, 28 Jan 2012 19:49:18 -0500
Subject: [jdom-interest] JDOM and memory
In-Reply-To: <4F247738.9080207@saxonica.com>
References: <4F02133C.5010704@tuis.net> <4F242488.4000708@tuis.net>
	<4F244077.9050901@saxonica.com> <4F244F8A.5020709@tuis.net>
	<45B3B70B-6BB9-4D18-A3D9-5B5844948B9D@hoplahup.net>
	<4F247078.2050102@tuis.net> <4F247738.9080207@saxonica.com>
Message-ID: <4F24978E.4070201@tuis.net>


On 28/01/2012 5:31 PM, Michael Kay wrote:
>
>> In many places 1,000,000 strings is not a lot....
>>
> The Saxon NamePool is optimized for much lower numbers than this: it's
> rare to have more than a couple of thousand element and attribute names.
> The only time I've seen large numbers reached is with pathological
> applications that generate random namespace prefixes.
>
> Michael Kay
> Saxonica
>

I addressed this in mail I inadvertently did not send to the list, but 
to Paul only. I corrected that now.

The issue is not so much the content of one document, but the content of 
all data in a JVM. Tomcat is a prime example. Because it uses a separate 
Classloader for each installed application, it has many multiples of 
copies of classes in the perm-gen. The permgen space is limited to start 
with.... then, if these applications are doing JDOM processing then you 
are in trouble if JDOM uses the PermGen space for 'scratch' data.

PermGen is a non-obvious component of Java. Novices do not know of it, 
do not understand it's purpose, and do not know how to debug it. By way 
of example, I ran in to it using intern() and it took me days to figure 
out where the memory was going.... (years ago). Perhaps that is why I am 
so sensitive to it. Similarly, do a search for 'Tomcat PermGen' and you 
quickly understand how precious PermGen space is, it is not to be 
squandered on something that is easy to replace on the heap.

Rolf

From jdom at tuis.net  Sat Jan 28 17:41:07 2012
From: jdom at tuis.net (Rolf Lear)
Date: Sat, 28 Jan 2012 20:41:07 -0500
Subject: [jdom-interest] JDOM and memory
In-Reply-To: <4F244077.9050901@saxonica.com>
References: <4F02133C.5010704@tuis.net> <4F242488.4000708@tuis.net>
	<4F244077.9050901@saxonica.com>
Message-ID: <4F24A3B3.8080803@tuis.net>

I have now compared the results of string-interning to the String-cache 
code.

The 'raw' code (neither SLimJDOMFactory nor string-interning) is:
2.06MB @ 4.55ms
The SlimJDOMFactory is:
1.57MB @ 8ms
The string-interning SAX Feature is:
2.06MB @ 6.1ms

Not sure how I got essentially zero improvement of memory.... got 
something wrong..... no... been checking, but I think the difference in 
using String.intern on element names only is so insignificant that it 
does not feature as much as 1%.....  perhaps all the dirrerence is 
coming in whitespace....

Not worth checking in to it.... I don't believe the String.itern() is 
the right answer regardless.

Rolf


On 28/01/2012 1:37 PM, Michael Kay wrote:
>
>>
>>
>> Finally, I have in the past had some success with the concept of
>> 'reusing' String values. XML Parsers (like SAX, etc.) typically create
>> a new String instance for all the variables they pass. For example,
>> the Element names, prefixes, etc. are all new instances of String.
>> Thus, if you have hundreds of Elements called 'car' in your input XML,
>> you will get hundreds of different String Element names with the value
>> 'car'. I have built a class that does something similar to
>> String.intern() in order to rationalize the hundreds of
>> different-but-equals() values that are passed in by the parsers.
> Have you measured how your optimization compares with the effect of
> setting the http://xml.org/sax/features/string-interning property on the
> SAX parser?
>
> Are you doing the interning in a way that guarantees strings can be
> compared using "==", and if so, are you taking advantage of this when
> doing the comparisons? .The big win comes with XPath searches such as
> //x. Does the interning introduce any synchronization? (This is the big
> disadvantage with Saxon's NamePool - it speeds up XPath searching
> substantially, but the contention in a highly concurrent workload can
> become quite significant.)
>
> Are you pooling the QName as a whole, or the local name, prefix and URI
> separately?
>
> Michael Kay
> Saxonica
>>
>> I have incorporated this 'caching' class in to a new JDOMFactory
>> called 'SlimJDOMFactory'. This factory 'normalizes' all String values
>> to a single instance of each unique String value. This significantly
>> reduces the amount of memory used in the JDOM tree especially if there
>> are lots of: similarly named attributes, elements, white-space-padding
>> in otherwise empty elements, or between elements. This process is
>> significantly slower through...
>>
>> For example, with the 'hamlet' test case, the 'baseline' memory
>> footprint for hamlet in JDOM is 2.27MB in 4.75ms.
>> With the SlimJDOMFactory it is: 1.77MB in 8.5ms
>> With Lazy AttributeList it is: 2.06MB in 4.55ms
>> With the both it is 1.57MB in 8.3ms
>>
>> I am pushing both of these changes in to github. The AttributeList is
>> an easy one to justify. It is fully compatible with prior code, it has
>> positive memory and perfomance impacts.
>>
>> The SlimJDOMFactory is also justifiable when you consider:
>> 1. the user has to decide to use it specifically.
>> 2. The memory saving can be very significant.
>> 3. Even though the parse time is slower, the GC time savings can be
>> significant if the document 'hangs around' for a long time - the
>> quicker GC time can add up fast.
>> 4. When you have lots of code doing comparisons it is much faster to
>> do equals() calls on Strings that are == as well. It saves a hashCode
>> calculation as well as a string character scan to prove equals().
>>
>> Rolf
>>
>> On 02/01/2012 3:27 PM, Rolf wrote:
>>> Hi all.
>>>
>>> Memory optimization has never been a top priority for JDOM. At the same
>>> time, for what it does, JDOM is not a 'terrible' memory user. Still, I
>>> have done some analysis, and, I believe I can trim about a quarter to a
>>> half of 'JDOM Overhead' memory usage by making two 'simple' changes....
>>>
>>> The first is to merge the ContentList class in to the Element class (and
>>> also in to Document). This will reduce the number of Java objects by
>>> about half, and that will save about 32 bytes per Element at a minimum
>>> in a 64-bit JRE. Additionally, by lazy-initialization of the Content
>>> array, we can save memory on otherwise 'empty' Elements.
>>>
>>> This can be done by extending the Element (and perhaps Document) class
>>> to extend 'List'. It can all be done in a 'backward compatible' way, but
>>> also leads to some interesting possibilities, like:
>>>
>>> for (Content c : element) {
>>> ... do something
>>> }
>>>
>>> (for backward compatibility, Element.getContent() will return 'this').
>>>
>>>
>>> The second change is to make the AttributeList instance in Element a
>>> lazy-initialization. This would save memory on all Elements that have no
>>> attributes, but would have an impact for people who sub-class the
>>> Element class and may expect the attributes field to be non-null.
>>>
>>>
>>> I am trying to get a feel for how important this sort of optimization
>>> may be. If there is interest then I will make some changes, and test the
>>> impact. I may make a separate branch in github to test it out....
>>>
>>> If the above changes are unrealistic then I don't think it makes sense
>>> to even try....
>>>
>>> Rolf
>>> _______________________________________________
>>> To control your jdom-interest membership:
>>> http://www.jdom.org/mailman/options/jdom-interest/youraddr at yourhost.com
>>>
>>
>> _______________________________________________
>> To control your jdom-interest membership:
>> http://www.jdom.org/mailman/options/jdom-interest/youraddr at yourhost.com
>>
>
> _______________________________________________
> To control your jdom-interest membership:
> http://www.jdom.org/mailman/options/jdom-interest/youraddr at yourhost.com
>


From paul at hoplahup.net  Sun Jan 29 02:58:18 2012
From: paul at hoplahup.net (Paul Libbrecht)
Date: Sun, 29 Jan 2012 11:58:18 +0100
Subject: [jdom-interest] JDOM and memory
In-Reply-To: <4F24A3B3.8080803@tuis.net>
References: <4F02133C.5010704@tuis.net> <4F242488.4000708@tuis.net>
	<4F244077.9050901@saxonica.com> <4F24A3B3.8080803@tuis.net>
Message-ID: <CDF1FE5D-3C28-4C83-A35E-AB8A38B227AF@hoplahup.net>

Rolf,

I do know there are applications (such as what Michael reported about: that generate random prefixes) for which any form of pooling is dangerous; and you show that there are situation where interning performs worth than other pooling methods (I think hashCode might be seen as guilty but that can't been changed).

Nonetheless, I believe the design that we had where the element names were interned is common: in the server application that was there, the ActiveMath learning environment, the element names are everywhere in the java code as well, e.g. for comparison within if statements. So for this interning is actually better than pooling overall. 

I'm convinced many JDOM users have this approach; using JDOM is cute for Java programming, not for XSLT friends that only see the world as pipelines translatable into a set of unix xsltproc calls.

I would suggest the following:
- make this configurable
- make this subclassable and exploitable

That is to let e.g. SAXBuilder have a method:

    public String makePooledName(String)

which would then call the right interning method (String.intern for those who want, SlimJDOMFactory's per default?, nothing for those who fear retention).

That'd be in SAXBuilder or JDOMFactory? I'm afraid there's no global JDOM config object, that'd be the place, e.g. also to be called from new Element("name").

paul


Le 29 janv. 2012 ? 02:41, Rolf Lear a ?crit :

> I have now compared the results of string-interning to the String-cache code.
> 
> The 'raw' code (neither SLimJDOMFactory nor string-interning) is:
> 2.06MB @ 4.55ms
> The SlimJDOMFactory is:
> 1.57MB @ 8ms
> The string-interning SAX Feature is:
> 2.06MB @ 6.1ms
> 
> Not sure how I got essentially zero improvement of memory.... got something wrong..... no... been checking, but I think the difference in using String.intern on element names only is so insignificant that it does not feature as much as 1%.....  perhaps all the dirrerence is coming in whitespace....
> 
> Not worth checking in to it.... I don't believe the String.itern() is the right answer regardless.
> 
> Rolf
> 
> 
> On 28/01/2012 1:37 PM, Michael Kay wrote:
>> 
>>> 
>>> 
>>> Finally, I have in the past had some success with the concept of
>>> 'reusing' String values. XML Parsers (like SAX, etc.) typically create
>>> a new String instance for all the variables they pass. For example,
>>> the Element names, prefixes, etc. are all new instances of String.
>>> Thus, if you have hundreds of Elements called 'car' in your input XML,
>>> you will get hundreds of different String Element names with the value
>>> 'car'. I have built a class that does something similar to
>>> String.intern() in order to rationalize the hundreds of
>>> different-but-equals() values that are passed in by the parsers.
>> Have you measured how your optimization compares with the effect of
>> setting the http://xml.org/sax/features/string-interning property on the
>> SAX parser?
>> 
>> Are you doing the interning in a way that guarantees strings can be
>> compared using "==", and if so, are you taking advantage of this when
>> doing the comparisons? .The big win comes with XPath searches such as
>> //x. Does the interning introduce any synchronization? (This is the big
>> disadvantage with Saxon's NamePool - it speeds up XPath searching
>> substantially, but the contention in a highly concurrent workload can
>> become quite significant.)
>> 
>> Are you pooling the QName as a whole, or the local name, prefix and URI
>> separately?
>> 
>> Michael Kay
>> Saxonica
>>> 
>>> I have incorporated this 'caching' class in to a new JDOMFactory
>>> called 'SlimJDOMFactory'. This factory 'normalizes' all String values
>>> to a single instance of each unique String value. This significantly
>>> reduces the amount of memory used in the JDOM tree especially if there
>>> are lots of: similarly named attributes, elements, white-space-padding
>>> in otherwise empty elements, or between elements. This process is
>>> significantly slower through...
>>> 
>>> For example, with the 'hamlet' test case, the 'baseline' memory
>>> footprint for hamlet in JDOM is 2.27MB in 4.75ms.
>>> With the SlimJDOMFactory it is: 1.77MB in 8.5ms
>>> With Lazy AttributeList it is: 2.06MB in 4.55ms
>>> With the both it is 1.57MB in 8.3ms
>>> 
>>> I am pushing both of these changes in to github. The AttributeList is
>>> an easy one to justify. It is fully compatible with prior code, it has
>>> positive memory and perfomance impacts.
>>> 
>>> The SlimJDOMFactory is also justifiable when you consider:
>>> 1. the user has to decide to use it specifically.
>>> 2. The memory saving can be very significant.
>>> 3. Even though the parse time is slower, the GC time savings can be
>>> significant if the document 'hangs around' for a long time - the
>>> quicker GC time can add up fast.
>>> 4. When you have lots of code doing comparisons it is much faster to
>>> do equals() calls on Strings that are == as well. It saves a hashCode
>>> calculation as well as a string character scan to prove equals().
>>> 
>>> Rolf
>>> 
>>> On 02/01/2012 3:27 PM, Rolf wrote:
>>>> Hi all.
>>>> 
>>>> Memory optimization has never been a top priority for JDOM. At the same
>>>> time, for what it does, JDOM is not a 'terrible' memory user. Still, I
>>>> have done some analysis, and, I believe I can trim about a quarter to a
>>>> half of 'JDOM Overhead' memory usage by making two 'simple' changes....
>>>> 
>>>> The first is to merge the ContentList class in to the Element class (and
>>>> also in to Document). This will reduce the number of Java objects by
>>>> about half, and that will save about 32 bytes per Element at a minimum
>>>> in a 64-bit JRE. Additionally, by lazy-initialization of the Content
>>>> array, we can save memory on otherwise 'empty' Elements.
>>>> 
>>>> This can be done by extending the Element (and perhaps Document) class
>>>> to extend 'List'. It can all be done in a 'backward compatible' way, but
>>>> also leads to some interesting possibilities, like:
>>>> 
>>>> for (Content c : element) {
>>>> ... do something
>>>> }
>>>> 
>>>> (for backward compatibility, Element.getContent() will return 'this').
>>>> 
>>>> 
>>>> The second change is to make the AttributeList instance in Element a
>>>> lazy-initialization. This would save memory on all Elements that have no
>>>> attributes, but would have an impact for people who sub-class the
>>>> Element class and may expect the attributes field to be non-null.
>>>> 
>>>> 
>>>> I am trying to get a feel for how important this sort of optimization
>>>> may be. If there is interest then I will make some changes, and test the
>>>> impact. I may make a separate branch in github to test it out....
>>>> 
>>>> If the above changes are unrealistic then I don't think it makes sense
>>>> to even try....
>>>> 
>>>> Rolf
>>>> _______________________________________________
>>>> To control your jdom-interest membership:
>>>> http://www.jdom.org/mailman/options/jdom-interest/youraddr at yourhost.com
>>>> 
>>> 
>>> _______________________________________________
>>> To control your jdom-interest membership:
>>> http://www.jdom.org/mailman/options/jdom-interest/youraddr at yourhost.com
>>> 
>> 
>> _______________________________________________
>> To control your jdom-interest membership:
>> http://www.jdom.org/mailman/options/jdom-interest/youraddr at yourhost.com
>> 
> 
> _______________________________________________
> To control your jdom-interest membership:
> http://www.jdom.org/mailman/options/jdom-interest/youraddr at yourhost.com


From jdom at tuis.net  Sun Jan 29 03:44:48 2012
From: jdom at tuis.net (Rolf Lear)
Date: Sun, 29 Jan 2012 06:44:48 -0500
Subject: [jdom-interest] JDOM and memory
In-Reply-To: <CDF1FE5D-3C28-4C83-A35E-AB8A38B227AF@hoplahup.net>
References: <4F02133C.5010704@tuis.net> <4F242488.4000708@tuis.net>
	<4F244077.9050901@saxonica.com> <4F24A3B3.8080803@tuis.net>
	<CDF1FE5D-3C28-4C83-A35E-AB8A38B227AF@hoplahup.net>
Message-ID: <c38188de8d93cbda6bfa67333cf16746@tuis.net>


Hi all.

Just to be clear, the 'SlimJDOMFactory is not a default setting.

by default people will:

SAXBuilder builder = new SAXBuilder();

If you want to have a smaller mempory footprint (but also a slower parse)
you can:

SAXBuilder builder = new SAXBuilder(new SlimJDOMBuilder());

So, these changes are not affecting anything by default.

What I am hearing is that there is value in an 'InterningJDOMFactory'
which will do a String.intern() on element and attribute names? That should
be easy to arrange... but doing more thant just the Element and Attribute
names is likely to cause issues in PermGen (the SlimJDOMFactory can do
'everything' including the XML Text and CDATA sections...

Regardless, I sense some anxiety about the SlimJDOMFactory, but, it is
something the user needs to opt-in for, so it is very 'safe'.

Rolf  


On Sun, 29 Jan 2012 11:58:18 +0100, Paul Libbrecht <paul at hoplahup.net>
wrote:
> Rolf,
> 
> I do know there are applications (such as what Michael reported about:
> that generate random prefixes) for which any form of pooling is
dangerous;
> and you show that there are situation where interning performs worth
than
> other pooling methods (I think hashCode might be seen as guilty but that
> can't been changed).
> 
> Nonetheless, I believe the design that we had where the element names
were
> interned is common: in the server application that was there, the
> ActiveMath learning environment, the element names are everywhere in the
> java code as well, e.g. for comparison within if statements. So for this
> interning is actually better than pooling overall. 
> 
> I'm convinced many JDOM users have this approach; using JDOM is cute for
> Java programming, not for XSLT friends that only see the world as
pipelines
> translatable into a set of unix xsltproc calls.
> 
> I would suggest the following:
> - make this configurable
> - make this subclassable and exploitable
> 
> That is to let e.g. SAXBuilder have a method:
> 
>     public String makePooledName(String)
> 
> which would then call the right interning method (String.intern for
those
> who want, SlimJDOMFactory's per default?, nothing for those who fear
> retention).
> 
> That'd be in SAXBuilder or JDOMFactory? I'm afraid there's no global
JDOM
> config object, that'd be the place, e.g. also to be called from new
> Element("name").
> 
> paul
> 
> 
> Le 29 janv. 2012 ? 02:41, Rolf Lear a ?crit :
> 
>> I have now compared the results of string-interning to the String-cache
>> code.
>> 
>> The 'raw' code (neither SLimJDOMFactory nor string-interning) is:
>> 2.06MB @ 4.55ms
>> The SlimJDOMFactory is:
>> 1.57MB @ 8ms
>> The string-interning SAX Feature is:
>> 2.06MB @ 6.1ms
>> 
>> Not sure how I got essentially zero improvement of memory.... got
>> something wrong..... no... been checking, but I think the difference in
>> using String.intern on element names only is so insignificant that it
>> does not feature as much as 1%.....  perhaps all the dirrerence is
coming
>> in whitespace....
>> 
>> Not worth checking in to it.... I don't believe the String.itern() is
>> the right answer regardless.
>> 
>> Rolf
>> 
>> 
>> On 28/01/2012 1:37 PM, Michael Kay wrote:
>>> 
>>>> 
>>>> 
>>>> Finally, I have in the past had some success with the concept of
>>>> 'reusing' String values. XML Parsers (like SAX, etc.) typically
create
>>>> a new String instance for all the variables they pass. For example,
>>>> the Element names, prefixes, etc. are all new instances of String.
>>>> Thus, if you have hundreds of Elements called 'car' in your input
XML,
>>>> you will get hundreds of different String Element names with the
value
>>>> 'car'. I have built a class that does something similar to
>>>> String.intern() in order to rationalize the hundreds of
>>>> different-but-equals() values that are passed in by the parsers.
>>> Have you measured how your optimization compares with the effect of
>>> setting the http://xml.org/sax/features/string-interning property on
the
>>> SAX parser?
>>> 
>>> Are you doing the interning in a way that guarantees strings can be
>>> compared using "==", and if so, are you taking advantage of this when
>>> doing the comparisons? .The big win comes with XPath searches such as
>>> //x. Does the interning introduce any synchronization? (This is the
big
>>> disadvantage with Saxon's NamePool - it speeds up XPath searching
>>> substantially, but the contention in a highly concurrent workload can
>>> become quite significant.)
>>> 
>>> Are you pooling the QName as a whole, or the local name, prefix and
URI
>>> separately?
>>> 
>>> Michael Kay
>>> Saxonica
>>>> 
>>>> I have incorporated this 'caching' class in to a new JDOMFactory
>>>> called 'SlimJDOMFactory'. This factory 'normalizes' all String values
>>>> to a single instance of each unique String value. This significantly
>>>> reduces the amount of memory used in the JDOM tree especially if
there
>>>> are lots of: similarly named attributes, elements,
white-space-padding
>>>> in otherwise empty elements, or between elements. This process is
>>>> significantly slower through...
>>>> 
>>>> For example, with the 'hamlet' test case, the 'baseline' memory
>>>> footprint for hamlet in JDOM is 2.27MB in 4.75ms.
>>>> With the SlimJDOMFactory it is: 1.77MB in 8.5ms
>>>> With Lazy AttributeList it is: 2.06MB in 4.55ms
>>>> With the both it is 1.57MB in 8.3ms
>>>> 
>>>> I am pushing both of these changes in to github. The AttributeList is
>>>> an easy one to justify. It is fully compatible with prior code, it
has
>>>> positive memory and perfomance impacts.
>>>> 
>>>> The SlimJDOMFactory is also justifiable when you consider:
>>>> 1. the user has to decide to use it specifically.
>>>> 2. The memory saving can be very significant.
>>>> 3. Even though the parse time is slower, the GC time savings can be
>>>> significant if the document 'hangs around' for a long time - the
>>>> quicker GC time can add up fast.
>>>> 4. When you have lots of code doing comparisons it is much faster to
>>>> do equals() calls on Strings that are == as well. It saves a hashCode
>>>> calculation as well as a string character scan to prove equals().
>>>> 
>>>> Rolf
>>>> 
>>>> On 02/01/2012 3:27 PM, Rolf wrote:
>>>>> Hi all.
>>>>> 
>>>>> Memory optimization has never been a top priority for JDOM. At the
>>>>> same
>>>>> time, for what it does, JDOM is not a 'terrible' memory user. Still,
I
>>>>> have done some analysis, and, I believe I can trim about a quarter
to
>>>>> a
>>>>> half of 'JDOM Overhead' memory usage by making two 'simple'
>>>>> changes....
>>>>> 
>>>>> The first is to merge the ContentList class in to the Element class
>>>>> (and
>>>>> also in to Document). This will reduce the number of Java objects by
>>>>> about half, and that will save about 32 bytes per Element at a
minimum
>>>>> in a 64-bit JRE. Additionally, by lazy-initialization of the Content
>>>>> array, we can save memory on otherwise 'empty' Elements.
>>>>> 
>>>>> This can be done by extending the Element (and perhaps Document)
class
>>>>> to extend 'List'. It can all be done in a 'backward compatible' way,
>>>>> but
>>>>> also leads to some interesting possibilities, like:
>>>>> 
>>>>> for (Content c : element) {
>>>>> ... do something
>>>>> }
>>>>> 
>>>>> (for backward compatibility, Element.getContent() will return
'this').
>>>>> 
>>>>> 
>>>>> The second change is to make the AttributeList instance in Element a
>>>>> lazy-initialization. This would save memory on all Elements that
have
>>>>> no
>>>>> attributes, but would have an impact for people who sub-class the
>>>>> Element class and may expect the attributes field to be non-null.
>>>>> 
>>>>> 
>>>>> I am trying to get a feel for how important this sort of
optimization
>>>>> may be. If there is interest then I will make some changes, and test
>>>>> the
>>>>> impact. I may make a separate branch in github to test it out....
>>>>> 
>>>>> If the above changes are unrealistic then I don't think it makes
sense
>>>>> to even try....
>>>>> 
>>>>> Rolf
>>>>> _______________________________________________
>>>>> To control your jdom-interest membership:
>>>>>
http://www.jdom.org/mailman/options/jdom-interest/youraddr at yourhost.com
>>>>> 
>>>> 
>>>> _______________________________________________
>>>> To control your jdom-interest membership:
>>>>
http://www.jdom.org/mailman/options/jdom-interest/youraddr at yourhost.com
>>>> 
>>> 
>>> _______________________________________________
>>> To control your jdom-interest membership:
>>>
http://www.jdom.org/mailman/options/jdom-interest/youraddr at yourhost.com
>>> 
>> 
>> _______________________________________________
>> To control your jdom-interest membership:
>> http://www.jdom.org/mailman/options/jdom-interest/youraddr at yourhost.com