From jdom at tuis.net Sun Jan 1 16:57:33 2012 From: jdom at tuis.net (Rolf) Date: Sun, 01 Jan 2012 19:57:33 -0500 Subject: [jdom-interest] JDOM 2 Alpha release Message-ID: <4F0100FD.4020107@tuis.net> Hi all and Happy New Year! I have just uploaded a new JDOM2 'package' jdom-2.x-2012.01.01.19.15 which I am designating as the JDOM2 "Alpha" Release. Find it here: https://github.com/hunterhacker/jdom/downloads The JDOM2 pages have been updated to match the JavaDoc API, Code coverage, and Unit-test results as well. See the 'entry' page here: https://github.com/hunterhacker/jdom/wiki/JDOM-2.0#wiki-links There are eleven 'issues' currently outstanding. None of them are bugs in the core functionality of JDOM. In other words, this JDOM2 Alpha release has no known bugs. Please help with this and ensure we discover any gremlins sooner rather than later. It is my expectation that for regular users there will be very few 'interface' changes between now and JDOM2 final release. There may be some 'transparent' extensions to the API, and there may/will be changes to the 'sub-classing' API, so if you have custom sub-classes of JDOM code then you will probably want to pay special attention. If you *do* have sub-classes of JDOM code now is a very important time to test JDOM2 to see if your code will break, and how JDOM2 can best be adapted/fixed to continue to support your custom requirements. To create some form of 'deadline' for JDOM2 I intend to (provisionally): - 2 Feb GroundHog Day! all current issues resolved - submit any issues to the mailing list if you encounter any. Deadline for new feature requests/enhancements - mail the list if you have any. - 14 Feb 'Valentine' *BETA* Release on 14th February - may shift depending on any large enhancements/requests. - 29 Feb 'Leap Day' Second *BETA* - All class/method signatures 'locked' Bug Fixing only - 9 Apr 'Easter' JDOM2 Release So, please get playing with JDOM2, if you don't provide feedback in this time period there's a good chance there will not be an opportunity later to get that 'sweet' feature in that you want. Please Note =========== I believe this release is 'stable' in the sense that the code is fully functional. I believe that while there may be bugs, the code is generally in good condition, and it can be trusted to do 'the right thing' with nearly as much confidence as JDOM 1.1.2. This is an alpha release though, and the expectation is that there will be some issues with the code, and I fully expect there to be small changes to some method/interface calls as the need arises. Happy Coding! Rolf From jdom at tuis.net Mon Jan 2 12:27:40 2012 From: jdom at tuis.net (Rolf) Date: Mon, 02 Jan 2012 15:27:40 -0500 Subject: [jdom-interest] JDOM and memory Message-ID: <4F02133C.5010704@tuis.net> Hi all. Memory optimization has never been a top priority for JDOM. At the same time, for what it does, JDOM is not a 'terrible' memory user. Still, I have done some analysis, and, I believe I can trim about a quarter to a half of 'JDOM Overhead' memory usage by making two 'simple' changes.... The first is to merge the ContentList class in to the Element class (and also in to Document). This will reduce the number of Java objects by about half, and that will save about 32 bytes per Element at a minimum in a 64-bit JRE. Additionally, by lazy-initialization of the Content array, we can save memory on otherwise 'empty' Elements. This can be done by extending the Element (and perhaps Document) class to extend 'List'. It can all be done in a 'backward compatible' way, but also leads to some interesting possibilities, like: for (Content c : element) { ... do something } (for backward compatibility, Element.getContent() will return 'this'). The second change is to make the AttributeList instance in Element a lazy-initialization. This would save memory on all Elements that have no attributes, but would have an impact for people who sub-class the Element class and may expect the attributes field to be non-null. I am trying to get a feel for how important this sort of optimization may be. If there is interest then I will make some changes, and test the impact. I may make a separate branch in github to test it out.... If the above changes are unrealistic then I don't think it makes sense to even try.... Rolf From jdom at tuis.net Tue Jan 3 05:22:39 2012 From: jdom at tuis.net (Rolf Lear) Date: Tue, 03 Jan 2012 08:22:39 -0500 Subject: [jdom-interest] Maven build Message-ID: <95d7ddad8eb3087da3ba87cb56c5a785@tuis.net> Hi all. I am going to start playing with the concept of loading the 'snapshot' builds up on to maven central as 'SNAPSHOT' type builds. This is to ensure I get some 'practice' before the final JDOM2.0 release. If you are currently using maven to load your JDOM 1.x jars you should ensure that you set your maven version dependencies correctly so that you do not start pulling any JDOM 2.x jars. My understanding is that if I label the versions as SNAPSHOT then they should be ignored by you, but, for everyone's peace of mind, in your 'real' development environments you should restrict your dependencies to version 1.1.2 only I expect to start 'playing' with this in the next week or so. Rolf From mike at saxonica.com Wed Jan 4 15:05:12 2012 From: mike at saxonica.com (Michael Kay) Date: Wed, 04 Jan 2012 23:05:12 +0000 Subject: [jdom-interest] XML Schema classification help In-Reply-To: References: Message-ID: <4F04DB28.20409@saxonica.com> On 04/01/2012 19:11, cliff palmer wrote: > I need to examine XML documents contained in multiple columns in a > database table with over a million rows and identify each of the > different structures used for the XML data, producing a count if the > number of instances that use each structure. > > I thought of using the SAXParser then creating a list of the XML > headers in the order used and storing each unique list and > accumulating a count based on matching an already encountered list > object, but I am hoping there is a less cumbersome approach. > > I would appreciate any and all suggestions. > You've chosen an odd place to ask the question, since there's nothing specific in JDOM that will help you. The key thing you need to do is to define what are the rules for your taxonomy. Presumably it's something more complex than categorizing documents by the name of their root element, or the namespaces they use. But presumably a document with four paragraphs and two images and one with five paragraphs and no images go in the same bucket. So what are the rules? Michael Kay Saxonica From mike at saxonica.com Wed Jan 4 16:13:15 2012 From: mike at saxonica.com (Michael Kay) Date: Thu, 05 Jan 2012 00:13:15 +0000 Subject: [jdom-interest] XML Schema classification help In-Reply-To: <4F04E4C2.7020708@tuis.net> References: <4F04AE51.2060104@tuis.net> <4F04E4C2.7020708@tuis.net> Message-ID: <4F04EB1B.1050809@saxonica.com> > > Unfortunately (for you), this is not something that I think there is > an easy, or preexisting solution for (nothing comes to mind). > Well, there are a number of tools that generate a schema from an instance (including my own venerable DTDGenerator) but it's far from clear that two instances belong in the same bucket if and only if such a tool imputes the same schema for both instances. Michael Kay Saxonica From palmercliff at gmail.com Tue Jan 10 10:46:07 2012 From: palmercliff at gmail.com (cliff palmer) Date: Tue, 10 Jan 2012 13:46:07 -0500 Subject: [jdom-interest] Finding XPath location for an Element Message-ID: I'd like to be able to find the XPath search or the node hierarchy for an Element. For example, if the Element is in: I'd like to have either the XPath search argument ("/a//b//c//d) or the list of nodes in the elements parents ("a b c d"). Is there a method that returns this? Cliff From mj-lists at expertsystems.se Tue Jan 10 11:01:37 2012 From: mj-lists at expertsystems.se (Mattias Jiderhamn) Date: Tue, 10 Jan 2012 20:01:37 +0100 Subject: [jdom-interest] Finding XPath location for an Element Message-ID: <4F0C8B11.9060301@expertsystems.se> while(node != null) { ... // Build XPath or list bottom up node = node.getParent(); } ----- Original Message ----- Subject: [jdom-interest] Finding XPath location for an Element Date: Tue, 10 Jan 2012 13:46:07 -0500 From: cliff palmer I'd like to be able to find the XPath search or the node hierarchy for an Element. For example, if the Element is in: I'd like to have either the XPath search argument ("/a//b//c//d) or the list of nodes in the elements parents ("a b c d"). Is there a method that returns this? Cliff _______________________________________________ To control your jdom-interest membership: http://www.jdom.org/mailman/options/jdom-interest/youraddr at yourhost.com -- From jdom at tuis.net Tue Jan 10 11:07:06 2012 From: jdom at tuis.net (Rolf Lear) Date: Tue, 10 Jan 2012 14:07:06 -0500 Subject: [jdom-interest] Finding XPath location for an Element In-Reply-To: References: Message-ID: <6fd1973aec9fe8c2294c7727eba3b221@tuis.net> Hi Cliff No method 'native' to JDOM, but the code is simple (and you can 'season' to taste...): String xpath = ""; Element p = element; while (p != null) { xpath = "/" + p.getName() + xpath; p = p.getParentElement(); } System.out.println(xpath); But, the problem is that this will get *all* 'd' Elements that have an ancestry with the same XPath Rolf On Tue, 10 Jan 2012 13:46:07 -0500, cliff palmer wrote: > I'd like to be able to find the XPath search or the node hierarchy for > an Element. For example, if the Element is in: > > > > > > > > > I'd like to have either the XPath search argument ("/a//b//c//d) or > the list of nodes in the elements parents ("a b c d"). > > Is there a method that returns this? > > Cliff > _______________________________________________ > To control your jdom-interest membership: > http://www.jdom.org/mailman/options/jdom-interest/youraddr at yourhost.com From paul at hoplahup.net Tue Jan 10 11:37:32 2012 From: paul at hoplahup.net (Paul Libbrecht) Date: Tue, 10 Jan 2012 20:37:32 +0100 Subject: [jdom-interest] Finding XPath location for an Element In-Reply-To: <6fd1973aec9fe8c2294c7727eba3b221@tuis.net> References: <6fd1973aec9fe8c2294c7727eba3b221@tuis.net> Message-ID: <8671B1F0-694B-4686-9E02-99F275EA7279@hoplahup.net> Isn't the trick to compute the index as in b[0] ?? Lists of jdom using indexOf, elementA.getChildren('b').indexOf(elementB) are perfect for this. paul Le 10 janv. 2012 ? 20:07, Rolf Lear a ?crit : > > Hi Cliff > > No method 'native' to JDOM, but the code is simple (and you can 'season' > to taste...): > > String xpath = ""; > Element p = element; > while (p != null) { > xpath = "/" + p.getName() + xpath; > p = p.getParentElement(); > } > System.out.println(xpath); > > But, the problem is that this will get *all* 'd' Elements that have an > ancestry with the same XPath > > Rolf > > On Tue, 10 Jan 2012 13:46:07 -0500, cliff palmer > wrote: >> I'd like to be able to find the XPath search or the node hierarchy for >> an Element. For example, if the Element is in: >> >> >> >> >> >> >> >> >> I'd like to have either the XPath search argument ("/a//b//c//d) or >> the list of nodes in the elements parents ("a b c d"). >> >> Is there a method that returns this? >> >> Cliff >> _______________________________________________ >> To control your jdom-interest membership: >> http://www.jdom.org/mailman/options/jdom-interest/youraddr at yourhost.com > _______________________________________________ > To control your jdom-interest membership: > http://www.jdom.org/mailman/options/jdom-interest/youraddr at yourhost.com From jdom at tuis.net Tue Jan 10 12:14:05 2012 From: jdom at tuis.net (Rolf Lear) Date: Tue, 10 Jan 2012 15:14:05 -0500 Subject: [jdom-interest] Finding XPath location for an Element In-Reply-To: <8671B1F0-694B-4686-9E02-99F275EA7279@hoplahup.net> References: <6fd1973aec9fe8c2294c7727eba3b221@tuis.net> <8671B1F0-694B-4686-9E02-99F275EA7279@hoplahup.net> Message-ID: <1f91a4bb672be2fc94376a9296c4e7d2@tuis.net> Yes, you could do that and get an exact path to a particular element.... I had not thought it through as far as you, but the indexing is reasonable (so is 'season to taste').... But then you should probably also take it further again and ensure that the namespace management is correct too.... but how would you set up the XPath references/links for an XPath query with namespaces ... easily ... and in a 'general' way? Rolf On Tue, 10 Jan 2012 20:37:32 +0100, Paul Libbrecht wrote: > Isn't the trick to compute the index as in b[0] ?? > Lists of jdom using indexOf, elementA.getChildren('b').indexOf(elementB) > are perfect for this. > > paul > > > Le 10 janv. 2012 ? 20:07, Rolf Lear a ?crit : > >> >> Hi Cliff >> >> No method 'native' to JDOM, but the code is simple (and you can 'season' >> to taste...): >> >> String xpath = ""; >> Element p = element; >> while (p != null) { >> xpath = "/" + p.getName() + xpath; >> p = p.getParentElement(); >> } >> System.out.println(xpath); >> >> But, the problem is that this will get *all* 'd' Elements that have an >> ancestry with the same XPath >> >> Rolf >> >> On Tue, 10 Jan 2012 13:46:07 -0500, cliff palmer >> wrote: >>> I'd like to be able to find the XPath search or the node hierarchy for >>> an Element. For example, if the Element is in: >>> >>> >>> >>> >>> >>> >>> >>> >>> I'd like to have either the XPath search argument ("/a//b//c//d) or >>> the list of nodes in the elements parents ("a b c d"). >>> >>> Is there a method that returns this? >>> >>> Cliff >>> _______________________________________________ >>> To control your jdom-interest membership: >>> http://www.jdom.org/mailman/options/jdom-interest/youraddr at yourhost.com >> _______________________________________________ >> To control your jdom-interest membership: >> http://www.jdom.org/mailman/options/jdom-interest/youraddr at yourhost.com From jdom at tuis.net Mon Jan 16 16:20:53 2012 From: jdom at tuis.net (Rolf Lear) Date: Mon, 16 Jan 2012 19:20:53 -0500 Subject: [jdom-interest] JDOM and memory In-Reply-To: <4F149866.50606@xerox.com> References: <4F02133C.5010704@tuis.net> <4F149866.50606@xerox.com> Message-ID: <4F14BEE5.6080501@tuis.net> Hi Leigh I am uncertain if I am missing something in whether your comments/suggestions are specifically related to memory improvement of JDOM2 (the subject line), or just general improvements. Reading your comments they seem to be unrelated to memory specifically, but more general performance/convenience. That's fine if it is, I just want to make sure I am not missing something... Just to summarize your mail very briefly, you are addressing three areas: getChild*(), XPath, and Exceptions getChild...() ============= As for the getChild(...), getChildren(...) and getContent(Filter) methods. They all derive from the same concept ... create a FilterList on the underlying ContentList, and scan it for all available (or the first available for getChild(...) ) matching content. JDOM2 already has overridden the 'inefficient' iterator (and listIterator) methods to provide a more efficient iterator (a significant improvement in performance over JDOM 1.x see http://hunterhacker.github.com/jdom/jdom2/performance.html and scroll down about half the page to 'First major performance cycle', compare the results table to the one below.... ) These improvements do *not* override the isEmpty() call though, and that should absolutely be overridden too. By default it compares size() == 0, and that would require a full scan of the underlying content, but iterator.hasNext() in JDOM2 only does a 'lazy' scan. So, introduce issue #57, override isEmpty() on FilterList. Since ContentList has a fast size() method then there is no need to change ContentList.isEmpty(). I am trying to think of any other methods that would be slow? There is no way to avoid a full scan for FilterList.size() So, in summary on the getChildren code... you should already be seeing improved performance on the getChildren() method calls with more efficient iterators, and soon the isEmpty will be even faster too. If/when the ContentList 'moves' in to Element to save memory, these improvements will be preserved. XPath ===== In regards to the XPath I took notes from the XOM project which has the 'query()' method on all nodes... so for example you can: element.query(myxpath); I had a hard look at it and it makes some sense to do something similar. Especially now in JDOM2 where XPath supports more than just Element and Document 'context' items. The issue is that full XPath support requires both Namespace and 'Variable' contexts (XOM does address the Namespace context). This would be hard to implement on a simple 'query' method. Additionally, XPaths are intended to be 'compiled' and 'reused'. The XOM 'query' implementation does not support the reuse of the XPath. The simple query method would have to be limited, but would still cover (sucks out of thin air) 95% of XPath use in JDOM I am sure. So, the current XPath implementation in JDOM2 is able to do the full gamut of operation, but loses some convenience because you need to access it outside of the Element/Content. I certainly feel that making XPath more accessible to JDOM content would be 'friendly', but I worry that it will breed performance problems if it is too easy... At the time I worked the JDOM2 XPath code I looked in to what it would take to extend the functionality in to the 'Content' area of JDOM (like XOM), but found there were more issues than can be resolved by a person working alone with limited XPath experience (me). I figured I would come back to it. Perhaps now is the time. Still, taking your JDOMUtil examples: > JDOMUtil.selectElementChildren(element, xpath) > JDOMUtil.selectElement(element, xpath) > JDOMUtil.selectAttribute(element, xpath) > JDOMUtil.ref(Element element, String xpath, String defaultValue) In JDOM2, these same concepts can be 'easily' obtained with: Filters.element().filter(XPath.selectNodes(element, xpath)); ... not sure what the selectElement() would do, but you get the idea. Filters.attribute().filter(XPath.selectNodes(element, xpath)); ... well, the 'defaultValue' would take a tweak.... Exceptions ========== Interesting observation. I can see the benefit of a JDOM 'Runtime' exception in addition to JDOMException. There are a few places where it could be useful to indicate a programmatic issue that does not need to be explicitly thrown/caught. XPath library is a good example. I'll think some more on that... see if I can see a problem with introducing JDOMRuntimeException...... and see what other places it would possibly make sense. So, thanks for the comments. If there's anything I missed, misunderstood, or needs attention, please don't hesitate! Rolf On 16/01/2012 4:36 PM, Leigh L Klotz Jr wrote: > I'm currently evaluating the alpha of JDOM2. Most of the problems I've > found with JDOM and Java 6 have been fixed in a utility class I have > called JDOMUtil. A good deal of the methods in there are handling > generic types, > > As for the question below, I don't often have the use case of for() > iterating over, element.getContent(), but I do often iterate over the > following: > element.getChildren() > element.getChildren(name) > element.getChildren().isEmpty() as a surrogate for element.hasChildren() > > You could have Element.getContent() return a List implementation of your > own, and make the Iterable.iterate() method in it (which is what for() > calls) be efficient. That might also make element.getChildren.hasNext be > efficient, or you could implement isEmpty directly. > > For JDOMUtil, I often use these: > JDOMUtil.selectElementChildren(element, xpath) > JDOMUtil.selectElement(element, xpath) > JDOMUtil.selectAttribute(element, xpath) > JDOMUtil.ref(Element element, String xpath, String defaultValue) > > The JDOMUtil.ref(Element element, String xpath, String defaultValue) > method returns either the leaf-node value of the XPath expression, or > the defaultValue if the nodeset is empty. > > I've also wrapped every one of the JDOMUtil XPath calls with something > that throws a RuntimeException wrapper for JDOMException, and I let pass > JDOMException and IOException only on serialization and parsing > utilities. I believe that checked exceptions for XPath errors are a > detraction from the simplicity of JDOM. XPath exceptions are always > internal programming errors, and it is the rare case where they can be > corrected at the point of invocation. Parsing and IO exceptions can come > from external system interaction and can reasonably be expected to be > correctable in point source code. > > Leigh. > From thomas.scheffler at uni-jena.de Tue Jan 17 00:10:00 2012 From: thomas.scheffler at uni-jena.de (Thomas Scheffler) Date: Tue, 17 Jan 2012 09:10:00 +0100 Subject: [jdom-interest] suggested JDOM2 improvements Message-ID: <4F152CD8.5030508@uni-jena.de> Hi, first I want to thank all on working on JDOM2. While going over the Javadocs I noticed some issues and got some ideas I want to share. When creating a XPath instance, the instance should be unmodifiable, e.g. remove setNamespace() methods and use XPathFactory.newInstance().compile(String xpath, Namespace... namespaces) One thing that is left then is variables and XPath instances should be threadsafe then. One way to achieve this would be to create a XPathVariable class and use var-args on selectNodes: xPath.selectNodes(NamespaceAware context, XPathVariable... variables) Then you can improve the XPathFactory on using a weak cache that always returns the same instance. This would not only allow to share a XPath instance across multiple threads but also decrease memory consumption. ---- What I would take into consideration is allow generics in XPath, e.g. XPath test=XPath.newInstance("/foo/bar", SOME_ELEMENT_HINT); XPath test2=XPath.newInstance("/foo/@bar", SOME_ATTRIBUTE_HINT); Or if you do not want this, you can return by default. ---- One other thing I noticed is the practice of making JDOMConstants an interface. Usually interface means something like if (o instanceof (JDOMConstants)){ ((JDOMConstance)o).doSomething(); } It would be "better" code to make JDOMConstants a final class with private constructor and use "import static JDOMConstants.*" where you need it. That would not result in such statements: "Element _is a_ JDOMConstants". ---- And before starting another mail, please count my vote on moving ContentList into Element. I'm really looking forward to JDOM2 release. regards Thomas -- Thomas Scheffler Friedrich-Schiller-Universit?t Jena Th?ringer Universit?ts- und Landesbibliothek Bibliotheksplatz 2 07743 Jena Phone: ++49 3641 940027 FAX: ++49 3641 940022 From jdom at tuis.net Tue Jan 17 05:42:59 2012 From: jdom at tuis.net (Rolf Lear) Date: Tue, 17 Jan 2012 08:42:59 -0500 Subject: [jdom-interest] suggested JDOM2 improvements In-Reply-To: <4F152CD8.5030508@uni-jena.de> References: <4F152CD8.5030508@uni-jena.de> Message-ID: Hi Thomas. Interesting Feedback. A lot to respond to... not complaining ;-) First, the easy thing: XPath and thread-safety.... it will never happen. There's just too much to 'require', not the least of which would be that all of JDOM would need to be thread-safe. For example, someone modifying an Element's content in one thread while that same Element is being queried (XPath) in another. Some things would make sense to be Thread-safe (Namespace class is...), but in the case of XPath it would just never happen. Additionally, our default XPath 'engine' Jaxen makes no claims about being thread-safe. If thread-safety is not an 'intrinsic' property of XPath, then there is no real sense in making it 'immutable' if it removes 'convenience'. In the 'simple' XPath case (no extra namespaces, no Variables) XPath is still a 1-liner, which is hard to beat. In a complicated case I see more complexity trying to 'massage' your Namespaces and Variables in to some new type structures (the varargs) than the existing concept of adding Namespaces and setting variables. Also, keeping backward compatibility is a strong consideration. In reality I think I would like to see code examples of what you think it 'should' look like to get a better idea, but at the moment i am not convinced that it's actually broken enough to require fixing. Right, what about the generics and XPath return types....? Well, this is a complicated one, and I thought about it hard. The problem boils down to the fact that XPath expressions can return Boolean, Double, and String in addition to whatever JDOM nodes are selected. There is no common 'base' to selectNodes results other than 'Object'. Really! This means that XPath has to be *able* to return List (but not necessarily always). There is no option. This problem is what inspired a lot of the Filters class, because the List return type is not convenient, yet it can be coerced into something that *is* convenient. The Filter instances do full type (and other) checking on the values in the List and not only re-casts the generic type of the result, but it also 'silently' removes any content that cannot be coerced. Thus my intention was that people would do things like: XPathFactory xpfactory = XPathFactory.newInstance(); XPath xpath = xpfactory.compile("//*"); List nodes = Filters.element().filter(xpath.selectNodes(document)); I can see that this model could be modified somewhat to put the filter in at the XPath compile time to become something like: XPathFactory xpfactory = XPathFactory.newInstance(); XPath xpath = xpfactory.compile("//*", Filter.element()); List nodes = xpath.selectNodes(document); I think that is a valuable modification, and it is nice because the compile(String) would return XPath, and the compile(String, Filter) would return XPath. This would all still be backward compatible with JDOM 1.x Filed issue #58 JDOMConstants. Hmmm, I think that shows my 'age'. It is a kick-back to when import-static was not available. Old habits and so on. Point taken ... ;-). It should be an easy change. Filed Issue #59 Finally, the ContentList in Element. I am getting to it.... doing some tidy-up first. Javadoc mostly. This will be a relatively big change, and impacts all 'custom' JDOM implementations. It is not a certainty yet for JDOM2. Thanks for the feedback. Appreciate it! Rolf On Tue, 17 Jan 2012 09:10:00 +0100, Thomas Scheffler wrote: > Hi, > > first I want to thank all on working on JDOM2. While going over the > Javadocs I noticed some issues and got some ideas I want to share. > > When creating a XPath instance, the instance should be unmodifiable, e.g. > > remove setNamespace() methods and use > > XPathFactory.newInstance().compile(String xpath, Namespace... namespaces) > > One thing that is left then is variables and XPath instances should be > threadsafe then. One way to achieve this would be to create a > XPathVariable class and use var-args on selectNodes: > > xPath.selectNodes(NamespaceAware context, XPathVariable... variables) > > Then you can improve the XPathFactory on using a weak cache that always > returns the same instance. This would not only allow to share a XPath > instance across multiple threads but also decrease memory consumption. > > ---- > > What I would take into consideration is allow generics in XPath, e.g. > > XPath test=XPath.newInstance("/foo/bar", SOME_ELEMENT_HINT); > XPath test2=XPath.newInstance("/foo/@bar", SOME_ATTRIBUTE_HINT); > > Or if you do not want this, you can return by > default. > > ---- > > One other thing I noticed is the practice of making JDOMConstants an > interface. Usually interface means something like > > if (o instanceof (JDOMConstants)){ > ((JDOMConstance)o).doSomething(); > } > > It would be "better" code to make JDOMConstants a final class with > private constructor and use "import static JDOMConstants.*" where you > need it. That would not result in such statements: "Element _is a_ > JDOMConstants". > > ---- > > And before starting another mail, please count my vote on moving > ContentList into Element. > > I'm really looking forward to JDOM2 release. > > regards Thomas From jdom at tuis.net Tue Jan 17 19:31:10 2012 From: jdom at tuis.net (Rolf Lear) Date: Tue, 17 Jan 2012 22:31:10 -0500 Subject: [jdom-interest] JDOM2 and Runtime Exceptions Message-ID: <4F163CFE.4030209@tuis.net> Hi all. Recent discussions have highlighted the area of how JDOM handles some exceptions. In particular the context was XPath expressions. JDOM specifies (and 'always' has specified) that XPath throws JDOMException in the event of a failure on XPath. This has been 'questioned' from the perspective that this would not be the fault of JDOM if the XPath expression failed to compile, or evaluate. Exceptions that are outside the control of the programmer, like IOException, should be thrown and caught, but an illegal XPath is more of a bug/programming error than an Exception, and hence should be treated more like a NullPointerException, IllegalArgumentException, IndexOutOfBoundsException, etc. Certainly it is 'ugly' to have to try/catch even the simplest XPath expressions: List nodes = null; try { nodes = XPath.selectNodes(document, "//tag"); } catch (JDOMException e) { // handle it somehow ... } // do something with nodes. This would all be much simpler if the code throws a RuntimeException instead: List nodes = XPath.selectNodes(document, "//tag"); So, having used XPath as one example, I can then extrapolate the issue in to other general areas (sticking with concepts that are 'old' - in JDOM as well as JDOM2 - JDOM2 has additional areas of concern): 1. SAXOutputter throws JDOMExcepion on all it's calls because it traps SAXException from the output target: http://jdom.org/docs/apidocs/org/jdom/output/SAXOutputter.html#output%28org.jdom.Document%29 2. DOMOutputter throws JDOMException to wrap ParserConfigurationException from Java's DocumentBuilder. 3. XSLTransform throws a subclass of JDOMException. Interestingly, XMLOutputter throws IOException, but not JDOMException. Taking the issue to an abstract level, there are a number of places where JDOM throws the checked exception JDOMException, and that exception requires cumbersome handling in situations where unchecked exceptions would (potentially) be a better choice. There are a number issues at stake here though: 1. In JDOM the JDOMException is specified ( http://jdom.org/docs/apidocs/org/jdom/JDOMException.html ) as being the 'top level Exception JDOM classes can throw'. But that's already *not* true. We have had all sorts of runtime exceptions thrown from various classes like 'Element' which throws IlleglNameException from it's constructor... So, should JDOMException be redefined to be JDOM-specific problems only? 2. Where is the 'line'? Should SAXOutputter throw SAXException instead of JDOMException (like XMLOutputter throws IOException not JDOMException)? Should SAXOutputter throw some new RuntimeException instead? How could the 'system' be described so that this inconsistency of exceptions is better controlled? 3. It creates a major backward-compatibility issue to remove the 'throws JDOMException' from methods. Existing code that does: try { nodes = XPath.selectNodes(document, "//tag"); } catch (JDOMException jde) { // handle it somehow ... } Fails to compile with: [javac] ....\src\java\org\jdom2\test\cases\xpath\AbstractTestXPath.java:595: exception org.jdom2.JDOMException is never thrown in body of corresponding try statement [javac] } catch (JDOMException jde) { [javac] ^ [javac] 1 error I have been playing with the code anyway, and I like the looks of the results of replacing 'strategic' JDOMExceptions with a runtime Exception. For example, I created a new unchecked JDOMRuntimeException class. From this class I created two subclasses: XPathCompileException and XPathEvaluationException. I made all the code 'work' nicely with these exceptions and the code looks very clean. Backward compatibility is 'screwed' though, but somewhat mitigated by the fact that 'old' code can be modified from: ... } catch (JDOMException jde) { ... to ... } catch (JDOMRuntimeException jde) { ... Alternatively, depending on the actual exception handling, the try/catch can be completely removed and handling can be cascaded up to a higher point.... Apart from renaming all the packages to org.jdom2, this would be the most significant migration problem for any users of JDOM/JDOM2. Documenting it as a migration issue should be relatively easy, but the fix would not be a pure search/replace, but the exceptions would have to be identified and fixed individually. Admittedly in a tool like eclipse, it is quite easy to put 'Runtime' in your copy/paste buffer, and go from one compile problem to the next simply looking for the 'unreachable code' problem and adding the 'Runtime' to the middle of 'JDOMException'. Sorry for the long mail, but this is a 'feature' which could make JDOM2 much easier to work with, but would certainly make a migration from JDOM more complicated. Would love some thoughts on this.... Rolf From mike at saxonica.com Wed Jan 18 01:12:23 2012 From: mike at saxonica.com (Michael Kay) Date: Wed, 18 Jan 2012 09:12:23 +0000 Subject: [jdom-interest] JDOM2 and Runtime Exceptions In-Reply-To: <4F163CFE.4030209@tuis.net> References: <4F163CFE.4030209@tuis.net> Message-ID: <4F168CF7.2000706@saxonica.com> On 18/01/2012 03:31, Rolf Lear wrote: > Hi all. > > Recent discussions have highlighted the area of how JDOM handles some > exceptions. In particular the context was XPath expressions. JDOM > specifies (and 'always' has specified) that XPath throws JDOMException > in the event of a failure on XPath. This has been 'questioned' from > the perspective that this would not be the fault of JDOM if the XPath > expression failed to compile, or evaluate. If A calls B, and B calls C, and C fails, I think it's very much an open question whether B should attempt to translate/interpret any errors coming from C before passing them back to A. To some extent it depends on the level of transparency - if it's obvious to A that the request will involve a call on C, then perhaps passing back C's exception unchanged is reasonable. But if B wants to encapsulate C, and have flexibility to choose different service suppliers (e.g. to call D instead of calling C), then it's tough on A to pass back an exception from a component it didn't know was involved. Might JDOM ever choose to invoke a different XPath provider, or to include its own XPath engine? For example, one that supports XPath 2.0? In that case, exposing third-party exceptions would be an embarrassment. > > > Exceptions that are outside the control of the programmer, like > IOException, should be thrown and caught, but an illegal XPath is more > of a bug/programming error than an Exception, and hence should be > treated more like a NullPointerException, IllegalArgumentException, > IndexOutOfBoundsException, etc. Again this is an open question. URISyntaxException is very similar to a compile-time XPath exception in this regard, and that is a checked exception (and yes, it can be a pain). On the other hand PatternSyntaxException is unchecked. There's no logical reason to make them different. I'm one of those who believes that the discipline and extra effort caused by having to think about exceptions makes for better engineered and more robust programs. I hate C# from this perspective; you never know whether you have tested the exception handling code in your application adequately. Similarly StAX is a mess from the exception handling point of view - Sax, where every method can throw SAXException, is much easier to work with. > > > Would love some thoughts on this.... > > I don't think you'll please everyone here, but even without the compatibility implications, I'm not convinced that moving to unchecked exceptions would be an improvement. From noel at peralex.com Wed Jan 18 03:38:44 2012 From: noel at peralex.com (Noel Grandin) Date: Wed, 18 Jan 2012 13:38:44 +0200 Subject: [jdom-interest] JDOM2 and Runtime Exceptions In-Reply-To: <4F163CFE.4030209@tuis.net> References: <4F163CFE.4030209@tuis.net> Message-ID: <4F16AF44.6090400@peralex.com> I agree that programming errors should throw something that extends RuntimeException. If you're going to make a change like that, JDOM2 is the right time to do it :-) Regards, Noel Grandin On 2012-01-18 05:31, Rolf Lear wrote: > Hi all. > > Recent discussions have highlighted the area of how JDOM handles some > exceptions. In particular the context was XPath expressions. JDOM > specifies (and 'always' has specified) that XPath throws JDOMException > in the event of a failure on XPath. This has been 'questioned' from > the perspective that this would not be the fault of JDOM if the XPath > expression failed to compile, or evaluate. > > Exceptions that are outside the control of the programmer, like > IOException, should be thrown and caught, but an illegal XPath is more > of a bug/programming error than an Exception, and hence should be > treated more like a NullPointerException, IllegalArgumentException, > IndexOutOfBoundsException, etc. > > Certainly it is 'ugly' to have to try/catch even the simplest XPath > expressions: > > List nodes = null; > try { > nodes = XPath.selectNodes(document, "//tag"); > } catch (JDOMException e) { > // handle it somehow > ... > } > // do something with nodes. > > This would all be much simpler if the code throws a RuntimeException > instead: > > List nodes = XPath.selectNodes(document, "//tag"); > > > > So, having used XPath as one example, I can then extrapolate the issue > in to other general areas (sticking with concepts that are 'old' - in > JDOM as well as JDOM2 - JDOM2 has additional areas of concern): > 1. SAXOutputter throws JDOMExcepion on all it's calls because it traps > SAXException from the output target: > http://jdom.org/docs/apidocs/org/jdom/output/SAXOutputter.html#output%28org.jdom.Document%29 > 2. DOMOutputter throws JDOMException to wrap > ParserConfigurationException from Java's DocumentBuilder. > 3. XSLTransform throws a subclass of JDOMException. > > Interestingly, XMLOutputter throws IOException, but not JDOMException. > > > Taking the issue to an abstract level, there are a number of places > where JDOM throws the checked exception JDOMException, and that > exception requires cumbersome handling in situations where unchecked > exceptions would (potentially) be a better choice. > > > There are a number issues at stake here though: > > 1. In JDOM the JDOMException is specified ( > http://jdom.org/docs/apidocs/org/jdom/JDOMException.html ) as being > the 'top level Exception JDOM classes can throw'. But that's already > *not* true. We have had all sorts of runtime exceptions thrown from > various classes like 'Element' which throws IlleglNameException from > it's constructor... So, should JDOMException be redefined to be > JDOM-specific problems only? > > 2. Where is the 'line'? Should SAXOutputter throw SAXException instead > of JDOMException (like XMLOutputter throws IOException not > JDOMException)? Should SAXOutputter throw some new RuntimeException > instead? How could the 'system' be described so that this > inconsistency of exceptions is better controlled? > > 3. It creates a major backward-compatibility issue to remove the > 'throws JDOMException' from methods. Existing code that does: > > try { > nodes = XPath.selectNodes(document, "//tag"); > } catch (JDOMException jde) { > // handle it somehow > ... > } > > Fails to compile with: > > [javac] > ....\src\java\org\jdom2\test\cases\xpath\AbstractTestXPath.java:595: > exception org.jdom2.JDOMException is never thrown in body of > corresponding try statement > [javac] } catch (JDOMException jde) { > [javac] ^ > [javac] 1 error > > > > > I have been playing with the code anyway, and I like the looks of the > results of replacing 'strategic' JDOMExceptions with a runtime > Exception. For example, I created a new unchecked JDOMRuntimeException > class. From this class I created two subclasses: XPathCompileException > and XPathEvaluationException. I made all the code 'work' nicely with > these exceptions and the code looks very clean. > > Backward compatibility is 'screwed' though, but somewhat mitigated by > the fact that 'old' code can be modified from: > > ... > } catch (JDOMException jde) { > ... > > > to > > ... > } catch (JDOMRuntimeException jde) { > ... > > Alternatively, depending on the actual exception handling, the > try/catch can be completely removed and handling can be cascaded up to > a higher point.... > > > Apart from renaming all the packages to org.jdom2, this would be the > most significant migration problem for any users of JDOM/JDOM2. > Documenting it as a migration issue should be relatively easy, but the > fix would not be a pure search/replace, but the exceptions would have > to be identified and fixed individually. > > Admittedly in a tool like eclipse, it is quite easy to put 'Runtime' > in your copy/paste buffer, and go from one compile problem to the next > simply looking for the 'unreachable code' problem and adding the > 'Runtime' to the middle of 'JDOMException'. > > > > Sorry for the long mail, but this is a 'feature' which could make > JDOM2 much easier to work with, but would certainly make a migration > from JDOM more complicated. > > > Would love some thoughts on this.... > > Rolf > _______________________________________________ > To control your jdom-interest membership: > http://www.jdom.org/mailman/options/jdom-interest/youraddr at yourhost.com > Disclaimer: http://www.peralex.com/disclaimer.html From jdom at tuis.net Wed Jan 18 17:03:54 2012 From: jdom at tuis.net (Rolf Lear) Date: Wed, 18 Jan 2012 20:03:54 -0500 Subject: [jdom-interest] JDOM2 and Runtime Exceptions In-Reply-To: <4F163CFE.4030209@tuis.net> References: <4F163CFE.4030209@tuis.net> Message-ID: <4F176BFA.9060505@tuis.net> Hi all. This issue has been nagging at me. I finally pulled out my copy of 'Effective Java'. Quoting some sections (Item 58): > Use checked exceptions for conditions from which the caller can reasonably be expected to recover. Each checked exception ... is therefore a potent indication to the API user that the associated condition is a possible outcome. [this] presents a mandate [for the API user] to recover from the condition. > If a program throws an unchecked exception ... it is generally the case that recovery is impossible and continued execution would do more harm than good. Use runtime exceptions to indicate programming errors. The great majority of runtime exceptions indicate precondition violations. Precondition violation is simply a failure by the caller to adhere to the contract established by the API specification. Putting the logic together like the above makes sense. It makes sense that a 'null' XPath expression is a 'precondition violation', and hence a NullPointerException, and it also makes sense that an invalid XPath expression is something that the caller can reasonably be expected to recover from, and should be checked - even if it is inconvenient sometimes... Thus, I think I have it settled in my mind that changing to an unchecked exception is wrong (even if the code looks a lot prettier). I think I may still differentiate between an XPath 'compile' exception, and an XPath 'evaluation' Exception instead of using a blanket JDOMException. Psychologically that makes it an 'XPath' problem, not a JDOM problem. Rolf On 17/01/2012 10:31 PM, Rolf Lear wrote: > Hi all. > > Recent discussions have highlighted the area of how JDOM handles some > exceptions. In particular the context was XPath expressions. JDOM > specifies (and 'always' has specified) that XPath throws JDOMException > in the event of a failure on XPath. This has been 'questioned' from > the perspective that this would not be the fault of JDOM if the XPath > expression failed to compile, or evaluate. > > Exceptions that are outside the control of the programmer, like > IOException, should be thrown and caught, but an illegal XPath is more > of a bug/programming error than an Exception, and hence should be > treated more like a NullPointerException, IllegalArgumentException, > IndexOutOfBoundsException, etc. > > Certainly it is 'ugly' to have to try/catch even the simplest XPath > expressions: > > List nodes = null; > try { > nodes = XPath.selectNodes(document, "//tag"); > } catch (JDOMException e) { > // handle it somehow > ... > } > // do something with nodes. > > This would all be much simpler if the code throws a RuntimeException > instead: > > List nodes = XPath.selectNodes(document, "//tag"); > > > > So, having used XPath as one example, I can then extrapolate the issue > in to other general areas (sticking with concepts that are 'old' - in > JDOM as well as JDOM2 - JDOM2 has additional areas of concern): > 1. SAXOutputter throws JDOMExcepion on all it's calls because it traps > SAXException from the output target: > http://jdom.org/docs/apidocs/org/jdom/output/SAXOutputter.html#output%28org.jdom.Document%29 > 2. DOMOutputter throws JDOMException to wrap > ParserConfigurationException from Java's DocumentBuilder. > 3. XSLTransform throws a subclass of JDOMException. > > Interestingly, XMLOutputter throws IOException, but not JDOMException. > > > Taking the issue to an abstract level, there are a number of places > where JDOM throws the checked exception JDOMException, and that > exception requires cumbersome handling in situations where unchecked > exceptions would (potentially) be a better choice. > > > There are a number issues at stake here though: > > 1. In JDOM the JDOMException is specified ( > http://jdom.org/docs/apidocs/org/jdom/JDOMException.html ) as being > the 'top level Exception JDOM classes can throw'. But that's already > *not* true. We have had all sorts of runtime exceptions thrown from > various classes like 'Element' which throws IlleglNameException from > it's constructor... So, should JDOMException be redefined to be > JDOM-specific problems only? > > 2. Where is the 'line'? Should SAXOutputter throw SAXException instead > of JDOMException (like XMLOutputter throws IOException not > JDOMException)? Should SAXOutputter throw some new RuntimeException > instead? How could the 'system' be described so that this > inconsistency of exceptions is better controlled? > > 3. It creates a major backward-compatibility issue to remove the > 'throws JDOMException' from methods. Existing code that does: > > try { > nodes = XPath.selectNodes(document, "//tag"); > } catch (JDOMException jde) { > // handle it somehow > ... > } > > Fails to compile with: > > [javac] > ....\src\java\org\jdom2\test\cases\xpath\AbstractTestXPath.java:595: > exception org.jdom2.JDOMException is never thrown in body of > corresponding try statement > [javac] } catch (JDOMException jde) { > [javac] ^ > [javac] 1 error > > > > > I have been playing with the code anyway, and I like the looks of the > results of replacing 'strategic' JDOMExceptions with a runtime > Exception. For example, I created a new unchecked JDOMRuntimeException > class. From this class I created two subclasses: XPathCompileException > and XPathEvaluationException. I made all the code 'work' nicely with > these exceptions and the code looks very clean. > > Backward compatibility is 'screwed' though, but somewhat mitigated by > the fact that 'old' code can be modified from: > > ... > } catch (JDOMException jde) { > ... > > > to > > ... > } catch (JDOMRuntimeException jde) { > ... > > Alternatively, depending on the actual exception handling, the > try/catch can be completely removed and handling can be cascaded up to > a higher point.... > > > Apart from renaming all the packages to org.jdom2, this would be the > most significant migration problem for any users of JDOM/JDOM2. > Documenting it as a migration issue should be relatively easy, but the > fix would not be a pure search/replace, but the exceptions would have > to be identified and fixed individually. > > Admittedly in a tool like eclipse, it is quite easy to put 'Runtime' > in your copy/paste buffer, and go from one compile problem to the next > simply looking for the 'unreachable code' problem and adding the > 'Runtime' to the middle of 'JDOMException'. > > > > Sorry for the long mail, but this is a 'feature' which could make > JDOM2 much easier to work with, but would certainly make a migration > from JDOM more complicated. > > > Would love some thoughts on this.... > > Rolf > _______________________________________________ > To control your jdom-interest membership: > http://www.jdom.org/mailman/options/jdom-interest/youraddr at yourhost.com > From jdom at tuis.net Thu Jan 19 12:41:41 2012 From: jdom at tuis.net (Rolf Lear) Date: Thu, 19 Jan 2012 15:41:41 -0500 Subject: [jdom-interest] JDOM2 and Runtime Exceptions In-Reply-To: <4F18709E.3020502@xerox.com> References: <4F163CFE.4030209@tuis.net> <4F168CF7.2000706@saxonica.com> <4F18709E.3020502@xerox.com> Message-ID: <69c4dc0f0b038d49a45074da43984dd1@tuis.net> Hi Leigh, all Despite my earlier mail referencing 'Effective Java', I went further in to the book, and it then contradicts itself in "item 59" which claims "Avoid unnecessary use of checked exceptions". It quite clearly contradicts the "item 58".... so even Bloch is not able to clearly define a 'rule' for checked exceptions. This process has been an exercise of frustration. It is quite clear that there is no clear 'right' way of doing things. There is no clear 'precedent' on how it should be done either. Should XPath be like regex with a compile() and match() process, neither of which throw checked exceptions? The 'similarity' between XPath and Regex is quite convincing... Despite my earlier claim that xpath exceptions can be 'recovered from easily by the caller', I am not actually convinced. How do you 'recover' from a bad expression? How do you recover from an expression that does arithmatic with a value that is non-numeric? The argument for having checked exceptions is very unclear, and the convenience of unchecked exceptions is substantial. In a 'fresh' world, if I were writing the JDOM/XPath API from scratch, I think it would be very reasonable to throw an XPathSyntaxException for bad XPaths just like java.util.regex.Pattern throws PatternSyntaxException. Similarly XPathEvaluationException for issues encountered in the document. But backward compatibility is a big issue too. I think a big part of the API problem is because it is so closely tied to Jaxen. Jaxen throws checked exceptions too. I am not saying that checked exceptions are wring, but nor are they right. On the train this morning I played again with the JDOM/XPath API. I think I have a working solution, and I think I am more comfortable with it. It took a while to come to, but Java already has a well defined process for it... ;-) Deprecation. The thrown exceptions of a method are part of it's public API. I don't like the JDOMException thrown from XPath methods, so I am going to deprecate them.... JDOM 1.x users will get compile warnings, not errors. That's the compatibility problem solved. Then, I break down the XPath in to a 'compile' and 'evaluate' step, and make them throw unchecked exceptions that make sense for the particular issue. The new methods will be called 'XPath compile(...)' instead of newInstance(...), and I think I will call the new execution methods List matchAll(context) and T matchFirst(context) . I have looked in to XPath 2.0, and by being smart with the API, and 'nice' with the option of applying a Filter directly to the XPath, I think it is reasonable to have the best of both worlds. With the changes as I have them now I think plugging in a different XPath2.0 back-end should be easy when one is available, and it will 'just work'. XPath 2.0 clearly differentiates between 'static analysis' portion of the XPath, and the 'dynamic evaluation' stage. Since this is such a grey area I think someone needs to just 'decide', and I think I will do just that.... Deprecate the old methods, keep their signatures unchanged (including exceptions), and implement a new, clean, unchecked, and generified set of methods. I like the idea of XPath being similar in 'feel' to RegEx. Time for me to get on the train again, and spend an hour playing with what feels right. Rolf On Thu, 19 Jan 2012 11:35:58 -0800, Leigh L Klotz Jr wrote: > On 01/18/2012 01:12 AM, Michael Kay wrote: >> I don't think you'll please everyone here, but even without the >> compatibility implications, I'm not convinced that moving to unchecked >> exceptions would be an improvement. >> > > We use JDOM in our hand-written because it is a convenient, expressive > API, giving much of the compactness and other benefits we see from XPath > itself and other higher-level XML interfaces such as XQuery. > > However, we haven't found the JDOM1 XPath Java interface to be > convenient or expressive, because of the verbosity and the checked > exceptions, which in our case are all programming errors of one sort or > another. (We don't let end users type in XPath expressions.) Instead, > we use a static JDOMUtil wrapper class with methods such as > selectElement, selectElements, selectAttributes, selectContent, and ref > (leaf-node value). > > So for us, the JDOM XPath API is a implementation of a way to run XPath > expressions over JDOM objects, and not a convenient, expressive API that > we use to hand write Java code. > > JDOM2 with the filters may offer an expressive API that would let us do > away with the profusion of select* utility methods, but with checked > exceptions it still won't be convenient, and we still won't use it > directly. > > Leaving in the checked exceptions means less migration headache for > other users, and since we're not going to use it directly, it doesn't > matter much. Another reason we may shift away from JDOM XPath API is > that we're disenchanted with Jaxen as well and are hoping to find a fast > (at runtime) way to use Saxon on JDOM from hand-written Java code. That > probably won't use the JDOM XPath API at all. > > Leigh. From leigh.klotz at xerox.com Thu Jan 19 14:06:17 2012 From: leigh.klotz at xerox.com (Leigh L Klotz Jr) Date: Thu, 19 Jan 2012 14:06:17 -0800 Subject: [jdom-interest] JDOM2 and Runtime Exceptions In-Reply-To: <69c4dc0f0b038d49a45074da43984dd1@tuis.net> References: <4F163CFE.4030209@tuis.net> <4F168CF7.2000706@saxonica.com> <4F18709E.3020502@xerox.com> <69c4dc0f0b038d49a45074da43984dd1@tuis.net> Message-ID: <4F1893D9.5050206@xerox.com> Given what I decided about our usage of org.jdom.xpath packages being isolated, the issue of exception checking isn't a big one for me, but sadly that's because we can't much use it anyway. If you're interested in doing refactoring, making it easier to use a different XPath implementation would be my suggested goal. Leigh. From laurent.bihanic at atos.net Fri Jan 20 01:26:48 2012 From: laurent.bihanic at atos.net (BIHANIC Laurent) Date: Fri, 20 Jan 2012 09:26:48 +0000 Subject: [jdom-interest] suggested JDOM2 improvements In-Reply-To: References: <4F152CD8.5030508@uni-jena.de> Message-ID: <4F193347.7040208@atos.net> Hi Rolf, Le 17/01/12 14:42, Rolf Lear a ?crit : > In the 'simple' XPath case (no extra namespaces, no Variables) XPath is > still a 1-liner, which is hard to beat. In a complicated case I see more > complexity trying to 'massage' your Namespaces and Variables in to some new > type structures (the varargs) than the existing concept of adding > Namespaces and setting variables. > > Also, keeping backward compatibility is a strong consideration. > > In reality I think I would like to see code examples of what you think it > 'should' look like to get a better idea, but at the moment i am not > convinced that it's actually broken enough to require fixing. Well, as 99% of our XML use namespaces, using JDOM XPath is not a 1-liner. And as the XPath API throws non-runtime exceptions, pre-compiling XPath expressions (as we do for regex) requires using a class initializer to map JDOMException to runtime exceptions. The only case where we can't compile XPath expressions is when we want to use variables. Which defeats the whole purpose of compiling XPath! Or we have to use thread-local compiled XPaths. So, I think it would be great to split the XPath API in two parts. One for constructing compiled XPath expressions, including the namespaces, using either a constructor/factory method with varargs, e.g. compile(String expr, Namespace... namespace), or a builder/DSL. The result being an immutable thread-safe XPath object. This part would only throw runtime exceptions, IllegalArgumentException seeming sufficient. A second for evaluating compiled XPaths on documents, taking optional variable bindings as argument and throwing regular exceptions, e.g. find(context, Map bindings) If we go this way, we should leave the existing XPath class unchanged and deprecate it and create a new separate class. Regards, Laurent ________________________________ Ce message et les pi?ces jointes sont confidentiels et r?serv?s ? l'usage exclusif de ses destinataires. Il peut ?galement ?tre prot?g? par le secret professionnel. Si vous recevez ce message par erreur, merci d'en avertir imm?diatement l'exp?diteur et de le d?truire. L'int?grit? du message ne pouvant ?tre assur?e sur Internet, la responsabilit? du groupe Atos ne pourra ?tre engag?e quant au contenu de ce message. Bien que les meilleurs efforts soient faits pour maintenir cette transmission exempte de tout virus, l'exp?diteur ne donne aucune garantie ? cet ?gard et sa responsabilit? ne saurait ?tre engag?e pour tout dommage r?sultant d'un virus transmis. This e-mail and the documents attached are confidential and intended solely for the addressee; it may also be privileged. If you receive this e-mail in error, please notify the sender immediately and destroy it. As its integrity cannot be secured on the Internet, the Atos group liability cannot be triggered for the message content. Although the sender endeavors to maintain a computer virus-free network, the sender does not warrant that this transmission is virus-free and will not be liable for any damages resulting from any virus transmitted. From mike at saxonica.com Fri Jan 20 02:45:19 2012 From: mike at saxonica.com (Michael Kay) Date: Fri, 20 Jan 2012 10:45:19 +0000 Subject: [jdom-interest] suggested JDOM2 improvements In-Reply-To: <4F193347.7040208@atos.net> References: <4F152CD8.5030508@uni-jena.de> <4F193347.7040208@atos.net> Message-ID: <4F1945BF.30201@saxonica.com> >The only case where we can't compile XPath expressions is when we want to use variables. Which defeats the whole purpose of compiling XPath! Absolutely! >Or we have to use thread-local compiled XPaths. So, I think it would be great to split the XPath API in two parts. That' definitely the way to go if you're making changes to this area. If you're not familiar with it, do take a look at the s9api design in Saxon: http://www.saxonica.com/documentation/javadoc/net/sf/saxon/s9api/XPathCompiler.html That involves three classes: XPathCompiler contains the static context (variable and namespace declarations) XPathExecutable is the thread-safe compiled and reusable XPath expression XPathEvaluator contains the dynamic context (variable values, context item) You can eliminate the XPathEvaluator by having a more complex evaluate() method on the XPathExecutable, e.g. one that supplies the variable values as a Map; but this doesn't reduce the overall number of objects involved, it just replaces the XPathEvaluator object with a Map object. The other big design problem with an XPath API is the types used for variable values and for the evaluation result. With the JAXP API I get an enormous amount of support hassle caused by the lack of type safety in the way JAXP does this. In s9api I decided, despite the complexity, to introduce classes XdmValue, XdmItem, XdmAtomicValue etc to make the whole thing type-safe, and I don't regret the decision. (I also have XdmNode which abstracts over DOM, JDOM, XOM etc nodes.) If you're designing a new XPath API in 2012 then I think it's essential to think about how it will support XPath 2.0. Michael Kay Saxonica From noel at peralex.com Fri Jan 20 04:57:11 2012 From: noel at peralex.com (Noel Grandin) Date: Fri, 20 Jan 2012 14:57:11 +0200 Subject: [jdom-interest] JDOM2 and Runtime Exceptions In-Reply-To: <69c4dc0f0b038d49a45074da43984dd1@tuis.net> References: <4F163CFE.4030209@tuis.net> <4F168CF7.2000706@saxonica.com> <4F18709E.3020502@xerox.com> <69c4dc0f0b038d49a45074da43984dd1@tuis.net> Message-ID: <4F1964A7.6020001@peralex.com> You are correct, how exactly to use exceptions is still a matter of taste and debate. That being said, I prefer programmatic problems to be unchecked. And I don't think backwards compatibility w.r.t. exceptions is such a big deal - JDOM2 already requires quite a few changes. Changing my catch block and throws clauses is not a big deal, and it's not the kind of change that would subtly corrupt my code either. On 2012-01-19 22:41, Rolf Lear wrote: > Hi Leigh, all > > Despite my earlier mail referencing 'Effective Java', I went further in to > the book, and it then contradicts itself in "item 59" which claims "Avoid > unnecessary use of checked exceptions". It quite clearly contradicts the > "item 58".... so even Bloch is not able to clearly define a 'rule' for > checked exceptions. > > This process has been an exercise of frustration. It is quite clear that > there is no clear 'right' way of doing things. There is no clear > 'precedent' on how it should be done either. > Disclaimer: http://www.peralex.com/disclaimer.html From jdom at tuis.net Fri Jan 20 05:56:56 2012 From: jdom at tuis.net (Rolf Lear) Date: Fri, 20 Jan 2012 08:56:56 -0500 Subject: [jdom-interest] suggested JDOM2 improvements In-Reply-To: <4F1945BF.30201@saxonica.com> References: <4F152CD8.5030508@uni-jena.de> <4F193347.7040208@atos.net> <4F1945BF.30201@saxonica.com> Message-ID: <779b646e68bc8d6f49267a572345c616@tuis.net> I have looked at the Saxon API, as well as the native Java API. I have also looked in to XPath2.0. Mostly my 'experience' with XPath is through the current JDOM API. There are things I like, and things I dislike, and things I have had to relearn because the JDOM/XPath API has skewed my experience. I think I am settling on the following model: 1. deprecate the current XPath entirely. Keep it fully backward compatible with JDOM 1.x 2. new JDOM2 XPathFactory concept which can have different implementation back-ends (Jaxen, Saxon, whatever). 3. XPathFactories are thread-safe and reusable in any threads. 4. have a single 'default' XPathFactory instance obtainable with XPathFactory.instance(). The default back-end instance() can be changed with a system property. 5. the default 'default' back-end will continue to be Jaxen 6. Other back-ends can be used at will by calling the XPathFactory.newInstance(String) method (or some direct constructor on the Factory if it exposes one). 6. At the other end of the system will be an interface XPathCompiled. This will be immutable, but not thread-safe. Similar concept/behaviour to javax.xml.xpath.XPathExpression. 7. XPathCompiled will not have the 'special' valueOf, numberValue, booleanValue that org.jdom.xpath.XPath has. These methods are extensions to the basic XPath concept and make support for other types impossible (like XPath 2.0). 8. Instead, XPathCompiled has a generic type which will match the result values from the expression. The generic type is set by the JDOM Filter. 9. XPathCompiled can return the full list of results, or alternatively just the first result. The results will be type-cast to the specified Filter. 10. The compiling and running methods for the new API will throw unchecked exceptions (like the javax.xml.xpath.* API). That will be the base model. Using this model I expect a base (comprehensive) factory method: public XPathCompiled compile(String xpath, Filter filter, Map variables, List namespaces); In addition there will be variations on the compile method that cater for simplified conditions, like the basic no-namespace, no-variable, no-filter: public XPathCompiled compile(String xpath); The XPathCompiled class will have: public List evaluate(Object context); public T evaluateFirst(Object context); The evaluateFirst method is a convenience method that will be defined to return the first value in the evaluate() results, or null if the result is empty. Implementations can choose to have some short-circuit logic if possible. To make life easier it is helpful to have an intermediate class that can manage the variable and namespace contexts for you. Thus a helper class XPathBuilder will support managing these (getters/setters for variables, namespaces). It will also have a compile() method to create an XPathCompiled using the state of the XPathBuilder at compile time. Since this new API will impose a 'Filter' on top of the XPath results there may/will be times when debugging problems will be a challenge.. for example: Am I missing element X because it was not selected by the XPath or because it was eliminated by the filter? To answer that sort of question there needs to be an XPathResult object which contains the pre and post filtered results (as well as other useful debugging information). Thus, XPathCompiled will also have: public XPathResult evaluateResult(Object context); Examples of the way I see it working are: //the following two are identical: String name = XPathFactory.instance().compile("//name/text()", Filters.string()).evaluateFirst(document); String name = XPathFactory.instance().evaluateFirst(document, "//name/text()", Filters.string()); // just select the current node. Object val = XPathFactory.instance().evaluateFirst(context, "node()"); // create a builder and use it to compile an XPath. XPathBuilder builder = new XPathBuilder(Filters.element()); builder.setXPath("//ns:*"); builder.addNamespace("ns", "http://example.com/mynamespace"); XPathCompiled xpath = builder.compile(XPathFactory.instance()); List mine = xpath.evaluate(mydocument); // Get a diagnostic XPathResult result = XPathFactory.instance().compile("//@*", Filters.element()).evaluateResult(context); if (!result.filtered().isEmpty()) { List filtered = result.filtered(); System.out.println("The following results were selected by the XPath but removed by the Filter: " + fltered.toString()); List survived = result.result(); System.out.println("The following results were selected by the XPath but removed by the Filter: " + survived.toString()); } This is all taking longer than I expected. I think I will have to put a 'proof of concept' out there, and extend the ALPHA release phase..... Rolf In essence this API shifts the 'onus' on ensuring the return value is of the appropriate type to the 'user'. They know the XPath query, they should know the return type. >From what I can tell, this model should be compatible with any back-end, including XPath 2.0. It does not impose any XPath-specific logic modifiers. If you want a 'number' back from your XPath then you need to use the XPath number() function to get one. If you want the XPath result cast as a String using the XPath string-conversion logic, then you should wrap your XPath query in the XPath string() function. This same logic follows through to XPath2.0 On Fri, 20 Jan 2012 10:45:19 +0000, Michael Kay wrote: >>The only case where we can't compile XPath expressions is when we want > to use variables. Which defeats the whole purpose of compiling XPath! > > Absolutely! > > >Or we have to use thread-local compiled XPaths. So, I think it would > be great to split the XPath API in two parts. > > That' definitely the way to go if you're making changes to this area. If > you're not familiar with it, do take a look at the s9api design in Saxon: > > http://www.saxonica.com/documentation/javadoc/net/sf/saxon/s9api/XPathCompiler.html > > That involves three classes: > > XPathCompiler contains the static context (variable and namespace > declarations) > > XPathExecutable is the thread-safe compiled and reusable XPath expression > > XPathEvaluator contains the dynamic context (variable values, context item) > > You can eliminate the XPathEvaluator by having a more complex evaluate() > method on the XPathExecutable, e.g. one that supplies the variable > values as a Map; but this doesn't reduce the overall number of objects > involved, it just replaces the XPathEvaluator object with a Map object. > > The other big design problem with an XPath API is the types used for > variable values and for the evaluation result. With the JAXP API I get > an enormous amount of support hassle caused by the lack of type safety > in the way JAXP does this. In s9api I decided, despite the complexity, > to introduce classes XdmValue, XdmItem, XdmAtomicValue etc to make the > whole thing type-safe, and I don't regret the decision. (I also have > XdmNode which abstracts over DOM, JDOM, XOM etc nodes.) > > If you're designing a new XPath API in 2012 then I think it's essential > to think about how it will support XPath 2.0. > > Michael Kay > Saxonica > > _______________________________________________ > To control your jdom-interest membership: > http://www.jdom.org/mailman/options/jdom-interest/youraddr at yourhost.com From mike at saxonica.com Fri Jan 20 06:31:07 2012 From: mike at saxonica.com (Michael Kay) Date: Fri, 20 Jan 2012 14:31:07 +0000 Subject: [jdom-interest] suggested JDOM2 improvements In-Reply-To: <779b646e68bc8d6f49267a572345c616@tuis.net> References: <4F152CD8.5030508@uni-jena.de> <4F193347.7040208@atos.net> <4F1945BF.30201@saxonica.com> <779b646e68bc8d6f49267a572345c616@tuis.net> Message-ID: <4F197AAB.4050202@saxonica.com> >public XPathCompiled compile(String xpath); I started introducing generics for this in Saxon 9.4 and the experience wasn't wholly positive; it left a lot of cases where there were warnings that needed to be ignored. That may be because I found generics to be deeper and more bewildering than I expected. It's not at all clear to me how your types such as XPathCompiled are supposed to work. Do they rely excessively on the ability of the XPath engine to do static type analysis of the supplied expression? Michael Kay Saxonica From jdom at tuis.net Fri Jan 20 06:50:53 2012 From: jdom at tuis.net (Rolf Lear) Date: Fri, 20 Jan 2012 09:50:53 -0500 Subject: [jdom-interest] suggested JDOM2 improvements In-Reply-To: <4F197AAB.4050202@saxonica.com> References: <4F152CD8.5030508@uni-jena.de> <4F193347.7040208@atos.net> <4F1945BF.30201@saxonica.com> <779b646e68bc8d6f49267a572345c616@tuis.net> <4F197AAB.4050202@saxonica.com> Message-ID: <530e8a5bc8b2312c0a9ff17c8b303ed9@tuis.net> No, no static type analysis. JDOM has 'always' had the 'Filter' concept. You could, for example, do: List comments = element.getContent(new ContentFilter(ContentFilter.COMMENT)); In order to make the above 'generic' in JDOM2, the getContent() has to return an appropriate type for whatever the Filter returns. I 'extended' the Filter class to have a generic return type. Thus, it is now possible to: List comments = element.getContent(Filters.comment()); The Filter implementations all follow the rules: 1. if the content to be filtered does not match the filter, then the content is discareded. 2. if the content matches the filter, then it is explicitly cast to the generic type of the filter. What this means is that you are guaranteed that the generic type of the Filter results is accurate, and it is impossible to 'force' Filter results to have badly-loaded result lists. Filter instances can do more than just type-checking on the input data, but can also do anything else to filter the content, like checking for particular names, etc. With the XPath library, I intend to apply the same Filter concept to the XPath results. Since the user knows the XPath expression, they will also know the anticipated return type. If they want to select Elements then they can apply an Element filter. If they want to select 'everything' then they can use a 'passthough' filter which 'does no filtering' (but as a result can only 'cast' to Object). Essentially the Filter concept is a way to coerce unknown data in to a user defined type while ensuring the results will never generate class-cast, and providing an opportunity to discard what you do not want. It is ideal for XPath results. The 'user' creates their own filter http://hunterhacker.github.com/jdom/jdom2/apidocs/org/jdom2/filter/Filter.html , or reuses one of the 'common' filters accessible in the 'Filters' class http://hunterhacker.github.com/jdom/jdom2/apidocs/org/jdom2/filter/Filters.html Most Filter implementations take a Class instance (matching the generic type of the Filter) as a constructor argument, and any values that match the filter are cast using the Class.cast() method. Rolf On Fri, 20 Jan 2012 14:31:07 +0000, Michael Kay wrote: >>public XPathCompiled compile(String xpath); > > I started introducing generics for this in Saxon 9.4 and the experience > wasn't wholly positive; it left a lot of cases where there were warnings > that needed to be ignored. That may be because I found generics to be > deeper and more bewildering than I expected. > > It's not at all clear to me how your types such as > XPathCompiled are supposed to work. Do they rely excessively on > the ability of the XPath engine to do static type analysis of the > supplied expression? > > Michael Kay > Saxonica From mike at saxonica.com Fri Jan 20 06:57:02 2012 From: mike at saxonica.com (Michael Kay) Date: Fri, 20 Jan 2012 14:57:02 +0000 Subject: [jdom-interest] suggested JDOM2 improvements In-Reply-To: <530e8a5bc8b2312c0a9ff17c8b303ed9@tuis.net> References: <4F152CD8.5030508@uni-jena.de> <4F193347.7040208@atos.net> <4F1945BF.30201@saxonica.com> <779b646e68bc8d6f49267a572345c616@tuis.net> <4F197AAB.4050202@saxonica.com> <530e8a5bc8b2312c0a9ff17c8b303ed9@tuis.net> Message-ID: <4F1980BE.4020702@saxonica.com> Thanks for the explanation. I wonder, though, if discarding data of the wrong type is better than throwing a ClassCastException? It's very easy in XPath, for example, to ask for a text node when you thought you were asking for a string. Expressions that return nothing are the hardest thing to debug as it is. Michael Kay Saxonica On 20/01/2012 14:50, Rolf Lear wrote: > No, no static type analysis. > > JDOM has 'always' had the 'Filter' concept. You could, for example, do: > > List comments = element.getContent(new > ContentFilter(ContentFilter.COMMENT)); > > In order to make the above 'generic' in JDOM2, the getContent() has to > return an appropriate type for whatever the Filter returns. I 'extended' > the Filter class to have a generic return type. Thus, it is now possible > to: > > List comments = element.getContent(Filters.comment()); > > The Filter implementations all follow the rules: > 1. if the content to be filtered does not match the filter, then the > content is discareded. > 2. if the content matches the filter, then it is explicitly cast to the > generic type of the filter. > > What this means is that you are guaranteed that the generic type of the > Filter results is accurate, and it is impossible to 'force' Filter results > to have badly-loaded result lists. > > Filter instances can do more than just type-checking on the input data, > but can also do anything else to filter the content, like checking for > particular names, etc. > > With the XPath library, I intend to apply the same Filter concept to the > XPath results. > > Since the user knows the XPath expression, they will also know the > anticipated return type. If they want to select Elements then they can > apply an Element filter. If they want to select 'everything' then > they can use a 'passthough' filter which 'does no filtering' (but as a > result can only 'cast' to Object). > > Essentially the Filter concept is a way to coerce unknown data in to a > user defined type while ensuring the results will never generate > class-cast, and providing an opportunity to discard what you do not want. > It is ideal for XPath results. > > The 'user' creates their own filter > http://hunterhacker.github.com/jdom/jdom2/apidocs/org/jdom2/filter/Filter.html > , or reuses one of the 'common' filters accessible in the 'Filters' class > http://hunterhacker.github.com/jdom/jdom2/apidocs/org/jdom2/filter/Filters.html > > Most Filter implementations take a Class instance (matching the generic > type of the Filter) as a constructor argument, and any values that match > the filter are cast using the Class.cast() method. > > Rolf > > > On Fri, 20 Jan 2012 14:31:07 +0000, Michael Kay wrote: >>> public XPathCompiled compile(String xpath); >> I started introducing generics for this in Saxon 9.4 and the experience >> wasn't wholly positive; it left a lot of cases where there were warnings >> that needed to be ignored. That may be because I found generics to be >> deeper and more bewildering than I expected. >> >> It's not at all clear to me how your types such as >> XPathCompiled are supposed to work. Do they rely excessively on >> the ability of the XPath engine to do static type analysis of the >> supplied expression? >> >> Michael Kay >> Saxonica From jdom at tuis.net Fri Jan 20 07:16:16 2012 From: jdom at tuis.net (Rolf Lear) Date: Fri, 20 Jan 2012 10:16:16 -0500 Subject: [jdom-interest] suggested JDOM2 improvements In-Reply-To: <4F1980BE.4020702@saxonica.com> References: <4F152CD8.5030508@uni-jena.de> <4F193347.7040208@atos.net> <4F1945BF.30201@saxonica.com> <779b646e68bc8d6f49267a572345c616@tuis.net> <4F197AAB.4050202@saxonica.com> <530e8a5bc8b2312c0a9ff17c8b303ed9@tuis.net> <4F1980BE.4020702@saxonica.com> Message-ID: <2252a7c03b6c5fe5a1829818f8b86c8f@tuis.net> I agree with the debug issue. That is exactly why in the model I intend to provide I will make it possible to return the 'XPathResult' and not just a List. The XPathResult will allow you to inspect the base XPath results in a List as well as the filter results in the List format. XPath has always been a vulnerable area for type-casting. Nothing has stopped the user from coding inappropriate casts for XPath results. With JDOM2 the user will have the option of trading class-cast-exceptions for missing result conditions. If the user is anxious to keep the class-cast-exception option then they can choose to use unfiltered XPath results. In general, a user writing an XPath expression has to know ahead of time what the return types will be (including XPath 2.0 with it's plethora of atomic types). Using the Filter concept allows the user to anticipate the type of his/her choice, and not have to statically build the type in to the API. Mitigating the debug issue with a XPathResult with useful methods interrogating intermediate results (and a useful toString()) is a good compromise, I think. As long as people understand that the XPath results are 'filtered' a second time then everything should be fine. Remember that the users can always elect to have unfiltered results too, but then they have to live with List results. Rolf On Fri, 20 Jan 2012 14:57:02 +0000, Michael Kay wrote: > Thanks for the explanation. > > I wonder, though, if discarding data of the wrong type is better than > throwing a ClassCastException? It's very easy in XPath, for example, to > ask for a text node when you thought you were asking for a string. > Expressions that return nothing are the hardest thing to debug as it is. > > Michael Kay > Saxonica > > On 20/01/2012 14:50, Rolf Lear wrote: >> No, no static type analysis. >> >> JDOM has 'always' had the 'Filter' concept. You could, for example, do: >> >> List comments = element.getContent(new >> ContentFilter(ContentFilter.COMMENT)); >> >> In order to make the above 'generic' in JDOM2, the getContent() has to >> return an appropriate type for whatever the Filter returns. I 'extended' >> the Filter class to have a generic return type. Thus, it is now possible >> to: >> >> List comments = element.getContent(Filters.comment()); >> >> The Filter implementations all follow the rules: >> 1. if the content to be filtered does not match the filter, then the >> content is discareded. >> 2. if the content matches the filter, then it is explicitly cast to the >> generic type of the filter. >> >> What this means is that you are guaranteed that the generic type of the >> Filter results is accurate, and it is impossible to 'force' Filter >> results >> to have badly-loaded result lists. >> >> Filter instances can do more than just type-checking on the input data, >> but can also do anything else to filter the content, like checking for >> particular names, etc. >> >> With the XPath library, I intend to apply the same Filter concept to the >> XPath results. >> >> Since the user knows the XPath expression, they will also know the >> anticipated return type. If they want to select Elements then they can >> apply an Element filter. If they want to select 'everything' then >> they can use a 'passthough' filter which 'does no filtering' (but as a >> result can only 'cast' to Object). >> >> Essentially the Filter concept is a way to coerce unknown data in to a >> user defined type while ensuring the results will never generate >> class-cast, and providing an opportunity to discard what you do not want. >> It is ideal for XPath results. >> >> The 'user' creates their own filter >> http://hunterhacker.github.com/jdom/jdom2/apidocs/org/jdom2/filter/Filter.html >> , or reuses one of the 'common' filters accessible in the 'Filters' class >> http://hunterhacker.github.com/jdom/jdom2/apidocs/org/jdom2/filter/Filters.html >> >> Most Filter implementations take a Class instance (matching the generic >> type of the Filter) as a constructor argument, and any values that match >> the filter are cast using the Class.cast() method. >> >> Rolf >> >> >> On Fri, 20 Jan 2012 14:31:07 +0000, Michael Kay >> wrote: >>>> public XPathCompiled compile(String xpath); >>> I started introducing generics for this in Saxon 9.4 and the experience >>> wasn't wholly positive; it left a lot of cases where there were warnings >>> that needed to be ignored. That may be because I found generics to be >>> deeper and more bewildering than I expected. >>> >>> It's not at all clear to me how your types such as >>> XPathCompiled are supposed to work. Do they rely excessively >>> on >>> the ability of the XPath engine to do static type analysis of the >>> supplied expression? >>> >>> Michael Kay >>> Saxonica From jdom at tuis.net Fri Jan 20 08:13:41 2012 From: jdom at tuis.net (Rolf Lear) Date: Fri, 20 Jan 2012 11:13:41 -0500 Subject: [jdom-interest] suggested JDOM2 improvements In-Reply-To: References: <4F152CD8.5030508@uni-jena.de> <4F193347.7040208@atos.net> <4F1945BF.30201@saxonica.com> <779b646e68bc8d6f49267a572345c616@tuis.net> <4F197AAB.4050202@saxonica.com> <530e8a5bc8b2312c0a9ff17c8b303ed9@tuis.net> <4F1980BE.4020702@saxonica.com> <2252a7c03b6c5fe5a1829818f8b86c8f@tuis.net> Message-ID: No, I have not considered that. It is important for JDOM2 to get the API right. I do not want to be deprecating anything after 2.0 I am targeting a second alpha release for Groundhog Day (feb 2nd). I am expecting to have a memory-efficiency improvement and any other API changes in for that release (currently only XPath has concerns). Additionally there are a few issues I am working on for that release: https://github.com/hunterhacker/jdom/issues I intend to clear out all the issues except the serialization (which is a major pain). All the others will either be rejected, or resolved. Assuming no other issues I anticipate keeping to the schedule: http://markmail.org/message/dqxabjn56vt3dbik It is pretty tight already, and the quality of the release is strongly dependent on how much feedback there is.... ... really, I would love for people to get more involved. If anyone has contributions or wish-list items for JDOM they should speak up. With this XPath API change I think I will push out an intermediate ALPHA sometime this weekend with the new XPath API in as a 'trial' for people to play with and criticise. I will perhaps out out another intermediate ALPHA with the memory-efficiency changes sometime after that... but I have not yet started working on that properly... soon. So, expect a somewhat quick turnaround in the next two weeks for ALPHA_XP, ALPHA_MEM and ALPHA_GH (XPath, Memory, and Ground-Hog Day) respectively. If anything else comes up for development before then I will probably have to slip the timetable somehow. That's if it is just me working on it. Rolf On Fri, 20 Jan 2012 07:42:51 -0800, Joe Bowbeer wrote: > Any thought to releasing JDOM2 with the existing functionality and > targeting XPath redesign for JDOM2.1? > From jdom at tuis.net Fri Jan 20 12:28:04 2012 From: jdom at tuis.net (Rolf Lear) Date: Fri, 20 Jan 2012 15:28:04 -0500 Subject: [jdom-interest] suggested JDOM2 improvements In-Reply-To: <4F19C1B6.6060406@xerox.com> References: <4F152CD8.5030508@uni-jena.de><4F193347.7040208@atos.net> <4F1945BF.30201@saxonica.com> <779b646e68bc8d6f49267a572345c616@tuis.net> <4F19C1B6.6060406@xerox.com> Message-ID: XPath and JDOM have always been very loosely coupled. For years (and still now) there is no need to have direct support for XPath in JDOM. Saxon does OK without using any of the JDOM/XPath code for example. What the XPath code in JDOM does (or should do) is to provide a convenient interface for the functionality. The native javax.xml.xpath.* entry point is not useful because the JDOM classes do not conform to the same NODESET type model. So, given the alternatives: 1. shoehorn XPath support on top of the javax.xml.xpath.* model 2. continue with JDOM1 XPath model 3. build new 'better' model 4. remove xpath support form JDOM and let it be a 3rd-party add-on. I think 3 is the best. But, there are problems with having the support: if you claim support, it has to actually work when needed. This in turn means that you have to have some sort of starting point. Jaxen is the only viable alternative (at the moment) that I know of simply because it's licensing is permissive enough and it has the right history with JDOM (sorry Michael, Saxon does not make sense for a ship-it-with-JDOM library). So, to make a working default system, but then provide the mechanisms to customize it. But, you should not change the 'global' default implementaiton from within Java code. This is because JDOM is often used in multiple places of code: for example, eclipse has JDOM built in. Hypothetically lets say the Eclispe 'Git' plugin changes the 'default' XPath backend to some new XPath2.0 custom value, then suddenly the 'CVS' plugin is no longer getting the results it wants. Setting a JVM-wide System property is a compromise already. It has real problems because people think they can race the static initializer to change the System property before it is used the first time.... It is accessed only once on the first time the XPathFactory is created. On the other hand, because XPathFactory instances are specified to be thread-safe, there is nothing stopping you from doing: public static final XPathFactory XPATH = XPathFactory.newInstance("com.example.xpath20.XPathFactory"); Then in your code you can freely use: XPathCompiled xp = XPATH.compile("//*"); You have in fact been exploiting one of the major flaws in the JDOM 1.x XPath library: that there is no way to have multiple concurrent XPath libraries active at the same time. When you do: XPath.setXPathClass(JaxenXPath.class); you are changing the global JDOM XPath library for all JDOM users in the same JVM. This is not an OK thing to do from a JDOM API perspective. Bottom line is that there is no good way to allow the 'world' to change the default XPathFactory from inside a running JVM. Allowing the world to create a custom instance is a good compromoise, and allowing the global default instance to be changed from the command-line is also a decent compromise. The best practice would be for you to get your own instance of your own factory, then use that instance from wherever you need it. So, if you can think of a better way to allow all JDOM users (in any potential JVM use-case) to get the JDOMFactory of their choice. Based on my limited understanding of your environment, it would seem to me that having a single method on your JDOMUtil class like: private static final AtomicReference myfactory = new AtomicReference(); public static final XPathFactory instance() { final XPathFactory ret = myfactory.get(); if (ret == null) { ret = XPathFactory.newInstance("my.custom.factory.ClassName"); if (myfactory.compareAndSet(null, ret) { return ret; } return myfactory.get(); } return ret; } That way you can a single location to access your particular factory. You never have to worry about the System properties. You can change the factory at your leaisure, and 'everything just works'. If your use case is more complicated than that, there is nothing stopping you from having complete control of your factory simply by not using the newInstance(String) method at all. There is nothing stopping you from doing: public static final XPathFactory myfactory = new MyFactoryImplementation(); Oh, it is hard to keep things straight in my head between what code I have on my laptop, and what's in the alpha release, so I'll just talk from the perspective of what's on my laptop now, and what will be in the next Alpha release. Rolf On Fri, 20 Jan 2012 11:34:14 -0800, Leigh L Klotz Jr wrote: > On 01/20/2012 05:56 AM, Rolf Lear wrote: >> 2. new JDOM2 XPathFactory concept which can have different >> implementationback-ends (Jaxen, Saxon, whatever). > +1 >> >> 3. XPathFactories are thread-safe and reusable in any threads. >> > +1 >> >> 4. have a single 'default' XPathFactory instance obtainable with >> XPathFactory.instance(). The default back-end instance() can be changed >> with a system property. >> > This is causing me trouble at the moment. I have to override the > XPathFactory, to provide common function definitions and to avoid > performance problems that Java classlibrary and JAXP cause. In JDOM1 I > do this in a static class: > public class JDOMUtil { > static { > try { > XPath.setXPathClass(JaxenXPath.class); > } catch (JDOMException e) { > throw new RuntimeException(e); > } > } > > I can be assured that it works, and though I'm not sure under what > conditions it throws a checked exception, if it does throw one, it's a > system startup failure to be debugged by a system engineer. > > With JDOM 2 alpha I have to do this > > // replaced with > -Dorg.jdom2.xpath.XPathFactory=com.example.jaxen.JaxenXPath > static { > if > (!(JaxenXPath.class.getName().equals(System.getProperty(JDOMConstants.JDOM2_PROPERTY_XPATH_FACTORY)))) > > { > throw new RuntimeException(String.format("JDOM Not set up > property with -D%=%", JDOMConstants.JDOM2_PROPERTY_XPATH_FACTORY, > > JaxenXPath.class.getName())); > } > } > > Now I've got JDOM2 dependencies off in a faraway place of Java CLI, > where they can easily get lost. >> 6. Other back-ends can be used at will by calling the >> XPathFactory.newInstance(String) method (or some direct constructor on >> the >> Factory if it exposes one). > This doesn't help me fix the above problem, because all of the > ThreadLocal cache logic and pretty entrypoints into the XPath class > itself are hardwired to use the System-property defined constructor. So > they might as well not be there. > >> 5. the default 'default' back-end will continue to be Jaxen >> > > Personally, I'd prefer it if you broke this requirement up into a few > parts and made it easy to have a Jaxen backend. > For example, you might say that there's no XPath support without also > loading jdom2.jar and jdom2-jaxen.jar. > Right now, with Jaxen having JDOM1 support built in, and then JDOM2 > having Jaxen support built in, it causes a bit of circular confusion > trying to get things to work. > > If we could configure JDOM to use Saxon and have it get good performance > without unnecessary recalculations, we'd not even load Jaxen all. > > Leigh. From leigh.klotz at xerox.com Fri Jan 20 14:58:43 2012 From: leigh.klotz at xerox.com (Leigh L Klotz Jr) Date: Fri, 20 Jan 2012 14:58:43 -0800 Subject: [jdom-interest] suggested JDOM2 improvements In-Reply-To: References: <4F152CD8.5030508@uni-jena.de><4F193347.7040208@atos.net> <4F1945BF.30201@saxonica.com> <779b646e68bc8d6f49267a572345c616@tuis.net> <4F19C1B6.6060406@xerox.com> Message-ID: <4F19F1A3.2030900@xerox.com> On 01/20/2012 12:28 PM, Rolf Lear wrote: > > On the other hand, because XPathFactory instances are specified to be > thread-safe, there is nothing stopping you from doing: > > public static final XPathFactory XPATH = > XPathFactory.newInstance("com.example.xpath20.XPathFactory"); > > Then in your code you can freely use: > > XPathCompiled xp = XPATH.compile("//*"); > > ... > > The best practice would be for you to get your own instance of your own > factory, then use that instance from wherever you need it. > > I'd like to use a custom factory as you describe above, but right now, that makes all public methods on org.jdom2.xpath.XPath useless, because they use a static threadlocal factory which can only be the result of XPathFactory.newInstance(), which is the DEFAULTFACTORY from XPathFactory, which is settable only by the System property: public abstract class XPath { private static final ThreadLocal localfactory = new ThreadLocal(); public static List selectNodes(final Object context, final String path) throws JDOMException { return newInstance(path).selectNodes(context); } public static final XPath newInstance(final String path) throws JDOMException { XPathFactory fac = localfactory.get(); if (fac == null) { fac = XPathFactory.newInstance(); localfactory.set(fac); } return fac.compile(path); } } The reason I use a custom factory is to work around a performance problem with Jaxen: org.jaxen.saxpath.helpers.XPathReaderFactory.createReader() does an expensive synchronized System.getProperty() that causes concurrency bottlenecks, and it's done frequently, and there's no way to configure Jaxen or JDOM to use a specific implementation class rather than consult System.getProperty every time. To fix this, I have to split apart a whole stack of factory code from JDOM and Jaxen, just in order to get at the createReader() method. Another reason to use a custom XPath factory would be to use the JDOM API for XPath to get the work done with Saxon. So, to summarize, my complaint is that if I want to use a custom XPath factory for whatever reason (and I've given two above), I cannot use any of the XPath public static methods. Leigh. -------------- next part -------------- An HTML attachment was scrubbed... URL: From jdom at tuis.net Fri Jan 20 15:19:33 2012 From: jdom at tuis.net (Rolf Lear) Date: Fri, 20 Jan 2012 18:19:33 -0500 Subject: [jdom-interest] suggested JDOM2 improvements In-Reply-To: <4F19F1A3.2030900@xerox.com> References: <4F152CD8.5030508@uni-jena.de><4F193347.7040208@atos.net> <4F1945BF.30201@saxonica.com> <779b646e68bc8d6f49267a572345c616@tuis.net> <4F19C1B6.6060406@xerox.com> <4F19F1A3.2030900@xerox.com> Message-ID: <4F19F685.70703@tuis.net> Hi Leigh I think we are both missing something here. In JDOM2 I'm convinced that XPath is deprecated... so, while it is still in the ALPHA at the moment it will have a viable replacement by the next ALPHA. We'll make sure the replacement is 'good' for custom/other XPath backend implementations. Give me a day to polish up a proposed replacement. I think you are missing the tricks of the XPathFactory code in the current ALPHA release, but there's not much point in fighting it when it is going to change anyway. Rolf From leigh.klotz at xerox.com Fri Jan 20 16:44:12 2012 From: leigh.klotz at xerox.com (Leigh L Klotz Jr) Date: Fri, 20 Jan 2012 16:44:12 -0800 Subject: [jdom-interest] suggested JDOM2 improvements In-Reply-To: <4F19F685.70703@tuis.net> References: <4F152CD8.5030508@uni-jena.de><4F193347.7040208@atos.net> <4F1945BF.30201@saxonica.com> <779b646e68bc8d6f49267a572345c616@tuis.net> <4F19C1B6.6060406@xerox.com> <4F19F1A3.2030900@xerox.com> <4F19F685.70703@tuis.net> Message-ID: <4F1A0A5C.7060409@xerox.com> No problem, I understand now. BTW I've decided my desire to replace the XPath factory to work around a Jaxen bug is in fact a problem with Jaxen instead; it's just that I have no belief I can get the Jaxen bug fixed ever. Since you're re-working the XPath class I'll hold off on any more uninformed comments... Leigh. On 01/20/2012 03:19 PM, Rolf Lear wrote: > > Hi Leigh > > I think we are both missing something here. > > In JDOM2 I'm convinced that XPath is deprecated... so, while it is still > in the ALPHA at the moment it will have a viable replacement by the next > ALPHA. > > We'll make sure the replacement is 'good' for custom/other XPath backend > implementations. > > Give me a day to polish up a proposed replacement. I think you are > missing the tricks of the XPathFactory code in the current ALPHA > release, but there's not much point in fighting it when it is going to > change anyway. > > Rolf > -------------- next part -------------- An HTML attachment was scrubbed... URL: From jdom at tuis.net Sun Jan 22 18:13:59 2012 From: jdom at tuis.net (Rolf Lear) Date: Sun, 22 Jan 2012 21:13:59 -0500 Subject: [jdom-interest] JDOM ALPHA - Second Alpha Released Message-ID: <4F1CC267.6060006@tuis.net> Hi all. I have just pushed a second ALPHA release up to github. This release contains a new XPath API for JDOM. Please see the page https://github.com/hunterhacker/jdom/wiki/JDOM2-Feature:-XPath-Upgrade If you ae new to the alpha releases please see the wiki pages here: https://github.com/hunterhacker/jdom/wiki/JDOM2-Features For those who have played with the first alpha already, the highlights of this second alpha release are: 1. new XPath API. The first alpha release had a first-attempt at improving the XPath API. That attempt was 'reverted' completely. It has been replaced with a second attempt. This second attempt deprecates the JDOM 1.x class 'XPath', and introduces a number of new API classes. Please see https://github.com/hunterhacker/jdom/wiki/JDOM2-Feature:-XPath-Upgrade 2. the entire 'backend' of the *Outputter code has been refactored. This change should be transparent to everyone *unless* you oberride/customize some outputters (typically XMLOutputter). If you have a customised XMLOutputter then your code is basically going to need a big refactor. The changes to the Outputter implementations are not yet documented on the wiki page, but, essentially, the formatting code and the 'target' code have been completely separated. The XMLOutputter no longer has any code that deals with the 'look&feel' of the output. I expect this alpha release to generate a fair amount of discussion regarding the XPath API changes. Please take it for a test drive and make your opinions known. I expect to be putting out yet another ALPHA drop in the next week, probably with code related to memory efficiency. Thanks all. Rolf From jdom at tuis.net Mon Jan 23 08:28:37 2012 From: jdom at tuis.net (Rolf Lear) Date: Mon, 23 Jan 2012 11:28:37 -0500 Subject: [jdom-interest] JDOM and memory In-Reply-To: <4F02133C.5010704@tuis.net> References: <4F02133C.5010704@tuis.net> Message-ID: Hi all. I have started on this memory optimization, and it is still in early stages. There is one API issue though: The Element API has the two methods: addContent(Content node) addContent(Collection newContent) if I make Element implement List (which is what this memory-change will do), then the above two methods become ambiguous because Element will be both Content and List The logical thing to do would be to deprecate addContent(Collection) since the List.addAll(Collection ...) is the obvious substitute. In the interim people migrating from JDOM 1.x will have compile errors, and will have to either: 1. choose to change all addContent() calls where the content is Element to either add(element), or addAll(element) to add the element or it's content respectively... - which would make no sense because that would guarantee an exception because you cannot add an Element's content to some other element without first detaching it. The bottom line is that all the addContent* methods are equivalent to the regular List.add* methods.... and there is no ambiguity in those, it is either add(Content) or addAll(Collection...) So far the results look promising. I have a baseline memory footprint that I am aiming to improve on, and when I have results it will be easier to discuss whether the changes would be worth the improvements. But, for now, it would seem impossible to merge ContentList in to Element without some compatibility problems... Rolf On Mon, 02 Jan 2012 15:27:40 -0500, Rolf wrote: > Hi all. > > Memory optimization has never been a top priority for JDOM. At the same > time, for what it does, JDOM is not a 'terrible' memory user. Still, I > have done some analysis, and, I believe I can trim about a quarter to a > half of 'JDOM Overhead' memory usage by making two 'simple' changes.... > > The first is to merge the ContentList class in to the Element class (and > also in to Document). .... From mike at saxonica.com Mon Jan 23 08:59:35 2012 From: mike at saxonica.com (Michael Kay) Date: Mon, 23 Jan 2012 16:59:35 +0000 Subject: [jdom-interest] JDOM and memory In-Reply-To: References: <4F02133C.5010704@tuis.net> Message-ID: <4F1D91F7.5070404@saxonica.com> On 23/01/2012 16:28, Rolf Lear wrote: > Hi all. > > I have started on this memory optimization, and it is still in early > stages. There is one API issue though: > > The Element API has the two methods: > > addContent(Content node) > addContent(Collection newContent) > > if I make Element implement List (which is what this > memory-change will do), then the above two methods become ambiguous because > Element will be both Content and List And that suggests to me that it is a bad idea. The class hierarchy should reflect "is-a" relationships, it shouldn't be designed to tweak performance. It's not true that an Element and its contents are the same thing, therefore it's wrong to treat them as being the same object. It will only lead to confusion. You can achieve the memory saving by having Element.getChildren() create the returned List object dynamically (it doesn't need to copy any data to achieve this). > The logical thing to do would be to deprecate addContent(Collection) I don't think that solves the problem. There will be cases where existing code fixes up to the wrong method, and ends up adding the children of an element to a new parent rather than adding the element itself. Michael Kay Saxonica From jdom at tuis.net Mon Jan 23 09:04:48 2012 From: jdom at tuis.net (Rolf Lear) Date: Mon, 23 Jan 2012 12:04:48 -0500 Subject: [jdom-interest] JDOM and memory In-Reply-To: <4F1D91F7.5070404@saxonica.com> References: <4F02133C.5010704@tuis.net> <4F1D91F7.5070404@saxonica.com> Message-ID: <3e7472a6ab9c791fd291a7b579a42f53@tuis.net> Heh... you are right. Element should not be List, and the getContent() method can create a dynamic implementation as needed. That's the solution... Element already has rules about synchronization so multiple 'active' dynamic instances should not be a problem.... Thanks. I will play with that concept. Rolf On Mon, 23 Jan 2012 16:59:35 +0000, Michael Kay wrote: > On 23/01/2012 16:28, Rolf Lear wrote: >> Hi all. >> >> I have started on this memory optimization, and it is still in early >> stages. There is one API issue though: >> >> The Element API has the two methods: >> >> addContent(Content node) >> addContent(Collection newContent) >> >> if I make Element implement List (which is what this >> memory-change will do), then the above two methods become ambiguous >> because >> Element will be both Content and List > And that suggests to me that it is a bad idea. > > The class hierarchy should reflect "is-a" relationships, it shouldn't be > designed to tweak performance. It's not true that an Element and its > contents are the same thing, therefore it's wrong to treat them as being > the same object. It will only lead to confusion. > > You can achieve the memory saving by having Element.getChildren() create > the returned List object dynamically (it doesn't need to copy any data > to achieve this). > > > The logical thing to do would be to deprecate addContent(Collection) > > I don't think that solves the problem. There will be cases where > existing code fixes up to the wrong method, and ends up adding the > children of an element to a new parent rather than adding the element > itself. > > Michael Kay > Saxonica From jdom at tuis.net Mon Jan 23 12:15:44 2012 From: jdom at tuis.net (Rolf Lear) Date: Mon, 23 Jan 2012 15:15:44 -0500 Subject: [jdom-interest] JDOM and memory In-Reply-To: References: <4F02133C.5010704@tuis.net> <4F1D91F7.5070404@saxonica.com> <3e7472a6ab9c791fd291a7b579a42f53@tuis.net> Message-ID: <8da72d1c7408e42283b49f498eb498d7@tuis.net> Yes, it is useful. XOM has nice features, and a lot of that comes from being able to look back on things with hindsight.... which is a big advantage. JDOM is not XOM though, and JDOM carries a legacy which is both an advantage and a disadvantage. In the past (before I took on JDOM2) I have looked in to XOM, and, having been a JDOM user I could not personally justify the 'cost' of learning yet another API that accomplished the same function as JDOM.... it would require a new jar to deploy, new learning, etc. I can see that someone new to Java/XML would find XOM appealing... but is it really as good as it claims? It is hard to tell. What XOM claims to be fluff, others claim to be useful. What is interesting is reading the list of 'design principles' that I mostly agree with.... but also some that I don't. Now that I know more about JDOM it is interesting to realize that before I decided to commit myself to JDOM2, one of the considerations I had was 'should I use some other library, or should I make JDOM2 better?' I investigated XOM then, and decided it was not 'nice', and that I 'prefer' JDOM. I don't think I am alone in that logical thinking. (I also looked in to dom4j, DOM, etc.). I did not just 'decide' to do JDOM2, it really is my belief that all the other libraries are 'behind the curve' when it comes to usability in the Java5+ world. Fundamentally though, I have to change in JDOM what makes sense to change while taking in to consideration the legacy of JDOM. I think the new generics application in JDOM2 is very successful from a usability point of view, and also very compatible with legacy code. It is a 'win'. I don't think I can agree with Elliotte's comments about the generics implementation in the Collections API being so broken that it is not worth using in the XOM API. Not having a List API is one of the big reasons I didn't try XOM in the past. So, in the context of this particular mail thread, I strongly believe that JDOM is doing the right thing by using the List API, and that the COntentList is a good concept. It is just memeory hungry, and I have make one failed attempt to make it better. I think the next version will be right. But removing the List API entirely is a 'bad idea'. In general I am very happy to borrow good concepts from places, for example, I specifically looked at how XOM does the XPath implementation, and was suprised at how minimalist it is. So minimalist that it does not support the full specification... it simply does not support variables... Anyway, I think on the whole that JDOM has a reasonable balance between being comprehensive and being usable. The API on the whole is well mannered, and in the JDOM2 work I have done I have changed very little of the API (other than the XPath stuff). It is all functionally compatible with 1.x, and while there are small deviations at the technical level they are all 'replacements' (e.g. using enums instead of int-constants). Regardless, XOM provides a good comparison of functionality.... and a good measure of what's right and wrong - at least for the aspects that I have 'checked out'. I am rambling. If you have any particular concepts in XOM (or any other library) that you like you should point them out! Rolf On Mon, 23 Jan 2012 09:23:36 -0800, Joe Bowbeer wrote: > It may be useful to compare and contrast with XOM? > > http://xom.nu/designprinciples.xhtml#d0e389 > > On Mon, Jan 23, 2012 at 9:04 AM, Rolf Lear wrote: > >> >> Heh... you are right. >> >> Element should not be List, and the getContent() method can >> create a dynamic implementation as needed. That's the solution... Element >> already has rules about synchronization so multiple 'active' dynamic >> instances should not be a problem.... >> >> Thanks. I will play with that concept. >> >> Rolf >> From leigh.klotz at xerox.com Tue Jan 24 11:26:41 2012 From: leigh.klotz at xerox.com (Leigh L Klotz Jr) Date: Tue, 24 Jan 2012 11:26:41 -0800 Subject: [jdom-interest] JDOM 1.1.2 / Saxon 9.4.0.1: namespace xmlns="" could not be added as a namespace Message-ID: <4F1F05F1.9030107@xerox.com> Has anyone encountered this? It doesn't happen with JDOM 1.1.1, but it does happen with JDOM 1.1.2. Vanilla XSLT transform: Document with default namespace change and any attribute on the element: FAILS: ... Document with default namespace change and no attribute on the element: WORKS: ... Here's the error: org.jdom.IllegalAddException: The namespace xmlns="" could not be added as a namespace to "bar": The namespace prefix "" collides with the element namespace prefix at org.jdom.Element.addNamespaceDeclaration(Element.java:363) at org.jdom.input.SAXHandler.transferNamespaces(SAXHandler.java:714) at org.jdom.input.SAXHandler.startElement(SAXHandler.java:563) at net.sf.saxon.event.ContentHandlerProxy.startContent(ContentHandlerProxy.java:366) at net.sf.saxon.event.NamespaceReducer.startContent(NamespaceReducer.java:192) at net.sf.saxon.event.ComplexContentOutputter.startContent(ComplexContentOutputter.java:583) at net.sf.saxon.tree.tiny.TinyElementImpl.copy(TinyElementImpl.java:350) at net.sf.saxon.expr.instruct.CopyOf.processLeavingTail(CopyOf.java:510) at net.sf.saxon.expr.instruct.Template.applyLeavingTail(Template.java:212) at net.sf.saxon.trans.Mode.applyTemplates(Mode.java:1032) at net.sf.saxon.trans.TextOnlyCopyRuleSet.process(TextOnlyCopyRuleSet.java:58) at net.sf.saxon.trans.Mode.applyTemplates(Mode.java:1020) at net.sf.saxon.Controller.transformDocument(Controller.java:1957) at net.sf.saxon.Controller.transform(Controller.java:1803) at net.sf.saxon.s9api.XsltTransformer.transform(XsltTransformer.java:430) I'm using this code fragment to tell Saxon9 to serialize to JDOM: import net.sf.saxon.s9api.SAXDestination; import org.jdom.input.SAXHandler; import net.sf.saxon.s9api.Destination; SAXHandler saxHandler = new SAXHandler(); Destination saxDestination = new SAXDestination(saxHandler); xsltTransformer.setSource(new JDOMSource(document)); xsltTransformer.setDestination(saxDestination); xsltTransformer.transform(); If this isn't a JDOM bug, then I guess it must be a Saxon one. Leigh. From jdom at tuis.net Tue Jan 24 12:30:57 2012 From: jdom at tuis.net (Rolf Lear) Date: Tue, 24 Jan 2012 15:30:57 -0500 Subject: [jdom-interest] =?utf-8?b?SkRPTSAxLjEuMiAvIFNheG9uIDkuNC4wLjE6?= =?utf-8?q?_namespace_xmlns=3D=22=22_could_not_be_added_as_a_namespace?= In-Reply-To: <4F1F05F1.9030107@xerox.com> References: <4F1F05F1.9030107@xerox.com> Message-ID: <268d84b06cc20c3cbbe72dbec85d1672@tuis.net> Hi Leigh. I am at my office so I can't debug this issue right now... and additionally I have not played with Saxon XSLT code. but, inspecting the JDOM 1.1.2 code it is 'clear' that the Saxon code triggered the following Sax 'events': ... // maybe some other startPrefixMapping(..., ...); startPrefixMapping("", ""); // indicate that the "" prefix is linked to the "" URI startElement("http://example.com/foo", "bar", "bar", attributes); ... This is a broken chain of SAX events.... it is indicating that the "" prefix maps to "" (xmlns=""), but then loads the element in the foo namespace xmlns="http://example.com/foo" In the particular examples you cite there should be exactly one startPrefixMapping("", "") call per document and it should happen before the 'document' start element (or will it be zero calls for "","" since it is assumed... I forget). when the new element processes the 'additional' namespace xmlns="" it finds that the element itself has the "" prefix, but it is mapped to a different URI. Hence the exception. Now, as to why this is different in 1.1.2 vs. 1.1.1 I am not sure.... and that in itself is suspicious.... If you have the code in hand you can more easily debug the issue... (easier than me right now...). I can load it up in a few hours time and inspect it too. I suspect that the issue is a Saxon one, but then why the difference between 1.1.1 and 1.1.2 ... I am not sure. Rolf On Tue, 24 Jan 2012 11:26:41 -0800, Leigh L Klotz Jr wrote: > Has anyone encountered this? It doesn't happen with JDOM 1.1.1, but it > does happen with JDOM 1.1.2. > > Vanilla XSLT transform: > > > xmlns:xsl="http://www.w3.org/1999/XSL/Transform"> > > > > > > Document with default namespace change and any attribute on the element: > FAILS: > > > > > ... > > > > Document with default namespace change and no attribute on the element: > WORKS: > > > > ... > > > > Here's the error: > > org.jdom.IllegalAddException: The namespace xmlns="" could not be added > as a namespace to "bar": The namespace prefix "" collides with the > element namespace prefix > at org.jdom.Element.addNamespaceDeclaration(Element.java:363) > at org.jdom.input.SAXHandler.transferNamespaces(SAXHandler.java:714) > at org.jdom.input.SAXHandler.startElement(SAXHandler.java:563) > at > net.sf.saxon.event.ContentHandlerProxy.startContent(ContentHandlerProxy.java:366) > at > net.sf.saxon.event.NamespaceReducer.startContent(NamespaceReducer.java:192) > at > net.sf.saxon.event.ComplexContentOutputter.startContent(ComplexContentOutputter.java:583) > at > net.sf.saxon.tree.tiny.TinyElementImpl.copy(TinyElementImpl.java:350) > at > net.sf.saxon.expr.instruct.CopyOf.processLeavingTail(CopyOf.java:510) > at > net.sf.saxon.expr.instruct.Template.applyLeavingTail(Template.java:212) > at net.sf.saxon.trans.Mode.applyTemplates(Mode.java:1032) > at > net.sf.saxon.trans.TextOnlyCopyRuleSet.process(TextOnlyCopyRuleSet.java:58) > at net.sf.saxon.trans.Mode.applyTemplates(Mode.java:1020) > at net.sf.saxon.Controller.transformDocument(Controller.java:1957) > at net.sf.saxon.Controller.transform(Controller.java:1803) > at > net.sf.saxon.s9api.XsltTransformer.transform(XsltTransformer.java:430) > > > I'm using this code fragment to tell Saxon9 to serialize to JDOM: > > import net.sf.saxon.s9api.SAXDestination; > import org.jdom.input.SAXHandler; > import net.sf.saxon.s9api.Destination; > > SAXHandler saxHandler = new SAXHandler(); > Destination saxDestination = new SAXDestination(saxHandler); > xsltTransformer.setSource(new JDOMSource(document)); > xsltTransformer.setDestination(saxDestination); > xsltTransformer.transform(); > > If this isn't a JDOM bug, then I guess it must be a Saxon one. > > Leigh. > > _______________________________________________ > To control your jdom-interest membership: > http://www.jdom.org/mailman/options/jdom-interest/youraddr at yourhost.com From leigh.klotz at xerox.com Tue Jan 24 13:13:19 2012 From: leigh.klotz at xerox.com (Leigh L Klotz Jr) Date: Tue, 24 Jan 2012 13:13:19 -0800 Subject: [jdom-interest] JDOM 1.1.2 / Saxon 9.4.0.1: namespace xmlns="" could not be added as a namespace In-Reply-To: <268d84b06cc20c3cbbe72dbec85d1672@tuis.net> References: <4F1F05F1.9030107@xerox.com> <268d84b06cc20c3cbbe72dbec85d1672@tuis.net> Message-ID: <4F1F1EEF.2060603@xerox.com> Thanks, Rolf. This is more than enough analysis on your part. I appreciate it. Leigh. On 01/24/2012 12:30 PM, Rolf Lear wrote: > > Hi Leigh. > > I am at my office so I can't debug this issue right now... and > additionally I have not played with Saxon XSLT code. > > but, inspecting the JDOM 1.1.2 code it is 'clear' that the Saxon code > triggered the following Sax 'events': > > > ... > // maybe some other startPrefixMapping(..., ...); > startPrefixMapping("", ""); // indicate that the "" prefix is linked to > the "" URI > startElement("http://example.com/foo", "bar", "bar", attributes); > ... > > > This is a broken chain of SAX events.... it is indicating that the "" > prefix maps to "" (xmlns=""), but then loads the element in the foo > namespace xmlns="http://example.com/foo" > > In the particular examples you cite there should be exactly one > startPrefixMapping("", "") call per document and it should happen before > the 'document' start element (or will it be zero calls for "","" since it > is assumed... I forget). > > when the new element processes the 'additional' namespace xmlns="" it > finds that the element itself has the "" prefix, but it is mapped to a > different URI. Hence the exception. > > Now, as to why this is different in 1.1.2 vs. 1.1.1 I am not sure.... and > that in itself is suspicious.... > > If you have the code in hand you can more easily debug the issue... > (easier than me right now...). > > I can load it up in a few hours time and inspect it too. I suspect that > the issue is a Saxon one, but then why the difference between 1.1.1 and > 1.1.2 ... I am not sure. > > Rolf > > > > On Tue, 24 Jan 2012 11:26:41 -0800, Leigh L Klotz Jr > wrote: > > Has anyone encountered this? It doesn't happen with JDOM 1.1.1, but it > > does happen with JDOM 1.1.2. > > > > Vanilla XSLT transform: > > > > > > > xmlns:xsl="http://www.w3.org/1999/XSL/Transform"> > > > > > > > > > > > > Document with default namespace change and any attribute on the > element: > > > FAILS: > > > > > > > > > > ... > > > > > > > > Document with default namespace change and no attribute on the element: > > WORKS: > > > > > > > > ... > > > > > > > > Here's the error: > > > > org.jdom.IllegalAddException: The namespace xmlns="" could not be added > > as a namespace to "bar": The namespace prefix "" collides with the > > element namespace prefix > > at org.jdom.Element.addNamespaceDeclaration(Element.java:363) > > at > org.jdom.input.SAXHandler.transferNamespaces(SAXHandler.java:714) > > at org.jdom.input.SAXHandler.startElement(SAXHandler.java:563) > > at > > > net.sf.saxon.event.ContentHandlerProxy.startContent(ContentHandlerProxy.java:366) > > > at > > > net.sf.saxon.event.NamespaceReducer.startContent(NamespaceReducer.java:192) > > > at > > > net.sf.saxon.event.ComplexContentOutputter.startContent(ComplexContentOutputter.java:583) > > > at > > net.sf.saxon.tree.tiny.TinyElementImpl.copy(TinyElementImpl.java:350) > > at > > net.sf.saxon.expr.instruct.CopyOf.processLeavingTail(CopyOf.java:510) > > at > > net.sf.saxon.expr.instruct.Template.applyLeavingTail(Template.java:212) > > at net.sf.saxon.trans.Mode.applyTemplates(Mode.java:1032) > > at > > > net.sf.saxon.trans.TextOnlyCopyRuleSet.process(TextOnlyCopyRuleSet.java:58) > > > at net.sf.saxon.trans.Mode.applyTemplates(Mode.java:1020) > > at net.sf.saxon.Controller.transformDocument(Controller.java:1957) > > at net.sf.saxon.Controller.transform(Controller.java:1803) > > at > > net.sf.saxon.s9api.XsltTransformer.transform(XsltTransformer.java:430) > > > > > > I'm using this code fragment to tell Saxon9 to serialize to JDOM: > > > > import net.sf.saxon.s9api.SAXDestination; > > import org.jdom.input.SAXHandler; > > import net.sf.saxon.s9api.Destination; > > > > SAXHandler saxHandler = new SAXHandler(); > > Destination saxDestination = new SAXDestination(saxHandler); > > xsltTransformer.setSource(new JDOMSource(document)); > > xsltTransformer.setDestination(saxDestination); > > xsltTransformer.transform(); > > > > If this isn't a JDOM bug, then I guess it must be a Saxon one. > > > > Leigh. > > > > _______________________________________________ > > To control your jdom-interest membership: > > http://www.jdom.org/mailman/options/jdom-interest/youraddr at yourhost.com > -------------- next part -------------- An HTML attachment was scrubbed... URL: From jdom at tuis.net Tue Jan 24 15:34:56 2012 From: jdom at tuis.net (Rolf Lear) Date: Tue, 24 Jan 2012 18:34:56 -0500 Subject: [jdom-interest] JDOM 1.1.2 / Saxon 9.4.0.1: namespace xmlns="" could not be added as a namespace In-Reply-To: <4F1F1EEF.2060603@xerox.com> References: <4F1F05F1.9030107@xerox.com> <268d84b06cc20c3cbbe72dbec85d1672@tuis.net> <4F1F1EEF.2060603@xerox.com> Message-ID: <4F1F4020.2050006@tuis.net> Hi Leigh. I have tracked down the issue. It comes from this change I made here: https://github.com/hunterhacker/jdom/commit/f026e89780b3259fa049fd223ceaacfee16fce65 So, The Saxon code is getting the event fired from the JDOMSource.... ... which in turn is breaking the Saxon side of things ... gigo. In essence I traded one bug for another. The original bug was that namespaces used by Attributes were being 'missed' in the SAX Event stream, but now that they are checked, we need to ensure that the no-namespace namespace is excluded. It is an easy fix, but a slower process to get JDOM 1.1.3 out. Rolf On 24/01/2012 4:13 PM, Leigh L Klotz Jr wrote: > Thanks, Rolf. This is more than enough analysis on your part. I > appreciate it. > Leigh. > > On 01/24/2012 12:30 PM, Rolf Lear wrote: >> >> Hi Leigh. >> >> I am at my office so I can't debug this issue right now... and >> additionally I have not played with Saxon XSLT code. >> >> but, inspecting the JDOM 1.1.2 code it is 'clear' that the Saxon code >> triggered the following Sax 'events': >> >> >> ... >> // maybe some other startPrefixMapping(..., ...); >> startPrefixMapping("", ""); // indicate that the "" prefix is linked to >> the "" URI >> startElement("http://example.com/foo", "bar", "bar", attributes); >> ... >> >> >> This is a broken chain of SAX events.... it is indicating that the "" >> prefix maps to "" (xmlns=""), but then loads the element in the foo >> namespace xmlns="http://example.com/foo" >> >> In the particular examples you cite there should be exactly one >> startPrefixMapping("", "") call per document and it should happen before >> the 'document' start element (or will it be zero calls for "","" since it >> is assumed... I forget). >> >> when the new element processes the 'additional' namespace xmlns="" it >> finds that the element itself has the "" prefix, but it is mapped to a >> different URI. Hence the exception. >> >> Now, as to why this is different in 1.1.2 vs. 1.1.1 I am not sure.... and >> that in itself is suspicious.... >> >> If you have the code in hand you can more easily debug the issue... >> (easier than me right now...). >> >> I can load it up in a few hours time and inspect it too. I suspect that >> the issue is a Saxon one, but then why the difference between 1.1.1 and >> 1.1.2 ... I am not sure. >> >> Rolf >> >> >> >> On Tue, 24 Jan 2012 11:26:41 -0800, Leigh L Klotz Jr >> wrote: >> > Has anyone encountered this? It doesn't happen with JDOM 1.1.1, but it >> > does happen with JDOM 1.1.2. >> > >> > Vanilla XSLT transform: >> > >> > >> > > > xmlns:xsl="http://www.w3.org/1999/XSL/Transform"> >> > >> > >> > >> > >> > >> > Document with default namespace change and any attribute on the >> element: >> >> > FAILS: >> > >> > >> > >> > >> > ... >> > >> > >> > >> > Document with default namespace change and no attribute on the element: >> > WORKS: >> > >> > >> > >> > ... >> > >> > >> > >> > Here's the error: >> > >> > org.jdom.IllegalAddException: The namespace xmlns="" could not be added >> > as a namespace to "bar": The namespace prefix "" collides with the >> > element namespace prefix >> > at org.jdom.Element.addNamespaceDeclaration(Element.java:363) >> > at >> org.jdom.input.SAXHandler.transferNamespaces(SAXHandler.java:714) >> > at org.jdom.input.SAXHandler.startElement(SAXHandler.java:563) >> > at >> > >> net.sf.saxon.event.ContentHandlerProxy.startContent(ContentHandlerProxy.java:366) >> >> > at >> > >> net.sf.saxon.event.NamespaceReducer.startContent(NamespaceReducer.java:192) >> >> > at >> > >> net.sf.saxon.event.ComplexContentOutputter.startContent(ComplexContentOutputter.java:583) >> >> > at >> > net.sf.saxon.tree.tiny.TinyElementImpl.copy(TinyElementImpl.java:350) >> > at >> > net.sf.saxon.expr.instruct.CopyOf.processLeavingTail(CopyOf.java:510) >> > at >> > net.sf.saxon.expr.instruct.Template.applyLeavingTail(Template.java:212) >> > at net.sf.saxon.trans.Mode.applyTemplates(Mode.java:1032) >> > at >> > >> net.sf.saxon.trans.TextOnlyCopyRuleSet.process(TextOnlyCopyRuleSet.java:58) >> >> > at net.sf.saxon.trans.Mode.applyTemplates(Mode.java:1020) >> > at net.sf.saxon.Controller.transformDocument(Controller.java:1957) >> > at net.sf.saxon.Controller.transform(Controller.java:1803) >> > at >> > net.sf.saxon.s9api.XsltTransformer.transform(XsltTransformer.java:430) >> > >> > >> > I'm using this code fragment to tell Saxon9 to serialize to JDOM: >> > >> > import net.sf.saxon.s9api.SAXDestination; >> > import org.jdom.input.SAXHandler; >> > import net.sf.saxon.s9api.Destination; >> > >> > SAXHandler saxHandler = new SAXHandler(); >> > Destination saxDestination = new SAXDestination(saxHandler); >> > xsltTransformer.setSource(new JDOMSource(document)); >> > xsltTransformer.setDestination(saxDestination); >> > xsltTransformer.transform(); >> > >> > If this isn't a JDOM bug, then I guess it must be a Saxon one. >> > >> > Leigh. >> > >> > _______________________________________________ >> > To control your jdom-interest membership: >> > http://www.jdom.org/mailman/options/jdom-interest/youraddr at yourhost.com >> > From leigh.klotz at xerox.com Tue Jan 24 15:52:06 2012 From: leigh.klotz at xerox.com (Leigh L Klotz Jr) Date: Tue, 24 Jan 2012 15:52:06 -0800 Subject: [jdom-interest] JDOM 1.1.2 / Saxon 9.4.0.1: namespace xmlns="" could not be added as a namespace In-Reply-To: <4F1F4020.2050006@tuis.net> References: <4F1F05F1.9030107@xerox.com> <268d84b06cc20c3cbbe72dbec85d1672@tuis.net> <4F1F1EEF.2060603@xerox.com> <4F1F4020.2050006@tuis.net> Message-ID: <4F1F4426.1050608@xerox.com> Thanks! I'll stick with 1.1.1 if this isn't easily fixed. Leigh. On 01/24/2012 03:34 PM, Rolf Lear wrote: > > Hi Leigh. > > I have tracked down the issue. It comes from this change I made here: > > https://github.com/hunterhacker/jdom/commit/f026e89780b3259fa049fd223ceaacfee16fce65 > > > So, The Saxon code is getting the event fired from the JDOMSource.... > ... which in turn is breaking the Saxon side of things ... gigo. > > In essence I traded one bug for another. > > The original bug was that namespaces used by Attributes were being > 'missed' in the SAX Event stream, but now that they are checked, we need > to ensure that the no-namespace namespace is excluded. > > It is an easy fix, but a slower process to get JDOM 1.1.3 out. > > Rolf > > On 24/01/2012 4:13 PM, Leigh L Klotz Jr wrote: > > Thanks, Rolf. This is more than enough analysis on your part. I > > appreciate it. > > Leigh. > > > > On 01/24/2012 12:30 PM, Rolf Lear wrote: > >> > >> Hi Leigh. > >> > >> I am at my office so I can't debug this issue right now... and > >> additionally I have not played with Saxon XSLT code. > >> > >> but, inspecting the JDOM 1.1.2 code it is 'clear' that the Saxon code > >> triggered the following Sax 'events': > >> > >> > >> ... > >> // maybe some other startPrefixMapping(..., ...); > >> startPrefixMapping("", ""); // indicate that the "" prefix is > linked to > >> the "" URI > >> startElement("http://example.com/foo", "bar", "bar", attributes); > >> ... > >> > >> > >> This is a broken chain of SAX events.... it is indicating that the "" > >> prefix maps to "" (xmlns=""), but then loads the element in the foo > >> namespace xmlns="http://example.com/foo" > >> > >> In the particular examples you cite there should be exactly one > >> startPrefixMapping("", "") call per document and it should happen > before > >> the 'document' start element (or will it be zero calls for "","" > since it > >> is assumed... I forget). > >> > >> when the new element processes the 'additional' namespace xmlns="" it > >> finds that the element itself has the "" prefix, but it is mapped to a > >> different URI. Hence the exception. > >> > >> Now, as to why this is different in 1.1.2 vs. 1.1.1 I am not > sure.... and > >> that in itself is suspicious.... > >> > >> If you have the code in hand you can more easily debug the issue... > >> (easier than me right now...). > >> > >> I can load it up in a few hours time and inspect it too. I suspect > that > >> the issue is a Saxon one, but then why the difference between 1.1.1 > and > >> 1.1.2 ... I am not sure. > >> > >> Rolf > >> > >> > >> > >> On Tue, 24 Jan 2012 11:26:41 -0800, Leigh L Klotz Jr > >> wrote: > >> > Has anyone encountered this? It doesn't happen with JDOM 1.1.1, > but it > >> > does happen with JDOM 1.1.2. > >> > > >> > Vanilla XSLT transform: > >> > > >> > > >> > >> > xmlns:xsl="http://www.w3.org/1999/XSL/Transform"> > >> > > >> > > >> > > >> > > >> > > >> > Document with default namespace change and any attribute on the > >> element: > >> > >> > FAILS: > >> > > >> > > >> > > >> > > >> > ... > >> > > >> > > >> > > >> > Document with default namespace change and no attribute on the > element: > >> > WORKS: > >> > > >> > > >> > > >> > ... > >> > > >> > > >> > > >> > Here's the error: > >> > > >> > org.jdom.IllegalAddException: The namespace xmlns="" could not be > added > >> > as a namespace to "bar": The namespace prefix "" collides with the > >> > element namespace prefix > >> > at org.jdom.Element.addNamespaceDeclaration(Element.java:363) > >> > at > >> org.jdom.input.SAXHandler.transferNamespaces(SAXHandler.java:714) > >> > at org.jdom.input.SAXHandler.startElement(SAXHandler.java:563) > >> > at > >> > > >> > net.sf.saxon.event.ContentHandlerProxy.startContent(ContentHandlerProxy.java:366) > >> > >> > at > >> > > >> > net.sf.saxon.event.NamespaceReducer.startContent(NamespaceReducer.java:192) > >> > >> > at > >> > > >> > net.sf.saxon.event.ComplexContentOutputter.startContent(ComplexContentOutputter.java:583) > >> > >> > at > >> > > net.sf.saxon.tree.tiny.TinyElementImpl.copy(TinyElementImpl.java:350) > >> > at > >> > > net.sf.saxon.expr.instruct.CopyOf.processLeavingTail(CopyOf.java:510) > >> > at > >> > > net.sf.saxon.expr.instruct.Template.applyLeavingTail(Template.java:212) > >> > at net.sf.saxon.trans.Mode.applyTemplates(Mode.java:1032) > >> > at > >> > > >> > net.sf.saxon.trans.TextOnlyCopyRuleSet.process(TextOnlyCopyRuleSet.java:58) > >> > >> > at net.sf.saxon.trans.Mode.applyTemplates(Mode.java:1020) > >> > at > net.sf.saxon.Controller.transformDocument(Controller.java:1957) > >> > at net.sf.saxon.Controller.transform(Controller.java:1803) > >> > at > >> > > net.sf.saxon.s9api.XsltTransformer.transform(XsltTransformer.java:430) > >> > > >> > > >> > I'm using this code fragment to tell Saxon9 to serialize to JDOM: > >> > > >> > import net.sf.saxon.s9api.SAXDestination; > >> > import org.jdom.input.SAXHandler; > >> > import net.sf.saxon.s9api.Destination; > >> > > >> > SAXHandler saxHandler = new SAXHandler(); > >> > Destination saxDestination = new SAXDestination(saxHandler); > >> > xsltTransformer.setSource(new JDOMSource(document)); > >> > xsltTransformer.setDestination(saxDestination); > >> > xsltTransformer.transform(); > >> > > >> > If this isn't a JDOM bug, then I guess it must be a Saxon one. > >> > > >> > Leigh. > >> > > >> > _______________________________________________ > >> > To control your jdom-interest membership: > >> > > http://www.jdom.org/mailman/options/jdom-interest/youraddr at yourhost.com > >> > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From jdom at tuis.net Wed Jan 25 06:42:27 2012 From: jdom at tuis.net (Rolf Lear) Date: Wed, 25 Jan 2012 09:42:27 -0500 Subject: [jdom-interest] JDOM 1.x release schedule Message-ID: <98cae462f08b0dcd7161b15737854dfc@tuis.net> Hi All. Given the bug fix pending in the JDOM 1.1.x stream I believe a new release of the JDOM 1.1.x is required. On the other hand I do not want to be releasing 1.1.x versions for every issue that arises. I think a compromise schedule is viable, and it goes something like this: 1. build the current JDOM 1.x stream with the current bug fix in it and post it on the github download page. Call it JDOM.1.1.x.hotfix.2012.01.25.zip 2. if any additional bug fixes are needed another hotfix package will be built. 3. at some fixed point in time we schedule a formal 1.1.3 release that contains all the fixes. 4. if any bug comes up that is considered to be 'critical' an earlier-than-schedule release could be made. In this case, I think 1st March 2012 is a good candidate date... 5 weeks from now. Later today I will build the current JDOM 1.x code base as JDOM.1.1.x.hotfix.2012.01.25 and I will post it to github. If any other issues arise I will create hotfix updates to address them. On March 1st I will rebuild the JDOM code again as 1.1.3 and do the formal release process to www.jdom.org as well as maven-central. Does this sound like a viable process? Rolf From olivier.jaquemet at jalios.com Wed Jan 25 06:57:44 2012 From: olivier.jaquemet at jalios.com (Olivier Jaquemet) Date: Wed, 25 Jan 2012 15:57:44 +0100 Subject: [jdom-interest] JDOM 1.x release schedule In-Reply-To: <98cae462f08b0dcd7161b15737854dfc@tuis.net> References: <98cae462f08b0dcd7161b15737854dfc@tuis.net> Message-ID: <4F201868.50002@jalios.com> Hi Rolf, This process sounds good to me. It does provide a valid and official build for people needing quick fixes. But other users looking for a more "long term support" release are thus not required to update too often. Olivier On 25/01/2012 15:42, Rolf Lear wrote: > Hi All. > > Given the bug fix pending in the JDOM 1.1.x stream I believe a new release > of the JDOM 1.1.x is required. > > On the other hand I do not want to be releasing 1.1.x versions for every > issue that arises. > > I think a compromise schedule is viable, and it goes something like this: > > 1. build the current JDOM 1.x stream with the current bug fix in it and > post it on the github download page. Call it > JDOM.1.1.x.hotfix.2012.01.25.zip > 2. if any additional bug fixes are needed another hotfix package will be > built. > 3. at some fixed point in time we schedule a formal 1.1.3 release that > contains all the fixes. > 4. if any bug comes up that is considered to be 'critical' an > earlier-than-schedule release could be made. > > In this case, I think 1st March 2012 is a good candidate date... 5 weeks > from now. > > Later today I will build the current JDOM 1.x code base as > JDOM.1.1.x.hotfix.2012.01.25 and I will post it to github. > If any other issues arise I will create hotfix updates to address them. > On March 1st I will rebuild the JDOM code again as 1.1.3 and do the formal > release process to www.jdom.org as well as maven-central. > > Does this sound like a viable process? > > Rolf > _______________________________________________ > To control your jdom-interest membership: > http://www.jdom.org/mailman/options/jdom-interest/youraddr at yourhost.com > -- Olivier Jaquemet Ing?nieur R&D Jalios S.A. - http://www.jalios.com/ @OlivierJaquemet +33970461480 From jdom at tuis.net Sat Jan 28 07:24:20 2012 From: jdom at tuis.net (Rolf Lear) Date: Sat, 28 Jan 2012 10:24:20 -0500 Subject: [jdom-interest] JDOM 1.x release schedule In-Reply-To: <98cae462f08b0dcd7161b15737854dfc@tuis.net> References: <98cae462f08b0dcd7161b15737854dfc@tuis.net> Message-ID: <4F241324.8090406@tuis.net> Hi all. I believe 1.1.2 is a more reliable version of JDOM than 1.1.1. Unfortunately there is already one known new issue 1.1.2 related to people using the SAXOutputter (which is used in XML Transformations). This issue will be resolved in 1.1.3. Until 1.1.3 is released though there is a 'hotfix' for this issue here: https://github.com/hunterhacker/jdom/downloads Download the jdom-1.1.2.hf1.zip file. The direct link is: https://github.com/downloads/hunterhacker/jdom/jdom-1.1.2.hf1.zip This zip file is in the same format that you would normally download from www.jdom.org. If any other issues come up with 1.1.2 they will be fixed and released as a second hotfix package. All issues found and fixed before 1 March 2012 will be accumulated and released as a 1.1.3 on that date. If you are currently running with 1.1.2 please continue to do so. If you run in to any issues please report them here on this list, check the open and recently fixed issues on github of issues found in 1.1.2: https://github.com/hunterhacker/jdom/issues?labels=found+in+1.1.2 Despite recent evidence to the contrary, I do believe that 1.1.2 is more stable than 1.1.1. If you have an issue in 1.1.2 and it has been resolved in the issues list above, then please use the most recent 1.1.2 hotfix Jar from the downloads page. Thanks. Rolf On 25/01/2012 9:42 AM, Rolf Lear wrote: > > Hi All. > > Given the bug fix pending in the JDOM 1.1.x stream I believe a new release > of the JDOM 1.1.x is required. > > On the other hand I do not want to be releasing 1.1.x versions for every > issue that arises. > From jdom at tuis.net Sat Jan 28 08:38:32 2012 From: jdom at tuis.net (Rolf Lear) Date: Sat, 28 Jan 2012 11:38:32 -0500 Subject: [jdom-interest] JDOM and memory In-Reply-To: <4F02133C.5010704@tuis.net> References: <4F02133C.5010704@tuis.net> Message-ID: <4F242488.4000708@tuis.net> Hi All ... An update... I have played with a number of options, and have not had significant success with any. Merging Content-list in to Element has a number of problems: 1. Document and Element end up duplicating a lot of code 2. It changes the API of Document and Element with it implementing List Document and Element almost always contain content... it is seldom that you have empty Elements (there is normally some text at least). As a result, the savings of not having to have a content array are limited. There can be some saving in not having a separate object as the list, but it does not amount to much. Given the issues with the API this approach does not make sense. Michael Kay suggested keeping the ContentList independent of the Element, and creating an instance when it was referenced in getContent(). The problem with this is that the management of ConcurrentModification becomes very complicated, and, as far as I can tell, essentially impossible if there are multiple differet instances of the ContentList class for any particular Element. Given that almost all Element instances have content, it is not worth the effort to lose the ConcurrentModification control, and not actually save any memory in a typical use case. So, neither option for changing the ContentList system is very successful. On the other hand, it is relatively common to have no Attributes on an Element, and some careful changes to the Element class (adding a hasAttributes() method and making the AttributeList variable a 'lazy' initialised field) this means that in ideal cases we never need to actually create an AttributeList instance for the Element. This has a significant impact on the 'hamlet' test, where there are essentially no attributes. It has no 'negative' impact on memory in the worst case either, and it has positive (small but significant) impact on performance. So, the lazy initialization of AttributeList is a 'win'. Finally, I have in the past had some success with the concept of 'reusing' String values. XML Parsers (like SAX, etc.) typically create a new String instance for all the variables they pass. For example, the Element names, prefixes, etc. are all new instances of String. Thus, if you have hundreds of Elements called 'car' in your input XML, you will get hundreds of different String Element names with the value 'car'. I have built a class that does something similar to String.intern() in order to rationalize the hundreds of different-but-equals() values that are passed in by the parsers. I have incorporated this 'caching' class in to a new JDOMFactory called 'SlimJDOMFactory'. This factory 'normalizes' all String values to a single instance of each unique String value. This significantly reduces the amount of memory used in the JDOM tree especially if there are lots of: similarly named attributes, elements, white-space-padding in otherwise empty elements, or between elements. This process is significantly slower through... For example, with the 'hamlet' test case, the 'baseline' memory footprint for hamlet in JDOM is 2.27MB in 4.75ms. With the SlimJDOMFactory it is: 1.77MB in 8.5ms With Lazy AttributeList it is: 2.06MB in 4.55ms With the both it is 1.57MB in 8.3ms I am pushing both of these changes in to github. The AttributeList is an easy one to justify. It is fully compatible with prior code, it has positive memory and perfomance impacts. The SlimJDOMFactory is also justifiable when you consider: 1. the user has to decide to use it specifically. 2. The memory saving can be very significant. 3. Even though the parse time is slower, the GC time savings can be significant if the document 'hangs around' for a long time - the quicker GC time can add up fast. 4. When you have lots of code doing comparisons it is much faster to do equals() calls on Strings that are == as well. It saves a hashCode calculation as well as a string character scan to prove equals(). Rolf On 02/01/2012 3:27 PM, Rolf wrote: > Hi all. > > Memory optimization has never been a top priority for JDOM. At the same > time, for what it does, JDOM is not a 'terrible' memory user. Still, I > have done some analysis, and, I believe I can trim about a quarter to a > half of 'JDOM Overhead' memory usage by making two 'simple' changes.... > > The first is to merge the ContentList class in to the Element class (and > also in to Document). This will reduce the number of Java objects by > about half, and that will save about 32 bytes per Element at a minimum > in a 64-bit JRE. Additionally, by lazy-initialization of the Content > array, we can save memory on otherwise 'empty' Elements. > > This can be done by extending the Element (and perhaps Document) class > to extend 'List'. It can all be done in a 'backward compatible' way, but > also leads to some interesting possibilities, like: > > for (Content c : element) { > ... do something > } > > (for backward compatibility, Element.getContent() will return 'this'). > > > The second change is to make the AttributeList instance in Element a > lazy-initialization. This would save memory on all Elements that have no > attributes, but would have an impact for people who sub-class the > Element class and may expect the attributes field to be non-null. > > > I am trying to get a feel for how important this sort of optimization > may be. If there is interest then I will make some changes, and test the > impact. I may make a separate branch in github to test it out.... > > If the above changes are unrealistic then I don't think it makes sense > to even try.... > > Rolf > _______________________________________________ > To control your jdom-interest membership: > http://www.jdom.org/mailman/options/jdom-interest/youraddr at yourhost.com > From mike at saxonica.com Sat Jan 28 10:37:43 2012 From: mike at saxonica.com (Michael Kay) Date: Sat, 28 Jan 2012 18:37:43 +0000 Subject: [jdom-interest] JDOM and memory In-Reply-To: <4F242488.4000708@tuis.net> References: <4F02133C.5010704@tuis.net> <4F242488.4000708@tuis.net> Message-ID: <4F244077.9050901@saxonica.com> > > > Finally, I have in the past had some success with the concept of > 'reusing' String values. XML Parsers (like SAX, etc.) typically create > a new String instance for all the variables they pass. For example, > the Element names, prefixes, etc. are all new instances of String. > Thus, if you have hundreds of Elements called 'car' in your input XML, > you will get hundreds of different String Element names with the value > 'car'. I have built a class that does something similar to > String.intern() in order to rationalize the hundreds of > different-but-equals() values that are passed in by the parsers. Have you measured how your optimization compares with the effect of setting the http://xml.org/sax/features/string-interning property on the SAX parser? Are you doing the interning in a way that guarantees strings can be compared using "==", and if so, are you taking advantage of this when doing the comparisons? .The big win comes with XPath searches such as //x. Does the interning introduce any synchronization? (This is the big disadvantage with Saxon's NamePool - it speeds up XPath searching substantially, but the contention in a highly concurrent workload can become quite significant.) Are you pooling the QName as a whole, or the local name, prefix and URI separately? Michael Kay Saxonica > > I have incorporated this 'caching' class in to a new JDOMFactory > called 'SlimJDOMFactory'. This factory 'normalizes' all String values > to a single instance of each unique String value. This significantly > reduces the amount of memory used in the JDOM tree especially if there > are lots of: similarly named attributes, elements, white-space-padding > in otherwise empty elements, or between elements. This process is > significantly slower through... > > For example, with the 'hamlet' test case, the 'baseline' memory > footprint for hamlet in JDOM is 2.27MB in 4.75ms. > With the SlimJDOMFactory it is: 1.77MB in 8.5ms > With Lazy AttributeList it is: 2.06MB in 4.55ms > With the both it is 1.57MB in 8.3ms > > I am pushing both of these changes in to github. The AttributeList is > an easy one to justify. It is fully compatible with prior code, it has > positive memory and perfomance impacts. > > The SlimJDOMFactory is also justifiable when you consider: > 1. the user has to decide to use it specifically. > 2. The memory saving can be very significant. > 3. Even though the parse time is slower, the GC time savings can be > significant if the document 'hangs around' for a long time - the > quicker GC time can add up fast. > 4. When you have lots of code doing comparisons it is much faster to > do equals() calls on Strings that are == as well. It saves a hashCode > calculation as well as a string character scan to prove equals(). > > Rolf > > On 02/01/2012 3:27 PM, Rolf wrote: >> Hi all. >> >> Memory optimization has never been a top priority for JDOM. At the same >> time, for what it does, JDOM is not a 'terrible' memory user. Still, I >> have done some analysis, and, I believe I can trim about a quarter to a >> half of 'JDOM Overhead' memory usage by making two 'simple' changes.... >> >> The first is to merge the ContentList class in to the Element class (and >> also in to Document). This will reduce the number of Java objects by >> about half, and that will save about 32 bytes per Element at a minimum >> in a 64-bit JRE. Additionally, by lazy-initialization of the Content >> array, we can save memory on otherwise 'empty' Elements. >> >> This can be done by extending the Element (and perhaps Document) class >> to extend 'List'. It can all be done in a 'backward compatible' way, but >> also leads to some interesting possibilities, like: >> >> for (Content c : element) { >> ... do something >> } >> >> (for backward compatibility, Element.getContent() will return 'this'). >> >> >> The second change is to make the AttributeList instance in Element a >> lazy-initialization. This would save memory on all Elements that have no >> attributes, but would have an impact for people who sub-class the >> Element class and may expect the attributes field to be non-null. >> >> >> I am trying to get a feel for how important this sort of optimization >> may be. If there is interest then I will make some changes, and test the >> impact. I may make a separate branch in github to test it out.... >> >> If the above changes are unrealistic then I don't think it makes sense >> to even try.... >> >> Rolf >> _______________________________________________ >> To control your jdom-interest membership: >> http://www.jdom.org/mailman/options/jdom-interest/youraddr at yourhost.com >> > > _______________________________________________ > To control your jdom-interest membership: > http://www.jdom.org/mailman/options/jdom-interest/youraddr at yourhost.com > From jdom at tuis.net Sat Jan 28 11:42:02 2012 From: jdom at tuis.net (Rolf Lear) Date: Sat, 28 Jan 2012 14:42:02 -0500 Subject: [jdom-interest] JDOM and memory In-Reply-To: <4F244077.9050901@saxonica.com> References: <4F02133C.5010704@tuis.net> <4F242488.4000708@tuis.net> <4F244077.9050901@saxonica.com> Message-ID: <4F244F8A.5020709@tuis.net> On 28/01/2012 1:37 PM, Michael Kay wrote: > >> >> >> Finally, I have in the past had some success with the concept of >> 'reusing' String values. XML Parsers (like SAX, etc.) typically create >> a new String instance for all the variables they pass. For example, >> the Element names, prefixes, etc. are all new instances of String. >> Thus, if you have hundreds of Elements called 'car' in your input XML, >> you will get hundreds of different String Element names with the value >> 'car'. I have built a class that does something similar to >> String.intern() in order to rationalize the hundreds of >> different-but-equals() values that are passed in by the parsers. > Have you measured how your optimization compares with the effect of > setting the http://xml.org/sax/features/string-interning property on the > SAX parser? > > Are you doing the interning in a way that guarantees strings can be > compared using "==", and if so, are you taking advantage of this when > doing the comparisons? .The big win comes with XPath searches such as > //x. Does the interning introduce any synchronization? (This is the big > disadvantage with Saxon's NamePool - it speeds up XPath searching > substantially, but the contention in a highly concurrent workload can > become quite significant.) > > Are you pooling the QName as a whole, or the local name, prefix and URI > separately? > > Michael Kay > Saxonica Hi Michael, In answer to your questions... no, I have not compared against string-interning property. I was not aware of that. But, reading the documentation, it says: All element names, prefixes, attribute names, Namespace URIs, and local names are internalized using java.lang.String.intern. This is *not* a good thing. String.intern() uses PermGen space to intern the value (as if the value is a String constant in the code). PermGen space is typically limited to a hundred or so megabytes. I have, in the past, run in to significant issues where you get OutOfMemory issues when String.intern is used liberally.... and changing -Xmx makes no difference... very confusing the first time you run in to that.... So, I have not compared, to string-intern of the SAX parser. And I would not recommend that people use that unless they know what they are doing, and what sort of data they have. The mechanism I do use is based on previous experience with this sort of problem, and it works by doing a memory-efficient hash-table to store unique instances of String. Subsequent lookups in to the hash table return the previously stored string value, if any. Because the hash-table is not a global hash table, and because it is not linked in to any core Java structures, you cannot guarantee == based comparisons, but, in many cases, the String.equals() returns immediately because you are in fact comparing identical instances and the first linke of String.equals() does the == comparison. My method does not use any synchronization, and I expect each JDOM builder to have it's own cache, possibly for the duration of a single parse only. It makes a difference on small-scale items only. I have in the past built a thread-safe and 'global' type cache using similar principles, and it is a good concept, but it would be overkill for here. With JDOM in particular you do not want large memory structures hanging around... and limiting this cache to a single builder is about the right sort of compromise. Further, because I have implemented in a new JDOMFactory implementation, it is easy for the JDOM user to manage how long it lives for, and they can call the SlimJDOMFactory.clearCache() to remove any previously cached String instances. In other words, the JDOM user can use it as much or as little as they want ( but not concurrently) In my testing the Jaxen-based XPath expressions are in fact faster with the 'cached' string values ... about 1ms faster on a 30ms process... not very significant (not significant enough to be purely attributable to that ...). So, it is a single-threaded cache that reuses previously cached values. It can be applied to a single, or consecutive processes, and the cache itself is available outside the SlimJDOMFactory if people want to borrow that code in their own way. In my experience, the benefit of this sort of caching is most obvious in a GC - monitored environment where the GC times can be substantially shortened.... but not easily measured. Rolf From jdom at tuis.net Sat Jan 28 14:02:32 2012 From: jdom at tuis.net (Rolf Lear) Date: Sat, 28 Jan 2012 17:02:32 -0500 Subject: [jdom-interest] JDOM and memory In-Reply-To: <45B3B70B-6BB9-4D18-A3D9-5B5844948B9D@hoplahup.net> References: <4F02133C.5010704@tuis.net> <4F242488.4000708@tuis.net> <4F244077.9050901@saxonica.com> <4F244F8A.5020709@tuis.net> <45B3B70B-6BB9-4D18-A3D9-5B5844948B9D@hoplahup.net> Message-ID: <4F247078.2050102@tuis.net> public class OOM { public static void main(String[] args) { int i = 0; String[] strings = new String[10000000]; try { while (true) { i++; strings[i] = ("Number " + i).intern(); if (0 == (i % 100000)) { System.out.println(strings[i]); } } } catch (Throwable t) { System.out.println("Last was " + i); } } } ..... Number 700000 Number 800000 Number 900000 Exception in thread "RMI TCP Connection(idle)" java.lang.OutOfMemoryError: PermGen space Last was 984460 I had to store the result in the 'strings' array... I learned something ... Java 6 does GC in the perm-gen space.... I watched it clearing out the values in the JVisualVM monitor.... but keeping a reference to the intern'd string causes OOM as expected. In many places 1,000,000 strings is not a lot.... Rolf On 28/01/2012 4:17 PM, Paul Libbrecht wrote: > Interesting, > > the very first thing I did when writing OmdocJdom, a library with > subclasses for each element type, is to use string-interning. I do not > believe you can reach Out-Of-Memory by having such a diversity in > element names, prefixes, etc... unless you are building a kind of super > generic editor or modifier. 100Mb of strings is quite a lot (far more > than all DTDs I've been touching thus far in my life I believe). We > never ran into OOM for this (but with Lucene we did). > > paul > > > Le 28 janv. 2012 ? 20:42, Rolf Lear a ?crit : > >> no, I have not compared against string-interning property. I was not >> aware of that. But, reading the documentation, it says: All element >> names, prefixes, attribute names, Namespace URIs, and local names are >> internalized using java.lang.String.intern. >> >> This is *not* a good thing. String.intern() uses PermGen space to >> intern the value (as if the value is a String constant in the code). >> PermGen space is typically limited to a hundred or so megabytes. I >> have, in the past, run in to significant issues where you get >> OutOfMemory issues when String.intern is used liberally.... and >> changing -Xmx makes no difference... very confusing the first time you >> run in to that.... >> >> So, I have not compared, to string-intern of the SAX parser. And I >> would not recommend that people use that unless they know what they >> are doing, and what sort of data they have. > From mike at saxonica.com Sat Jan 28 14:31:20 2012 From: mike at saxonica.com (Michael Kay) Date: Sat, 28 Jan 2012 22:31:20 +0000 Subject: [jdom-interest] JDOM and memory In-Reply-To: <4F247078.2050102@tuis.net> References: <4F02133C.5010704@tuis.net> <4F242488.4000708@tuis.net> <4F244077.9050901@saxonica.com> <4F244F8A.5020709@tuis.net> <45B3B70B-6BB9-4D18-A3D9-5B5844948B9D@hoplahup.net> <4F247078.2050102@tuis.net> Message-ID: <4F247738.9080207@saxonica.com> > In many places 1,000,000 strings is not a lot.... > The Saxon NamePool is optimized for much lower numbers than this: it's rare to have more than a couple of thousand element and attribute names. The only time I've seen large numbers reached is with pathological applications that generate random namespace prefixes. Michael Kay Saxonica From jdom at tuis.net Sat Jan 28 14:56:47 2012 From: jdom at tuis.net (Rolf Lear) Date: Sat, 28 Jan 2012 17:56:47 -0500 Subject: [jdom-interest] JDOM and memory In-Reply-To: References: <4F02133C.5010704@tuis.net> <4F242488.4000708@tuis.net> <4F244077.9050901@saxonica.com> <4F244F8A.5020709@tuis.net> <45B3B70B-6BB9-4D18-A3D9-5B5844948B9D@hoplahup.net> <4F247078.2050102@tuis.net> Message-ID: <4F247D2F.2000803@tuis.net> (I did a reply, not reply all, so it did not go to the list). I disagree.... it is element names, attribute names, and in the case of SlimJDOMFactory, it is the XML Text content (whitespace padding between elements is ripe for reuse). If you put JDOM in something like a TomCat server with long-running applications, the PermGen space dies pretty fast.... especially with the way that tomcat has multiple classloaders, etc. I consider it to be bad practice for a library to make routine use of the PermGen space. I modified the example slightly... added timing to it... and then I compared it to the StringBin tool I built.... ;-) Here are the two code examples: public class OOM { public static void main(String[] args) { int i = 0; String[] strings = new String[10000000]; long time = System.currentTimeMillis(); try { while (true) { i++; strings[i] = ("Number " + i).intern(); if (0 == (i % 100000)) { System.out.printf("%s at %.4f/ms\n", strings[i], (1.0 * i) / (System.currentTimeMillis() - time)); } } } catch (Error t) { System.out.println("Last was " + i); throw t; } } } and second example: public class OOMSB { public static void main(String[] args) { int i = 0; String[] strings = new String[10000000]; StringBin sb = new StringBin(); long time = System.currentTimeMillis(); try { while (true) { i++; strings[i] = sb.reuse("Number " + i); if (0 == (i % 100000)) { System.out.printf("%s at %.4f/ms\n", strings[i], (1.0 * i) / (System.currentTimeMillis() - time)); } } } catch (Error t) { System.out.println("Last was " + i); throw t; } } } The String.intern() fails at: Number 500000 at 99.1080/ms Number 600000 at 79.0306/ms Number 700000 at 65.1405/ms Number 800000 at 54.7608/ms Number 900000 at 46.9851/ms Number 1000000 at 40.9920/ms Last was 1043637 Exception in thread "main" java.lang.OutOfMemoryError: PermGen space at java.lang.String.intern(Native Method) at net.tuis.debug.OOM.main(OOM.java:12) The StringBin version fails at..... Number 9500000 at 693.2788/ms Number 9600000 at 697.7758/ms Number 9700000 at 701.9829/ms Number 9800000 at 706.4081/ms Number 9900000 at 596.7810/ms Exception in thread "main" java.lang.ArrayIndexOutOfBoundsException: 10000000 at net.tuis.debug.OOMSB.main(OOMSB.java:15) Another reason to not use String.intern..... it is slow... ;-) Rolf On 28/01/2012 5:08 PM, Paul Libbrecht wrote: > > > > Rolf Lear a ?crit : > >> ... Java 6 does GC in the perm-gen space.... I watched it clearing out >> the values in the JVisualVM monitor.... but keeping a reference to the >> intern'd string causes OOM as expected. > > very cute example, thanks for that! > >> In many places 1,000,000 strings is not a lot.... > > I agree, but not in element names! > > paul > From jdom at tuis.net Sat Jan 28 16:46:52 2012 From: jdom at tuis.net (Rolf Lear) Date: Sat, 28 Jan 2012 19:46:52 -0500 Subject: [jdom-interest] JDOM and memory In-Reply-To: References: <4F02133C.5010704@tuis.net> <4F242488.4000708@tuis.net> Message-ID: <4F2496FC.30307@tuis.net> Hi Joe. Thanks for that. I have run in to the problem before with the backing array not being the same as the actual String content. In the StringBin code I specifically account for that: https://github.com/hunterhacker/jdom/blob/master/core/src/java/org/jdom2/util/StringBin.java#L371 In essence, it ensures he String is as compact as possible. Rolf On 28/01/2012 7:10 PM, Joe Bowbeer wrote: > A per-document string pool is a feature of binary xml formats. > > A potential problem with per-factory string pooling is the possibility > of retaining large character arrays. Android's String class description > explains the problem: > > This class is implemented using a char[]. The length of the array > may exceed the length of the string. For example, the string "Hello" > may be backed by the array |['H', 'e', 'l', 'l', 'o', 'W'. 'o', 'r', > 'l', 'd']| with offset 0 and length 5. > Multiple strings can share the same char[] because strings are > immutable. The |substring(int) > | method > *always* returns a string that shares the backing array of its > source string. Generally this is an optimization: fewer character > arrays need to be allocated, and less copying is necessary. But this > can also lead to unwanted heap retention. Taking a short substring > of long string means that the long shared char[] won't be garbage > until both strings are garbage. This typically happens when parsing > small substrings out of a large input. To avoid this where > necessary, call |new String(longString.subString(...))|. The string > copy constructor always ensures that the backing array is no larger > than necessary. > > > ...from http://developer.android.com/reference/java/lang/String.html > > If xml parsers create new strings, is it to avoid retaining the entire > source document? > > I suggest choosing a name for the Slim factory that is more descriptive > of what it does, as "slim" may depend on taste and application. > > Joe > > On Sat, Jan 28, 2012 at 8:38 AM, Rolf Lear wrote: > > Hi All ... An update... > > I have played with a number of options, and have not had significant > success with any. > > Merging Content-list in to Element has a number of problems: > 1. Document and Element end up duplicating a lot of code > 2. It changes the API of Document and Element with it implementing > List > > Document and Element almost always contain content... it is seldom > that you have empty Elements (there is normally some text at least). > As a result, the savings of not having to have a content array are > limited. > > There can be some saving in not having a separate object as the > list, but it does not amount to much. Given the issues with the API > this approach does not make sense. > > Michael Kay suggested keeping the ContentList independent of the > Element, and creating an instance when it was referenced in > getContent(). The problem with this is that the management of > ConcurrentModification becomes very complicated, and, as far as I > can tell, essentially impossible if there are multiple differet > instances of the ContentList class for any particular Element. Given > that almost all Element instances have content, it is not worth the > effort to lose the ConcurrentModification control, and not actually > save any memory in a typical use case. > > So, neither option for changing the ContentList system is very > successful. > > On the other hand, it is relatively common to have no Attributes on > an Element, and some careful changes to the Element class (adding a > hasAttributes() method and making the AttributeList variable a > 'lazy' initialised field) this means that in ideal cases we never > need to actually create an AttributeList instance for the Element. > This has a significant impact on the 'hamlet' test, where there are > essentially no attributes. It has no 'negative' impact on memory in > the worst case either, and it has positive (small but significant) > impact on performance. > > So, the lazy initialization of AttributeList is a 'win'. > > Finally, I have in the past had some success with the concept of > 'reusing' String values. XML Parsers (like SAX, etc.) typically > create a new String instance for all the variables they pass. For > example, the Element names, prefixes, etc. are all new instances of > String. Thus, if you have hundreds of Elements called 'car' in your > input XML, you will get hundreds of different String Element names > with the value 'car'. I have built a class that does something > similar to String.intern() in order to rationalize the hundreds of > different-but-equals() values that are passed in by the parsers. > > I have incorporated this 'caching' class in to a new JDOMFactory > called 'SlimJDOMFactory'. This factory 'normalizes' all String > values to a single instance of each unique String value. This > significantly reduces the amount of memory used in the JDOM tree > especially if there are lots of: similarly named attributes, > elements, white-space-padding in otherwise empty elements, or > between elements. This process is significantly slower through... > > For example, with the 'hamlet' test case, the 'baseline' memory > footprint for hamlet in JDOM is 2.27MB in 4.75ms. > With the SlimJDOMFactory it is: 1.77MB in 8.5ms > With Lazy AttributeList it is: 2.06MB in 4.55ms > With the both it is 1.57MB in 8.3ms > > I am pushing both of these changes in to github. The AttributeList > is an easy one to justify. It is fully compatible with prior code, > it has positive memory and perfomance impacts. > > The SlimJDOMFactory is also justifiable when you consider: > 1. the user has to decide to use it specifically. > 2. The memory saving can be very significant. > 3. Even though the parse time is slower, the GC time savings can be > significant if the document 'hangs around' for a long time - the > quicker GC time can add up fast. > 4. When you have lots of code doing comparisons it is much faster to > do equals() calls on Strings that are == as well. It saves a > hashCode calculation as well as a string character scan to prove > equals(). > > Rolf > > > On 02/01/2012 3:27 PM, Rolf wrote: > > Hi all. > > Memory optimization has never been a top priority for JDOM. At > the same > time, for what it does, JDOM is not a 'terrible' memory user. > Still, I > have done some analysis, and, I believe I can trim about a > quarter to a > half of 'JDOM Overhead' memory usage by making two 'simple' > changes.... > > The first is to merge the ContentList class in to the Element > class (and > also in to Document). This will reduce the number of Java objects by > about half, and that will save about 32 bytes per Element at a > minimum > in a 64-bit JRE. Additionally, by lazy-initialization of the Content > array, we can save memory on otherwise 'empty' Elements. > > This can be done by extending the Element (and perhaps Document) > class > to extend 'List'. It can all be done in a 'backward compatible' > way, but > also leads to some interesting possibilities, like: > > for (Content c : element) { > ... do something > } > > (for backward compatibility, Element.getContent() will return > 'this'). > > > The second change is to make the AttributeList instance in Element a > lazy-initialization. This would save memory on all Elements that > have no > attributes, but would have an impact for people who sub-class the > Element class and may expect the attributes field to be non-null. > > > I am trying to get a feel for how important this sort of > optimization > may be. If there is interest then I will make some changes, and > test the > impact. I may make a separate branch in github to test it out.... > > If the above changes are unrealistic then I don't think it makes > sense > to even try.... > > Rolf > > > > _______________________________________________ > To control your jdom-interest membership: > http://www.jdom.org/mailman/options/jdom-interest/youraddr at yourhost.com From jdom at tuis.net Sat Jan 28 16:49:18 2012 From: jdom at tuis.net (Rolf Lear) Date: Sat, 28 Jan 2012 19:49:18 -0500 Subject: [jdom-interest] JDOM and memory In-Reply-To: <4F247738.9080207@saxonica.com> References: <4F02133C.5010704@tuis.net> <4F242488.4000708@tuis.net> <4F244077.9050901@saxonica.com> <4F244F8A.5020709@tuis.net> <45B3B70B-6BB9-4D18-A3D9-5B5844948B9D@hoplahup.net> <4F247078.2050102@tuis.net> <4F247738.9080207@saxonica.com> Message-ID: <4F24978E.4070201@tuis.net> On 28/01/2012 5:31 PM, Michael Kay wrote: > >> In many places 1,000,000 strings is not a lot.... >> > The Saxon NamePool is optimized for much lower numbers than this: it's > rare to have more than a couple of thousand element and attribute names. > The only time I've seen large numbers reached is with pathological > applications that generate random namespace prefixes. > > Michael Kay > Saxonica > I addressed this in mail I inadvertently did not send to the list, but to Paul only. I corrected that now. The issue is not so much the content of one document, but the content of all data in a JVM. Tomcat is a prime example. Because it uses a separate Classloader for each installed application, it has many multiples of copies of classes in the perm-gen. The permgen space is limited to start with.... then, if these applications are doing JDOM processing then you are in trouble if JDOM uses the PermGen space for 'scratch' data. PermGen is a non-obvious component of Java. Novices do not know of it, do not understand it's purpose, and do not know how to debug it. By way of example, I ran in to it using intern() and it took me days to figure out where the memory was going.... (years ago). Perhaps that is why I am so sensitive to it. Similarly, do a search for 'Tomcat PermGen' and you quickly understand how precious PermGen space is, it is not to be squandered on something that is easy to replace on the heap. Rolf From jdom at tuis.net Sat Jan 28 17:41:07 2012 From: jdom at tuis.net (Rolf Lear) Date: Sat, 28 Jan 2012 20:41:07 -0500 Subject: [jdom-interest] JDOM and memory In-Reply-To: <4F244077.9050901@saxonica.com> References: <4F02133C.5010704@tuis.net> <4F242488.4000708@tuis.net> <4F244077.9050901@saxonica.com> Message-ID: <4F24A3B3.8080803@tuis.net> I have now compared the results of string-interning to the String-cache code. The 'raw' code (neither SLimJDOMFactory nor string-interning) is: 2.06MB @ 4.55ms The SlimJDOMFactory is: 1.57MB @ 8ms The string-interning SAX Feature is: 2.06MB @ 6.1ms Not sure how I got essentially zero improvement of memory.... got something wrong..... no... been checking, but I think the difference in using String.intern on element names only is so insignificant that it does not feature as much as 1%..... perhaps all the dirrerence is coming in whitespace.... Not worth checking in to it.... I don't believe the String.itern() is the right answer regardless. Rolf On 28/01/2012 1:37 PM, Michael Kay wrote: > >> >> >> Finally, I have in the past had some success with the concept of >> 'reusing' String values. XML Parsers (like SAX, etc.) typically create >> a new String instance for all the variables they pass. For example, >> the Element names, prefixes, etc. are all new instances of String. >> Thus, if you have hundreds of Elements called 'car' in your input XML, >> you will get hundreds of different String Element names with the value >> 'car'. I have built a class that does something similar to >> String.intern() in order to rationalize the hundreds of >> different-but-equals() values that are passed in by the parsers. > Have you measured how your optimization compares with the effect of > setting the http://xml.org/sax/features/string-interning property on the > SAX parser? > > Are you doing the interning in a way that guarantees strings can be > compared using "==", and if so, are you taking advantage of this when > doing the comparisons? .The big win comes with XPath searches such as > //x. Does the interning introduce any synchronization? (This is the big > disadvantage with Saxon's NamePool - it speeds up XPath searching > substantially, but the contention in a highly concurrent workload can > become quite significant.) > > Are you pooling the QName as a whole, or the local name, prefix and URI > separately? > > Michael Kay > Saxonica >> >> I have incorporated this 'caching' class in to a new JDOMFactory >> called 'SlimJDOMFactory'. This factory 'normalizes' all String values >> to a single instance of each unique String value. This significantly >> reduces the amount of memory used in the JDOM tree especially if there >> are lots of: similarly named attributes, elements, white-space-padding >> in otherwise empty elements, or between elements. This process is >> significantly slower through... >> >> For example, with the 'hamlet' test case, the 'baseline' memory >> footprint for hamlet in JDOM is 2.27MB in 4.75ms. >> With the SlimJDOMFactory it is: 1.77MB in 8.5ms >> With Lazy AttributeList it is: 2.06MB in 4.55ms >> With the both it is 1.57MB in 8.3ms >> >> I am pushing both of these changes in to github. The AttributeList is >> an easy one to justify. It is fully compatible with prior code, it has >> positive memory and perfomance impacts. >> >> The SlimJDOMFactory is also justifiable when you consider: >> 1. the user has to decide to use it specifically. >> 2. The memory saving can be very significant. >> 3. Even though the parse time is slower, the GC time savings can be >> significant if the document 'hangs around' for a long time - the >> quicker GC time can add up fast. >> 4. When you have lots of code doing comparisons it is much faster to >> do equals() calls on Strings that are == as well. It saves a hashCode >> calculation as well as a string character scan to prove equals(). >> >> Rolf >> >> On 02/01/2012 3:27 PM, Rolf wrote: >>> Hi all. >>> >>> Memory optimization has never been a top priority for JDOM. At the same >>> time, for what it does, JDOM is not a 'terrible' memory user. Still, I >>> have done some analysis, and, I believe I can trim about a quarter to a >>> half of 'JDOM Overhead' memory usage by making two 'simple' changes.... >>> >>> The first is to merge the ContentList class in to the Element class (and >>> also in to Document). This will reduce the number of Java objects by >>> about half, and that will save about 32 bytes per Element at a minimum >>> in a 64-bit JRE. Additionally, by lazy-initialization of the Content >>> array, we can save memory on otherwise 'empty' Elements. >>> >>> This can be done by extending the Element (and perhaps Document) class >>> to extend 'List'. It can all be done in a 'backward compatible' way, but >>> also leads to some interesting possibilities, like: >>> >>> for (Content c : element) { >>> ... do something >>> } >>> >>> (for backward compatibility, Element.getContent() will return 'this'). >>> >>> >>> The second change is to make the AttributeList instance in Element a >>> lazy-initialization. This would save memory on all Elements that have no >>> attributes, but would have an impact for people who sub-class the >>> Element class and may expect the attributes field to be non-null. >>> >>> >>> I am trying to get a feel for how important this sort of optimization >>> may be. If there is interest then I will make some changes, and test the >>> impact. I may make a separate branch in github to test it out.... >>> >>> If the above changes are unrealistic then I don't think it makes sense >>> to even try.... >>> >>> Rolf >>> _______________________________________________ >>> To control your jdom-interest membership: >>> http://www.jdom.org/mailman/options/jdom-interest/youraddr at yourhost.com >>> >> >> _______________________________________________ >> To control your jdom-interest membership: >> http://www.jdom.org/mailman/options/jdom-interest/youraddr at yourhost.com >> > > _______________________________________________ > To control your jdom-interest membership: > http://www.jdom.org/mailman/options/jdom-interest/youraddr at yourhost.com > From paul at hoplahup.net Sun Jan 29 02:58:18 2012 From: paul at hoplahup.net (Paul Libbrecht) Date: Sun, 29 Jan 2012 11:58:18 +0100 Subject: [jdom-interest] JDOM and memory In-Reply-To: <4F24A3B3.8080803@tuis.net> References: <4F02133C.5010704@tuis.net> <4F242488.4000708@tuis.net> <4F244077.9050901@saxonica.com> <4F24A3B3.8080803@tuis.net> Message-ID: Rolf, I do know there are applications (such as what Michael reported about: that generate random prefixes) for which any form of pooling is dangerous; and you show that there are situation where interning performs worth than other pooling methods (I think hashCode might be seen as guilty but that can't been changed). Nonetheless, I believe the design that we had where the element names were interned is common: in the server application that was there, the ActiveMath learning environment, the element names are everywhere in the java code as well, e.g. for comparison within if statements. So for this interning is actually better than pooling overall. I'm convinced many JDOM users have this approach; using JDOM is cute for Java programming, not for XSLT friends that only see the world as pipelines translatable into a set of unix xsltproc calls. I would suggest the following: - make this configurable - make this subclassable and exploitable That is to let e.g. SAXBuilder have a method: public String makePooledName(String) which would then call the right interning method (String.intern for those who want, SlimJDOMFactory's per default?, nothing for those who fear retention). That'd be in SAXBuilder or JDOMFactory? I'm afraid there's no global JDOM config object, that'd be the place, e.g. also to be called from new Element("name"). paul Le 29 janv. 2012 ? 02:41, Rolf Lear a ?crit : > I have now compared the results of string-interning to the String-cache code. > > The 'raw' code (neither SLimJDOMFactory nor string-interning) is: > 2.06MB @ 4.55ms > The SlimJDOMFactory is: > 1.57MB @ 8ms > The string-interning SAX Feature is: > 2.06MB @ 6.1ms > > Not sure how I got essentially zero improvement of memory.... got something wrong..... no... been checking, but I think the difference in using String.intern on element names only is so insignificant that it does not feature as much as 1%..... perhaps all the dirrerence is coming in whitespace.... > > Not worth checking in to it.... I don't believe the String.itern() is the right answer regardless. > > Rolf > > > On 28/01/2012 1:37 PM, Michael Kay wrote: >> >>> >>> >>> Finally, I have in the past had some success with the concept of >>> 'reusing' String values. XML Parsers (like SAX, etc.) typically create >>> a new String instance for all the variables they pass. For example, >>> the Element names, prefixes, etc. are all new instances of String. >>> Thus, if you have hundreds of Elements called 'car' in your input XML, >>> you will get hundreds of different String Element names with the value >>> 'car'. I have built a class that does something similar to >>> String.intern() in order to rationalize the hundreds of >>> different-but-equals() values that are passed in by the parsers. >> Have you measured how your optimization compares with the effect of >> setting the http://xml.org/sax/features/string-interning property on the >> SAX parser? >> >> Are you doing the interning in a way that guarantees strings can be >> compared using "==", and if so, are you taking advantage of this when >> doing the comparisons? .The big win comes with XPath searches such as >> //x. Does the interning introduce any synchronization? (This is the big >> disadvantage with Saxon's NamePool - it speeds up XPath searching >> substantially, but the contention in a highly concurrent workload can >> become quite significant.) >> >> Are you pooling the QName as a whole, or the local name, prefix and URI >> separately? >> >> Michael Kay >> Saxonica >>> >>> I have incorporated this 'caching' class in to a new JDOMFactory >>> called 'SlimJDOMFactory'. This factory 'normalizes' all String values >>> to a single instance of each unique String value. This significantly >>> reduces the amount of memory used in the JDOM tree especially if there >>> are lots of: similarly named attributes, elements, white-space-padding >>> in otherwise empty elements, or between elements. This process is >>> significantly slower through... >>> >>> For example, with the 'hamlet' test case, the 'baseline' memory >>> footprint for hamlet in JDOM is 2.27MB in 4.75ms. >>> With the SlimJDOMFactory it is: 1.77MB in 8.5ms >>> With Lazy AttributeList it is: 2.06MB in 4.55ms >>> With the both it is 1.57MB in 8.3ms >>> >>> I am pushing both of these changes in to github. The AttributeList is >>> an easy one to justify. It is fully compatible with prior code, it has >>> positive memory and perfomance impacts. >>> >>> The SlimJDOMFactory is also justifiable when you consider: >>> 1. the user has to decide to use it specifically. >>> 2. The memory saving can be very significant. >>> 3. Even though the parse time is slower, the GC time savings can be >>> significant if the document 'hangs around' for a long time - the >>> quicker GC time can add up fast. >>> 4. When you have lots of code doing comparisons it is much faster to >>> do equals() calls on Strings that are == as well. It saves a hashCode >>> calculation as well as a string character scan to prove equals(). >>> >>> Rolf >>> >>> On 02/01/2012 3:27 PM, Rolf wrote: >>>> Hi all. >>>> >>>> Memory optimization has never been a top priority for JDOM. At the same >>>> time, for what it does, JDOM is not a 'terrible' memory user. Still, I >>>> have done some analysis, and, I believe I can trim about a quarter to a >>>> half of 'JDOM Overhead' memory usage by making two 'simple' changes.... >>>> >>>> The first is to merge the ContentList class in to the Element class (and >>>> also in to Document). This will reduce the number of Java objects by >>>> about half, and that will save about 32 bytes per Element at a minimum >>>> in a 64-bit JRE. Additionally, by lazy-initialization of the Content >>>> array, we can save memory on otherwise 'empty' Elements. >>>> >>>> This can be done by extending the Element (and perhaps Document) class >>>> to extend 'List'. It can all be done in a 'backward compatible' way, but >>>> also leads to some interesting possibilities, like: >>>> >>>> for (Content c : element) { >>>> ... do something >>>> } >>>> >>>> (for backward compatibility, Element.getContent() will return 'this'). >>>> >>>> >>>> The second change is to make the AttributeList instance in Element a >>>> lazy-initialization. This would save memory on all Elements that have no >>>> attributes, but would have an impact for people who sub-class the >>>> Element class and may expect the attributes field to be non-null. >>>> >>>> >>>> I am trying to get a feel for how important this sort of optimization >>>> may be. If there is interest then I will make some changes, and test the >>>> impact. I may make a separate branch in github to test it out.... >>>> >>>> If the above changes are unrealistic then I don't think it makes sense >>>> to even try.... >>>> >>>> Rolf >>>> _______________________________________________ >>>> To control your jdom-interest membership: >>>> http://www.jdom.org/mailman/options/jdom-interest/youraddr at yourhost.com >>>> >>> >>> _______________________________________________ >>> To control your jdom-interest membership: >>> http://www.jdom.org/mailman/options/jdom-interest/youraddr at yourhost.com >>> >> >> _______________________________________________ >> To control your jdom-interest membership: >> http://www.jdom.org/mailman/options/jdom-interest/youraddr at yourhost.com >> > > _______________________________________________ > To control your jdom-interest membership: > http://www.jdom.org/mailman/options/jdom-interest/youraddr at yourhost.com From jdom at tuis.net Sun Jan 29 03:44:48 2012 From: jdom at tuis.net (Rolf Lear) Date: Sun, 29 Jan 2012 06:44:48 -0500 Subject: [jdom-interest] JDOM and memory In-Reply-To: References: <4F02133C.5010704@tuis.net> <4F242488.4000708@tuis.net> <4F244077.9050901@saxonica.com> <4F24A3B3.8080803@tuis.net> Message-ID: Hi all. Just to be clear, the 'SlimJDOMFactory is not a default setting. by default people will: SAXBuilder builder = new SAXBuilder(); If you want to have a smaller mempory footprint (but also a slower parse) you can: SAXBuilder builder = new SAXBuilder(new SlimJDOMBuilder()); So, these changes are not affecting anything by default. What I am hearing is that there is value in an 'InterningJDOMFactory' which will do a String.intern() on element and attribute names? That should be easy to arrange... but doing more thant just the Element and Attribute names is likely to cause issues in PermGen (the SlimJDOMFactory can do 'everything' including the XML Text and CDATA sections... Regardless, I sense some anxiety about the SlimJDOMFactory, but, it is something the user needs to opt-in for, so it is very 'safe'. Rolf On Sun, 29 Jan 2012 11:58:18 +0100, Paul Libbrecht wrote: > Rolf, > > I do know there are applications (such as what Michael reported about: > that generate random prefixes) for which any form of pooling is dangerous; > and you show that there are situation where interning performs worth than > other pooling methods (I think hashCode might be seen as guilty but that > can't been changed). > > Nonetheless, I believe the design that we had where the element names were > interned is common: in the server application that was there, the > ActiveMath learning environment, the element names are everywhere in the > java code as well, e.g. for comparison within if statements. So for this > interning is actually better than pooling overall. > > I'm convinced many JDOM users have this approach; using JDOM is cute for > Java programming, not for XSLT friends that only see the world as pipelines > translatable into a set of unix xsltproc calls. > > I would suggest the following: > - make this configurable > - make this subclassable and exploitable > > That is to let e.g. SAXBuilder have a method: > > public String makePooledName(String) > > which would then call the right interning method (String.intern for those > who want, SlimJDOMFactory's per default?, nothing for those who fear > retention). > > That'd be in SAXBuilder or JDOMFactory? I'm afraid there's no global JDOM > config object, that'd be the place, e.g. also to be called from new > Element("name"). > > paul > > > Le 29 janv. 2012 ? 02:41, Rolf Lear a ?crit : > >> I have now compared the results of string-interning to the String-cache >> code. >> >> The 'raw' code (neither SLimJDOMFactory nor string-interning) is: >> 2.06MB @ 4.55ms >> The SlimJDOMFactory is: >> 1.57MB @ 8ms >> The string-interning SAX Feature is: >> 2.06MB @ 6.1ms >> >> Not sure how I got essentially zero improvement of memory.... got >> something wrong..... no... been checking, but I think the difference in >> using String.intern on element names only is so insignificant that it >> does not feature as much as 1%..... perhaps all the dirrerence is coming >> in whitespace.... >> >> Not worth checking in to it.... I don't believe the String.itern() is >> the right answer regardless. >> >> Rolf >> >> >> On 28/01/2012 1:37 PM, Michael Kay wrote: >>> >>>> >>>> >>>> Finally, I have in the past had some success with the concept of >>>> 'reusing' String values. XML Parsers (like SAX, etc.) typically create >>>> a new String instance for all the variables they pass. For example, >>>> the Element names, prefixes, etc. are all new instances of String. >>>> Thus, if you have hundreds of Elements called 'car' in your input XML, >>>> you will get hundreds of different String Element names with the value >>>> 'car'. I have built a class that does something similar to >>>> String.intern() in order to rationalize the hundreds of >>>> different-but-equals() values that are passed in by the parsers. >>> Have you measured how your optimization compares with the effect of >>> setting the http://xml.org/sax/features/string-interning property on the >>> SAX parser? >>> >>> Are you doing the interning in a way that guarantees strings can be >>> compared using "==", and if so, are you taking advantage of this when >>> doing the comparisons? .The big win comes with XPath searches such as >>> //x. Does the interning introduce any synchronization? (This is the big >>> disadvantage with Saxon's NamePool - it speeds up XPath searching >>> substantially, but the contention in a highly concurrent workload can >>> become quite significant.) >>> >>> Are you pooling the QName as a whole, or the local name, prefix and URI >>> separately? >>> >>> Michael Kay >>> Saxonica >>>> >>>> I have incorporated this 'caching' class in to a new JDOMFactory >>>> called 'SlimJDOMFactory'. This factory 'normalizes' all String values >>>> to a single instance of each unique String value. This significantly >>>> reduces the amount of memory used in the JDOM tree especially if there >>>> are lots of: similarly named attributes, elements, white-space-padding >>>> in otherwise empty elements, or between elements. This process is >>>> significantly slower through... >>>> >>>> For example, with the 'hamlet' test case, the 'baseline' memory >>>> footprint for hamlet in JDOM is 2.27MB in 4.75ms. >>>> With the SlimJDOMFactory it is: 1.77MB in 8.5ms >>>> With Lazy AttributeList it is: 2.06MB in 4.55ms >>>> With the both it is 1.57MB in 8.3ms >>>> >>>> I am pushing both of these changes in to github. The AttributeList is >>>> an easy one to justify. It is fully compatible with prior code, it has >>>> positive memory and perfomance impacts. >>>> >>>> The SlimJDOMFactory is also justifiable when you consider: >>>> 1. the user has to decide to use it specifically. >>>> 2. The memory saving can be very significant. >>>> 3. Even though the parse time is slower, the GC time savings can be >>>> significant if the document 'hangs around' for a long time - the >>>> quicker GC time can add up fast. >>>> 4. When you have lots of code doing comparisons it is much faster to >>>> do equals() calls on Strings that are == as well. It saves a hashCode >>>> calculation as well as a string character scan to prove equals(). >>>> >>>> Rolf >>>> >>>> On 02/01/2012 3:27 PM, Rolf wrote: >>>>> Hi all. >>>>> >>>>> Memory optimization has never been a top priority for JDOM. At the >>>>> same >>>>> time, for what it does, JDOM is not a 'terrible' memory user. Still, I >>>>> have done some analysis, and, I believe I can trim about a quarter to >>>>> a >>>>> half of 'JDOM Overhead' memory usage by making two 'simple' >>>>> changes.... >>>>> >>>>> The first is to merge the ContentList class in to the Element class >>>>> (and >>>>> also in to Document). This will reduce the number of Java objects by >>>>> about half, and that will save about 32 bytes per Element at a minimum >>>>> in a 64-bit JRE. Additionally, by lazy-initialization of the Content >>>>> array, we can save memory on otherwise 'empty' Elements. >>>>> >>>>> This can be done by extending the Element (and perhaps Document) class >>>>> to extend 'List'. It can all be done in a 'backward compatible' way, >>>>> but >>>>> also leads to some interesting possibilities, like: >>>>> >>>>> for (Content c : element) { >>>>> ... do something >>>>> } >>>>> >>>>> (for backward compatibility, Element.getContent() will return 'this'). >>>>> >>>>> >>>>> The second change is to make the AttributeList instance in Element a >>>>> lazy-initialization. This would save memory on all Elements that have >>>>> no >>>>> attributes, but would have an impact for people who sub-class the >>>>> Element class and may expect the attributes field to be non-null. >>>>> >>>>> >>>>> I am trying to get a feel for how important this sort of optimization >>>>> may be. If there is interest then I will make some changes, and test >>>>> the >>>>> impact. I may make a separate branch in github to test it out.... >>>>> >>>>> If the above changes are unrealistic then I don't think it makes sense >>>>> to even try.... >>>>> >>>>> Rolf >>>>> _______________________________________________ >>>>> To control your jdom-interest membership: >>>>> http://www.jdom.org/mailman/options/jdom-interest/youraddr at yourhost.com >>>>> >>>> >>>> _______________________________________________ >>>> To control your jdom-interest membership: >>>> http://www.jdom.org/mailman/options/jdom-interest/youraddr at yourhost.com >>>> >>> >>> _______________________________________________ >>> To control your jdom-interest membership: >>> http://www.jdom.org/mailman/options/jdom-interest/youraddr at yourhost.com >>> >> >> _______________________________________________ >> To control your jdom-interest membership: >> http://www.jdom.org/mailman/options/jdom-interest/youraddr at yourhost.com