[From nobody Fri Aug 6 17:04:25 2004 Delivered-To: jhunter@collab.net Return-Path: xerces-j-dev-return-1183-jhunter=acm.org@xml.apache.org Received: (qmail 96125 invoked from network); 8 Jul 2000 05:32:02 -0000 Received: from mail.acm.org (199.222.69.4) by laswell.collab.net with SMTP; 8 Jul 2000 05:32:02 -0000 Received: from locus.apache.org (locus.apache.org [63.211.145.10]) by mail.acm.org (8.9.3/8.9.3) with SMTP id BAA21772 for <jhunter@acm.org>; Sat, 8 Jul 2000 01:30:12 -0400 Received: (qmail 81536 invoked by uid 500); 8 Jul 2000 05:31:22 -0000 Mailing-List: contact xerces-j-dev-help@xml.apache.org; run by ezmlm Precedence: bulk X-No-Archive: yes Reply-To: xerces-j-dev@xml.apache.org list-help: <mailto:xerces-j-dev-help@xml.apache.org> list-unsubscribe: <mailto:xerces-j-dev-unsubscribe@xml.apache.org> list-post: <mailto:xerces-j-dev@xml.apache.org> Delivered-To: mailing list xerces-j-dev@xml.apache.org Received: (qmail 81515 invoked from network); 8 Jul 2000 05:31:22 -0000 User-Agent: Microsoft-Outlook-Express-Macintosh-Edition/5.02.2022 Date: Fri, 07 Jul 2000 22:31:16 -0700 Subject: [spinnaker] Announce From: James Duncan Davidson <james.davidson@eng.sun.com> To: <xerces-j-dev@xml.apache.org>, <general@xml.apache.org> Message-ID: <B58C0AB3.7073%james.davidson@eng.sun.com> Mime-version: 1.0 Content-type: text/plain; charset="US-ASCII" Content-transfer-encoding: 7bit X-Mozilla-Status2: 00000000 It's been a while since Xerces was launched onto the world. And more recently we received Crimson to compare it to. From experience and this comparison, we've found a few things to be evident. * Xerces is performant on JDK 1.1 VMs. Very much so. Admirably so in fact. * Crimson isn't so optimized, yet it runs about as fast as Xerces does on modern VMs such as HotSpot. The HotSpot team told us that heavily optimized code for 1.1 would not benefit under HotSpot. We have the proof now. In fact, there's cases where it seems that Xerces slows down. * Xerces has a large memory consumption. And a large Jar size. This probably wasn't an original design goal, but there are a catagory of users that we've talked to that have an issue with this. * Use of Xerces is widespread. Obviously people want a good, high quality parser from a free source. * Xerces is a great product. It stands well in the marketplace. * However, because Xerces was heavily pre-optimized, its extremely complex to understand and delve into. I think that this is best reflected in that most of the bits that go into Xerces come from IBM Cupertino. * In our analysis of the Xerces code base, we can't use it for future inclusion in the JDK. The pre-optimization is a killer. The code-complexity is a killer. And the memory consumption is a problem. These are not unknown problems. Ted L. and I talked about the current Xerces source base at length at ApacheCon (as we were working out details for getting Crimson donated). Ted put forward the opinion that it might be best to do a massive refactoring based on the lessons learned from both parsers. To essentially ground up a new parser that has a heritage in both existing parsers. I've come to the conclusion that I agree with him. After quite a bit of discussion, the rest of the XML team at Sun, the people who are responsible for the parser that will ship in the core of future JDKs, agree as well. It is important to stress that we want to ship an Apache based parser in the JDK for all the reasons that you'd expect. Apache code tends to be good code. The Apache process is one that we beleive in. So, in the best of Apache traditions, were gonna do something about it. I'm creating a tree in the xml-contrib area in which to do a lot of code work to explore how such a new parser could come to be. It's called Spinnaker. This is the Spinnaker project description based on the README that will get checked in: -=------------------------------------------------------------------------- Spinnaker is an attempt to create a next generation Apache XML Parser based on all the lessons learned from the current versions of Xerces and Crimson. GOALS: * Simple to read, maintainable code. Above all, this is the primary goal for any openly developed project as without the ability to read the code, it's impossible for people to contribute and get involved. * Smallest possible size. This means small distribution size (JAR file) and small memory footprint. * Modular. It should be possible to build a parser as a set of Jar files so that a smaller parser can be assembled which fits the need of a particular implementation. For example, in TV sets do you really need validation? * Cleanly Optimized. This means optimized in a way that is compatible with modern virtual machines such as HotSpot. Optimizations that work well with JDK 1.1 style VMs can actually impact performance under more modern VMs. Optimizations that interfere with readability, modularity, or size will be shunned. * Collaboratively Developed. This means that we want *lots* of people from diverse backgrounds to participate in this barn raising. PLAN OF RECORD: In order to bootstrap what will essentially be a full refactoring of what an XML parser is (base on our two existing ones), the following is a list of possible checkpoints to hit. * First, factor out utility classes from both the Xerces and Crimson source bases. There is a lot of good work on things like the Xerces decoders which are faster than the JDK's. This is actually the start of an Apache wide common utility set (something that I'd like to see in the future as AUC -- Apache Utility Classes). We've talked about this before in other Apache projects, and there's a lot of good code that we can start it off with here. * Determine what the modular API looks like. What are the various peices that can be factored out. How can we get to a point where it's easy to package a parser that doesn't include DOM or a particular validator? There's some work started on a branch, but it hasn't been touched in a month or so. This might serve as a start place. * Refactor out a base parser. Once we see how those APIs should look (or at least get a start, they don't have to be perfect :) we start at the bottom and look at the code of the existing parsers to come up with a basic non-validating parser that can rip through XML. * Set SAX on top of this base parser. Of course. * Look at pluggable validation. * Factor in tree based producers. We'd like to see DOM and JDOM up front. * Stability. By this point, we should have something that is starting to work well. Stability will be a driving goal then. It should be said up front that this won't happen overnight. It will be a while before any fruit starts to grow. -=------------------------------------------------------------------------- So, to close a few thoughts... Q. Isn't this a slam on the Xerces guys? A. Nope. This is a natural thing that happens when people get an itch to scratch in the Apache organization. It should be pointed out that Apache Webserver 2.0 started out as a thought project, and that the next version of Tomcat may very well be Catalina which was a similar refactoring of the current Tomcat source base. Q. When will this be ready? A. Damn if I know. Not anytime immediately to be sure. There's a bit of work to be done. Q. Where's the repository gonna be? A. $CVSROOT/xml-contrib/spinnaker Q. When's the code going to go in? A. Well, the initial little itty bit that I've done so far to set up a directory structure and identify a few utility classes is going to be put in in just a few minutes time after this email goes out. I'll be working on more pieces throughout the weekend that will beef things up. Q. Is this Xerces 2.0? A. No. Not Yet. And maybe Never. It would take the acceptance of the developer community to be so. For the time being, it's just a code base where some of us are going to hang out and work. It should be said that software darwinism could strike and this code base goes absolutely nowhere. Or, as I hope, this is going to take off and really work out. Q. Can I help? A. Duh.... Oh and by the way, to help keep discussion seperate, please use [spinnaker] in your subject lines. This has been a help on the Tomcat lists. That's all for now... Let the code start flowing. ;) .duncan --------------------------------------------------------------------- To unsubscribe, e-mail: xerces-j-dev-unsubscribe@xml.apache.org For additional commands, e-mail: xerces-j-dev-help@xml.apache.org ]