In defense of Apache

Apache is an open source giant, but does it live up to its community ideals?

In the early part of the last decade I donated a piece of software to Apache because I was afraid that Microsoft would sue me. Nearly a decade later, Microsoft would become a major contributor to that software. I started POI with the hopes of landing local business during the dot bomb, which was a much worse recession than the latest one for those in technology. As it turned out, my first customer was in South Africa.

My vision for POI was to grow beyond a spreadsheet-file-format API, cover the full gamut of Microsoft's Office file formats, and plug this into a reporting engine that I would develop later. Unfortunately for me, other people further along on reporting engines were able to plug POI into them and make them more successful. That was a big success for Apache, but something of a miss for me. I'd made the hard part happen but missed out on the primary benefit.

 Apache for individuals

I remember Marc Fleury at JBoss used to characterize Apache as being a bunch of guys who stood around waiting for IBM to "take them." I don't think it was quite that simple. At one point, IBM approached us and wanted to use POI in some part of WebSphere. I was happy to have Big Blue, but I wasn't willing to meet the company's timelines for free. IBM decided to do something with OpenOffice instead.

While others may have benefited more from POI than I did personally, I did make a few hundred thousand dollars off it. That was the smallest benefit I received. Because of POI, I was able to get a (not very good) job to wait out the rest of the recession. My co-founder went to SAS even earlier. Because I'd already started an Apache project, I was able to talk Marc Fleury into hiring me at JBoss, and I made a lot more after the acquisition.

Apache wasn't a rosy time for me. I was in my early 20s. My brain cells were firing thousands of times per millisecond in hundreds of different directions, my teenage hormones had not completely abated, and all this was mixed with an Irish temper. Communicating with somewhat bureaucratic people by email is an incredibly anger-inducing and frustrating way to spend your time. I had plenty to be frustrated about, but I handled it rather poorly.

 Community, meritocracy, and seniority at Apache

Apache is about community on the surface, but more closely resembles a meritocracy. Unfortunately, you build merit quickly and it wanes slowly, so seniority seems to be an unspoken component of merit. This is natural -- who am I next to Brian Behlendorf (one of the founders of Apache used merely for example)? Why should anyone care if I disagree with him? I mean, virtually everyone in the organization knew who Brian was, and virtually everyone knew I had a big virtual mouth.

When actions are public, the issues tend to be decided on their merit. When matters are private (Apache has over time moved a lot more to private lists), decisions are made sometimes that don't make much sense if you believe Apache stands for what it purports.

Apache works, mostly

Much of the time the Apache system works. You have interested people who start a project, get some code working, then propose it to Apache. One of these meritorious members shepherds it into the organization and helps build a community of developers. The "committers" on the project do their own stunts -- the bulk of the marketing and evangelizing.

The Apache name carries a lot of weight and is helpful. There was a brief time when another project had temporarily surpassed POI in capabilities on Excel. This could have been fatal. While POI's scope was broader and included Word and other file formats -- arguably why we fell behind -- Excel was always the most important. The Apache name kept us afloat until we surpassed the other project. It is mostly dead now, and POI continues to be the default in reading/writing Office files from Java.

If you don't care about making a profit and want to attract contributors and users, Apache can be helpful. But Apache is also a big weight on any project. The Apache system for making decisions takes a lot of time, and it encourages the kinds of fights that probably don't need to happen. Projects need leaders, but Apache robs leaders of the semiautocratic power sometimes helpful to keep projects on track. Instead, the leader must become more of a community organizer. Some software developers are good at becoming community organizers, but most ... not so much.

This is not to say that any open source project leader, even outside Apache, can truly be autocratic. Open source allows people to "vote with their feet" -- to leave a project and start their own. Apache's system doesn't make it any less likely they will do so; it just makes it harder for leaders to herd cats.

 Board action tends to fail to produce community

The worst Apache projects are those where the board takes action, often in private, and simply announces its decision, sometimes embargoing members from talking about it. Generally these projects were donated by a single company, consisting mainly of that company's developers and possibly allies or business partners. These projects tend to eventually fail.

There are successes as well. Generally, where IBM or others have used Apache to establish a standard, these projects have succeeded. Ironically, I don't think they've usually been successful from a "community" standpoint, but at establishing a standard. You can see successes in the myriad XML and Web services libraries.

The clearest example of failure is Apache Harmony. When IBM pulled out, the project folded. There are earlier examples. Apache Beehive was entirely a BEA project. It really ended up being not much more than a dump and run. Geronimo, for example, was reportedly taken directly under board control because of its lack of diversity. Geronimo also has a declining number of releases per year, and if you look at the LinkedIn profiles of people, you'll realize it is directly based on employment at IBM.

OpenOffice is another project beset by all signs of failure, including the fact that Apache has seldom undertaken desktop applications. But any understanding of OpenOffice requires understanding Oracle and open source.

Oracle and open source don't really fit together. For starters, Oracle closed most of Sun's open source projects. Oracle then proceeded to tick off most of the contributors to the remaining projects, especially since many were former Sun employees who quit because they didn't want to work for Oracle (but still, ironically, wanted to work on the projects) or were laid off by Oracle. All of the ex-Sun types and the rare but stereotypical open source hippie types -- as well as the people being paid by someone else -- had one major desire: to work on the project and meet the project's overall objective.

Oracle famously decided that Hudson would migrate to its infrastructure no matter what the developers preferred. The developers decided to continue to develop the project as Jenkins on GitHub and Google Groups regardless of what Oracle thought. Oracle then donated Hudson to Eclipse. Although both projects are active, Jenkins is far more active, with far more contributors and broader industry support.

In an almost parallel story, the developers of OpenOffice were already dissatisfied with Sun making arbitrary decisions that affected them despite the creation of a governance board they were a part of. They split off into LibreOffice. Oracle fired most of its paid OpenOffice developers, closed OpenOffice.org, and donated the trademark to Apache. The project was forked as Apache OpenOffice, and now IBM is now doing most of the core development. Both projects are active.

According to Ohloh, the Libre fork has fewer lines of code (possibly in a good way), more commits, and a larger diversity of contributors. The Apache OpenOffice fork has more lines of code from fewer contributors and declining diversity. The larger codebase may be in part due to IBM's donation of its Lotus Symphony code. Nearly all of the very active Apache OpenOffice developers work for IBM directly or indirectly. Like Apache Harmony, this isn't a "community" project that would survive IBM changing gears. According to Apache's "community" rhetoric, Apache OpenOffice shouldn't even exist.

I should note that at my company, our operations people use OpenOffice because of a compatibility bug with Google Docs. Most of the operations people are on Windows (due to scanner feed support on Linux, believe it or not). Because of a horrible bug in Apache OpenOffice, backspacing take an eternity, which is truly painful to watch. Our developers are all on LibreOffice in the rare instances we need an installed desktop suite. Mostly, we use Google Docs. Who wants to email file attachments back and forth? As a developer, I'm an open source guy; as a businessman I'm all cloud through and through.

Hadoop: A donated success

Hadoop is a different story. It's creating an entire new industry. Hadoop is everything an Apache project should be: a community of rival companies, an increasing activity level, and an increasing number of committers. You see rivals Hortonworks and Cloudera. You also see Yahoo and other industry titans (or former titans in this case) coming together to develop this new class of software.

Like the heady days of the Apache Web Server, the opportunity is big enough for Hortonworks and Cloudera to both be successful without exclusive access to the trademark. The knowledge and skills required are hefty enough that no new barriers to entry need to exist to make a profit. This is Apache at its very finest. It will be messy and there will be kerfuffles, but how else and where else could this happen? Where else could Hadoop be both open source and inaugurate the next stage of the InterWebs? In some ways Hadoop is in fact the successor to the Apache Web Server -- or maybe the realization of what it started.

I think Apache is a very fine place to develop frameworks, establish standards, and even create new industries: HTTP, XML, Web services, big data. I think Apache is a fine place to develop frameworks that cross an industry and are needed in multiple products or projects, such as Struts and POI. Apache often suits the needs of large companies trying to develop a competitive advantage over rivals; for example, the Java ecosystem was seeded at Apache.

Apache is a great place to start your career if you have a lot of time on your hands, such as those who graduate during recessions that affect developer employment. It's also a great way to take your career to the next level if you have the skills to do it and pick the right project -- I guarantee you'll up your pay rate by contributing to Hadoop.

I don't think Apache always lives up to its purported ideals, and I don't think it can rescue a project from corporate abandonment (Beehive). I don't think the all-IBM projects are generally successful at creating communities or even projects that survive when IBM switches gears. You can replace IBM with any company -- but at Apache, it's usually IBM. I think these kinds of "business strategy" forks don't achieve much and generally hurt users.

Retrospective

I could never regret my time at Apache. I owe it my career to some degree. It isn't how I would choose to develop software again, because my interests and my role in the world have changed. That said, I think the long-term health of the organization requires it get back to its ideals, open up its private lists, and let sunshine disinfect the interests. My poorly articulated reasons for leaving a long time ago stemmed from my inability to effect that change.

I have a lot of respect for many of the people on the Apache board, but it's probably time for new leadership and a new perspective on what makes a successful project -- and when it should really, truly be allowed out of incubation and how to ensure private interests don't cloud judgement regarding that. The world needs an Apache Software Foundation.

This article, "In defense of Apache," was originally published at InfoWorld.com. Keep up on the latest developments in application development, and read more of Andrew Oliver's Strategic Developer blog at InfoWorld.com. For the latest business technology news, follow InfoWorld.com on Twitter.

This story, "In defense of Apache" was originally published by InfoWorld.

Notice to our Readers
We're now using social media to take your comments and feedback. Learn more about this here.