The PostgreSQL database project has recently released Version 8.0, which was received with quite some fanfare, mostly due to its first-ever Windows port. Mad Penguin talked with Josh Berkus, one of the core team members, to find out how 8.0 has fared since its official release on January 17, 2005.
MP: So today is Monday, March the... seventh, thank you, Josh. We're interviewing Josh Berkus, the marketing lead of the PostgreSQL project. What other hats to you wear, Josh?
JB: Well, actually my title is Core Team member, I happen to be sort of the de-facto PR lead as well. But that evolved after I was elected to the Core Team.
MP: Let's talk real briefly about the highlights of PostgreSQL 8.0, the version that was recently released.
JB: Well the big new feature that's getting the most press is, of course, our native Windows port. This is the first version of PostgreSQL that will run natively on Window with something comparable to the performance that you can expect on Linux or BSD. That's what's gotten us all the press, and that's what's gotten us probably somewhere around like 100,000 new users. Other than that, there's a number of other features that have filled holes that our big enterprise users want, including: point in time recovery, which is otherwise known as continuous backup; nested transactions or ”Savepoints” in SQL standard terminology; table spaces, which is a way of making use of more disks; and some memory and I/O improvements, which were intended primarily to help large multi-processor machines.
MP: We were talking earlier about this version representing a “third hump” for PostgreSQL . How does this version represent a major milestone for PostgreSQL?
JB: To mention the three humps, I've been involved with a number of open source projects, and as I see it, there are three humps to get over before you are a “big project.” The number one is when you go from being one developer to being multiple developers. Number two is when you pick up momentum among open source users, when people you never met before start jumping on mailing lists and saying, “Hey, I've used the software, is there anything can I do to help you out.” That gets you a certain distance, and then when you grow even further in that direction to reach the third hump, where rather than just individuals, companies start saying “Hey, you've got a cool project, we use the software, we want to contribute, we want to be publicly involved with your project, it's good for our PR, too.”
That's sort of the third hump, and we [PostgreSQL] have seen that happen in the last year. Over the last year we have seen not just SRA, PostgreSQL Inc., and Command Prompt, Credativ and TDMSoft and a few other companies who have supported us for years, but we've also picked up Pervasive, a proprietary database vendor that's going open source; and Fujitsu, the $43 billion Japanese mega-corporation, and a few others who haven't been public.
MP: Let's talk a bit about Fujitsu. They made a rather stunning statement of commitment. Can you tell us a bit about it?
JB: Basically, the head of their open source business applications division, Mr. Takayuki Nakazawa, said, “We are committed to helping make PostgreSQL the leading database management system.” Of course, we at the PostgreSQL project really appreciated that vote of confidence from a large, multi-billion dollar corporation, not just because Fujitsu has been contributing code and features, which they have, for example, tablespaces, nested transactions, and others.
In addition, one of the battles that you always face as open source software growing on the commercial market is legitimacy. Basically, no business that's adopting a new software, particularly something as vital as a enterprise data center, wants to be a maverick. They all want to have the confidence that someone else is using it, and the bigger the corporation that's endorsing it, the better. So if you have endorsements from companies like Fujitsu, Pervasive, in addition to the start-ups, that really gives IT managers the confidence they need to go to their bosses and say, “Hey, I found something to use for our project: PostgreSQL.”
MP: What do you think was the big draw for Fujitsu for PostgreSQL 8.0?
JB: People in the US are not really familiar with it, but in Japan, Fujitsu is a major software vendor, if not THE leading software vendor, but Fujitsu does not themselves have a big database offering. Yet they do have a lot of very substantial database tools, so incorporating an open source database into their repertoire, something they can attach their database tools to , rather than licensing something from another commercial company, was a natural move for them. When it comes right down to it, it is a way for them to compete with IBM and Microsoft.
MP: Let's talk about the general open source community's reaction to PostgreSQL 8.0. How long has 8.0 been out, and what are the download numbers looking like?
JB: PostgreSQL came out officially on January 17, but it was in beta 6 months before that. Fixing all of the bugs in the Windows platform really took us an inordinate amount of time. But since it came out, we've had about 200,000 downloads from our primary FTP and BitTorrent sites. That's not the only place you can get PostgreSQL. That may represent about half of all of our recent new users, and of course, we get the majority of our users through major Linux distributions, for which we don't have numbers and for which we will be seeing adoption for the next two years.
So, overall, there has been a huge reaction, and one of the things that I am encouraged by is that the Windows port has resulted in over 100,000 downloads and new users, potentially, people who weren't able to use PostgreSQL before because they didn't have access to experts on Linux or FreeBSD or Solaris or other Unix-like operating systems. And that's going to continue to help grow our community.
MP: What's this launch going to mean for PostgreSQL's star in the constellation of the overall open source “sky”?
JB: I don't think that anyone questions that we're somewhere in the realm of the “big-name” projects. Certainly both the press and the companies that are affiliating with the PostgreSQL project seem to think so. In terms of ratings, your class A projects are pretty much Linux and Apache. (laughing) And then you have your second tier, which include a whole host of applications. And we're either in the second tier or the third tier. Certainly, we have users into the six figures; we have a couple hundred code contributors; and at this point, any time I talk to major corporations about databases, it turns out that somewhere in some department, if not corporation wide, they are already using PostgreSQL.
MP: What are the major draws that will attract developers to the project?
JB: A lot of the features are stuff that we've had for a long time. For example, PostgreSQL is a completely community owned project. We're not corporate, we don't even have a foundation that governs the code. We have a foundation for fundraising, but it does not govern code contributions. So that openness is not only part of our design, but part of our culture. The idea being that PostreSQL is there for you to hack. If you need something that most users don't need, or don't use, but it's special for your project or your business goal, you can go ahead and hack it. That, combined with the business license, let's you go ahead and hack it, and then commercialize it. You can even release under a commercial license. It's there to be completely, 100% free, and that's the main thing that attracts people to PostgreSQL.
And of course, there are other features that compliment the license. For example, a clear coding style, and a very accessible code base make it easy to hack. We have a pluggable architecture that makes it easy to write your own extensions to the database.
There's part of our general architecture design to attract developers from all [programming] languages, unlike any other database system that I know of, PostgreSQL supports 11 or 12 different languages for writing stored procedures. So whatever your chosen programming language is, Perl, PHP, Java, whatever, you can probably write stored procedures in it. That opens up the world of database programming to developers who otherwise might not approach it.
For current users, we have added stuff to keep them on board. For example, Christopher Kings-Lynne completely overhauled the backup and restore stuff to eliminate a lot of the annoying issues with that. The PL/Perl server-side scripting language that allows you to write Perl scripts inside the database has been vastly enhanced, which should be a big attraction for Perl users.
Other features in this release intended for existing users and DBAs include vastly expanded logging options and filling in a few holes in our SQL support for database design in terms of managing database permissions and database object characteristics.
MP: Where is PostgreSQL heading over the next 12 to 24 months?
JB: We have been releasing a new version about once per year for the last several years, and I don't see any reason for that pattern to change. It's a good compromise between how often the developers would like to release, which is about once every six months, and how often our users would like us to release, which is actually more like once very 18 months.
Because we're a completely community-organized project, if you want to develop something, and have the resources to develop it, you just jump on the hackers' list, and say “hey, this is something that I want to develop, and this is how I want to do it.” People will criticize your ideas, and suggest changes to fit into PostgreSQL in general. And then you do it! You don't have to fit some pre-determined marketing goal.
On the other hand, this also means that version 8.1 doesn't have a title. There's no specific goal for the release. An example of what people are working on right now is SQL standard compliant stored procedures. We have procedures now, but they're not compliant with the standard syntax. We are also working on the bit-mapped indexes, which is a big feature for our Oracle users out there; also, vastly improved improved performance on other forms of advanced indexing; two-phase commit, which is another big thing for distributed application users; migrating auto-vacuum, (a maintenance administrator) into the back end, so that it's no longer a separate process.
One other question that I would like to answer is replication, because I get this question all the time: unlike some other database systems within PostgreSQL, replication is an add-in. It's a separate application. That isn't an accident. It's done on purpose.
There are several reasons for that. One is that replication is actually not a single feature. It is a set of four or five different related implementations, which serve four or five different needs. As a result, we don't want to bundle one particular kind of replication with the main database, because that's not suitable to all users. Our leading replication project, in terms of popularity, is something called Slony-I, lead by Jan Wieck, who is also on the Core Team. That has actually been quite popular as one of the leading master-slave high availability replication systems of any kind. Jan is currently working on Slony-II, which will be synchronous multi-master replication for database server clusters. Based on the pace of his past work, I would anticipate that it would be available in about a year or so. But don't look for that information in the main release notes for PostgreSQL, because it will always be a separate parallel project.
MP: you were the marketing lead for OpenOffice.org, which is a huge cross-platform project. Now PostgreSQL is a cross-platform project, too. What did you learn from the OpenOffice.org project that will be applicable to the PostgreSQL project?
JB: I think that the thing that lots of open source projects have learned, and that those that haven't should learn, is the simple fact that millions of people use Windows, and millions of people use only Windows. If you don't have a port to that platform, you have denied them access to your project.
That's a tough thing for lots of us. I personally do not have any Windows machines. This is an all-Linux office. But that doesn't mean I don't recognize that not only do individuals only have a Windows machine to use, but there are companies that have standardized, say, on a Windows 2000 server environment, and don't have a heterogeneous environment available. And there are a lot of developers who do their work on a notebook running Windows XP, although their final applications will run on Linux or a Solaris server. So, as a result, if you really want your open source project to grow and take off, and reach millions of people, if it's appropriate for what your project does, then you need to have a Windows port of your project so that those people can download it and try it out on the hardware and the operating system that they already have, regardless of what they may use it on later.
MP: Is there anything else you would like to add?
JB: The main thing is that PostgreSQL is a community project. We always welcome new people. If you're interested in PostgreSQL, you downloaded it, you tried it, you liked it except for one thing, then jump on a mailing list and talk to people about it, because whatever that one thing is, fixing it might be closer than you think.