As we approach the imminent release of the “stable” version of OpenOffice.org 2.0 (OOo 2.0), it is becoming increasingly apparent that OOo 2.0 and it's commercial big brother, StarOffice 8.0, are going shake up the desktop software industry. As this Mad Penguin™ interview with Gary Edwards shows, the “clean” XML standard adopted by the OASISOpenDocumentTechnical Committee (TC) and the State of Massachusettst is at the heart of massive changes now under way in the multi-billion dollar office productivity software industry.
Gary is a member of the OpenDocument TC and the principal of Open Business Stack Systems, a company providing XML-based SOA (service-oriented architecture) solutions, and so has first-hand knowledge of the potential that truly open XML holds for uniting disparate computer systems which are still out there running mission critical applications. By constrast, Gary explains, Microsoft's Word ML will only interoperate with its own locked stack, require customers to become complete Microsoft shops if they hope to achieve the same level of fluid information flow available through truly open SOAs.
No one knows for certain when OOo 2.0 stable will be released, but Mad Penguin's bet is that the stable 2.0 release will come before any recently purchased cartons of milk expire in your refrigerator, in which case, it will be well-timed to coincide with two other events focusing attention on the OpenDocument. The release will probably follow shortly on the heels of the 2005/10/10 founding of the OpenDocument Fellowship, an international organization dedicated to promoting the OpenDocument format. This is important because CIOs will need a neutral information source to cite in countering Microsoft FUD, such as this interview with Microsoft Senior Vice President Steven Sinofsky. Equally important, the release could fall right smack into the middle of the annual conference of the National Association of State Chief Information Officers (NASCIO), which will be held in San Diego, California, from October 16 – 19, 2005. Peter Quinn, the Massachusetts CIO, is sure to be mobbed by fellow State CIOs and reporters alike, because his State's high-profile adoption of the clean XML OpenDocument, as opposed to the Word ML format offered by Microsoft, could trigger a ripple effect throughout the conference. CIOs from all over the US will be focusing their attention on the OpenDocument and discussing amongst themselves the feasibility of duplicating Massachusetts' lead in their own states.
This article is the second of three articles which Mad Penguin™ is offering in celebration of the stable OOo 2.0 release. The first article was Slashdotted here.
Mad Penguin: Why don't you introduce yourself and your company to the nice folks out there in Mad Penguin™ land?
Gary Edwards: I'm Gary Edwards, and my company is OpenStack Business Systems. I'm also a founding member of the OASIS OpenDocument Technical Committee (TC), representing the OpenOffice.org community.
MP: What does OpenStack Business Systems do?
GE: We provide service-oriented architecture based consulting and development services. We are a boutique open source shop, implementing solutions mostly based on Open Standards and Open XML technologies.
MP: What does SOA mean?
GE: It means you can finally connect legacy information systems to everything else, and do so with an efficiency and resulting flow of information that is beyond your wildest dreams. When you write new business applications, you're able to write them against this new horizontal visibility of not just your information resources and transaction process, but including valuable services from trading partners, customers, and other web based information services (like Google and eBay). SOA itself is just a collection of best “Open Internet” practices shaped into an easy to follow blueprint. It's important to understand that the methods and protocols used in creating an SOA solution for connecting disparate information systems are always Open Internet based. So they always involve Open XML technologies. And more often than not, remaining compliant with Open Standards is the best way to improve the participation ratios of an SOA and achieve the broadest horizon of information visibility.
MP: In other words, for the newbies out there, it's a way of getting different computer systems to talk to each other?
GE: Yeah, but this is way beyond the promise of client/server. How do we get disparate systems to digitally connect, exchange, and interact the way we need them to? At OpenStack we have one rule of thumb, “first, get everything into XML, and then get it back again”. If you can't write XML connectors or work with XML web services, you can't take that first SOA step. The next step of course is setting up a XML universal transformation layer, and an XML Hub that you can create portals, application services, and rich web applications from. The XML hub synchronizes workflows, transaction processing flows, and information flows to the disparate back end (black box) legacy systems – using the universal transformation layer as the connectivity buffer.
The last great architecture before the Internet came along was client-server based information systems. The tradition of information systems was that of departmentalization, and client/server architectures were the final champion of boundary based business process – transaction process systems. The model for this is simple enough. Each department in a corporation traditionally had their own information system. They would have business applications written for the business processes that they were involved in. These systems didn't talk to each other. But since they were dedicated systems, there really wasn't a need to pursue something deemed impossible. And priced accordingly. If you're the management of a corporation, there was no way of digitally getting all the information into one place. Or digitally passing information from one system to others. There was no way of running reports across the disparate systems, unless of course you standardized on a single vendor platform where at least you could use SQL (Standard Query Language) to run against your databases. Unless your bank account was bottomless, you had to wait for the quarterly reports to come out before you found out what was going on.
When you hook the systems together, you also take out the barriers to the information flow. In freeing the information flow, you enable management to re-engineer the various business processes without having to rip out and replace the legacy systems. The re-engineering occurs at a higher level. Decision making and workflow routing is implemented in new applications not based on the vendor limits of the back end systems, but based entirely on the needs of a changing business. The applications and the back end databases and transaction processing centers are still doing work they way they always have. It's just that we're able to move the information into an XML file format which is useful to all the other information systems through that XML transformation process.
MP: What it is about XML that allows this to happen? Why is it so magical?
GE: Well, first of all, XML is readable by both humans and machines. Plus XML is extensible, so that it can be used to make adaptations to almost anything out there. Since it's readable by humans, people can come in and figure out what was done. What is this system doing that I need to understand? Then there's the real magic; the transformational qualities of XML.
Legacy systems usually provide information that's locked into an application-bound binary file format. Either the keeper of these systems provide you with a description of the inherent schema defining the structure of that information, or you work it out with the vendor. Much of the time though these information structures have been painstakingly reverse engineered so that the files can be worked with. This is why writing XML connectors is still an art. Once you transform that information into XML, it becomes a common layer within a business that any other system can grab and then transform it back to their business processing systems. You only need write your connections once. After that the information flow from that legacy system can be re-purposed endlessly. Once in the universal transformation layer, the information is 100% fluid and interoperable.
The other thing about SOA is that although you can continue to get incredible value out of your legacy systems, you really don't ever want to bring in another system where the information is locked to the application and platform bound software. The file formats of the future should always be openly structured and separate from applications and platforms. This means that a digital file format should be intelligent, containing all the information about itself that other applications, or future applications, would need to know for that information to be usefully rendered and re-purposed. The only way a file format can truly be application independent is to have that metadata remain with the file, completely cutting the application cord. I mention this because it goes to the heart of perhaps the most important distinction between OpenDocument XML and MS XML.
MP: Give us an example, say from your wife's real estate business, or from your own business.
GE: Well, there are some legal limits to just how much i can disclose about the Comcast SOA project. So let's play it safe and talk real estate. I saw some problems recently with MS XML that really discloses everything you need to know about where Microsoft wants to take you. It's not pretty. The background for this is that the transaction process in the real estate industry is very complex, time sensitive, and involves many players with many disparate back end information systems.
Every real estate transaction involves somewhere upwards of 12 industry professions, each with their own information system. You've got two real estate agents, two brokerages, inspectors, mortgage providers, appraisers, title companies, insurance companies, county recorders' offices, law offices, and the list goes on. The way they conduct a transaction today is the way they've always done it; with a massive paper exchange at the closing table where everyone signs off on each document. The transaction process is done in paper, because that's the only way that they can exchange information independent of each computer system. Later on the documents are input by hand, and the various systems updated.
With XML, they will be able to conduct this transaction process electronically using a common XML layer that all the back end systems can read and contribute back to. It's a two-way street with XML. You send it out in XML, and you transform it back again after the information has made its way through the transaction process that is shared with all of the other individuals. The common layer is where the actual “transaction” would take place.
The OpenStack Comcast sub-contract was interesting in that Comcast is a company that had built itself out of rapid acquisitions. During the dot com frenzy, they bought up local cable companies. At the end of that buying frenzy, period which lasted a little over five years, they had disparate black boxes all over the country. These inventory management and billing systems were all doing what they were designed to do, but nothing more. What we had to do was develop a system where a virtual representation of this information could be aggregated in as close to real time as we could get. From there an advanced sales and management interface could be built, as well as a yield management system.
Our approach was to set up an XML stream to each of those black boxes that would, on a 24 hour dump cycle, synchronize with a global representation we called the “XML Hub”. The Hub was a LAMP stack running Tomcat and XUL based web applications. [Editor's note: see the graphic below for a visual representation of this system].
Each one of those black boxes had a different vendor, and even different versions from the same vendors. That kind of disparity would kill any client -server approach. For us all that mattered was if we could write a two-way XML connector to it.
On the desktop, Comcast had a mash of mostly Windows 98 and Win2K, with some WinXP. When you're working with SOA though, it's important that all the pieces, including the desktops, can work XML and connect through XML to the Hub. Critical to the Comcast sales force was PowerPoint presentations and Excel spreadsheets and forms (yeah, forms in Excel).
With OpenOffice.org and Mozilla on those desktops, we could get underneath the presentations and spreadsheets, and start binding information fields to web services and XML feeds from our Hub. We even did some nifty Jabber routing connecting all instances of certain spreadsheets together at the field level, so that changes would be reflected across the sales staff. Although we tried to get as much of the business process directed through our Tomcat portal as we could, all of the presentation stuff and some of the spreadsheet work just couldn't be fit into the browser space. So we automated the information feeds to those resources using XML web services and Jabber messaging. Some things have to go beyond the browser and into the larger realm of the desktop productivity environment.
MP: So Microsoft has recently claimed that the Open Document format would not work with their “legacy” documents, but your Comcast example certainly disproves that claim in a big fat hurry, doesn't it?
GE: When Microsoft talks about “legacy,” they're usually talking about the legacy of Microsoft Office 2000 and MS Office XP 2003. The truth is that MS Office has had a long history, but over the past 25 years, we've seen many versions of word processing and spread sheets and presentation systems other than Microsoft Office's history.
When the Open Document Technical Committee talks about legacy systems, we're talking about at least 30 years of legacy information systems that cross an incredible spectrum of information and file format types. Boeing is an excellent example, and ODF TC member Doug Alberg was a most important driver in the first 18 months of ODF TC work, a period I always refer to as the “universal transformation layer” period because interoperability with legacy information systems was our primary concern. So during that period the legacy needs of large publishing and content management systems like Stellent, Documentum, and Arbortext drove the specification work. It really had very little to do with the ideals of an application independent desktop productivity file format.
Enterprise publishing systems have to deal with 50 years of legacy data. Microsoft's consolidation is very young by comparison, having only to deal with the transition from MS Office 2000 to MS Office XP 2003.
Boeing is a great case in point. Doug Alberg had to make sense of so many different CAD systems, drawing systems, report systems and federally-mandated filing systems, there was no other way of dealing with the problem other than to come up with a common XML transformation layer. Many of these different information systems were long ago orphaned or outright abandoned by their vendors. Even though they're no longer supported, they're still on line, doing exactly what they were designed to do. Much of the world is like this, especially in governments. Which is why SOA is all the rage. You have older systems still on line, still doing what they were meant to do for some business process that's important. That legacy data needs to be brought into the information flow, where it's available to global Open Internet systems. XML can do that, and do it easily.
The first 18 months of the Open Document project were to perfect the Open Document XML as a transformation layer, where all of these legacy systems could be connected to the transformation layer. Once it's in the common transformation layer, then you can pick and choose which publishing and content management system you would want. You have much more choice. Indeed, many of these next gen systems are excellent for certain solutions, but lacking in other areas. Interestingly, Boeing used each one of the OpenDocument enterprise publication and management systems; Stellent, Documentum, and Arbortext. Imagine having to figure out how to connect all three of those systems to all of the legacy information structures that were still producing their intended functions. With a common transformation layer, you only need write your connectivity once.
So when Microsoft talks about legacy, they're talking about a comparative drop in the bucket to what Open Document can do.