As the industry absorbs the implications of the Google v. Oracle ruling, which essentially put proprietary APIs on the same footing as standardized ones, understanding the differences between standardized and proprietary APIs will become more important. The differences between closed and open source software have been well-plumbed; the differences between proprietary and standardized APIs, less so.
I’m a veteran of two API competitions that shed light on the tensions between the philosophies of proprietary and standardized APIs and the way they intersect with market realities: OpenGL/Direct3D and CUDA/OpenCL[1]. I’ve always been on the proprietary side of these competitions: Direct3D is proprietary to Microsoft Windows, and CUDA is proprietary to NVIDIA.
This blog entry will discuss the OpenGL/Direct3D history from my (admittedly incurably biased) standpoint as the former development lead for Direct3D.
The story does not begin with Direct3D, but OpenGL: In 1994, when I hired into Microsoft, my first project was to port the flagship Softimage application from Silicon Graphics (SGI) workstations to Windows NT, a feat that was made possible because both platforms could run OpenGL. Both Microsoft and SGI had been founding members of the OpenGL Architectural Review Board. OpenGL treated the API design as a commons: hardware suppliers collaborated on a standard, recognizing the value of a standardized API for developers to write their applications, then competed for market share on price, performance, and feature set. Since SGI had achieved early market success in selling workstations that were specifically designed to run graphical applications, the decision to work with its competitors to develop OpenGL was, essentially, altruistic[2].
At Microsoft, we were surprised at SGI’s decision. SGI had a market-leading position in the graphics workstation business, and applications written for their workstations used a proprietary API called IRIS GL. IRIS GL was not particularly well-designed, but IRIS GL applications were heavily dependent on the API. From our porting work on Softimage, we knew it would have been prohibitively expensive to modify the Softimage application to run on a different API; in fact, our porting strategy involved wrapping IRIS GL in an emulator that made the Softimage application believe, for all intents and purposes, that it was running on an SGI workstation[3].
Windows itself was a set of proprietary APIs that Microsoft was evolving quickly, to address new markets like workstations and servers. In 1992, the first year that Microsoft’s new Windows NT operating system shipped, Microsoft had become a founding member of the OpenGL ARB because they had an eye on the graphics workstation market. Although proprietary to Microsoft, Windows NT had the advantage of running on workstations from many different vendors. Unlike traditional workstation vendors, who built complete systems from hardware to operating system software to APIs, Microsoft continued the model it had perfected in the PC market, partnering with hardware vendors to build computers and licensing Windows to run on them. As a result, Microsoft was able to offer a more diverse set of products[4], from inexpensive, lower-performance workstations to high-end workstations with multiple CPUs, with the added benefit that the different hardware vendors also were competing on price. When Microsoft bought Softimage, the software cost $30,000 and ran on an $50,000 workstation. But when Softimage for Windows NT launched, Intergraph bundled the software and its high-end, dual-processor workstation hardware together for $25,000. SGI was no match for the PC workstation industry; its market capitalization collapsed from $7B in 1995 to $120M in 2005, when its stock was delisted[5].
For Microsoft, in the mid-1990s, OpenGL presented both an opportunity in the workstation business and a challenge in the gaming industry. The video card market for Windows computers was in rapid flux, as companies like S3, ATI, Trident, and others competed to deliver higher-performance Windows graphics cards that implemented operations like line- and rectangle-drawing. Furthermore, dozens of startups (such as 3Dfx and Rendition) were building hardware to address not only that market, but also 3D rendering – but unlike the workstation vendors who had co-founded the OpenGL ARB in 1992, these companies were focusing on the PC gaming business.
Gaming on Windows needed especial attention. A program manager at Microsoft named Eric Engstrom identified games (such as DOOM, id Software’s then-biggest commercial success) as the principal reason that Windows customers still needed to run DOS applications. Since continuing to support DOS was very difficult and presented an ongoing support burden to Microsoft, Eric got significant funding to develop a family of proprietary Windows APIs for gaming called DirectX. DirectInput would control devices like mice and keyboards (which, for games, look very different than for office productivity applications); DirectSound would control the audio hardware; DirectDraw enabled developers to control display hardware; and Direct3D would work with DirectDraw to enable 3D rendering[6].
Microsoft’s proprietary Direct3D, which only worked on Windows, therefore competed with Microsoft’s standardized OpenGL, which was portable to platforms other than Windows. Since the DirectX team reported through Windows 95 Vice President Brad Silverberg and the OpenGL team reported through Windows NT Vice President Jim Allchin[7], this set the stage for a political battle within Microsoft that would strain relationships, undermine Microsoft’s public messaging, and change the course of many developers’ careers, including mine. As the inevitable restructuring of Windows 95 assets into the Windows NT organization unfolded, Microsoft’s OpenGL team began openly lobbying our common management structure to discontinue Direct3D development and adopt OpenGL as the standard API. The Direct3D team, for which I was the development lead, was vulnerable to this line of argument, not only because we were wracked by reorganization issues (Eric Engstrom, anticipating that there would never be a home for him in the Windows NT organization, went to work on a browser-related project called Chrome and took most of the DirectX team with him), but because we were busy rehabilitating our API to be more user-friendly. It did not help that prominent figures in the game development community, like John Carmack and Chris Hecker, joined in public calls for Microsoft to drop Direct3D and instead use OpenGL.
Over the course of the next few months, this debate with my own Microsoft colleagues prompted a deep interrogation of the tradeoffs between proprietary and standardized APIs. OpenGL was a more palatable API, having emerged from a collaborative design effort involving multiple companies; but it seemed to me that the design deficiencies in Direct3D could be addressed simply by improving on its design. Furthermore, the recently completed Softimage port had required intensive use of OpenGL, so I had an excellent understanding of my competition.
Most of the differences were superficial. OpenGL was chattier (i.e., required more API calls to accomplish the same operation) and more stateful (i.e., kept more state in its context from which API methods could make inferences). OpenGL was more strongly-typed, meaning the elements of the state vector for a drawing context had explicit types and methods that operated on those types. Direct3D used the Common Object Model (COM), while OpenGL used a more familiar handle-based API. These superficial differences were a simple by-product of the environments where the APIs were developed.
But the APIs also included more substantive differences underpinned by real software engineering and business tradeoffs, like the way OpenGL required each hardware developer to implement the entire API[8], and emulate any functionality missing from the hardware. Direct3D, in contrast, provisionally allowed partial hardware support and did not require emulation of missing functionality[9]. OpenGL also surfaced fewer implementation details to its clients, with the idea that hardware vendors would be better able to optimize the application if the developer delegated as many tasks as possible – an explicit rejection of the “Direct” in “DirectX.” Finally, OpenGL “extensions” allowed any OpenGL implementor to define new interfaces; in turn, developers could query for these extensions and use them when available[10].
Many of these differences were informed by the differing objectives of the sponsoring organizations. Hardware vendors, who collaborated on OpenGL’s design, wanted the maximum degree of control over the implementation so they could differentiate their products through software as well as hardware. Microsoft just wanted to build a stable platform with as many hardware vendors as possible.
My Direct3D code base had structural advantages from a software engineering standpoint. A key difference between Direct3D and OpenGL was the amount of common code being executed by their client applications. Because OpenGL implementors were responsible for the entire API, there was a missed opportunity to consolidate the engineering and test resources to write and validate code that any OpenGL implementor would have to build, like memory allocators. In contrast, Direct3D defined an internal interface called the Hardware Abstraction Layer (HAL) that hardware vendors could code against. For Microsoft, the goal was to enable any hardware vendor to enter the Direct3D market by writing only code for their specific hardware, not an entire OpenGL implementation. In turn, a great deal of code written by Microsoft could be interposed between the application and the HAL, resulting in less variability across hardware from different vendors[11].
Also, with the HAL, Microsoft could ship new API features without the hardware vendors doing any additional work – in fact, that is exactly what we did with the first release of the DrawPrimitive API. When we first shipped DrawPrimitive for Christmas 1997 games, it was able to run on dozens of millions of PCs that already had sold into the PC marketplace.
The amount of competition, and the pace of innovation in this space, cannot be overstated. The capabilities of graphics chips were expanding so quickly that, as API designers, we had trouble keeping pace. Between 1997 and 2001, Microsoft released four major versions of Direct3D as transistor budgets exploded from 130,000 (for the best-selling S3 ViRGE chipset, c. 1997) to some 57 million (for NVIDIA’s GeForce 3 chip, c. 2001)[12]. Direct3D 5.0 introduced the DrawPrimitive API; Direct3D 6.0 introduced support for multiple textures, optimized geometry processing (written by Intel and AMD for their new instruction sets SSE and 3DNow!, respectively), automatic texture management, stencil buffers, texture compression, and numerous other features; Direct3D 7.0 introduced geometry acceleration; and Direct3D 8.0 introduced vertex and pixel shaders. Over the four-year span, the hardware had transitioned from simple 2.5-D triangle rasterizers attached to VRAM, to full-blown parallel processors. I did not, and do not, believe that the OpenGL ARB’s deliberative, slow-paced process could keep up with the pace of innovation we were seeing in the PC graphics space.
There were attempts to reconcile the two warring camps. In 1998, Jay Torborg began the Fahrenheit project, partnering with SGI and HP to create yet another graphics API[13]. The stated objective for Fahrenheit was to deliver replacement APIs for both the low-level “immediate mode” Direct3D and OpenGL APIs and the higher-level scene graph API. But the deal took months to negotiate – valuable time that the DirectX team spent working on their code – and suffered a setback because Microsoft’s Legal Department had not been included in the negotiations. Once the terms had been finalized and Jay brought the agreement to Microsoft Legal, he was told it could not be approved because it was anticompetitive. The agreement he’d hammered out with SGI held that Microsoft would only target game applications with Direct3D and leave workstation applications to OpenGL, an anticompetitive practice known in antitrust law as “dividing the market.” At the time, the United States v. Microsoft antitrust litigation was in full swing, so the Legal Department was exceptionally sensitive to the prospect of inviting further antitrust liability.
Over those four years (1998-2001), industry consolidation reduced the number of players to a handful. By the mid-aughts, NVIDIA and ATI had become a de facto duopoly in the discrete graphics chip market (chips designed to be installed on graphics cards that get plugged into the bus), and, after AMD bought ATI in 2006, Intel and AMD quickly become a de facto duopoly in the integrated graphics chip market (with graphics functionality on the same integrated circuit as the CPU). The hardware innovations also greatly reduced the amount of common code that Microsoft could interpose between the application and the driver. In 1998, it made sense for Microsoft’s runtime to execute the geometry pipeline on the three variants of x86 instruction set then available (x87, 3DNow!, SSE); by 2001, that functionality had been subsumed into hardware. With the addition of vertex and pixel shaders, Microsoft could deliver some value in applying compiler optimizations to the shaders that developers had written; but only the hardware vendors could translate that intermediate code into instructions for their graphics chips. So as the market consolidated, the relative sizes of the engineering investments by Microsoft and the graphics chip vendors shifted in favor of the latter.
One of the side effects of this market consolidation was that the barrier to entry for an OpenGL implementation stopped being prohibitive for the market participants. None of the players were tiny startups anymore; all of them could afford to develop formidable OpenGL implementations in addition to the Direct3D driver development they were doing for Windows. And, as it happens, the natural market divide between OpenGL and gaming applications – the one that Jay got in trouble with Microsoft Legal for trying to codify in the Fahrenheit partnership – was deep enough that both APIs thrived in their respective domains. Gaming developers tended to have more nimble code bases, and did not mind that DirectX only ran on Windows; developers of workstation applications valued OpenGL’s portability to non-Windows platforms, and did not necessarily require that the API formally support the latest hardware innovations. OpenGL hardware developers could use the extensions mechanism to add features, including standardized versions of DirectX features, before the ARB formalized support in a new version of the API. OpenGL application developers could accommodate the differences between their target vendors’ implementations.
Established in the early 2000s, the détente has lasted to the present day, with Direct3D the preferred API for games on Windows (and the Xbox, once it became available) and OpenGL the preferred API for workstation applications. Direct3D remains a proprietary API, wholly owned and managed by Microsoft for Microsoft platforms, and OpenGL remains a standardized API, periodically revised by its ARB, with an extensions mechanism, and implemented in its entirety by each vendor. One recent development of interest is that in 2018, Apple transitioned from standardized APIs (OpenGL and OpenCL for graphics and GPU computing, respectively) to its proprietary Metal API.
I personally had moved on to work on GPU computing technologies even before the OpenGL-Direct3D détente had become well-established. By the early 2000s, it was clear to some industry observers that GPUs were going to be an important computational resource. The transistor counts in NVIDIA’s GPUs had crossed over Intel’s transistor counts for CPUs in 1998 at 8M (the NVIDIA RIVA TNT and Intel Pentium 2, respectively) and never looked back[14]. A whole class of applications called GPGPU (“general-purpose computing on GPUs”) had been developed that involved writing OpenGL or Direct3D code that performed parallel computing tasks instead of rendering 3D scenes. The limitations of the earliest GPU hardware actually prompted some enterprising GPGPU developers to devise ways to perform integer computations with floating point hardware! After a stint in Microsoft Research, I left Microsoft for NVIDIA in 2002, having concluded that it would not be possible to build a GPU computing platform at Microsoft[15]. Finally, in early 2005, NVIDIA began work in earnest on CUDA.
[1] We could just as easily say CUDA/OpenACC, but for purposes of this discussion, we’ll refer only to one of the competitors to CUDA.
[2] While at Microsoft, my supervisor Jay Torborg, manager of the Talisman project and previously the co-founder of graphics startup Raster Technologies, claimed that he was the one to persuade Kurt Akeley, the SGI co-founder, to support OpenGL.
[3] It’s interesting to compare and contrast this porting strategy and the implementation strategy adopted by Google for the Java SE runtime that is the subject of the Google v. Oracle case. We modified Softimage to translate IRIS GL calls into OpenGL calls and, when necessary, native Windows API calls, to implement the IRIS GL calls. Google’s approach was designed to ensure that Java applications would not need to be modified so extensively in order to run on their new platform.
[4] In fact, Windows NT originally ran on different CPU architectures! Our Softimage port may have been the only commercial application shipped by Microsoft that ran on all three CPUs supported by Windows NT: Intel, MIPS and DEC Alpha. During the 18 months or so it took to port the application, Intel leapfrogged MIPS to go from the value play (lowest price, lowest performance) to the less expensive, second-highest-performance play. Alpha support was cancelled before Intel’s CPUs could overtake them in performance.
[5] Notably, Netscape co-founder Jim Clark, who was instrumental in getting the Justice Department to pick up the antitrust investigation into Microsoft that had been abandoned by the FTC, also was a co-founder of Silicon Graphics. As someone who had at least two lunches eaten by Microsoft, it would be an understatement to say that he had an axe to grind.
[6] The prefix “Direct” stems from Eric’s observation that game developers all wanted “direct control” over the hardware. They believed performance was the main predictor of market success, and they did not trust Microsoft or anyone else to get between them and the hardware. In fact, it was difficult to convince game developers to allow hardware to perform 3D rendering tasks on behalf of their applications.
[7] Believe it or not, the organizational divisions were deep enough that Windows NT had little to no code in common with Windows 95. One could argue Microsoft built a clean-room version of their own operating system! In fairness, when Dave Cutler and his team originally came to Microsoft in 1987, they intended to build a microkernel operating system that was agnostic in its support for Windows, OS/2, or UNIX. It was after a few years of development that Microsoft publicly split with IBM and put all its chips in the Windows NT box. See e.g. https://www.nytimes.com/1991/07/27/business/microsoft-widens-its-split-with-ibm-over-software.html
[8] The term of art for this API architecture is “installable client driver” (ICD). The host operating system provides only a minimal interface to its windowing system, where the pixels rendered by the OpenGL implementation would appear. Everything relating to OpenGL – not just hardware-specific code, but also potentially general-purpose code like memory allocators and math library functions – must be implemented in the ICD.
[9] Microsoft never would have allowed hardware developers to write their own software rasterizers – we would’ve written the emulation code ourselves. In fact, a stillborn OpenGL driver model from Microsoft called the Mini Client Driver (MCD) was the OpenGL team’s attempt to interpose a HAL between the API and its drivers, but it achieved limited adoption because drivers that used MCD were so much slower than ICDs.
[10] Since Direct3D was being revised so frequently, we considered an API revision to be the extensions mechanism. A recurring theme of “OpenGL v. Direct3D” conversations of the day was for OpenGL advocates to say that OpenGL had a feature if it was available on any OpenGL implementation, via the extensions mechanism; Direct3D advocates would counter that a feature didn’t count unless it was officially available in the flagship API.
[11] A sad truth about API implementation that may be the subject of a future blog: applications depend on the behavior of an API implementation, not its adherence to the API specification. Too often, that means fixing bugs breaks backward compatibility. It has taken decades for the software industry to learn how to build APIs that evolve seamlessly, while not breaking incumbent applications.
[12] While at Microsoft Research, I built an early GPU computing application that used Direct3D card to compute Hough transforms for line detection. That was when I realized GPU computing wasn’t going to happen at Microsoft – I’d have to go somewhere with both hardware and software expertise to do that. Somewhere like NVIDIA!
[13] It’s just now that I am connecting the dots that Jay also claimed to be the one who convinced Kurt Akeley to support OpenGL in the early 1990s.
[14] By 2001, Intel’s Pentium 4 “Willamette” processor had 42M transistors, but NVIDIA’s GeForce3 had about 60M. In 2013, Intel’s first Haswell processors had about 1.4B transistors to 3.5B transistors for NVIDIA’s Kepler chip. And that was not even NVIDIA’s “win” chip for the Kepler architecture! The GK110 processor (2014) had 8B transistors.
[15] Microsoft did not yet have the requisite hardware design expertise. With the hardware team behind the Xbox, that may have changed now.