Using a GPU for computational workloads is not a new concept. The first work in this area dates back to academic research in 2003, but it took the advent of unified shaders in the DX10 generation for GPU computing to be a plausible future. Around that time, Nvidia and ATI began releasing proprietary compute APIs for their graphics processors, and a number of companies were working on tools to leverage GPUs and other alternative architectures. The landscape back then was incredibly fragmented and almost every option required a proprietary solution – either software, hardware or both. Some of the engineers at Apple looked at the situation and decided that GPU computing had potential – but they wanted a standard API that would let them write code and run on many different hardware platforms. It was clear that Microsoft would eventually create one for Windows (ultimately DirectCompute), but what about Linux, and OS X? Thus an internal project was born, that would eventually become OpenCL.
The goals for OpenCL are deceptively simple: a cross-platform API and ecosystem for applications to take advantage of heterogeneous computing resources for parallel applications. The name also makes it clear – that OpenCL is the compute analogue of OpenGL and is intended to fill a similar role. While GPUs were explicitly targeted, a number of other devices have considerable potential, but lack a suitable programming model, including IBM’s Cell processor and various FPGAs. Multi-core CPUs are also candidates for OpenCL, especially given the difficultly inherent in parallel programming models, with the added benefit of integration with other devices.
OpenCL has a broad and inclusive approach to parallelism, both in software and hardware. The initial incarnations focus on data parallel programming models, partially because of the existing work in the area. However, task level parallelism is certainly anticipated and on the road map. In fact, one of the most interesting areas will be the interplay between the two.
The cross-platform aspect ensures that applications will be portable between different hardware platforms, from a functionality and correctness stand point. Performance will naturally vary across platforms and vendors, and improve over time as hardware evolves to exploit ever more parallelism. This means that OpenCL embraces multiple cores and vectorization as equally valid approaches and enables software to readily exploit both.
OpenCL is a C-like language, but with a number of restrictions to improve parallel execution (e.g. no recursion and limited pointers). For most implementations, the compiler back-end is based on LLVM, an open-source project out of UIUC. LLVM was a natural choice, as it is extensively used within Apple. It has a more permissive license than the GNU suite and many of the key contributors are employed with Apple.
More links on OpenCL Intro: Parallel Programming Tutorial Series - Part 9 - OpenCL