by Peter Mandl and Udeepta Bordoloi, AMD
Today’s graphics processors are highly programmable, massively parallel compute engines. In this role, they are commonly called general purpose graphics processing units, or GPGPUs. You can program them with the open and standard based OpenCL framework, distributing compute chores to CPUs, GPUs, and DSPs to optimise a system’s overall performance.
What makes GPGPU computing so enticing is the availability of extreme floating point performance in cost effective GPUs. AMD’s top of the line GPU in the AMD Radeon™ HD 6970 desktop graphics card delivers 2.7 single precision TFLOPs (theoretical peak) at a retail cost of about $369. By leveraging economies of scale in the PC industry, GPUs continue to drive higher performance per watt and dollar every year. To benefit from these performance gains, embedded systems need to meet three more challenges: lower power consumption, open standards, and parallel algorithms.
Low energy consumption
In addition to excellent GFLOPs per dollar, GPGPU computing also delivers high performance per watt. Although the AMD Radeon HD 6970 card has a thermal design power (TDP) of 250W, many server based applications can handle that power to achieve more than 10GFLOPs/W. On the other hand, embedded applications typically have more modest thresholds for TDP. Embedded applications are faced with size, weight and power (SWaP) constraints. Portable ultrasound machines benefit from small size, yet demand high performance compute capabilities for real time imaging. GPGPU offers new compute capabilities within limited power budgets for telecom infrastructure. Many defence and aerospace applications (sonar, radar and video surveillance, for example) require high performance compute capabilities delivered in embedded form factors. To meet the growing demand for embedded GPGPU, the AMD Radeon E6760 embedded GPU delivers 16.5GFLOPs/W at a TDP of about 35W. At such a modest power consumption, the AMD Radeon E6760 GPU is suitable for all common embedded systems based on slot based rack mounts, such as PICMG 1.x, CompactPCI, VME, VPX, MicroTCA or AdvancedTCA.