By Dave Strenski and Brian Durwood
In configuring next generation large scale parallel processing arrays some teams are relying on “heterogeneous processing”. Basically a fifty-cent phrase describing a microprocessor with one or more on board co-processors for high-speed on-node processing, most typically GPU, FPGA, Cell, and/or DSP. While the debate continues about the right ratio of microprocessors to co-processors, most teams agree that the basic plumbing of memory management can be the real bottleneck. Today the only real solution is having the microprocessor and co-processors share memory on the node, and interconnecting many nodes with a GigE, Infiniband, or a custom interconnection, configuring the nodes in a distributed memory layout.
Enter the unintended consequence of scaling. Amdahl’s law says that as you add more processors, you get bogged down by more overhead. Basically the Nth guy you add to build a brick wall begins to slow things down because all the brick layers are reaching for bricks off the same pile, and get in each other’s way. Add another N brick layers and it just gets worse. So the idea is to compliment the original process (the first brick layer) with a co-processor that makes that brick layer more efficient (faster), independent of any other brick layer. Image a machine that hands the brick layer a pre-cemented brick, so all they need to do is place it. Or, there is always the old analogy:
“I know how to make 4 horses pull a cart - I don’t know how to make 1024 chickens do it.”
–Enrico Clementi



0 responses so far ↓
There are no comments yet...Kick things off by filling out the form below.
You must log in to post a comment.