Rob Farber writes about CUDA programming for Dr. Dobb’s. In Part 4 of this article series on CUDA, he discussed how the execution model and kernel launch execution configuration affects the number of registers and amount of local multiprocessor resources such as shared memory. In this installment, he writes about memory performance and the use of shared memory in reverseArray_multiblock_fast.cu.
Full Story
Part 4
Part 3
Part 2
Part 1


