Here is a white paper from TotalView Technologies that describes the challenges of race conditions and other difficult bugs in parallel programming. It introduces a product called ReplayEngine.
Abstract
The hardest step in solving software bugs in a parallel programming environment centers on working backward from a software failure to the original program error. Conventional debugging techniques only allow users to control program execution in the forward direction, making it necessary to work against the grain and apply time-consuming methods to attempt to identify the problem. Reverse debugging technologies have the potential to greatly reduce the time required to identify and solve many of the most difficult bugs by adding the ability to replay parallel program execution.
This white paper explains the challenges presented by parallel debugging and the value of a reverse
capability. Learn about a unique new product that enables the developer to examine not just the current state of the program, but to follow its logic backward in execution time from the point of failure. This approach achieves significant productivity gains.


