By Ryan Paul
The USPTO awarded search giant Google a software method patent that covers the principle of distributed MapReduce, a strategy for parallel processing that is used by the search giant. If Google chooses to aggressively enforce the patent, it could have significant implications for some open source software projects that use the technique, including the Apache Foundation’s popular Hadoop software framework.
“Map” and “reduce” are functional programming primitives that have been used in software development for decades. A “map” operation allows you to apply a function to every item in a sequence, returning a sequence of equal size with the processed values. A “reduce” operation, also called “fold,” accumulates the contents of a sequence into a single return value by performing a function that combines each item in the sequence with the return value of the previous iteration.
Google’s MapReduce framework is roughly based on those concepts. A series of data elements is processed in a map operation, then combined at the end with a reduce operation to produce the finished output. The advantage of partitioning a workload this way is that it’s extremely conducive to parallelization. Each discrete unit of data in the series can be processed individually and combined at the end, making it possible to spread the workload across multiple processors or computers. It’s a fairly elegant approach to scalable concurrency, one that offers efficiency regardless of whether your environment is a single multicore processor or a massive grid in a data center.


