In late 2004, Google surprised the world of computing with the release of the paper MapReduce: Simplified Data Processing on Large Clusters. That paper ushered in a new model for data processing across clusters of machines that had the benefit of being simple to understand and incredibly flexible. Once you adopt a MapReduce way of thinking, dozens of previously difficult or long-running tasks suddenly start to seem approachable–if you have sufficient hardware.
Introduction This document describes how Map and Reduce operations are carried out in Hadoop. If you are not familiar with the Google [WWW] MapReduce programming model you should get acquainted with it first.