Map Reduce

The idea of Map Reduce is very simple. It is a method to split up a complicated problem into smaller work packets that can be solved on a distributed system.

There are two main parts to Map Reduce, the Mapping function and the Reducing function.

The Mapping function is used to split up your dataset into manageable chunks. It is performed by the master node in the system. The Reducing function will be run on each of your worker nodes. It takes a mapped set of data, processes it, and returns the results to the master node. The master will then collate all of the reduced values and return the final results.

There are some further complexities to Map Reduce implementations, namely how the data is actually sent to a worker, how the results are returned and how the scheduling is managed. This is basically what a system like Hadoop will manage for you, so you can concentrate on the details of your Mapping and Reducing.

The canonical example is to produce a count of words in a document. The input to your mapping function is a string containing the text of the document. This will be split into words by the mapper, and a list of key value pairs identifying each word will be sent to the worker nodes. The reduce function will simply count the values provided, and return the results.

"A simple string with a repeated word"

Map function

function map(string mapInput)
  foreach(var word in input.ToLower().Split(" "))

function reduce(string[] reduceInput)

This is rough psuedo-code showing what you’d need to implement, it will not work exactly as written, you will need to customise for your actual implementation.

We’d expect from this to see the following results:

a 2
repeated 1
simple 1
string 1
with 1
word 1

From this simple example, you should be able to see how we can expand to cope with much more complicated and interesting problems.


Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s