You are currently browsing the tag archive for the ‘google’ tag.
As an autodidact in some areas I frequently run across things I probably should understand and know about but don’t. Since today in an interview with Google I was asked about MapReduce and said, “Err, I couldn’t really explain that,” which is sad because a couple years ago I was having dinner with a Google Engineer in Mountain View and he was raving about MapReduce and I didn’t quite understand what it was then either. So I thought I’d start my new objective of explaining complicated topics that the brain trust at Google (or other places) works on in terms that everyone can understand. Here we go.
The first thing to understand is that there are multiple MapReduces. There is a patented version by Google that is *the* MapReduce, and a bunch of other implementations, the most widely known probably Hadoop. What does it do? Well, simply speaking it is a way to process a massive amount of data. Rather than having it processed on a single machine, you can process it in parallel. The reason it is called MapReduce is that the first step is to divide up all of the work among the different machines (map), and then to figure out what the answer is given the results from all of these processes (reduce). The names of these functions come out of functional programming.
Obviously this is a core part of Google’s proprietary tech, since they process massive amounts of data to build their search index. Is is usually used with unstructured data. Of course, this may not matter to you much depending on what you are doing, since most of the time (almost certainly) you want structured data rather than unstructured data when you can get it. That said, if you’ve got a bunch of machines, an enormous dataset, and something you want to find out from the dataset, MapReduce may be for you!
If you are already in “The Cloud” or promoting “The Cloud” you may not need this post (but may want to skip to the end for the comments on how to explain the “The Cloud” to those not yet in the stratosphere).

What is “The Cloud” ? Most people have heard of it. Most people seem to be somewhat confused as to what it is. Is it simply the internet touched up with some marketing fluff? Some grand new innovation encompassing technology, too new and cool to fully explain? Or simply a way to cut down on costs by putting all of your precious data on other people’s servers?
I think with this, as with other terms that are frequently used for marketing purposes, it is helpful to examine the history and evolution of the term. In this case, my presentation will be somewhat speculative since I’m not certain about all of the historical aspects, nor do the early inventors in the corporate world seem likely to want to help document it. That said, let’s begin.
“The Cloud” includes services such as Gmail. However, as far as I can tell it only began to be used when companies (like one of my favorites, Appirio), were replacing internal (e.g. Exchange) servers with “Cloud” services like Gmail. What do I take from this? “The Cloud” did not exist when individuals chose to store much of their information on servers that they did not control. “The Cloud” came into existence when companies with often large numbers of internal servers dedicated to various purposes began trusting other companies enough to put their data on servers operated by these other companies.
Of course, this falls into various categories, commonly described as Platform as a Service, Infrastructure as a Service, Software as a Service. In a certain sense, each is fairly straight forward, although certain categories overlap with other services that existed before “The Cloud.” For example,
“Software as a Service” is not really any different from a web application. In fact, after a fair bit of time the only difference I’ve figured out is that “Software as a Service” emphasizes the possibility of integration with a data and infrastructural layer that exists across distinct software instances.
Contenders in this area which have entries at multiple layers are Google, Salesforce, Amazon, Microsoft, Apple, and (to a lesser extent) Engine Yard — although it is also worth mentioning the many companies (and non-profits, universities, etc.) which maintain expensive internal servers and are still waiting to move over to the “Cloud” due to concerns about reliability, being locked-in, etc.
How to explain this shift to less computer literate friends or co-workers? Maybe the best way is “The Cloud is companies putting information in the internet.” If they have a bit more time, perhaps show them a video like this one (sometimes a bit of “fluff” is necessary when you have clouds floating around).
