Welcome to Distributed Data Processing using MapReduce
This will be a course that is on top of some very exciting developments in cloud computing and data centers, initiated by Google, and followed by many others such as Yahoo, Amazon, AOL, Baidu, Joost, Mylife, Facebook, etc., etc. The course is about processing terabytes of data on large clusters. But not only that, not many courses in the master’s Computer Science will be so “core computer science”: We will discuss new file systems (GFS and Hadoop FS), new programming paradigms (MapReduce), new programming languages and query languages (Sawzall, Pig Latin), and new Database paradigms (BigTable, Cassandra and Dynamo), and of course many web search and data mining applications that made Google one of today’s leading IT companies.
We hope to see you at our lectures on Friday’s 3/4 hour.
Robin Aly, Maarten Fokkinga, and Djoerd Hiemstra.