The MapReduce, Pig Latin and Cloud Computing assignments are graded. The final grades can be found in Blackboard's grade center. Please join the course evaluation session on 21 February in hal B 2C from 12.30 – 13.30 hour (including a free lunch).
Jimmy Lin will give a keynote lecture at the SIKS/BigGrid Big Data tutorial that preceeds the DBDBD on 30 November and 1 December 2011. Dr. Lin, who holds a PhD from MIT, is associate professor in the iSchool at the University of Maryland. He also has appointments in the Institute for Advanced Computer Studies (UMIACS) and the Department of Computer Science at Maryland. Lin works at the intersection of natural language processing (NLP) and information retrieval (IR), with a recent emphasis on scalable algorithm design and large-data issues. He directs the recently-formed Cloud Computing Center, an interdisciplinary group which explores the many aspects of cloud computing as it impacts technology, people, and society. He is also a member of both the Computational Linguistics and Information Processing Lab (CLIP) and the Human-Computer Interaction Lab (HCIL). Lin worked on Cloudera, which aims to bring Hadoop MapReduce to the enterprise, and is currently spending a sabbatical at Twitter
Solutions for Assignment 4 (Sawzall) and for Assignment 5 (HBase Schema) are now on Blackboard.
The solutions to Assignment 3 are now on-line in the Course Material Section on Blackboard. You need the solutions for Assignment 4, deadline next Friday, 10 December.
Next Monday, 6 December at 14.30 – 15.15h. in ZI-3126, there is a short meeting to discuss the solutions for Assignment 2 and 3. The solutions, which are helpful for Assignment 4, will also be put on Blackboard.
Next Tuesday, 7 December: the Hadoop Hackathon!
The grades for Assignment 2 are now on Blackboard's Grade Center. A correct solution for Assignment 2, which is needed for Assignment 3, can be found under “Course Materials” on Blackboard.
The grades for Assignment 1 are now on Blackboard's Grade Center. Please, send me an email as soon as possible, if you cannot find your grades, if you cannot find an explanation of your grade (including a per question result), or if you did not submit solutions at all for Assignment 1, but still want to participate in the course. Deadline for Assignment 2 is next Friday, 26 November.
The crash course Functional Programming, intended to be able to describe the word count program in a functional language, will be given by Maarten Fokkinga in room Zilverling, West 1, on Friday Nov 19, 13:45-15:30. We'll use programming language Amanda (one executable running under Windows), but to do the homework any other functional programming language, such as Haskell, may be used as well. A download for Amanda is given at the material for Assignment 2.
On December 7, SARA (the Dutch National High Performance Computing and e-Science Support Center) organizes a day-long hackathon to kick-off a Proof-of-Concept Hadoop service, and give the opportunity to experiment with Hadoop with support of experienced users. People who are interested can work with Hadoop on a case of choice, or only play with datasets like Wikipedia, the ENRON dataset, White House visitor records, Genome data or others.
Welcome to Distributed Data Processing using MapReduce
This will be a course that is on top of some very exciting developments in cloud computing and data centers, initiated by Google, and followed by many others such as Yahoo, Amazon, AOL, Baidu, Joost, Mylife, Facebook, etc., etc. The course is about processing terabytes of data on large clusters. But not only that, not many courses in the masterâ€™s Computer Science will be so “core computer science”: We will discuss new file systems (GFS and Hadoop FS), new programming paradigms (MapReduce), new programming languages and query languages (Sawzall, Pig Latin), and new Database paradigms (BigTable, Cassandra and Dynamo), and of course many web search and data mining applications that made Google one of todayâ€™s leading IT companies.
We hope to see you at our lectures on Fridayâ€™s 3/4 hour.
Robin Aly, Maarten Fokkinga, and Djoerd Hiemstra.