MapReduce book by Lin and Dyer

Data-Intensive Text Processing with MapReduce

An interesting book of by Jimmy Lin and Chris Dyer is forthcoming, in which they show how MapReduce can be used to solve large-scale text processing problems, including examples that use Expectation Maximization training.

This book is about MapReduce algorithm design, particularly for text processing applications. Although our presentation most closely follows implementations in the Hadoop open-source implementation of MapReduce, this book is explicitly not about Hadoop programming. We don't for example, discuss APIs, driver programs for composing jobs, command-line invocations for running jobs, etc.

See pre-prints of the book.