I present three well-known probabilistic models of information retrieval in tutorial style: The binary independence probabilistic model, the language modeling approach, and Google’s page rank. Although all three models are based on probability theory, they are very different in nature. Each model seems well-suited for solving certain information retrieval problems, but not so useful for solving others. So, essentially each model solves part of a bigger puzzle, and a unified view on these models might be a first step towards an Information Retrieval Theory of Everything.
The paper is published in the news letter of the NVTI, the “Nederlandse Vereniging voor Theoretische Informatica”. A more extensive overview of information retrieval theory, covering eight models is given in: Djoerd Hiemstra, Information Retrieval Models. In: Ayse Goker and John Davies (eds.), Information Retrieval: Searching in the 21st Century, Wiley, 2009.