I will give several lectures on information retrieval modeling at the Russian Summer School in Information Retrieval, which will be held September 11-16, 2009 in Petrozavodsk, Russia. The main audience of the school is graduate and post-graduate students, young scientists and professionals who have experience in development of information retrieval applications. The school will host approximately 100 participants.
Information Retrieval Modeling
There is no such thing as a dominating model or theory of information retrieval, unlike the situation in for instance the area of databases where the relational model is the dominating database model. In information retrieval, some models work for some applications, whereas others work for other applications. For instance, vector space models are well-suited for similarity search and relevance feedback in many (also non-textual) situations if a good weighting function is available; the probabilistic retrieval model or naive Bayes model might be a good choice if examples of relevant and nonrelevant documents are available; Google's PageRank model is often used in situations that need modelling of more of less static relations between documents; region models have been designed to search in structured text; and language models are helpful in situations that require models of language similarity or document priors; In this tutorial, I carefully describe all these models by explaining the consequences of modelling assumptions. I address approaches based on statistical language models in great depth. After the course, students are able to choose a model of information retrieval that is adequate in new situations, and to apply the model in practice.
More information at RuSSIR 2009.