Welcome to Foundations of Information Retrieval

Welcome to the course Foundations of Information Retrieval, a new 5 credit course that is based on the first part of last year’s 10 credit course Information Retrieval. We will introduce some exciting new things in the course: This year’s practical assignments are motivated by use cases of the Text Retrieval Conference’ Genomics track. We will use Elasticsearch, one of today’s most used, and most popular open source scalable search systems. The practical assignments use Jupyter notebooks. We hope to see you at the first lecture on Wednesday 5 September at 10:45h.

Check out the Canvas syllabus

Welcome to Information Retrieval

Welcome to the course Information Retrieval. We will introduce some exciting new things in the course: This year's practical assignments are motivated by use cases of MyDataFactory, a company specialized in product data. The course uses the book “Introduction to Information Retrieval” by Christopher Manning, Prabhakar Raghavan and Hinrich Schütze. Have a look at the schedule on Blackboard under “Course Information” for an overview of the course first quarter of the course. In the second quarter, students will research a specific topic in depth. We hope to see you at the first lecture on Wednesday 2 September at 13.45h. in RA4334.

Theo Huibers, Dolf Trieschnigg and Djoerd Hiemstra.

More info at: http://blackboard.utwente.nl (access restricted)

Guest lecture by Arjen de Vries

How search logs can help improve future searches

In the European project Vitalas, we had the opportunity to analyze the search log data from a commercial picture portal of a European news agency, which offers access to photographic images to professional users. I will discuss how these logs can be used in various ways to improve image search: to expand the image representation, to make suggestions of alternative queries, to adapt the search results to user context, and to build automatically concept detectors for content-based image retrieval. I also present recent work on using the semantic information that has become publicly available in the form of linked data to improve the search log analysis. The results show that bringing in linked data gives insights beyond the more common term-based analysis, since queries related in the most frequent ways do not usually share terms. I conclude with a discussion of the implications of our findings for improving log analysis, image collection management, and search engine design.

The guest lecture takes place on 20 October 2010 at 13.45 h. in ZI-2126.

Guest lecture by Thijs Westerveld

Automatically Analyzing Word of Mouth

Thijs Westerveld from Teezir B.V., Utrecht, will give a guest lecture on 6 October 2010 in ZI-2126. Teezir uses advanced search technology to aggregate views and opinions found on review sites, in discussion groups or blogs. This way, we create statistics and interpretations about what people are saying. Querying this data allows decision makers to slice and dice the content, and learn what people say, either at the very aggregated level: “what is the share of positive versus negative views about our new product?”, or at the very detailed level: “which sources reflect this negative sentiment, and what exactly are people saying?”

Who Rules ruler In this talk I will demonstrate Teezir’s Opinion Analysis dashboards and discuss the underlying technology. For collecting content from web sites we developed advanced crawling technology that automatically identifies relevant news, blog and forum pages and extracts the relevant content and metadata. The collected content is then further analyzed to identify the main sentiments before everything is indexed to be disclosed in the online dashboards. Various sentiment analysis variants that have proven successful in an academic setting have been evaluated on our live collections. I will demonstrate that success on academic test collections does not necessarily imply the practical use of a sentiment analysis algorithm.

See also: Who rules?

Guest lecture by Pavel Serdyukov

Pavel Serdyukov from TU Delft will give a guest lecture for the course Information Retrieval

When: Wednesday, October 21, 2009
Where: HO-B1212
Title: Faceted and Expert Search in the Enterprise


Enterprise Search problems recently received a considerable amount of attention from academia, mainly due to the increasing demand in industrial solutions supporting various search tasks in intranets. In this lecture I will give the research perspective on two core aspects of search in the Enterprise: Faceted and Expert search. I will demonstrate typical search scenarios, visualization approaches and ranking techniques. In the first part, I will overview the ways to support faceted search in typical cases, from easiest to hardest: with the availability of structured or unstructured document metadata and with no document metadata available. In the second part, I will talk about the latest developments in expert finding, namely, language model and graph-based based methods. I will also show the ways to to acquire expertise evidence outside of the Enterprise.

Guest lecture by Thijs Westerveld

Thijs Westerveld from Teezir will give a guest lecture for the course Information Retrieval

When: Wednesday, October 14, 2009
Where: HO-B1212
Title: Automatically Analyzing Word of Mouth And Focused Crawling

Teezir is a young and innovative technology company that develops and deploys comprehensive search solutions. Teezir lets companies take advantage of large and diverse amounts of documents or texts, using break through search technology. Teezir's search platform provides functionality for the entire process of disclosing data: from gathering content, analyzing documents and building indexes for efficient access to effective querying and ranking of information. Teezir's framework is based on full-text retrieval techniques.

Information Retrieval Models Tutorial

Many applications that handle information on the internet would be completely inadequate without the support of information retrieval technology. How would we find information on the world wide web if there were no web search engines? How would we manage our email without spam filtering? Much of the development of information retrieval technology, such as web search engines and spam filters, requires a combination of experimentation and theory. Experimentation and rigorous empirical testing are needed to keep up with increasing volumes of web pages and emails. Furthermore, experimentation and constant adaptation of technology is needed in practice to counteract the effects of people that deliberately try to manipulate the technology, such as email spammers. However, if experimentation is not guided by theory, engineering becomes trial and error. New problems and challenges for information retrieval come up constantly. They cannot possibly be solved by trial and error alone. So, what is the theory of information retrieval? There is not one convincing answer to this question. There are many theories, here called formal models, and each model is helpful for the development of some information retrieval tools, but not so helpful for the development others. In order to understand information retrieval, it is essential to learn about these retrieval models. In this chapter, some of the most important retrieval models are gathered and explained in a tutorial style.

The tutorial will be published in Ayse Goker and John Davies (eds.), Information Retrieval: Searching in the 21st Century, Wiley, 2009.

[download draft]

[download exercise solutions]