Welcome to the course Information Retrieval. We will introduce some exciting new things in the course: This year's practical assignments are motivated by use cases of MyDataFactory, a company specialized in product data. The course uses the book “Introduction to Information Retrieval” by Christopher Manning, Prabhakar Raghavan and Hinrich Schütze. Have a look at the schedule on Blackboard under “Course Information” for an overview of the course first quarter of the course. In the second quarter, students will research a specific topic in depth. We hope to see you at the first lecture on Wednesday 2 September at 13.45h. in RA4334.
Theo Huibers, Dolf Trieschnigg and Djoerd Hiemstra.
More info at: http://blackboard.utwente.nl (access restricted)
How search logs can help improve future searches
In the European project Vitalas, we had the opportunity to analyze the search log data from a commercial picture portal of a European news agency, which offers access to photographic images to professional users. I will discuss how these logs can be used in various ways to improve image search: to expand the image representation, to make suggestions of alternative queries, to adapt the search results to user context, and to build automatically concept detectors for content-based image retrieval. I also present recent work on using the semantic information that has become publicly available in the form of linked data to improve the search log analysis. The results show that bringing in linked data gives insights beyond the more common term-based analysis, since queries related in the most frequent ways do not usually share terms. I conclude with a discussion of the implications of our findings for improving log analysis, image collection management, and search engine design.
The guest lecture takes place on 20 October 2010 at 13.45 h. in ZI-2126.
Automatically Analyzing Word of Mouth
Thijs Westerveld from Teezir B.V., Utrecht, will give a guest lecture on 6 October 2010 in ZI-2126. Teezir uses advanced search technology to aggregate views and opinions found on review sites, in discussion groups or blogs. This way, we create statistics and interpretations about what people are saying. Querying this data allows decision makers to slice and dice the content, and learn what people say, either at the very aggregated level: “what is the share of positive versus negative views about our new product?”, or at the very detailed level: “which sources reflect this negative sentiment, and what exactly are people saying?”
In this talk I will demonstrate Teezirâ€™s Opinion Analysis dashboards and discuss the underlying technology. For collecting content from web sites we developed advanced crawling technology that automatically identifies relevant news, blog and forum pages and extracts the relevant content and metadata. The collected content is then further analyzed to identify the main sentiments before everything is indexed to be disclosed in the online dashboards. Various sentiment analysis variants that have proven successful in an academic setting have been evaluated on our live collections. I will demonstrate that success on academic test collections does not necessarily imply the practical use of a sentiment analysis algorithm.
See also: Who rules?
All following lectures Information Retrieval wil be held in room ZI-2126. The lecture of 22 September is canceled to give you the opportunity to visit the Interactief Symposium Predict 2010. See you 29 September, or at Predict 2010!
More information on Blackboard.
Pavel Serdyukov from TU Delft will give a guest lecture for the course Information Retrieval
When: Wednesday, October 21, 2009
Title: Faceted and Expert Search in the Enterprise
Enterprise Search problems recently received a considerable amount of attention from academia, mainly due to the increasing demand in industrial solutions supporting various search tasks in intranets. In this lecture I will give the research perspective on two core aspects of search in the Enterprise: Faceted and Expert search. I will demonstrate typical search scenarios, visualization approaches and ranking techniques. In the first part, I will overview the ways to support faceted search in typical cases, from easiest to hardest: with the availability of structured or unstructured document metadata and with no document metadata available. In the second part, I will talk about the latest developments in expert finding, namely, language model and graph-based based methods. I will also show the ways to to acquire expertise evidence outside of the Enterprise.
Thijs Westerveld from Teezir will give a guest lecture for the course Information Retrieval
When: Wednesday, October 14, 2009
Title: Automatically Analyzing Word of Mouth And Focused Crawling
Teezir is a young and innovative technology company that develops and deploys comprehensive search solutions. Teezir lets companies take advantage of large and diverse amounts of documents or texts, using break through search technology. Teezir's search platform provides functionality for the entire process of disclosing data: from gathering content, analyzing documents and building indexes for efficient access to effective querying and ranking of information. Teezir's framework is based on full-text retrieval techniques.
The handout for the practical part of the course Information Retrieval has been added under Course Materials on Blackboard. Additionally, you will find two useful handouts there that help you to write your report and to insert citations in it.
Deadline to form pairs for the Information Retrieval Course Project is 30 September. Please send names and email addresses to the course staff. Groups will be numbered and listed (under Email) on Blackboard.
Many applications that handle information on the internet would be completely inadequate without the support of information retrieval technology. How would we find information on the world wide web if there were no web search engines? How would we manage our email without spam filtering? Much of the development of information retrieval technology, such as web search engines and spam filters, requires a combination of experimentation and theory. Experimentation and rigorous empirical testing are needed to keep up with increasing volumes of web pages and emails. Furthermore, experimentation and constant adaptation of technology is needed in practice to counteract the effects of people that deliberately try to manipulate the technology, such as email spammers. However, if experimentation is not guided by theory, engineering becomes trial and error. New problems and challenges for information retrieval come up constantly. They cannot possibly be solved by trial and error alone. So, what is the theory of information retrieval? There is not one convincing answer to this question. There are many theories, here called formal models, and each model is helpful for the development of some information retrieval tools, but not so helpful for the development others. In order to understand information retrieval, it is essential to learn about these retrieval models. In this chapter, some of the most important retrieval models are gathered and explained in a tutorial style.
The tutorial will be published in Ayse Goker and John Davies (eds.), Information Retrieval: Searching in the 21st Century, Wiley, 2009.
[download exercise solutions]
As of today, the Blackboard site of the Information Retrieval course will be gradually filled with information. You can now register for the course. The deadline for self-enrollment is not known at the time of writing this. So don't wait until the last moment!