2008 – Page 6 – Djoerd Hiemstra

Dutch-Belgian IR workshop: please smile

The participants of the Dutch-Belgian Information Retrieval Workshop in Maastricht smile for a team photo: With organiser Ed Hoenkamp left in the middle, and Netherlands most famous IR researcher Keith van Rijsbergen right in the back.

[More info]

An Image Retrieval Interface for Children

by Sander Bockting, Matthijs Ooms, Djoerd Hiemstra, Paul van der Vet, and Theo Huibers

Studies on information retrieval for children are not yet common. As young children possess a limited vocabulary and limited intellectual power, they may experience more difficulty in fulfilling their information need than adults. This paper presents an image retrieval user interface that is specifically designed for children. The interface uses relevance feedback and has been evaluated by letting children perform different search tasks. The tasks were performed using two interfaces; a more traditional interface – acting as a control interface – and the relevance feedback interface. One of the remarkable results of this study is that children did not favor relevance feedback controls over traditional navigational controls.

[download pdf]

Date of Kick-off XML & Databases 2 changed

The kick-off meeting of the course XML & Databases 2 is moved from Tuesday 15 April to Thursday 17 April (8:30 – 10.15) in LA 3522. In this course you will do a small piece of research concerning XML databases by contributing to one of the extensions of MonetDB/XQuery that are currently being developed at the Database Group of the University of Twente:

Full text search (PF/Tijah)
Probabilistic XML
Geographic XML (MonetDB/GIS)

Hope to see you on Thursday 17 April.

Jun Wang defends Ph.D. thesis on Collaborative Filtering

Relevance Models for Collaborative Filtering

by Jung Wang

Collaborative filtering is the common technique of predicting the interests of a user by collecting preference information from many users. Although it is generally regarded as a key information retrieval technique, its relation to the existing information retrieval theory is unclear. This thesis shows how the development of collaborative filtering can gain many benefits from information retrieval theories and models. It brings the notion of relevance into collaborative filtering and develops several relevance models for collaborative filtering. Besides dealing with user profiles that are obtained by explicitly asking users to rate information items, the relevance models can also cope with the situations where user profiles are implicitly supplied by observing user interactions with a system. Experimental results complement the theoretical insights with improved recommendation accuracy for both item relevance ranking and user rating prediction. Furthermore, the approaches are more than just analogy: our derivations of the unified relevance model show that popular user-based and item-based approaches represent only a partial view of the problem, whereas a unified view that brings these partial views together gives better insights into their relative importance and how retrieval can benefit from their combination.

[download pdf]

Pavel Serdyukov wins ECIR best student paper award

Great news: Yesterday, Pavel Serdyukov won the best student paper award at the European Conference on Information Retrieval (ECIR) in Glasgow for his paper Modeling documents as mixtures of persons for expert finding. The award includes a check of $ 1200 sponsored by Yahoo.

[download pdf]

ECIR tutorial slides on-line

I enjoyed giving the advanced language modeling tutorial at the European Conference on Information Retrieval (ECIR). The slides are now availble for download below.

[download pdf]

Modeling documents as mixtures of persons

by Pavel Serdyukov and Djoerd Hiemstra

In this paper we address the problem of searching for knowledgeable persons within the enterprise, known as the expert finding (or expert search) task. We present a probabilistic algorithm using the assumption that terms in documents are produced by people who are mentioned in them. We represent documents retrieved to a query as mixtures of candidate experts language models. Two methods of personal language models extraction are proposed, as well as the way of combining them with other evidences of expertise. Experiments conducted with the TREC Enterprise collection demonstrate the superiority of our approach in comparison with the best one among existing solutions.

download pdf

XML & DB Home work series 2 available

The second series of home work exercises is now available on TeleTOP from the Roster and Archive pages. You are allowed to do the home work in pairs. Hand in solutions using the on-line submission system from the TeleTOP Roster. Mention both names, which home work series, and the assignment numbers. The amount of work is roughly equal to the work for the written exam, so you should be able to finish the assignments in about 3 hours (provided that you studied all research papers in advance). Similar questions will be asked at the written exam.

Home work series 2 deadline: 28 March 2008. Read more on TeleTOP.

Ranked XML Querying Seminar

The goal of the Dagstuhl seminar on Ranked XML Querying is to bring together researchers and practitioners from the database (DB), the information retrieval (IR) and the web/applications communities, and create an environment where the distinct communities collaboratively work on understanding the similarities and differences between their various approaches for querying XML data with heterogeneous structure and content, and benefit from each other's experiences.

The workshop was attended by 27 people from three different research communities: database systems (DB), information retrieval (IR), and Web. The seminar title was interpreted in an IR-style â€žandishâ€œ sense (it covered also subsets of {Ranking, XML, Querying}, with larger sets being favored) rather than the DB-style strictly conjunctive manner. So in essence, the seminar really addressed the integration of DB and IR technologies with Web 2.0 being an important target area.

[download report]

Structured Text Retrieval Models

by Djoerd Hiemstra and Ricardo Baeza-Yates

Structured text retrieval models provide a formal definition or mathematical framework for querying semi-structured textual databases. A textual database contains both content and structure. The content is the text itself, and the structure divides the database into separate textual parts and relates those textual parts by some criterion. Often, textual databases can be represented as marked up text, for instance as XML, where the XML elements define the structure on the text content. Retrieval models for textual databases should comprise three parts: 1) a model of the text, 2) a model of the structure, and 3) a query language: The model of the text defines a tokenization into words or other semantic units, as well as stop words, stemming, synonyms, etc. The model of the structure defines parts of the text, typically a contiguous portion of the text called element, region, or segment, which is defined on top of the text model's word tokens. The query language typically defines a number of operators on content and structure such as set operators and operators like “containing” and “contained-by” to model relations between content and structure, as well as relations between the structural elements themselves. Using such a query language, the (expert) user can for instance formulate requests like “I want a paragraph discussing formal models near to a table discussing the differences between databases and information retrieval”. Here, “formal models” and “differences between databases and information retrieval” should match the content that needs to be retrieved from the database, whereas “paragraph” and “table” refer to structural constraints on the units to retrieve. The features, structuring power, and the expressiveness of the query languages of several models for structured text retrieval are discussed in this entry.

This entry will soon be published in the Encyclopedia of Database Systems by Springer. The Encyclopedia, under the editorial guidance of Ling Liu and M. Tamer Ã–zsu, will be a multiple volume, comprehensive, and authoritative reference on databases, data management, and database systems. Since it will be available in both print and online formats, researchers, students, and practitioners will benefit from advanced search functionality and convenient interlinking possibilities with related online content. The Encyclopedia's online version will be accessible on the SpringerLink platform. Click here for more information about the Encyclopedia of Database Systems.

[draft]