A Survey of Pre-Retrieval Query Performance Predictors

by Claudia Hauff, Djoerd Hiemstra, and Franciska de Jong

The focus of research on query performance prediction is to predict the effectiveness of a query given a search system and a collection of documents. If the performance of queries can be estimated in advance of, or during the retrieval stage, specific measures can be taken to improve the overall performance of the system. In particular, pre-retrieval predictors predict the query performance before the retrieval step and are thus independent of the ranked list of results; such predictors base their predictions solely on query terms, the collection statistics and possibly external sources such as WordNet. In this paper, 22 pre-retrieval predictors are categorized and assessed on three different TREC test collections.

[download pdf]

OpenSearch: share your search results

OpenSearch is a collection of simple XML formats for sharing search results, that was originally developed by A9, a company founded by Amazon.com. A9 acts as a search mediator: You pick your favorite search engines, and A9 sends your queries to these engines, aggregates the results, and done, you have your own personal view of the web!

Many search engines provide some kind of OpenSearch or RSS-like search these days, for instance, here's an ego search on Yahoo. But, OpenSearch is just as useful on a much smaller scale, for instance for searching these pages for information on SIKS (the Dutch School for Information and Knowledge Systems).

University of Twente at the TREC 2008 Enterprise Track

Using the Global Web as an expertise evidence source

by Pavel Serdyukov, Robin Aly, Djoerd Hiemstra

This is the fourth (and the last) year of the TREC Enterprise Track and the second year the University of Twente submitted runs for the expert finding task. In the methods that were used to produce these runs, we mostly rely on the predicting potential of those expertise evidence sources that are publicly available on the Global Web, but not hosted at the website of the organization under study (CSIRO). This paper describes the follow-up studies complimentary to our recent research that demonstrated how taking the web factor seriously significantly improves the performance of expert finding in the enterprise.

The paper will be presented at the 17th Text Retrieval Conference (TREC), November 19-21, at the United States National Institute of Standards and Technology in Gaithersburg, USA.

[download draft paper] [More info]

Guest lecture by Maurice van Keulen and project deadline

Maurice van Keulen will give the final lecture on On-Line Analytical Processing & Data Warehouses on Wednesday 22 October, 1/2h. in HO-B1228. Maurice is one of the Database Group members, and teaches the follow-up course Data warehousing & data mining (232020).

Two days later, on Friday 24 October, is the dead line for sending in the report (subproject A) or the software install package (subproject B). The report or software should be accompanied by a prepared presentations (including e.g. a Powerpoint slide show) of about half an hour. At the written exam on 5 November, I will announce the lucky winners that will be asked to give the presentation to all of us on Friday 7 November 5/6h.

More on TeleTOP

Search for the Future

Information Retrieval is the discipline that studies computer-based search tools. Many applications that handle information on the internet would be completely inadequate without the support of information retrieval technology. How would we manage our email without spam filtering? How would we find information on the world wide web if there were no web search engines? The rise of web search engines has been one of the major success stories in computer science of the last decade: Internet and search companies like Google and Yahoo are now among the world's most influential information technology companies.

Today, search technology is provided and developed by major search providers like Google and Yahoo, and by small specialized companies with specialized staff. But as search technology matures, it will have to be available to non-expert application developers as well. A major obstacle to achieve this, is the lack of theories and high-level abstractions of search systems and the lack of declarative query languages. Another obstacle is the lack of methods to handle non-textual data, such as images, audio and video. Several projects of the Database Group of the University of Twente try to solve these problems for application areas such as Entity Search, Expert Search, Video Search, and Distributed Search. The models and approaches that are developed in these projects are evaluated on large scale, realistic testbeds, and implemented in the group's open source search system PF/Tijah, a search system that combines keyword queries with structured queries on XML databases. The research contributes to the several courses in the university's graduate programs, for instance Information Retrieval, and XML & Databases 1 and XML & Databases 2.

Dr. Kawashima says: Search the Web

Dr.Kawashima Scientists have found that searching the Internet triggers key centers in the brain that control decision-making and complex reasoning. The findings demonstrate that Web search activity may help stimulate and possibly improve brain function. According to UCLA's director of Memory and Aging Research Center Dr. Gary Small: “Our most striking finding was that Internet searching appears to engage a greater extent of neural circuitry that is not activated during reading — but only in those with prior Internet experience,”. Researchers found that during Web searching, volunteers with prior experience with internet searching registered a twofold increase in brain activation when compared with those with little internet experience.

More at UCLA

Keynote speech by Gerhard Weikum at DIR 2009

Prof. Gerhard Weikum (MPII, Saarbruecken, Germany) has agreed to give a keynote speech at the Dutch-Belgian Information Retrieval Workshop which takes place on 2 and 3 February 2009 at the University of Twente.

Gerhard Weikum is Research Director at the Max-Planck Institute for Informatics (MPII) in Saarbruecken, Germany, where he is leading the department on databases and information systems. Prof. Weikum is ACM fellow and a renowned expert in the field of Databases. He received the VLDB 10-Year Achievement Award in 2002. Since then, he focused on several information retrieval problems such as peer-to-peer search, search efficiency, and database and search integration, resulting in for instance 6 full papers at the last SIGIR conferences.

Paper submission deadline: 14 November 2008

Guest lecture: Henke Pons of Arcadis


Hands on GIS by ARCADIS

Who: Henke Pons (Arcadis)
When: Friday 17 October, 8.30 h.
Where: HO-B1228

Henke Pons from ARCADIS Nederland will give a guest lecture Hands on GIS by ARCADIS. Henke Pons is Project leader Geographic Information Systems at ARCADIS Spatial Information in Apeldoorn. He will talk about several special application of GIS at ARCADIS.

More information on TeleTOP

Distributed Search and Keyword Auctions

After the burst of the dot-com bubble in the autumn of 2001, the World Wide Web has gone through some remarkable changes in its organizational structure. Consumers of data and content are increasingly taking the role of producers of data and content, thereby threatening traditional publishers. A well known example is the Wikipedia encyclopedia, which is written entirely by its (non-professional) users on a voluntary basis, while still rivaling a traditional publisher like Britannica on-line in both size and quality. Similarly, in SourceForge, communities of open source software developers collaboratively create new software thereby rivaling software vendors like Microsoft; Blogging turned the internet consumers of news into news providers; Kazaa and related peer-to-peer platforms like BitTorrent and E-mule turned anyone who downloads a file automatically into contributors of files; Flickr turned users into contributors of visual content, but also into indexers of that content by social tagging, etc. Communities of users operate by trusting each other as co-developers and contributors, without the need for strict rules. There is however one major internet application for which communities only play a minor role. One of the web's most important applications — if not the most important application — is search. Internet search is almost exclusively run by three companies that dominate the search market: Google, Yahoo, and Microsoft. In contrast to traditional centralized search, where a centralized body like Google or Yahoo is in full control, a community-run search engine would consist of many small search engines that collaboratively provide the search service. This report motivates the need for large-scale distributed approaches to information retrieval, and proposes solutions based on keyword auctions.

[download pdf]