Inaugural lecture on 1 March

On 1 March 2024 at 15:45h., I will give my inaugural lecture: “Zoekmachines: Samen en duurzaam vooruit” (in Dutch). Everyone is invited. Please register on: https://www.ru.nl/rede/hiemstra

In the lecture, I will share an ancient wisdom about working together; I will discuss my plan to teach students of all background their shared history; and I will reveal my dream to provide unrestricted access to all human information by working together. The lecture will contain cars, iPhone chargers, the Space Shuttle, and references to exciting recent research.

Uitnodiging Oratie

ECIR 2019 proceedings online

by Leif Azzopardi, Benno Stein, Norbert Fuhr, Philipp Mayr, Claudia Hauff, and Djoerd Hiemstra

The 41st European Conference on Information Retrieval (ECIR) was held in Cologne, Germany, during April 14–18, 2019, and brought together hundreds of researchers from Europe and abroad. The conference was organized by GESIS–Leibniz Institute for the Social Sciences and the University of Duisburg-Essen — in cooperation with the British Computer Society’s Information Retrieval Specialist Group (BCS-IRSG). These proceedings contain the papers, presentations, workshops, and tutorials given during the conference. This year the ECIR 2019 program boasted a variety of novel work from contributors from all around the world and provided new platforms for promoting information retrieval-related (IR) activities from the CLEF Initiative. In total, 365 submissions were fielded across the tracks from 50 different countries.
The final program included 39 full papers (23% acceptance rate), 44 short papers (29% acceptance rate), eight demonstration papers (67% acceptance rate), nine reproducibility full papers (75% acceptance rate), and eight invited CLEF papers. All submissions were peer reviewed by at least three international Program Committee members to ensure that only submissions of the highest quality were included in the final program. As part of the reviewing process we also provided more detailed review forms and guidelines to help reviewers identify common errors in IR experimentation as a way to help ensure consistency and quality across the reviews.
The accepted papers cover the state of the art in IR: evaluation, deep learning, dialogue and conversational approaches, diversity, knowledge graphs, recommender systems, retrieval methods, user behavior, topic modelling, etc., and also include novel application areas beyond traditional text and Web documents such as the processing and retrieval of narrative histories, images, jobs, biodiversity, medical text, and math. The program boasted a high proportion of papers with students as first authors, as well as papers from a variety of universities, research institutes, and commercial organizations.
In addition to the papers, the program also included two keynotes, four tutorials, four workshops, a doctoral consortium, and an industry day. The first keynote was presented by this year’s BCS IRSG Karen Sparck Jones Award winner, Prof. Krisztian Balog, On Entities and Evaluation, and the second keynote was presented by Prof. Markus Strohmaier, On Ranking People. The tutorials covered a range of topics from conducting lab-based experiments and statistical analysis to categorization and deeplearning, while the workshops brought together participants to discuss algorithm selection (AMIR), narrative extraction (Text2Story), Bibliometrics (BIR), as well as social media personalization and search (SoMePeAS). As part of this year’s ECIR we also introduced a new CLEF session to enable CLEF organizers to report on and promote their upcoming tracks. In sum, this added to the success and diversity of ECIR and helped build bridges between communities.
The success of ECIR 2019 would not have been possible without all the help from the team of volunteers and reviewers. We wish to thank all our track chairs for coordinating the different tracks along with the teams of meta-reviewers and reviewers who helped ensure the high quality of the program. We also wish to thank the demo chairs: Christina Lioma and Dagmar Kern; student mentorship chairs: Ahmet Aker and Laura Dietz; doctoral consortium chairs: Ahmet Aker, Dimitar Dimitrov and Zeljko Carevic; workshop chairs: Diane Kelly and Andreas Rauber; tutorial chairs: Guillaume Cabanac and Suzan Verberne; industry chair: Udo Kruschwitz; publicity chair: Ingo Frommholz; and sponsorship chairs: Jochen L. Leidner and Karam Abdulahhad. We would like to thank our webmaster, Sascha Schüller and our local chair, Nina Dietzel along with all the student volunteers who helped to create an excellent online and offline experience for participants and attendees.

Published as: Advances in Information Retrieval. Proceedings of the 41st European Conference on Information Retrieval Research (ECIR), Lecture Notes in Computer Science, volumes 11437 and 11438, Springer, 2019
[Part I] [Part II]

Goodbye everybody at U. Twente

(written for CS teaching mailing no. 16 of 11 July)

As of 1 July, I will leave the U. Twente after almost 30 years (first as student, then as PhD student, finally as staff member) for a new challenge at the Radboud University in Nijmegen. I am proud to announce that I will join Radboud University’s faculty of science as professor of Federated Search.

I was privileged to teach in a world that changed a lot since I became an assistant professor (in 2001). Today, university-level courses are no longer taught for the privileged few at universities in developed countries. They are now freely available to anyone online via platforms like Coursera, edX, FutureLearn and on social media, such as on YouTube. Over the last 18 years, I tried to stimulate students to find additional study material online. In return I tried to contribute to the online study material by publishing my teaching material for students to use and for colleagues to share (my Canvas courses are still entirely publicly available) and by using novel social media like UT Mastodon (https://mastodon.utwente.nl).

In my years at the UT, I enjoyed promoting critical thinking by letting students actively put theory to practice, instead of letting students passively absorb knowledge. I particularly enjoyed developing the MSc course Managing Big Data with Maarten Fokkinga and Robin Aly (later perfected by Doina Bucur) where students analysed terabytes of data on a large Hadoop cluster. I enjoyed developing the BSc module Data & Information with Klaas Sikkel, Maurice van Keulen and Luís Ferreira Pires, where we let students work in agile teams, including daily stand-ups, sprint review meetings, and sprint backlogs. I also very much liked running the MSc course Information Retrieval with Paul van der Vet, Theo Huibers and Dolf Trieschnigg, where students used open source search engines and actively contributed to our research. Some of that work was published, and in such cases, students presented their work at international workshops or conferences.

Saying goodbye to Twente is harder than I expected. But remember, Nijmegen is close by: Feel free to contact me. As for PhD students, I intend to continue to be an active contributor to the courses of the Dutch research school SIKS: I hope to see you there.

Goodbye everybody!

MTCB: A Multi-Tenant Customizable database Benchmark

by Wim van der Zijden, Djoerd Hiemstra, and Maurice van Keulen

We argue that there is a need for Multi-Tenant Customizable OLTP systems. Such systems need a Multi-Tenant Customizable Database (MTC-DB) as a backing. To stimulate the development of such databases, we propose the benchmark MTCB. Benchmarks for OLTP exist and multi-tenant benchmarks exist, but no MTC-DB benchmark exists that accounts for customizability. We formulate seven requirements for the benchmark: realistic, unambiguous, comparable, correct, scalable, simple and independent. It focuses on performance aspects and produces nine metrics: Aulbach compliance, size on disk, tenants created, types created, attributes created, transaction data type instances created per minute, transaction data type instances loaded by ID per minute, conjunctive searches per minute and disjunctive searches per minute. We present a specification and an example implementation in Java 8, which can be accessed from the following public repository. In the same repository a naive implementation can be found of an MTC-DB where each tenant has its own schema. We believe that this benchmark is a valuable contribution to the community of MTC-DB developers, because it provides objective comparability as well as a precise definition of the concept of MTC-DB.

The Multi-Tenant Customizable database Benchmark will be presented at the 9th International Conference on Information Management and Engineering (ICIME 2017) on 9-11 October 2017 in Barcelona, Spain.

[download pdf]

Term Extraction paper in Computing Reviews’ Best of 2016

CR Best of Computing Notable Article The paper Evaluation and analysis of term scoring methods for term extraction with Suzan Verberne, Maya Sappelli and Wessel Kraaij is selected as one of ACM Computing Reviews' 2016 Best of Computing. Computing Reviews is published by the Association for Computing Machinery (ACM) and the editor-in-chief is Carol Hutchins (New York University).

In the paper, we evaluate five term scoring methods for automatic term extraction on four different types of text collections. We show that extracting relevant terms using unsupervised term scoring methods is possible in diverse use cases, and that the methods are applicable in more contexts than their original design purpose.

[download pdf]

Vincent van Donselaar graduates on database synchronization

Low latency asynchronous database synchronization and data transformation using the replication log

by Vincent van Donselaar

Analytics firm Distimo offers a web based product that allows mobile app developers to track the performance of their apps across all major app stores. The Distimo backend system uses web scraping techniques to retrieve the market data which is stored in the backend master database: the data warehouse (DWH). A batch-oriented program periodically synchronizes relevant data to the frontend database that feeds the customer-facing web interface.
The synchronization program poses limitations due to its batch-oriented design. The relevant metadata that must be calculated before and after each batch results in overhead and increased latency. The goal of this research is to streamline the synchronization process by moving to a continuous, replication-like solution, combined with principles seen in the field of data warehousing. The binary transaction log of the master database is used to feed the synchronization program that is also responsible for implicit data transformations like aggregation and metadata generation. In contrast to traditional homogeneous database replication, this design allows synchronization across heterogeneous database schemas. The prototype demonstrates that a composition of replication and data warehousing techniques can offer an adequate solution for robust and low latency data synchronization software.

[download pdf]

Roeland Kegel graduates on developing a personal information security assistant

Development and Validation of a Personal Information Security Assistant Architecture

by Roeland Kegel

This thesis presents and validates the first iteration of the design process of a Personal Information Security Assistant (PISA). The PISA aims to protect the information and devices of an end-user, offering advice and education in order to improve the security and awareness of its users. The PISA is a security solution that takes a user-centric approach, aiming to educate as well as protect, to motivate as well as secure. This thesis first presents the method and its application by which stakeholders are elicited and classified. Requirements are then elicited using these stakeholders. 4 architectural alternatives for PISA are then proposed. Finally, these alternatives are validated by a traceability analysis, a prototype implementation of a specific alternative and feedback by a focus group of experts. In summary, this thesis presents stakeholders, goals, requirements and proposed architectures for the PISA and contains a validation of the latter.

[download pdf]

Celebrating Stephen Robertson’s Retirement

by Djoerd Hiemstra, John Tait, Andrew MacFarlane, and Nick Belkin

Stephen Robertson at SIGIR 2013 Stephen Robertson was named fellow of the Association for Computing Machinery (ACM) last week. Robertson retired from the Microsoft Research Lab in Cambridge this year after a long career as one of the most influential, well liked and eminent researchers in Information Retrieval throughout the world. His successful career was celibrated in the latest BCS IRSG Informer. Stephen Robertson continues to be active in Information Retrieval in his retirement at University College London.

[download pdf]

In memory of Joost van Honschoten

Today would have been the 41st birthday of Joost van Honschoten, who passed away almost 2 years ago. Joost was a talented young researcher, holding grants from STW and NWO, working as a professor at the Transducers Science and Technology Group of the Unversity of Twente. Joost and I published several “papers” together around 1983, not as researchers, but as comic book writers when we were about 11 and 12 years old. One of them, “Honne & Ponnie en de Jacht op Ruige Robbie” can be downloaded from the link below. The comic gives an idea of the friendship, creativity and humour that we shared.

Honne en Ponnie en de Jacht op Ruige Robbie
HonneEnPonnieDeel1.pdf