Open Search Symposium 2022

10-12 October 202 at CERN

The Open Search Symposium series (#OSSYM) provides a forum to discuss and advance the ideas and concepts of Open Internet search in Europe. This year’s #OSSYM2022 takes place at CERN and online from 10-12 October 2022. The programme is great with for instance on Monday a keynote from Tomáš “Word2Vec” Mikolov, on Tuesday a track with alternative search engines including Raphael Auphan (the CEO of Qwant), Isabel Claus (founder of the B-to-B engine, and Joseph Cullhead (, a Swedish nonprofit organization with a low budget search engine). Wednesday has a panel discussion about the ethics of search.

[Register now via CERN]

Open Web Search project kicked off

Today, we kick-off our new EU project In the project, we develop a new architecture for search engines where many parts of the system will be decentralized. The key idea is to separate index construction from the search engines themselves, where the most expensive step to create index shards can be carried out on large clusters while the search engine itself can be operated locally.

We also envision an Open-Web-Search Engine Hub, where companies and individuals can share their specifications of search engines and pre-computed, regularly updated search indices. We think of this as a search engine mash-up, that would enable a new future of human-centric search without privacy concerns.

More information at:

Mo Together or Alone?

Investigating the Role of Fundraisers’ Networks in Online Peer-to-Peer Fundraising

by Anna Priante, Michel Ehrenhard, Tijs van den Broek, Ariana Need, and Djoerd Hiemstra

In online peer-to-peer fundraising, individual fundraisers, acting on behalf of nonprofit organizations, mobilize their social networks using social media to request donations. Whereas existing studies focus on networks of donors to explain success, we examine the role of the networks of fundraisers and their effect on fundraising outcomes. By drawing on social capital and network theories, we investigate how social capital derived from social media networks and fundraising groups explains individual fundraising success. Using the Movember health campaign on Twitter as an empirical context, we find that fundraising success is associated with a moderate level of centrality in social media networks and moderate group network size. In addition, we find that fundraisers interact only marginally on social media but prefer to connect with each other outside these platforms and engage in group fundraising. Our article contributes to research on fundraising and social networks and provides recommendations for practice.

Published at Nonprofit and Voluntary Sector Quarterly 51(5)

Exposure Gerrymandering

Search Engine Manipulation Flying under Fairness’ Radar

by Tim de Jonge

Modern society increasingly relies on Information Retrieval (IR) systems to answer various information needs. Since this impacts society in many ways, there has been a great deal of work to ensure the fairness of these systems, and to prevent societal harms. The Search Engine Manipulation Effect (SEME) is one such societal harm: voters could be influenced by means of these systems by showing biased search results. This paper introduces the notion of Exposure Gerrymandering, to illustrate how nefarious actors could create a system that appears unbiased to common fairness assessments, while substantially influencing the election at hand.

Presented on 20 July at Future Directions in Information Access
(FDIA 2022) at Lisbon, Portugal.

[download pdf]

IR Journal Special Issue on ECIR 2021

by Djoerd Hiemstra and Marie-Francine Moens

The 43rd European Conference on Information Retrieval, ECIR 2021, was supposed to take place as an in-person conference in Lucca, Italy. Due to the COVID-19 pandemic, ECIR 2021 was held entirely online from March 28 to April 1, 2021. The conference programme contained full paper presentations, poster presentations, system demonstrations, eight tutorials, five workshops, an industry event, a doctoral consortium, a reproducibility track, a panel on open access publishing and several online social events.

For this special issue, we asked the authors of eight of the ECIR 2021 full apers that had the best reviewing scores to submit an extended version of their paper. This led to five papers that are published in this special issue of the Information Retrieval Journal. The extended papers contain at least 30% new content. Examples of extensions are enhancements that improve the techniques described in the ECIR 2021 paper; as well as tests on additional datasets that reveal behaviors that differ from the originally published claims and that provide further insights into the methods being described. Among the papers in this special issue are extensions of two papers that received an award at ECIR 2021.

Published in Information Retrieval Journal.

[download pdf]

Felipe Moraes Gomes defends PhD thesis on Collaborative Search

Examining the Effectiveness of Collaborative Search Engines

by Felipe Moraes Gomes

Although searching is often seen as a solitary activity, searching in collaboration with others is deemed useful or necessary in many complex situations such as: travel planning; online shopping; looking for health related information; planning birthday parties; working on a group project; or finding a house to buy. Researchers have found that complex search tasks can be executed more effectively and efficiently, achieve higher material coverage, and enable higher knowledge gains in an explicit collaborative setting than if conducted in isolation. However, even though researchers have carefully designed several Collaborative Search (CSE) user studies, there is still conflicting evidence or a lack of evidence on the effectiveness of CSE systems. Thus, in this thesis, we focus on examining the effectiveness of CSE systems in two parts.

In the first part, we shed light on the effectiveness of CSE to support two group configurations, namely group sizes and users’ roles. Past collaborative search studies have had a strong focus on groups of two or three collaborators, thus naturally limiting the number of experimental conditions that could increase quickly. Therefore, there is a lack of evidence suggesting the extent to which
a CSE system can support group sizes beyond these commonly investigated group sizes. Thus, in Chapter 3, we study CSE system effectiveness with group size as the primary dependent variable. Here, we vary group sizes from two to six collaborators, with six as our upper bound due to limitations on our available resources.

In Chapter 4, we focus on roles in CSE. Roles can determine how a group splits up the search task, and determines each group member’s function (e.g., one group member is responsible for finding documents and reading and evaluating them, with a further member responsible for in-depth reading and evaluating of the aforementioned documents). In particular, when the CSE system assigns a role to each group member, researchers have hypothesised that a group may reduce the time spent communicating and coordinating the task, and make the search process more efficient and successful than groups without
role assignment. However, past user studies have provided contradicting evidence as to the utility of assigned roles in CSE. Thus, in Chapter 4, we provide more evidence to settle the question of the effectiveness of CSE systems when used by groups with pre-assigned roles versus groups without pre-assigned roles.

In the second part of this thesis, we make our group configurations constant, particularly, group sizes are set to up to three people, and group members receive the same role. We then turn to a different perspective and focus on examining the effectiveness in two contexts: Search as Learning (SAL) and collaborative online shopping. Search activities for human learning involve multiple iterations that require cognitive processing and interpretation, often requiring the searcher to spend time scanning/viewing, comparing, and evaluating information. However, web search engines are not built to support users in the search tasks often required in learning situations. When people use search as a learning activity, it can be an individual activity or a collaborative activity (e.g., group projects). Hence, in Chapter 5, we tackle the challenge of identifying the impact of web search engines on the (single-search or collaborative search) users’ ability to learn compared to learning acquired via high-quality learning materials as a baseline.

In Chapter 6, we look at a further context: collaborative online shopping. In collaborative online shopping, a group of people come together to make a decision to purchase a product that meets the various group members’ requirements and opinions. While shopping together, search is an important part of the task in order to search for products in a catalogue that is available in an e-commerce website. One important aspect of collaborative shopping is supporting awareness and sharing of knowledge as it can enable a sense of co-presence, which helps groups make a decision that satisfies each group member’s requirements and wishes. As search is a significant part of a collaborative online shopping experience, CSE systems are suitable for executing such tasks. However, there is insufficient evidence of how well can CSE systems support a group of users to search for online products together and make a group decision. Hence, in Chapter 6, we explore the effects of increased awareness and sharing of knowledge (co-presence) using a CSE system in collaborative shopping on the group decision making process.

[more info]

PhD vacancy for software correctness

We’re hiring a PhD Candidate for Software Correctness. The work will target the correctness of high-level programming languages that are “only” strings in your host language, such as SQL and regular expressions.

Software has shaped almost every aspect of our modern lives. Ensuring that software is correct, is both a major scientific challenge and an enterprise with enormous social relevance. Would you like to examine possibilities to introduce a theory for correctness levels for software? Then you have a part to play as a PhD Candidate.

The correctness of software is of major importance in computer science. Unfortunately, the significance of software correctness is not always clear. Furthermore, the automatic checking of software correctness is difficult. This leads to problems during system development projects and during the grading of software exercises.

This PhD candidate position is intended for four years. You understand the importance of correct software and know how to work with several meanings of software correctness. Your goal is to introduce a generic and formal theory of software correctness levels, in which partially correct/incorrect software can be handled in a flexible way.

You will put the generic theory into practice, by experimenting with automatic grading of software exercises in the context of our courses. One application will be dealing with automatic grading of SQL statements. You will be supervised by Patrick van Bommel and Djoerd Hiemstra. Profile:

  • You are an enthusiastic and motivated researcher.
  • You should have a Master’s degree in computer science, or a Master’s degree in mathematics and a demonstrable interest in computer science.

[Apply On-line]

(Deadline: 6 March 2022)

Dutch-Belgian Information Retrieval Workshop 2021½

The program for DIR2021½ is out. DIR 2021½ will run on four consecutive Fridays as online Search Engine Amsterdam meetups. Register now!

Session 1, 4 February 2022

  • Keynote 1 by Maria Maistro (Uni. of Copenhagen): How can we measure reproducibility of IR experiments?

Session 2, 11 February 2022

  • Ali Vardasbi (University of Amsterdam): Mixture-Based Correction for Position and Trust Bias in Counterfactual Learning to Rank
  • Sepideh Mesbah (Randstad Groep): Using RobBERT and eXtreme Multi-Label Classification to Extract Implicit and Explicit Skills From Dutch Job Descriptions
  • Hideaki Joko (Radboud University): Conversational Entity Linking: Problem Definition and Datasets
  • Liesbeth Allein (KU Leuven): Time-aware evidence ranking for fact-checking
  • Mozhdeh Ariannezhad (University of Amsterdam): Understanding Multi-channel Customer Behavior in Retail

Session 3, 18 February 2022

  • Garett Allen (TU Delft): Supercalifragilisticexpialidocious: Why Using the “Right” Readability Formula in Children’s Web Search Matters
  • Carsten Schnober (WizeNoze): Neural Information Retrieval for Educational Resources
  • Olivier Jeunen (Amazon): Embarrassingly shallow auto-encoders for dynamic collaborative filtering
  • Zhe Roger (TU Delft): Leave No User Behind: Towards Improving the Utility of Recommender Systems for Non-mainstream Users
  • Harrie Oosterhuis (Radboud University): Computationally Efficient Optimization of Plackett-Luce Ranking Models for Relevance and Fairness

Session 4, 25 February 2022

  • Keynote 2 by Gabriella Kazai (Microsoft Research): IR Evaluation – An Industry Perspective

Web Analytics & Privacy workshop

On Thursday 23 December, the NoGA team organizes the first Web Analytics and Privacy workshop with in the morning a demonstration of the open source analytics system Matomo, and in the afternoon two excellent guest speakers: Frederik Zuiderveen Borgesius and Güneş Acar.

Frederik Zuiderveen Borgesius will talk about behavioural targeting, privacy, and the law, discussesing the troubled relationship between contemporary advertising technology (adtech) systems, in particular systems of real-time bidding (RTB, also known as programmatic advertising) underpinning much behavioural targeting on the web and through mobile applications.

Güneş Acar will talk about browser fingerprinting and personal data exfiltration on the web, discussing the results of a study into data exfiltration by third-party scripts directly embedded on web pages. Specifically, Güneş will discuss three attacks: misuse of browsers’ internal login managers, social data exfiltration, and whole-DOM exfiltration.

More information at: