by Djoerd Hiemstra
I discuss fairness in Information Retrieval (IR) through the eyes of Cooper and Robertson’s probability ranking principle. I argue that unfair rankings may arise from blindly applying the principle without checking whether its preconditions are met. Following this argument, unfair rankings originate from the application of learning-to-rank approaches in cases where they should not be applied according to the probability ranking principle. I use two examples to show that fairer rankings may also be more relevant than rankings that are based on the probability ranking principle.
Published in ACM SIGIR Forum 56(2), 2022
We are proud to announce that Hannes Mühleisen will give a guest lecture on Tuesday 13 December at 13:30h. in LIN-2 for the course Information Modelling and Databases. Hannes Mühleisen is the creator of DuckDB and co-founder and CEO of DuckDB Labs. He is also a senior researcher of the Database Architectures group at the Centrum Wiskunde & Informatica (CWI) in Amsterdam. Students of the course use DuckDB to practice their SQL skills.
Analytical Query Processing and the DuckDB System
by Hannes Mühleisen
DBMSs have historically been created to support transactional (OLTP) workloads. However, a second use case, analytical data analysis (OLAP), quickly appeared. These workloads are characterised by complex, relatively long-running queries that process significant portions of the stored dataset, for example aggregations over entire tables or joins between several large tables. Its rather impossible for an OLTP-focused DBMS to perform well in OLAP scenarios, which is why specialised systems have been developed. In this lecture, I will introduce analytical query processing, give an overview over the state of the art in research and industry, and describe our own analytical DBMS, DuckDB.
10-12 October 202 at CERN
The Open Search Symposium series (#OSSYM) provides a forum to discuss and advance the ideas and concepts of Open Internet search in Europe. This year’s #OSSYM2022 takes place at CERN and online from 10-12 October 2022. The programme is great with for instance on Monday a keynote from Tomáš “Word2Vec” Mikolov, on Tuesday a track with alternative search engines including Raphael Auphan (the CEO of Qwant), Isabel Claus (founder of the B-to-B engine thinkers.ai), and Joseph Cullhead (alexandria.org, a Swedish nonprofit organization with a low budget search engine). Wednesday has a panel discussion about the ethics of search.
[Register now via CERN]
Today, we kick-off our new EU project OpenWebSearch.eu. In the project, we develop a new architecture for search engines where many parts of the system will be decentralized. The key idea is to separate index construction from the search engines themselves, where the most expensive step to create index shards can be carried out on large clusters while the search engine itself can be operated locally.
We also envision an Open-Web-Search Engine Hub, where companies and individuals can share their specifications of search engines and pre-computed, regularly updated search indices. We think of this as a search engine mash-up, that would enable a new future of human-centric search without privacy concerns.
More information at: https://openwebsearch.eu/partners/radboud-university/
Investigating the Role of Fundraisers’ Networks in Online Peer-to-Peer Fundraising
by Anna Priante, Michel Ehrenhard, Tijs van den Broek, Ariana Need, and Djoerd Hiemstra
In online peer-to-peer fundraising, individual fundraisers, acting on behalf of nonprofit organizations, mobilize their social networks using social media to request donations. Whereas existing studies focus on networks of donors to explain success, we examine the role of the networks of fundraisers and their effect on fundraising outcomes. By drawing on social capital and network theories, we investigate how social capital derived from social media networks and fundraising groups explains individual fundraising success. Using the Movember health campaign on Twitter as an empirical context, we find that fundraising success is associated with a moderate level of centrality in social media networks and moderate group network size. In addition, we find that fundraisers interact only marginally on social media but prefer to connect with each other outside these platforms and engage in group fundraising. Our article contributes to research on fundraising and social networks and provides recommendations for practice.
Published at Nonprofit and Voluntary Sector Quarterly 51(5)
Search Engine Manipulation Flying under Fairness’ Radar
by Tim de Jonge
Modern society increasingly relies on Information Retrieval (IR) systems to answer various information needs. Since this impacts society in many ways, there has been a great deal of work to ensure the fairness of these systems, and to prevent societal harms. The Search Engine Manipulation Effect (SEME) is one such societal harm: voters could be influenced by means of these systems by showing biased search results. This paper introduces the notion of Exposure Gerrymandering, to illustrate how nefarious actors could create a system that appears unbiased to common fairness assessments, while substantially influencing the election at hand.
Presented on 20 July at Future Directions in Information Access
(FDIA 2022) at Lisbon, Portugal.
by Djoerd Hiemstra and Marie-Francine Moens
The 43rd European Conference on Information Retrieval, ECIR 2021, was supposed to take place as an in-person conference in Lucca, Italy. Due to the COVID-19 pandemic, ECIR 2021 was held entirely online from March 28 to April 1, 2021. The conference programme contained full paper presentations, poster presentations, system demonstrations, eight tutorials, five workshops, an industry event, a doctoral consortium, a reproducibility track, a panel on open access publishing and several online social events.
For this special issue, we asked the authors of eight of the ECIR 2021 full apers that had the best reviewing scores to submit an extended version of their paper. This led to five papers that are published in this special issue of the Information Retrieval Journal. The extended papers contain at least 30% new content. Examples of extensions are enhancements that improve the techniques described in the ECIR 2021 paper; as well as tests on additional datasets that reveal behaviors that differ from the originally published claims and that provide further insights into the methods being described. Among the papers in this special issue are extensions of two papers that received an award at ECIR 2021.
Published in Information Retrieval Journal.
The Dutch Data Protection Authority may ban the use of Google Analytics later this year. The vast majority of popular websites in the Netherlands use Google Analytics for visitor statistics. How should you map the visit to your website?
Read more at the Privacy Company Blog (in Dutch).
Welcome to the Data Science group, Tim de Jonge! Tim will be working on Fairness and Non-discrimination in Machine Learning for Information Retrieval in cooperation with iHub.