UNFair: Search Engine Manipulation, Undetectable by Amortized Inequity

by Tim de Jonge and Djoerd Hiemstra

Modern society increasingly relies on Information Retrieval systems to answer various information needs. Since this impacts society in many ways, there has been a great deal of work to ensure the fairness of these systems, and to prevent societal harms. There is a prevalent risk of failing to model the entire system, where nefarious actors can produce harm outside the scope of fairness metrics. We demonstrate the practical possibility of this risk through UNFair, a ranking system that achieves performance and measured fairness competitive with current state-of-the-art, while simultaneously being manipulative in setup. UNFair demonstrates how adhering to a fairness metric, Amortized Equity, can be insufficient to prevent Search Engine Manipulation. This possibility of manipulation bypassing a fairness metric discourages imposing a fairness metric ahead of time, and motivates instead a more holistic approach to fairness assessments.

To be presented at the ACM Conference on Fairness, Accountability, and Transparency (FAccT 2023) on 12-15 June in Chicago, USA.

[download pdf]

Cross-Market Product-Related Question Answering

by Negin Ghasemi, Mohammad Aliannejadi, Hamed Bonab, Evangelos Kanoulas, Arjen de Vries, James Allan, and Djoerd Hiemstra

Online shops such as Amazon, eBay, and Etsy continue to expand their presence in multiple countries, creating new resource-scarce marketplaces with thousands of items. We consider a marketplace to be resource-scarce when only limited user-generated data is available about the products (e.g., ratings, reviews, and product-related questions). In such a marketplace, an information retrieval system is less likely to help users find answers to their questions about the products. As a result, questions posted online may go unanswered for extended periods. This study investigates the impact of using available data in a resource-rich marketplace to answer new questions in a resource-scarce marketplace, a new problem we call cross-market question answering. To study this problem’s potential impact, we collect and annotate a new dataset, XMarket-QA, from Amazon’s UK (resource-scarce) and US (resource-rich) local marketplaces. We conduct a data analysis to understand the scope of the cross-market question-answering task. This analysis shows a temporal gap of almost one year between the first question answered in the UK marketplace and the US marketplace. Also, it shows that the first question about a product is posted in the UK marketplace only when 28 questions, on average, have already been answered about the same product in the US marketplace. Human annotations demonstrate that, on average, 65% of the questions in the UK marketplace can be answered within the US marketplace, supporting the concept of cross-market question answering. Inspired by these findings, we develop a new method, CMJim, which utilizes product similarities across marketplaces in the training phase for retrieving answers from the resource-rich marketplace that can be used to answer a question in the resource-scarce marketplace. Our evaluations show CMJim’s significant improvement compared to competitive baselines.

To be presented at the 46th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2023) on July 23-27 in Taipei, Taiwan.

[download pdf]

#OSSYM2023 at CERN

The Open Search Symposium #OSSYM2023 brings together the Open Internet Search community in Europe for the fifth time this year. The interactive conference provides a forum to discuss and further develop the ideas and concepts of open internet search. Participants include researchers, data centres, libraries, policy makers, legal and ethical experts, and society.

#OSSYM2023 takes place at CERN, Geneva, Switzerland on 4-6 October 2023. The Call for Papers ends 31 May 2023

More info at: https://opensearchfoundation.org/5th-international-open-search-symposium-ossym2023/

Open Web Search project kicked off

Today, we kick-off our new EU project OpenWebSearch.eu. In the project, we develop a new architecture for search engines where many parts of the system will be decentralized. The key idea is to separate index construction from the search engines themselves, where the most expensive step to create index shards can be carried out on large clusters while the search engine itself can be operated locally.

We also envision an Open-Web-Search Engine Hub, where companies and individuals can share their specifications of search engines and pre-computed, regularly updated search indices. We think of this as a search engine mash-up, that would enable a new future of human-centric search without privacy concerns.

More information at: https://openwebsearch.eu/partners/radboud-university/

Open Search Symposium 2022

10-12 October 202 at CERN

The Open Search Symposium series (#OSSYM) provides a forum to discuss and advance the ideas and concepts of Open Internet search in Europe. This year’s #OSSYM2022 takes place at CERN and online from 10-12 October 2022. The programme is great with for instance on Monday a keynote from Tomáš “Word2Vec” Mikolov, on Tuesday a track with alternative search engines including Raphael Auphan (the CEO of Qwant), Isabel Claus (founder of the B-to-B engine thinkers.ai), and Joseph Cullhead (alexandria.org, a Swedish nonprofit organization with a low budget search engine). Wednesday has a panel discussion about the ethics of search.

[Register now via CERN]