Analyzing the Diversity and Performance of BERT in Unified Mobile Search
by Negin Ghasemi, Mohammad Aliannejadi, and Djoerd Hiemstra
A unified mobile search framework aims to identify the mobile apps that can satisfy a user’s information need and route the user’s query to them. Previous work has shown that resource descriptions for mobile apps are sparse as they rely on the app’s previous queries. This problem puts certain apps in dominance and leaves out the resource-scarce apps from the top ranks. In this case, we need a ranker that goes beyond simple lexical matching. Therefore, our goal is to study the extent of a BERT-based ranker’s ability to improve the quality and diversity of app selection. To this end, we compare the results of the BERT-based ranker with other information retrieval models, focusing on the analysis of selected apps diversification. Our analysis shows that the BERT-based ranker selects more diverse apps while improving the quality of baseline results by selecting the relevant apps such as Facebook and Contacts for more personal queries and decreasing the bias towards the dominant resources such as the Google Search app.
Slow, content-based, federated, explainable, and fair
Access to information on the world wide web is dominated by monopolists, (Google and Facebook) that decide most of the information we see. Their business models are based on “surveillance capitalism”, that is, profiting from getting to know as much as possible about individuals that use the platforms. The information about individuals is used to maximize their engagement thereby maximizing the number of targeted advertisements shown to these individuals. Google’s and Facebook’s financial success has influenced many other online businesses as well as a substantial part of the academic research agenda in machine learning and information retrieval, that increasingly focuses on training on huge datasets, literally building on the success of Google and Facebook by using their pre-trained models (e.g. BERT and ELMo). Large pre-trained models and algorithms that maximize engagement come with many societal problems: They have been shown to discriminate minority groups, to manipulate elections, to radicalize users, and even to enable genocide. Looking forward to 2021-2027, we aim to research the following technical alternatives that do not exhibit these problems: 1) slow, content-based, learning that maximizes user satisfaction instead of fast, click-based learning that maximizes user engagement; 2) federated information access and search instead of centralized access and search; 3) explainable, fair approaches instead of black-box, biased approaches.
by Djoerd Hiemstra
Query autocompletions help users of search engines to speed up their searches by recommending completions of partially typed queries in a drop down box. These recommended query autocompletions are usually based on large logs of queries that were previously entered by the search engine’s users. Therefore, misinformation entered — either accidentally or purposely to manipulate the search engine — might end up in the search engine’s recommendations, potentially harming organizations, individuals, and groups of people. This paper proposes an alternative approach for generating query autocompletions by extracting anchor texts from a large web crawl, without the need to use query logs. Our evaluation shows that even though query log autocompletions perform better for shorter queries, anchor text autocompletions outperform query log autocompletions for queries of 2 words or more.
To be presented at the 2nd International Symposium on Open Search Technology (OSSYM 2020), 12-14 October 2020, CERN, Geneva, Switzerland.
[download pdf] [slides]
Welcome to the Data Science group, Negin Ghasemi! Negin will work on Transfer Learning for Federated Search.
We are looking for a PhD candidate to join the Data Science group at Radboud University for an exciting new project on transfer learning for language modelling with an application for federated search. Transfer learning learns general purpose language models from huge datasets, such as web crawls, and then trains the models further on smaller datasets for a specific task. Transfer learning in NLP has successfully used pre-trained word-embeddings for several tasks. Although the success of word embeddings on search tasks has been limited, recently pre-trained general purpose language representations such as BERT and ELMo have been successful on several search tasks, including question answering tasks and conversational search tasks. Resource descriptions in federated search consist of samples of the full data that are sparser than full resource representations. This raises the question of how to infer vocabulary that is missing from the sampled data. A promising approach comes from transfer learning from pre-trained language representations. An open question is how to effectively and efficiently apply those pre-trained representations and how to adapt them to the domain of federated search. In this project, you will use pre-trained language models, and further train those models for a (federated) search task. You will evaluate the quality of those models as part of international evaluation conferences like the Text Retrieval Conference (TREC) and the Conference and Labs of the Evaluation Forum (CLEF).
If you search for something on the internet, you use Google. The fact that this allows the company to learn quite a bit about you is bothering more and more people. Can it be done differently?
Read more on Radboud Recharge.
Recommending Users: Whom to Follow on Federated Social Networks
by Jan Trienes, Andrés Torres Cano, and Djoerd Hiemstra
To foster an active and engaged community, social networks employ recommendation algorithms that filter large amounts of contents and provide a user with personalized views of the network. Popular social networks such as Facebook and Twitter generate follow recommendations by listing profiles a user may be interested to connect with. Federated social networks aim to resolve issues associated with the popular social networks – such as large-scale user-surveillance and the miss-use of user data to manipulate elections – by decentralizing authority and promoting privacy. Due to their recent emergence, recommender systems do not exist for federated social networks, yet. To make these networks more attractive and promote community building, we investigate how recommendation algorithms can be applied to decentralized social networks. We present an offline and online evaluation of two recommendation strategies: a collaborative filtering recommender based on BM25 and a topology-based recommender using personalized PageRank. Our experiments on a large unbiased sample of the federated social network Mastodon shows that collaborative filtering approaches outperform a topology-based approach, whereas both approaches significantly outperform a random recommender. A subsequent live user experiment on Mastodon using balanced interleaving shows that the collaborative filtering recommender performs on par with the topology-based recommender.
This paper will be presented at the 17th Dutch-Belgian Information Retrieval workshop in Leiden on 23 November 2018
After running the UT search engine for about a year now, there's a new search engine that uses Searsia: The search engine, called Dr. Sheet Music is a federated search engine for sheet music. Give it a try at http://drsheetmusic.com.
by Dong Nguyen, Thomas Demeester, Dolf Trieschnigg, and Djoerd Hiemstra
A publicly available dataset for federated search reflecting a real web environment has long been absent, making it difficult for researchers to test the validity of their federated search algorithms for the web setting. We present several experiments and analyses on resource selection on the web using a recently released test collection containing the results from more than a hundred real search engines, ranging from large general web search engines such as Google, Bing and Yahoo to small domain-specific engines.
First, we experiment with estimating the size of uncooperative search engines on the web using query based sampling and propose a new method using the ClueWeb09 dataset. We find the size estimates to be highly effective in resource selection. Second, we show that an optimized federated search system based on smaller web search engines can be an alternative to a system using large web search engines. Third, we provide an empirical comparison of several popular resource selection methods and find that these methods are not readily suitable for resource selection on the web. Challenges include the sparse resource descriptions and extremely skewed sizes of collections.
Predicting relevance based on assessor disagreement: analysis and practical applications for search evaluation
by Thomas Demeester, Robin Aly, Djoerd Hiemstra, Dong Nguyen, and Chris Develder
Evaluation of search engines relies on assessments of search results for selected test queries, from which we would ideally like to draw conclusions in terms of relevance of the results for general (e.g., future, unknown) users. In practice however, most evaluation scenarios only allow us to conclusively determine the relevance towards the particular assessor that provided the judgments. A factor that cannot be ignored when extending conclusions made from assessors towards users, is the possible disagreement on relevance, assuming that a single gold truth label does not exist. This paper presents and analyzes the predicted relevance model (PRM), which allows predicting a particular result’s relevance for a random user, based on an observed assessment and knowledge on the average disagreement between assessors. With the PRM, existing evaluation metrics designed to measure binary assessor relevance, can be transformed into more robust and effectively graded measures that evaluate relevance towards a random user. It also leads to a principled way of quantifying multiple graded or categorical relevance levels for use as gains in established graded relevance measures, such as normalized discounted cumulative gain, which nowadays often use heuristic and data-independent gain values. Given a set of test topics with graded relevance judgments, the PRM allows evaluating systems on different scenarios, such as their capability of retrieving top results, or how well they are able to filter out non-relevant ones. Its use in actual evaluation scenarios is illustrated on several information retrieval test collections.
To be published in Information Retrieval Journal by Springer