Expert group formation using facility location analysis

by Mahmood Neshati, Hamid Beigy, and Djoerd Hiemstra

In this paper, we propose an optimization framework to retrieve an optimal group of experts to perform a multi-aspect task. While a diverse set of skills are needed to perform a multi-aspect task, the group of assigned experts should be able to collectively cover all these required skills. We consider three types of multi-aspect expert group formation problems and propose a unified framework to solve these problems accurately and efficiently. The first problem is concerned with finding the top k experts for a given task, while the required skills of the task are implicitly described. In the second problem, the required skills of the tasks are explicitly described using some keywords but each expert has a limited capacity to perform these tasks and therefore should be assigned to a limited number of them. Finally, the third problem is the combination of the first and the second problems. Our proposed optimization framework is based on the Facility Location Analysis which is a well known branch of the Operation Research. In our experiments, we compare the accuracy and efficiency of the proposed framework with the state-of-the-art approaches for the group formation problems. The experiment results show the effectiveness of our proposed methods in comparison with state-of-the-art approaches.

Published in Information Processing & Management 50(2), March 2014, Pages 361–383

[download pdf]

University of Twente at the TREC 2008 Enterprise Track

Using the Global Web as an expertise evidence source

by Pavel Serdyukov, Robin Aly, Djoerd Hiemstra

This is the fourth (and the last) year of the TREC Enterprise Track and the second year the University of Twente submitted runs for the expert finding task. In the methods that were used to produce these runs, we mostly rely on the predicting potential of those expertise evidence sources that are publicly available on the Global Web, but not hosted at the website of the organization under study (CSIRO). This paper describes the follow-up studies complimentary to our recent research that demonstrated how taking the web factor seriously significantly improves the performance of expert finding in the enterprise.

The paper will be presented at the 17th Text Retrieval Conference (TREC), November 19-21, at the United States National Institute of Standards and Technology in Gaithersburg, USA.

[download draft paper] [More info]

Multi-step Relevance Propagation for Expert Finding

by Pavel Serdyukov, Henning Rode, and Djoerd Hiemstra

A fragment of the real expertise graph with links between documents white nodes) and candidate experts (black nodes) for query 'sustainable ecosystems' An expert finding system allows a user to type a simple text query and retrieve names and contact information of individuals that possess the expertise expressed in the query. This paper proposes a novel approach to expert finding in large enterprises or intranets by modeling candidate experts (persons), web documents and various relations among them with so-called expertise graphs. As distinct from the state-of-the-art approaches estimating personal expertise through one-step propagation of relevance probability from documents to the related candidates, our methods are based on the principle of multi-step relevance propagation in topic-specific expertise graphs. We model the process of expert finding by probabilistic random walks of three kinds: finite, infinite and absorbing. Experiments on TREC Enterprise Track data originating from two large organizations show that our methods using multi-step relevance propagation improve over the baseline one-step propagation based method in almost all cases.

The paper will be presented at the ACM Conference on Information and Knowledge Management CIKM 2008 in Napa Valley, USA

[download pdf]

Being Omnipresent to be Almighty

The Importance of the Global Web Evidence for Organizational Expert Finding

by Pavel Serdyukov and Djoerd Hiemstra

Modern expert finding algorithms are developed under the assumption that all possible expertise evidence for a person is concentrated in a company that currently employs the person. The evidence that can be acquired outside of an enterprise is traditionally unnoticed. At the same time, the Web is full of personal information which is sufficiently detailed to judge about a person's skills and knowledge. In this work, we review various sources of expertise evidence outside of an organization and experiment with rankings built on the data acquired from six different sources, accessible through APIs of two major web search engines. We show that these rankings and their combinations are often more realistic and of higher quality than rankings built on organizational data only.

The paper will be presented at the Future Challenges in Expertise Retrieval fCHER workshop in Singapore

[download pdf]

Modeling documents as mixtures of persons

by Pavel Serdyukov and Djoerd Hiemstra

In this paper we address the problem of searching for knowledgeable persons within the enterprise, known as the expert finding (or expert search) task. We present a probabilistic algorithm using the assumption that terms in documents are produced by people who are mentioned in them. We represent documents retrieved to a query as mixtures of candidate experts language models. Two methods of personal language models extraction are proposed, as well as the way of combining them with other evidences of expertise. Experiments conducted with the TREC Enterprise collection demonstrate the superiority of our approach in comparison with the best one among existing solutions.

download pdf

Relevance propagation for expert search

by Pavel Serdyukov, Henning Rode, and Djoerd Hiemstra

This paper describes several approaches which we used for the expert search task of the TREC 2007 Enterprise track. We studied several methods of relevance propagation from documents to related candidate experts. Instead of one- step propagation from documents to directly related candidates, used by many systems in the previous years, we do not limit the relevance flow and disseminate it further through mutual documents-candidates connections. We model relevance propagation using random walk principles, or in formal terms, discrete Markov processes. We experiment with infinite and finite numbers of propagation steps. We also demonstrate how additional information, namely hyperlinks among documents, organizational structure of the enterprise and relevance feedback may be utilized by the presented techniques.

[download pdf]

Tutorial: Advanced language modeling approaches

(Case study: Expert search)

I will give a tutorial at the 30th European Conference on Information Retrieval (ECIR): The tutorial gives a clear and detailed overview of advanced language modeling approaches and tools, including the use of document priors, translation models, relevance models, parsimonious models and expectation maximization training. Expert search will be used as a case study to explain the consequences of modeling assumptions.

[download pdf]

See the ECIR tutorials and workshops page

Entity Ranking on Graphs: Studies on Expert Finding

by Henning Rode, Pavel Serdyukov, Djoerd Hiemstra, and Hugo Zaragoza

Todays web search engines try to offer services for finding various information in addition to simple web pages, like showing locations or answering simple fact queries. Understanding the association of named entities and documents is one of the key steps towards such semantic search tasks. This paper addresses the ranking of entities and models it in a graph-based relevance propagation framework. In particular we study the problem of expert finding as an example of an entity ranking task. Entity containment graphs are introduced that represent the relationship between text fragments on the one hand and their contained entities on the other hand. The paper shows how these graphs can be used to propagate relevance information from the pre-ranked text fragments to their entities. We use this propagation framework to model existing approaches to expert finding based on the entity's indegree and extend them by recursive relevance propagation based on a probabilistic random walk over the entity containment graphs. Experiments on the TREC expert search task compare the retrieval performance of the different graph and propagation models.

[download pdf]

Who is the center of the SIGIR universe?

Last year, in Seattle, Jon Kleinberg gave a keynote speech about social networks, incentives and search. In social networks, it is not about what you do, how much you do, or where you're from, it is about who you know. To celebrate SIGIR's 30th anniversary, we analysed all SIGIR proceedings and built a social network in which SIGIR authors are the nodes and edges are added between nodes if two authors co-authored a SIGIR paper together. The author that is most central in the network, is the one that has the shortest average distance to all other authors, where the distance is 1 if two authors wrote a SIGIR paper together, it is 2 if the first author wrote a paper with someone who wrote a paper with the second author, etc. It turns out that the center of the SIGIR universe is Wensi Xi from Google. Congrats Wensi! More info on: Search demo (See the Oracle of Xi)