Pavel Serdyukov defends PhD thesis on Expert Search

by Pavel Serdyukov

The automatic search for knowledgeable people in the scope of an organization is a key function which makes modern enterprise search systems commercially successful and socially demanded. A number of effective approaches to expert finding were recently proposed in academic publications. Although, most of them use reasonably defined measures of personal expertise, they often limit themselves to rather unrealistic and sometimes oversimplified principles. In this thesis, we explore several ways to go beyond state-of-the-art assumptions used in research on expert finding and propose several novel solutions for this and related tasks. First, we describe measures of expertise that do not assume independent occurrence of terms and persons in a document what makes them perform better than the measures based on independence of all entities in a document. One of these measures makes persons central to the process of terms generation in a document. Another one assumes that the position of the person’s mention in a document with respect to the positions of query terms indicates the relation of the person to the document’s relevant content. Second, we find the ways to use not only direct expertise evidence for a person concentrated within the document space of the person’s current employer and only within those organizational documents that mention the person. We successfully utilize the predicting potential of additional indirect expertise evidence publicly available on the Web and in the organizational documents implicitly related to a person. Finally, besides the expert finding methods we proposed, we also demonstrate solutions for tasks from related domains. In one case, we use several algorithms of multi-step relevance propagation to search for typed entities in Wikipedia. In another case, we suggest generic methods for placing photos uploaded to Flickr on the world map using language models of locations built entirely on the annotations provided by users with a few task specific extensions.

[download pdf]