Context Based Personalized Ranking in Academic Search
by Alexandru Serban
A criticism of search engines is that queries return the same results for users who send exactly the same query, with distinct information needs. Personalized search is considered a solution as search results are re-evaluated based on user preferences or activity. Instead of relying on the unrealistic assumption that people will precisely specify their intent when searching, the user profile is exploited to re-rank the results. This thesis focuses on two problems related to academic information retrieval systems. The first part is dedicated to data sets for search engine evaluation. Test collections consists of documents, a set of information needs, also called topics, queries that represent the data structure sent to the information retrieval tool and relevance judgements for the top documents retrieved from the collection. Relevance judgements are difficult to gather because the process involves manual work. We propose an automatic method to generate queries from the content of a scientific article and evaluate the relevant results. A test collection is generated, but its power to discriminate between relevant and non relevant results is limited. In the second part of the thesis Scopus performance is improved through personalization. We focus on the academic background of researchers that interact with Scopus since information about their academic profile is already available. Two methods for personalized search are investigated.
At first, the connections between academic entities, expressed as a graph structure, are used to evaluate how relevant a result is to the user. We use SimRank, a similarity measure for entities based on their relationships with other entities. Secondly, the semantic structure of documents is exploited to evaluate how meaningful a document is for the user. A topic model is trained to reflect the user’s interests in research areas and how relevant the search results are.
In the end both methods are merged with the initial Scopus rank. The results of a user study show a constant performance increase for the first 10 results.