Optimizing Ranking Systems Online as Bandits
by Chang Li
People use interactive systems, such as search engines, as the main tool to obtain information. To satisfy the information needs, such systems usually provide a list of items that are selected out of a large candidate set and then sorted in the decreasing order of their usefulness. The result lists are generated by a ranking algorithm, called ranker, which takes the request of user and candidate items as the input and decides the order of candidate items. The quality of these systems depends on the underlying rankers.
There are two main approaches to optimize the ranker in an interactive system: using data annotated by humans or using the interactive user feedback. The first approach has been widely studied in history, also called offline learning to rank, and is the industry standard. However, the annotated data may not well represent information needs of users and are not timely. Thus, the first approaches may lead to suboptimal rankers. The second approach optimizes rankers by using interactive feedback. This thesis considers the second approach, learning from the interactive feedback. The reasons are two-fold:
- Everyday, millions of users interact with the interactive systems and generate a huge number of interactions, from which we can extract the information needs of users.
- Learning from the interactive data have more potentials to assist in designing the online algorithms.
Specifically, this thesis considers the task of learning from the user click feedback. The main contribution of this thesis is proposing a safe online learning to re-rank algorithm, named BubbleRank, which addresses one main disadvantage of online learning, i.e., the safety issue, by combining the advantages of both offline and online learning to rank algorithms. The thesis also proposes three other online algorithms, each of which solves unique online ranker optimization problems. All the proposed algorithms are theoretically sound and empirically effective.