by Gijs Hendriksen, Djoerd Hiemstra, and Arjen de Vries
Selective search assumes a document collection can be partitioned into topical index shards in such a way that individual search requests would be satisfied with a few shards only. Previous work has considered primarily the retrieval effectiveness of selective search architectures in an early precision setting. In this work, we instead consider selective search as the rst stage in a multi-stage pipeline, and therefore focus on obtaining high recall. We reproduce the most important algorithms from the selective search literature, and show that they can match the recall level of exhaustive search while reducing the required resources by 50%. We compare the different types of resource selection algorithms, and conclude that the more straightforward strategies that can select shards at a low cost actually outperform the more involved algorithms, in terms of reliably obtaining high recall with fewer shards.
To be presented at the 16th Conference and Lab of the Evaluation Forum (CLEF), in September 2025 in Madrid