Challenges of index exchange for search engine interoperability

by Djoerd Hiemstra, Gijs Hendriksen, Chris Kamphuis, and Arjen de Vries

We discuss tokenization challenges that arise when sharing inverted file indexes to support interoperability between search engines, in particular: How to tokenize queries such that the tokens are consistent with the tokens in the shared index? We discuss various solutions and present preliminary experimental results that show when the problem occurs and how it can be mitigated by standardizing on a simple, generic tokenizer for all shared indexes.

To be presented at the 5th International Open Search Symposium #OSSYM2023 at CERN, Geneva, Switzerland on 4-6 October 2023

[download pdf]