We made a little tool for running information retrieval experiments using DuckDB which we appropriately called Zoekeend (Dutch for “search duck”). Zoekeend will be presented at DuckCon #6 in Amsterdam on 31 January 2025.
I will present several reproduced experiments, such as ranking using (small) language models, imports of indexes in the common index file format (CIFF), and the CIFF tokenizer based on tokenizers of large language models, all elegantly defined as SQL queries. I will further present ongoing work on new types of indexes for search engines, such as the score-fitted index, the constant length index and the term-grouped index, all of which would be extremely cumbersome to implement in existing search engines like Lucene, but can be easily defined as SQL queries in DuckDB. Zoekeend will greatly simplify information retrieval experimentation. Zoekeend is open source and available from: https://gitlab.science.ru.nl/informagus/zoekeend/