Fausto de Lang graduates on tokenization for information retrieval
An empirical study of the effect of vocabulary size for various tokenization strategies in passage retrieval performance. by Fausto de Lang Many interactions between the the fields of lexical retrieval and large language models still remain underexplored, in particular there is little research into the use of advanced language model tokenizers in combination with classical … Continue reading “Fausto de Lang graduates on tokenization for information retrieval”