Improving biomedical information retrieval with pseudo and explicit relevance feedback
by Bas Niesink
The HERO project aims to increase the quality of supervised exercise during cancer treatment by making use of a clinical decision support system. In this research, concept-based information retrieval techniques to find relevant medical publications for such a system were developed and tested. These techniques were designed to search multiple document collections, without the need to store copies of the collections.
The influence of pseudo and explicit relevance feedback using the Rocchio algorithm were explored. The underlying retrieval models that were tested are TFIDF and BM25.
The tests were conducted using the TREC Clinical Decision Support datasets for the 2014 and 2015 editions. The TREC CDS relevance judgements were used to simulate explicit feedback. The NLM Medical Text Indexer was used to extract MeSH terms from the TREC CDS topics, to be able to conduct concept-based queries. Furthermore, the difference in performance when using inverse document frequencies calculated on the entire PMC dataset, and on a collection of several thousand intermediate search results were measured.
The results show that both pseudo and explicit relevance feedback have a strong positive influence on the inferred NDCG. Additionally, the performance difference when using IDF values calculated on a very small document collection is limited.