Fausto de Lang graduates on tokenization for information retrieval

An empirical study of the effect of vocabulary size for various tokenization strategies in passage retrieval performance. by Fausto de Lang Many interactions between the the fields of lexical retrieval and large language models still remain underexplored, in particular there is little research into the use of advanced language model tokenizers in combination with classical … Continue reading “Fausto de Lang graduates on tokenization for information retrieval”

Maurice Verbrugge graduates on the BERT Ranking Paradigm

The BERT Ranking Paradigm: Training Strategies Evaluated by Maurice Verbrugge This thesis researches the most recent paradigm in information retrieval, which applies the neural language representation model BERT to rank relevant passages out of a corpus. The research focuses on a re-ranker scheme that uses BM25 to pre-rank the corpus followed by BERT-based ranking, exploring … Continue reading “Maurice Verbrugge graduates on the BERT Ranking Paradigm”

Casper van Aarle graduates on Federated Regression Analysis

Federated Regression Analysis on Personal Data Stores: Improving the Personal Health Train by Casper van Aarle Due to regulations and increased privacy awareness, patients may be reticent in sharing data with any institution. The Personal Health Train is an initiative to connect different data institutions for data analysis while maintaining full authority over their data. … Continue reading “Casper van Aarle graduates on Federated Regression Analysis”

Fien Ockers graduates on medication annotation using weak supervision

Medication annotation in medical reports using weaksupervision by Fien Ockers By detecting textual references to medication in the daily reports written in different healthcare institutions, the resulting medication information can be used for research purposes like detecting common occurring adverse events or executing a comparative study into the effectiveness of different treatments. In this project, … Continue reading “Fien Ockers graduates on medication annotation using weak supervision”

Ismail Güçlü graduates on programmatically generating annotations for clinical data

Programmatically generating annotations for de-identificationof clinical data by Ismail Güçlü Clinical records may contain protected health information (PHI) which are privacy sensitive information. It is important to annotate and replace PHI in unstructured medical records, before being able to share the data for other research purposes. Machine learning models are quick to implement and can … Continue reading “Ismail Güçlü graduates on programmatically generating annotations for clinical data”

Ties de Kock graduates on visualization recommendation

Visualization recommendation in a natural setting by Ties de Kock Data visualization is often the first step in data analysis. However, creating visualizations is hard: it depends on both knowledge about the data and design knowledge. While more and more data is becoming available, appropriate visualizations are needed to explore this data and extract information. … Continue reading “Ties de Kock graduates on visualization recommendation”

Somto Enendu graduates cum laude on labelling document images

Predicting Semantic Labels of Text Regions in Heterogeneous Document Images by Somtochukwu Enendu This MSc thesis describes the use of sequence labeling methods in predicting the semantic labels of extracted text regions of heterogeneous electronic documents, by utilizing features related to each semantic label. In this study, we construct a novel dataset consisting of real … Continue reading “Somto Enendu graduates cum laude on labelling document images”

Jan Trienes graduates cum laude on de-identification of Dutch medical records

Comparing Rule-based, Feature-based and Deep Neural Methods for De-identification of Dutch Medical Records by Jan Trienes Unstructured information in electronic health records provide an invaluable resource for medical research. To protect the confidentiality of patients and to conform to privacy regulations, de-identification methods automatically remove personally identifying information from these medical records. However, due to … Continue reading “Jan Trienes graduates cum laude on de-identification of Dutch medical records”

Wim Florijn graduates on Semantically Grouping Search Query Data

Information Retrieval by Semantically Grouping Search Query Data by Wim Florijn Query data analysis is a time-consuming task. Currently, a method exists where word (combinations) in queries are labelled by using an information collection consisting of regular expressions. Because the information collection does not contain regular expressions from never-before seen domains, the method heavily relies … Continue reading “Wim Florijn graduates on Semantically Grouping Search Query Data”

Marieke Graef graduates cum laude on the Analysis of HPV discussions on Twitter

Responses to HPV Vaccination Campaigns in The Netherlands: an analysis of discussions on Twitter by Marieke Graef Even though the human papillomavirus vaccine (HPV) is an effective and safe instrument to decrease HPV infections and cases of several types of cancer, the Dutch HPV vaccination rate has been suboptimal from the start and has even … Continue reading “Marieke Graef graduates cum laude on the Analysis of HPV discussions on Twitter”