Medication annotation in medical reports using weak
by Fien Ockers
By detecting textual references to medication in the daily reports written in different healthcare institutions, the resulting medication information can be used for research purposes like detecting common occurring adverse events or executing a comparative study into the effectiveness of different treatments. In this project, 4 different models, including a CRF model and three BERT-based models, are used to solve this medication detection task. They are not only trained on a smaller manually annotated train set but also on two extended train sets that are created using two weak supervision systems, Snorkel and Skweak. It is found that the CRF model and RobBERT are the best performing models, and that performance is structurally higher for models trained on the manually annotated train set than the extended train sets. However, model performance for the extended train sets does not fall behind far, showing the potential of using a weak supervision system. Future research could either focus on training a BERT-based tokenizer and model further on the medical domain or focus on expanding the labelling functions used in the weak supervision systems to improve recall or generalize to other medication-related entities such as dosages or modes of administration.