Journal Citation Statistics for Library Collections using Document Reference Extraction Techniques
by Steven Verkuil
Providing access to journals often comes with a considerable subscription fee for universities. It is not always clear how these journal subscriptions actually contribute to ongoing research. This thesis provides a multistage process for evaluating which journals are actively referenced in publications. Our software tool for journal citation reports, CiteRep, is designed to aid decision making processes by providing statistics about the number of times a journal is referenced in a document set. Citation reports are automatically generated from online repositories containing PDF documents. The process of extracting citations and identifying journals is user and maintenance friendly. CiteRep allows to filter generated reports by year, faculty and study providing detailed insight in journal usage for specific user groups. Our software tool achieves an overall weighted precision and recall of 66,2% when identifying journals in a fresh set of PDF documents. While leaving open some areas of improvement, CiteRep outperforms the two most popular citation parsing libraries, ParsCit and FreeCite with respect to journal identification accuracy. CiteRep should be considered for creation of journal citation reports from document repositories.
Clone CiteRep on Github.