Medical literature monitoring of adverse drug reactions and special situations is an important aspect of the pharmacovigilance process and a regulatory requirement for every medicinal product available in the European market. It is however a time consuming effort requiring specialist domain knowledge and where only a small fraction of articles reviewed eventually become valid individual case safety reports (ICSRs).

We present an approach that applies machine learning models to reliably filter out irrelevant articles ahead of manual screening, based on information available in the article abstract and title. A benchmark model is trained on a labeled dataset produced for study, by asking annotators what articles include suspected adverse event mentions. This choice of label helps overcome the incomplete nature of article titles and abstracts, and produces a drug-agnostic dataset that requires less annotation effort.

Using historical data from the EMA’s own literature screening activities and a benchmark deep learning classification model, we achieve significant savings in volume of articles to be screened even when setting low target levels for false negatives (ie. high recall). For example: 44% average monthly savings (from 40% to 49%) in the volume of abstracts to screen for a target recall of 95%. These results suggest our approach is a promising use of machine learning to pragmatically reduce manual workloads in medical literature monitoring.

Reducing screening workload in medical literature monitoring with machine learning

Regulatory Science Forum

A global, member-driven non-profit organization mobilizing life sciences and healthcare. DIA fosters innovation to improve health and well-being worldwide. Professionals from 80 countries continue to engage with DIA through our unparalleled membership network, educational offerings, and professional development opportunities.