Natural language processing for structuring clinical text data on depression using UK-CRIS

Abstract

Background: Utilisation of routinely collected electronic health records from secondary care offers unprecedented possibilities for medical science research but can also present difficulties. One key issue is that medical information is presented as free-form text and, therefore, requires time commitment from clinicians to manually extract salient information. Natural language processing (NLP) methods can be used to automatically extract clinically relevant information. Objective: Our aim is to use natural language processing (NLP) to capture real-world data on individuals with depression from the Clinical Record Interactive Search (CRIS) clinical text to foster the use of electronic healthcare data in mental health research. Methods: We used a combination of methods to extract salient information from electronic health records. First, clinical experts define the information of interest and subsequently build the training and testing corpora for statistical models. Second, we built and fine-tuned the statistical models using active learning procedures. Findings: Results show a high degree of accuracy in the extraction of drug-related information. Contrastingly, a much lower degree of accuracy is demonstrated in relation to auxiliary variables. In combination with state-of-the-art active learning paradigms, the performance of the model increases considerably. Conclusions: This study illustrates the feasibility of using the natural language processing models and proposes a research pipeline to be used for accurately extracting information from electronic health records. Clinical implications: Real-world, individual patient data are an invaluable source of information, which can be used to better personalise treatment.

Andrey Kormilitzin
Andrey Kormilitzin
Senior Researcher

My research is centred around translating advances in mathematics, statistical machine learning and deep learning to address challenges involved in learning, inference and ethical decision making using complex biomedical and health data.

Andrea Cipriani
Andrea Cipriani
Professor of Psychiatry

My main research interest is evidence-based mental health and precision psychiatry. My research focuses on the evaluation of pharmacological, psychological and psychosocial interventions, mainly about major depression, bipolar disorder and schizophrenia

Alejo J Nevado-Holgado
Alejo J Nevado-Holgado
Associate Professor

I am an Associate Professor of the Department of Psychiatry and the Big Data Institute, and part of Dementia Research Oxford. I am very glad to supervise the AI team in the TNDR, formed by 10 excellent machine learners and bioinformaticians. Our focus is on the applications of machine learning and bioinformatics to mental health care. In addition, I also hold a position at the Big Data Institute, where we collaborate in the application of machine learning to genomics and target discovery. I am also consultant to a number of AI companies.

Related