ReCAP: Feasibility and Accuracy of Extracting Cancer Stage Information From Narrative Electronic Health Record Data [CARE DELIVERY]

QUESTION ASKED: Cancer stage, one of the most important prognostic factors for cancer-specific survival, is often documented in narrative form in electronic health records (EHRs), and as such is difficult to abstract by registrars and other secondary data users, including clinicians participating in quality reporting activities. Can the cancer stage be accurately extracted by natural language processing (NLP) of the text from EHRs? SUMMARY ANSWER: In a combined dataset of N = 2,323 patients with lung cancer (training set: n = 1,103; validation set n = 1,220), we analyzed 751,880 documents and discovered at least one stage statement for 98.6% of patients (median of 24 documents with stage statements per patient). Despite a high degree of discordance in patient records (83.6% of patients had conflicting stage statements in their HER; Fig 2), algorithmically derived stage accuracy was very high in the validation set, = 0.906 (95% CI, 0.873 to 0.939), as compared with the gold standard of tumor registrar–derived stage. METHODS: We developed an NLP algorithm to extract stage statements from machine-readable EHR documents, including automated rules to choose the most likely stage when discordance was present in the EHR; the algorithm was developed on a training set of patients with lung cancer and independently validated on a test set of patients with lung cancer who were seen at our institution. BIAS, CONFOUNDING FACTOR(S), DRAWBACKS: An exact stage (eg, stage I, stage I...
Source: Journal of Oncology Practice - Category: Cancer & Oncology Authors: Tags: Electronic health records, Methodology, Electronic health records, Prognostic Studies, Incidence trends, Diagnosis & Staging CARE DELIVERY Source Type: research