MIDS Capstone Project Summer 2024

Medycode Assistant

Imagine a world where medical coding is fast, accurate, and effortless—MedyCode makes this a reality.

Medical coding translates medical diagnoses, procedures, and services into alphanumeric codes for billing, data tracking, and research. This is typically an administrative and financial task that doesn't require a medical degree. However, due to a 30% shortage of coders, healthcare providers often handle this task themselves, adding hours of late-night work to their already high-stress, long-hour jobs. This additional administrative burden significantly increases stress levels among physicians, contributing to frustration, dissatisfaction, and burnout, which can interfere with their job performance and personal well-being. 

Our mixed-methods team, which includes experts from the medical industry, is committed to revolutionizing the medical coding process through automation. We aim to reduce errors, boost efficiency, and, most importantly, alleviate the significant administrative burden on healthcare providers. 

Data Source & Data Science Approach 

MedyCode is an NLP-based application powered by advanced LLM and Retrieval Augmented Generation (RAG) techniques. It processes clinical notes and returns the corresponding ICD-10 codes. The app features a user-friendly clinical note input tool on the front end, while the backend leverages data from MIMIC-IV, Name Entity Recognition (NER) from John Snow Labs, and a Neo4j graph to structure relationships between medical entities. 

Baseline Model

  • Data Input: Medical professionals input clinical notes.
  • Local LLM Processing: Local LLM interprets the notes.
  • Prompt Generation: LLM creates a prompt with essential details.
  • Further LLM Processing: Another local LLM maps the prompt to ICD-10 codes.
  • ICD-10 Code Prediction: The final prompt yields the ICD-10 code.

RAG Model

  • Retrieval: Searches a knowledge graph built from the medical entities for relevant information.
  • Query Augmentation: Enriches the prompt with additional context from the knowledge graph for better predictions.

Evaluation 

We conducted a series of experiments both with and without the RAG pipeline. We tested two LLMs, Mistral-7b Instruct and Llama-3 8B Instruct, alongside various embedding models and prompts at different stages of our pipeline. Our RAG architecture achieves up to 4 times the accuracy in ICD-10 code assignments compared to non-RAG methods, significantly outperforming traditional manual coding. Validation shows a substantial reduction in coding errors and enhanced consistency. 

Key Learnings & Impact

Our work will particularly benefit under-resourced healthcare entities, such as federally qualified health centers, rural clinics, and overburdened healthcare providers who often handle coding tasks themselves. A doctor can simply enter patient notes, and our app handles the rest, ensuring precision and efficiency in medical coding like never before.

Our project implements a very new technology. This area of research is actively evolving, and the first papers describing a similar approach are from earlier this year. Many organizations are dedicating significant time, infrastructure, and funding to develop AI medical coding solutions. For the purposes of our project, we limited our approach to the diseases of the circulatory system; however, as we build a more robust knowledge graph, we expect our accuracy to increase. For future work, we aim to scale to all ICD-10 codes and seamlessly integrate our product into existing systems to ease our clients' workload.

Acknowledgments

A special thank you to Todd Holloway, Fred Nugen, and Mark Butler for their support and guidance on our project. We would also like to express our gratitude to our content advisors and user testers: Dr. Yasmin Azar, Dr. Glubok Gonzalez, MD, and Dr. Judson Brewer. 

John Snow Labs:
Kocaman, V., & Talby, D. (2021). Spark NLP: Natural Language Understanding at Scale. Software Impacts, 8, 100058. ISSN 2665-9638. https://doi.org/10.1016/j.simpa.2021.100058

MIMIV-IV Dataset: 
Johnson, A., Bulgarelli, L., Pollard, T., Horng, S., Celi, L. A., and Mark, R., "MIMIC-IV (version 2.2)," PhysioNet, 2023.

Last updated: August 6, 2024