MedCoderAI
Problem & Motivation
Our project addresses the critical issue of hospital billing inaccuracies in the Diagnosis-Related Groups (DRGs) system, a standardized reimbursement method adopted by Medicare. DRGs categorize hospital admissions based on various medical factors, linking each category to a fixed payment. Inaccurate coding, whether through upcoding or downcoding, can lead to severe financial repercussions for hospitals, potential fraud charges, and negative impacts on patients and community health resources. Despite its intent to control costs and standardize payments, the DRG system faces challenges in accuracy and efficiency due to the complex nature of medical record abstraction. Typically, experienced coders manage this task, processing only a few DRGs daily, causing delays in hospital reimbursements. However, advancements in Natural Language Processing (NLP) and large language models offer a promising solution. By leveraging these technologies, the project aims to streamline the DRG classification process, enhance coding efficiency, and improve accuracy, ultimately optimizing the revenue cycle and benefiting the healthcare system.
Data Source & Data Science Approach
We utilized the MIMIC IV dataset (https://physionet.org/content/mimiciv/3.0/) as our training data source [1]. Leveraging an ensemble of machine-learning algorithms and natural language processing techniques, we aimed to achieve optimal inference results.
Previous approaches
The most recent published study from the Mayo Clinic used Low-Rank Adaptation (LoRA) to fine-tune the LLaMA-7B model with the same data source, achieving a top-1 prediction accuracy of 52% and a macro-average F1 score of 0.327 [2].
Our approach
We fine-tuned two models to predict Major Diagnostic Categories (MDC) and DRGs. The first model used the principal diagnosis ICD-10 description to predict MDCs using Snowflake Cortex. The resulting MDC was concatenated with a summarization of the discharge summary’s brief hospital course section, also using Snowflake Cortex’s summarization function. This combination of inputs was then used to fine-tune the ClinicalBERT model with the Mixture of Experts neural network to classify DRGs.
Evaluation
Our primary evaluation metrics are accuracy and F1 score.
Key Learnings & Impact
Our model significantly outperformed DRG-LLaMA in top-1 prediction. Our top-3 predictions achieved an accuracy of 88%, setting a new state-of-the-art (SOTA) for DRG assignments using the MIMIC-IV dataset.
Acknowledgement
We express our deep gratitude to our instructors, Dr. Fred Nugent, Dr. Daniel Cummings, and Dr. Korin Reid, for their guidance. We also thank our classmates and our TA for their support throughout the project.
References
- Johnson, A., Bulgarelli, L., Pollard, T., Gow, B., Moody, B., Horng, S., Celi, L. A., & Mark, R. (2024). MIMIC-IV (version 3.0). PhysioNet. https://doi.org/10.13026/hxp0-hg59.
- Wang H, Gao C, Dantona C, Hull B, Sun J. DRG-LLaMA : tuning LLaMA model to predict diagnosis-related group for hospitalized patients. NPJ Digit Med. 2024 Jan 22;7(1):16. doi: 10.1038/s41746-023-00989-3. PMID: 38253711; PMCID: PMC10803802.