Student Project


Problem & Motivation

There are over 135,000 medical billing coders in the US1 who transcribe medical notes into insurance billing codes.  It's an extremely complex task, that requires reading vast amounts of documentation and then mapping it to one of 7000 PDCs... which results in an estimated 2.5 million medical coding errors each year.


Data Source & Data Science Approach

Data sourced from

Previous approaches:
Truncate discharge notes

Our approach:
Summarized the discharge notes with a LLM, to avoid data loss from truncation.



Precision and Recall / True Positive Rate and Sensitivity



Key Learnings & Impact

New approaches and tools can have a huge impact!

This is about using LLM to summarize and using the encodings instead of text.


Lots of thanks for our amazing professors Danielle Cummings & Fred Nugen whose insight, advice and guidance helped turn our project into a reality.





Last updated: July 21, 2024