MIDS Capstone Project Fall 2024

ICUAlert

Team members

Problem & Motivation

ICU doctors are at the frontlines of critical care, making life-saving decisions every day. However, the sheer volume and complexity of these decisions present significant challenges.

On average, ICU doctors manage 11.8 patients daily¹—a caseload that demands constant attention and rapid prioritization of care. This includes all aspects of their care, from initial evaluations to ongoing monitoring and collaboration with the care team. ICU doctors need to evaluate patient data across up to 5 modalities²—ranging from imaging studies like chest X-rays to lab results, time-series vital signs, clinical notes, and more. Synthesizing this diverse data to form a cohesive picture of the patient's condition is not only time-consuming but cognitively taxing, especially under pressure.

With such high decision volumes and complexity, there is a pressing need for tools that support ICU doctors in identifying and prioritizing the most critical patients

Our Solution

We’ve created an inference service that uses clinical notes, lab results, and/or chest x-rays (that’s right, you don’t need all of them) to produce a mortality prediction. For our minimum viable product, we’ve limited the scope to patients with pleural effusion³, a condition where fluid builds up between the chest wall and the lungs. Currently, a user fills out a patient data form, submits it to our model where our proprietary system returns a mortality risk prediction. This prediction can be used by physicians to inform their interventions, or simply to serve as a gut-check in their decision making process.

Data Source & Data Science Approach

Data Source

For our data, we used the MIMIC-IV database, which is a de-identified patient database created through a collaboration between MIT and the Beth Israel Deaconess Medical Center. This database contains over 200k patient records, including both emergency room and ICU visitations. Additionally, we leveraged the corresponding CXR and Notes databases, which contain nearly 400k X-ray images and more than 2 million radiology notes, respectively.

To build our model, we included patients who had all relevant data modalities: lab results, mortality result, X-ray images, and clinical notes. We focused on non-malignant pleural effusion as our condition of interest, narrowing the dataset to 3,400 patients. These patients were then split into training, validation, and test groups for model development and evaluation.

Modeling Approach

To develop our multimodal fusion model, we designed individual pipelines for each data modality, including CXR images, clinical notes, and lab results. These pipelines leverage pretrained models such as CheXnet for CXR images and ClinicalBERT for clinical notes, to generate high quality embeddings. Lab results are passed to the model as structured numerical data.

These streams are combined in an early-fusion deep learning framework, allowing the model to learn the interactions between modalities and make a more contextualized prediction about the patient’s health status. For example, it can connect findings from an abnormal chest X-ray with elevated lab values to generate more accurate predictions.

Additionally, the model is robust to missing data. When only two of the three modalities are available, the model dynamically adjusts by passing embeddings to an appropriate dedicated classification head, performing well even with partial information.

By combining insights from multiple types of patient data, this model provides a valuable and helpful tool to enhance clinical decision-making.

Data Pipeline/System Architecture

Our overall system architecture looks complicated, but can largely be broken down into three sections.

EDA/Data Hosting: Because of the size and complexity of our dataset, and the fact that we needed to interact with three different modalities from MIMIC IV, we wrote helper scripts to store only the data we required. Moreover, we created notebooks to address class imbalances via upsampling, as well as split our data into train/test/validation, where all training modality notebooks can use a single source of truth to ensure consistency between all modalities in their respective training and evaluation process.
Model Training for Modalities: We trained three separate modality models (CXR, lab notes, and clinical notes). Each of these processes writes out a model file (tensorflow saved model for CXR/lab, pytorch model for clinical notes) and embeddings. The embeddings are used to train our fusion model, which in turn is written out as a pytorch model. The final versions of models are pushed to AWS S3.
Hosting: We host our demo application on AWS. Our models are hosted in a FastAPI application which is containerized and pushed to ECR via CI/CD integration with AWS CodeBuild. We run our application container on an arm64-based AWS Graviton instance, mounting our models from S3 at runtime. Our decoupled application and model code approach allows us to update models without rebuilding our application, and vice versa.

Evaluation

We evaluated fusion model performance in comparison to each of the 3 individual modality models, where each single mode model acts as a baseline.

As seen in the metrics, the fusion model ended up performing better overall than any of the other single modality models, matching our initial hypothesis incorporating more types of patient data to allow for better predictive results.

Our main evaluation metric was the Area Under the Receiver Operating Characteristic (AU-ROC). The reason we chose this metric was because the goal of the model was to distinguish between patients that faced mortality in the ICU compared to those that were not. In the end, we ended with an AUROC of 0.79, which indicates that we can distinguish between the two types of patients about 79% of the time, which performed better than any single modality model.

As part of our evaluation, we wanted to identify if the fusion model was equally using all of the three modalities of the model, or was favoring any specific modality.

We compared the ROC curves of the fusion model only using 2 modalities each. This provides users context on how well the model performs when not all the data is available for a patient.

Here, the results indicate that for the fusion model, the clinical notes + lab results provided the best combination, which matches with the individual model performances, then the CXR + Clinical notes, followed by the CXR + Lab Results models. Based on these results, in terms of predictive power, the clinical notes do the best, followed by lab results, then CXR images.

The t-SNE (t-distributed Stochastic Neighbor Embedding) visualization in the image above represents a 2D projection of high-dimensional fused multi-modal embeddings.

First, the points mostly grouped into clusters, indicating that the model has learned to embed similar data points closer together in the reduced-dimensional space and that the fused embeddings preserve meaningful relationships between the modalities

Second, we can see across the board overlap between red (mortality=1) and green (mortality=0) points suggesting that there is some similarity between the embeddings despite label differences.

Key Learnings & Impact

ICUAlert delivers impact by empowering ICU doctors with a decision support tool that reduces response times, enabling faster and more informed interventions. Patients benefit from enhanced care, particularly for hard-to-detect medical conditions, as the system helps identify critical patterns across diverse data modalities. By facilitating early and accurate decision-making, ICUAlert also supports improved outcomes and potentially reduces ICU patient readmission rates, contributing to more efficient operations.

A key learning from the project is that effectively combining multiple data modalities into a fusion model, each with unique challenges, requires careful data alignment, preprocessing, and model integration.

We aim to use this project as a proof-of-concept of the capabilities of a multimodal system in mortality predictions, and hope to partner with hospitals to train new systems tailored to their needs.

Acknowledgement

We thank our Capstone Advisors, Dr. Daniel Aranki and Dr. Fred Nugen, for sharing their domain expertise with us, and for providing invaluable insights throughout the development of our multimodal system. We would also like to thank our Capstone section (009) for giving us feedback and asking thought-provoking questions throughout this four month journey. Finally, we would like to acknowledge those who contributed to MIMIC-IV dataset—researchers, healthcare workers, and especially the patients—without whom our model development would not have been possible.

Sources

Course

Data Science 210. Capstone , Fall 2024

Class Project Gallery

More Information

ICUAlert Website

ICUAlert GitHub Repo

ICUAlert Presentation Slides

Last updated: December 12, 2024