Jul 29, 2024

Award-Winning Data Science Project Continues to Revolutionize Legionnaires’ Disease Investigation and Prevention

A Hal R. Varian Award-winning capstone project from 2021 has been published in the July 2024 issue of The Lancet Digital Health, an internationally trusted source of clinical, public health, and global health knowledge.

Master of Information and Data Science alums Karen Wong, Thaddeus Segura, Gunnar Mein, and Jia Lu (MIDS ’21) were the Spring 2021 Hal R. Varian Capstone Award winners. Their project, “TowerScout,” aimed to train a deep learning model to identify cooling towers from aerial imagery, which the team hoped would help public health teams efficiently investigate Legionnaires’ disease outbreaks.

Legionnaires’ disease, a severe pneumonia with a fatality rate of 8%-12%, is usually transmitted through water or mist containing Legionella bacteria. Freshwater environments such as lakes often naturally have low levels of Legionella, but this number grows and becomes deadly when the bacteria enters building water systems. As a result, recent outbreaks of Legionnaires’ have been found in cooling towers above apartment buildings, which — when inadequately maintained — can then be spread by fans across surrounding areas and cause entire communities to catch the disease.

The concept of “TowerScout” was first developed by Wong, who was working as a medical officer with the Centers for Disease Control and Prevention (CDC) and had worked on several disease outbreaks, and CDC epidemiologist Chris Edens. The two noticed that in cases of Legionnaires’ disease, most cities did not have datasets of cooling tower locations in the U.S., which made it difficult to track down sources of outbreaks and drastically slowed down the CDC’s ability to respond. Instead, investigators would have to manually look at aerial imagery to identify cooling towers, which was labor-intensive and time-consuming.

“Every time the CDC uses TowerScout for a Legionnaires’ disease investigation, it underscores how working closely with domain experts can take a data science project from a promising idea to something that has real-world impact.”

— Karen Wong

TowerScout remedies this issue with its two-stage model, which is trained on both manually annotated satellite imagery and synthetically generated images of cooling towers. The first stage detects objects within an image tile and labels them as having a low, intermediate, or high chance of being a cooling tower. Only those labeled intermediate move on to the second stage, which algorithmically runs more tests on the image before officially making a classification. When compared to manual identification by trained CDC investigators, the model was shown to be 600 times faster while maintaining high accuracy.

Since winning the Hal R. Varian Award, TowerScout has assisted the CDC with at least 24 Legionnaires’ disease outbreak investigations across 12 states and has been used to facilitate the creation of cooling tower registries, which are considered best practice to prevent and respond to outbreaks of the disease. Multiple jurisdictions such as Utah and Los Angeles are also adopting this technology to detect cooling towers in highly urbanized areas.

“We researched the goal of TowerScout thoroughly before we wrote a single line of code. Every time the CDC uses TowerScout for a Legionnaires’ disease investigation, it underscores how working closely with domain experts can take a data science project from a promising idea to something that has real-world impact,” said Wong.

“From the start, it has been important to our team to make TowerScout available to those who need it. We made our code open-source, presented at conferences and workshops for frontline public health workers, and met with local public health jurisdictions to help them implement TowerScout on their own. Publishing in The Lancet Digital Health as an Open Access article means this work can better reach the global public health community.”

The team also thanked advisors Fred Nugen and Alberto Todeschini for their assistance and mentorship throughout this project.


Top Master of Information and Data Science projects are awarded the Hal R. Varian MIDS Capstone Award, named in honor of professor emeritus Hal R. Varian, the founding dean of the Berkeley School of Information and currently the chief economist of Google.

Last updated: November 8, 2024