Jan 24, 2025

Master’s Project Develops Forecasting Tool to Optimize Data Center Water Usage

5th Year Master of Information and Data Science (MIDS) alums Marlon Fu, Austin Ho, Nora Povejsil, Suhas Prasad, and Derek Yao are the winners of the 5th Year MIDS Capstone Award for their project, HydroScale. Their project seeks to improve how data centers utilize water by providing granular, 72-hour ahead forecasts for water usage effectiveness (WUE) across the United States. HydroScale leverages weather, energy, and operational data to offer geospatial estimates of where water usage will be the most efficient in hopes of optimizing water management at data centers.

We interviewed the team to learn more —

What inspired your project?

Marlon: The growing demand for AI has never been more clear — as seen through weekly news headlines covering the latest models, hardware, and investments. Prior to our capstone semester, I attended the Asian American Pioneer Medal (AAPM) Symposium at Stanford University and heard from several professors — including Feifei Li, Arun Majumdar, and Raj Reddy — on their top concerns for AI in the near future. One recurring theme was AI sustainability, where the share of resources consumed by data centers is expected to more than double by the end of the decade to support the growing AI industry.

Around the same time, I completed an internship at a renewable energy company that introduced me to a few of the current industry solutions that address this problem. One such solution is the signing of power purchase agreements, which enable customers to procure energy from renewable sources. These contracts, as I eventually learned, have been growing in number between renewable energy providers and data centers in particular. This experience led me to ponder other directions that could facilitate greater data center sustainability and how data could be used in answering some of those questions.

“Our hope was, and continues to be, to communicate the details of our very complex project as clearly as possible in the hopes that others can come to understand some of the environmental dimensions of the AI boom and how data science can help address critical sustainability concerns.”

— Nora Povejsil

What was the timeline or process like from concept to final project?

Nora: Around the topic of AI sustainability, our initial pitch focused on using on-site renewable energy to power data centers. In order to refine our preliminary idea, we spent the beginning stages of the project researching current data center sustainability literature. During the process of reading everything from white papers, to news articles, to corporate sustainability pages for companies like Meta and Google, we noticed that water usage kept coming up in addition to energy concerns. This really jumped out at both me and Marlon, especially because I had worked on a water sustainability initiative before and had originally pitched a water-related project for Capstone. As a team, we decided that because data center water sustainability seemed more opaque and less thoroughly researched, there was a great opportunity for us to pivot and begin exploring water instead of energy.

With our new focus on water, we began interviewing researchers and stakeholders in the field. Some key voices we heard from were private data center operators as well as a team of researchers at UC Riverside, who wrote a paper that we drew from heavily called “Making AI Less ‘Thirsty’: Uncovering and Addressing the Secret Water Footprint of AI Models.” We learned from our stakeholder meetings that data center operations are somewhat secretive, so data availability and research on sustainability are limited. We also came to understand some of the existing knowledge gaps when it comes to water efficiency. Data center operators wanted to know long-term water availability across geographies to plan future construction projects, the best ways to schedule geospatially aware computing jobs to minimize water usage, and how much water they indirectly use through electricity consumption (sometimes called “embedded water”). This helped inform the scope of our project, and our goal became providing a forecast for water sustainability for data center operators (using a metric called “water usage effectiveness,” or “WUE”), as well as shedding light on the geospatial and temporal dynamics that impact data center water consumption.

From there, it was off to the races. We expanded on the methodology created by the team from UC Riverside to build out five years of hourly WUE data for over 1,000 points across the U.S., created clustering algorithms to reduce the number of forecasting models we needed, and took multiple modeling approaches, including using foundational models, traditional statistical models, and neural networks to create forecasts for on-site data center WUE and for indirect or “off-site” WUE. More details on our technical process are available on our project website.

We all feel lucky to have learned so much during this project. Some of the biggest technical learning moments were in designing a data architecture, learning how to store results in an AWS S3 bucket, and integrating everything into our final interactive web tool. Additionally, we came to understand the problem space of AI water sustainability that none of us had prior experience with, which was incredibly rewarding. When it came time for our final presentation, our hope was, and continues to be, to communicate the details of our very complex project as clearly as possible in the hopes that others can come to understand some of the environmental dimensions of the AI boom and how data science can help address critical sustainability concerns.

How did you work as a team?

Derek: Many of us in the group had collaborated on past projects, which gave us a strong understanding of each other’s work styles, interests, and strengths. From the very beginning, we focused on finding alignment in our capstone project goals. We knew that our work would go through many ups and downs and wanted to ensure that we had a strong enough team chemistry to push through and successfully complete the goals we set.

As expected, it wasn’t always smooth sailing. We had to pivot many times but our group’s resilience in feeling comfortable enough with each other to give honest opinions on the direction and concerns of the project ultimately resulted in us coming back stronger each time. A key lesson here is how important it is for communication to be both consistent and clear especially when thinking about creating a space where everyone can ideate and deliberate together.

Organization was another key component to our success. Nora was a great project manager and ensured we met our progress goals. Marlon was also greatly helpful as our resident subject matter expert in thinking about the bigger vision and direction of the project especially with the work that he had done diving deep into the literature at the beginning of the project. Each of us handled different aspects of the project and together as a team, we all worked to unify each of these respective components. Without any of the members of this team, we likely wouldn’t be where we are today.

How did your I School curriculum help prepare you for this project?

Suhas: We see the MIDS curriculum’s core theses as 1) instilling the fundamentals of how to tackle a problem from a data-informed perspective and 2) learning how to keep up with the aggressively evolving landscape of statistical, computational, and machine learning methods.

Regarding problem-solving: we learned how to effectively interview stakeholders, design research questions and methodologies, and evaluate our results. On the technical side, every member of our team came in with their own knowledge gained from a wide range of elective courses, from time series modeling to data visualization, to GenAI and natural language processing. With our different backgrounds and areas of study, we were able to effectively and iteratively collaborate and learn from each other during the course of the capstone project.

MIDS 271: Statistical Methods for Discrete Response, Time Series, and Panel Data was foundationally quite useful to me for this project, and showed us the value of employing scalable statistical models (i.e. SARIMA, GARCH) at the cluster level, versus the computationally expensive approaches of training a thousand individual recurrent neural networks (RNNs) and invoking foundational models at the city-level. We would at times chuckle at the irony of excess computing power being used to forecast its own environmental impact!

As a team, we jointly realize the importance of the project-based nature of MIDS curricula. Having each gone through multiple end-to-end projects in our courses, we were confident in our ability to hit the ground running and pick up new knowledge/tools when needed.

Do you have any future plans for the project?

Austin: To expand HydroScale’s potential impact on data center operators, we plan to introduce support for real-time inference on our platform. By leveraging live-streamed data directly from partnered data centers or public weather stations, we hope to create even more actionable insights to help optimize water usage for years to come. Furthermore, our team would like to further refine our onsite simulations to accommodate different operation specs of varying data centers. As we implement these major updates, we will also pursue opportunities with accelerators such as Berkeley Skydeck to help HydroScale reach its full potential.

Additionally, we are exploring a potential collaboration with CentralAxis, a startup focused on designing and managing the next generation of data centers. Through this partnership, we hope to dramatically improve how future data centers operate by having them utilize water more effectively while being environmentally sustainable.

How could this project make an impact, or, who will it serve?

Derek: Water Usage Effectiveness (WUE) metrics are highly variable across locations and over time, making highly accurate and computationally efficient prediction a significant challenge. Our work addresses this challenge by developing more computationally efficient methods for the large-scale forecasts necessary for analyzing water sustainability in data centers.

One practical application of our project is in sustainable data center planning. Building data centers involves significant fixed costs, and site selection heavily depends on the availability of both power and water. Accurate projections of WUE over 15-30 years can provide valuable insights into the sustainability of water resources in prospective locations. Such forecasts must also address ethical imperatives, including equitable water access and the consideration of which communities would be affected the most. By integrating these dimensions, decision-making tools can promote sustainable growth by balancing industrial needs with broader societal responsibilities.

Anything else you’d like to share?

Thank you to Shaolei Ren, Pengfei Li, and their team at the University of California, Riverside, whose input, methodology, and data we used to design our own processes.
Thank you to the industry experts whose interviews and direct feedback helped us deepen our understanding of the data center space. These individuals included Ivy Chan and her team at Equinix (Ivan Benitez, Arjan Westerof, Michael Gramani) as well as Jim Gao at Phaidra.ai.
We also deeply thank our capstone instructors, Joyce Shen, and Morgan Ames, for their continued guidance and mentorship throughout this project.