Amber Zheng, Yiqin Zhou, Maureen (Marty) Fromuth, Mahmoud Ghanem, Brendan Lundquist are the five recent Master of Information and Data Science (MIDS) alums behind QuillQuery: a capstone project focused on leveraging artificial intelligence to manage and retrieve email content from a cluttered and disorganized inbox.
The project was awarded the Hal R. Varian MIDS Capstone Award for Fall 2024.
We interviewed the team to learn more —
What inspired your project?
Marty: Many of us had personal struggles in work with high demand email volumes, trying to find that one email with the information we needed. What made it more challenging was the compounding nature of a separate account with a totally different email provider for our master’s program. Balancing the workloads between work and school are challenging, and with some of our teammates growing their families, gaining back time by finding efficiency in emails seemed like a straightforward way to help us do it all and do it well!
What was the timeline or process like from concept to final project?
Marty: In total, we went from concept to final project in ten weeks. We had a fairly strong concept for our project and key features from the beginning, which allowed us to jump in and quickly begin developing the front end and basic components of the retrieval augmentation and generation (RAG) in the backend. During this development period, we were also able to test out various elements and parameters of the RAG pipeline to ensure the highest quality of responses. We had a basic prototype running in about seven weeks but a more refined and performant project within nine weeks. We took about a week for beta testing, specifically on our web interface, and made small tweaks, wrapping up the final week of development.
“By handling the repetitive, tedious work, QuillQuery frees you to be more creative or spend more time with family — exactly what we need from AI.”
How did you work as a team? How did you work together as members of an online degree program?
Mahmoud: Our team worked incredibly well, despite being in different locations and balancing personal and professional commitments. Weekly video meetings kept us aligned, and we relied on collaborative tools like Slack and shared Drive/notebooks to stay connected. One of the most rewarding aspects was seeing how our diverse backgrounds naturally led us to specialize in different parts of the project, from building the RAG pipeline to Amazon Web Services (AWS) to designing the user interface and project management. The online nature of the program prepared us to be efficient with remote collaboration, and our motivation for the project kept us working hard until the last minute.
How did your I School curriculum help prepare you for this project?
Marty: Two of our team members had taken DATASCI 267: Generative AI and another had completed DATASCI 266: Natural Language Processing, both offered by UC Berkeley’s MIDS program. These courses provided a great framework for our understanding of the basic components of a RAG pipeline, the types of evaluation metrics that could be used to evaluate and test parameters, and the challenges associated with large language models. Similarly, thanks to previous experience in many of the core courses such as Fundamentals of Data Engineering, we had experience working with AWS and building ETL pipelines.
Mahmoud: I learned how to evaluate and fine-tune models for QuillQuery, while machine learning taught us about ETL pipelines and embeddings. The I School’s interdisciplinary approach also encouraged me to think critically about user needs and ethics when building AI solutions.
Do you have any future plans for the project?
Marty: We are all extremely excited about the results of this project and how well we were able to address search needs for such a unique data type (i.e. emails. That said, we have several other ideas we’d like to build into this prototype going forward.
First, we’d very much like to expand the type of data that a user can search for within an email. For example, many emails have attachments and several today have images within the emails themselves that have information. Currently, our prototype searches text by using a text-based embedding model. By integrating and testing a multi-modal embedding model as well as adding additional pre-processing steps for email attachments, however, this may allow for us to expand the search results for these additional email types.
Second, we also want to find ways to improve our models and thus would like to integrate a feedback feature (e.g. thumbs up vs. thumbs down) on the summary response as well as the retrieved sources.
Third, and most importantly, we want to integrate this tool into a browser plug-in with access to other email providers such as Yahoo and Outlook so users can search across any of their providers while not forcing users to support yet another email application.
Ultimately, we are excited to explore the possibilities of continuing to iterate on this prototype with the ultimate goal of eventually entering this into competitions and possibly looking to offer this to email users everywhere!
How could this project make an impact, or, who will it serve?
Brendan: We felt that this was something that could ultimately be put into the hands of anyone that is juggling through various emails and is constantly searching for something within their inbox. Something interesting that was shared with us during our final project presentation was the potential for such an application within the legal domain to allow for defendants and plaintiffs to use the tool as part of the discovery phase in legal proceedings!
Mahmoud: QuillQuery can benefit anyone overwhelmed by a cluttered inbox, like professionals who are also part time students like us. It quickly locates critical messages, organizes academic correspondence, and streamlines inbox management. By handling the repetitive, tedious work, QuillQuery frees you to be more creative or spend more time with family — exactly what we need from AI.