MIDS Capstone Project Fall 2024

QuillQuery: Revolutionizing Email Search for Smarter Summaries

Problem & Motivation

The average business user managed around 126 emails daily in 2019—a number that has surged with the rise of remote work. Today, professionals spend nearly 28% of their workday managing emails, often adopting strategies of either clearing out or leaving everything in their inbox. This creates a strong need for more productive email management. Despite the critical role email plays in professional communication, platforms like Gmail and Outlook still rely heavily on rules-based and keyword searches, making it tedious and inefficient to find relevant information.

In a typical business setting, emails flow internally and externally, across numerous threads and from multiple contacts, making it easy for important information to get lost. Our target users need fast and reliable access to critical information in their inboxes to make timely business decisions. QuillQuery aims to bridge this gap, providing a smarter, more efficient way to manage and retrieve essential email content.

Our Mission

We're reimagining email search, giving users the power of the pen in how they sift through their digital correspondence. By harnessing cutting-edge artificial intelligence approaches, we let users pick their favorite language wizard to conjure up lightning-fast results across all their inboxes. From Gmail gardens to Outlook oases, we empower users to craft queries that unlock the secrets of their inbox with consistent ease and elegance.

Architecture 

Data Sources & Data Science Approach

QuillQuery utilizes Retrieval-Augmented Generation (RAG) to enable seamless, flexible search with the user’s preferred Language Model (LLM) across platforms like Gmail and Outlook.

We fine-tune our RAG model with the Enron dataset, a rich collection of over 500,000 real-world emails from 150 users, enhancing the model's ability to generate accurate, contextually relevant results in diverse email scenarios.

Prior to training, we apply character-based chunking to split emails by paragraphs, line breaks, or words to retain context. For shorter emails, we merge the subject line with the body text to improve summaries, thereby enhancing the model’s accuracy in retrieving and generating relevant responses.

Model Evaluation 

We assessed our model through a comprehensive evaluation of multiple pipeline configurations, totaling 192 runs. The pipeline design incorporated a standard embedding layer, a cosine similarity retriever, a shared prompt, and reranking with cross-encoding. Generated responses were compared to gold standard answers for 10 user-specific questions, tested on emails from a selected Enron employee. After narrowing down to four optimal parameter combinations, we conducted additional evaluations using questions and emails from a second Enron employee to finalize the selection. To measure performance, we used a combination of state-of-the-art evaluation metrics and human review. The evaluation included a weighted metric average of BLEU (5%), RAGAS faithfulness (15%), RAGAS answer relevancy (40%), and RAGAS answer correctness (40%). Each automated metric was supplemented with human review of generated answers to ensure quality and accuracy.

Key Learnings & Impact 

This project demonstrated that fine-tuning parameters like top_k, context length, and sensitivity is crucial for optimizing response accuracy and relevance. Metadata-based querying, such as filtering by sender or date, enhanced the model’s performance on complex queries, while combining automated metrics with human review ensured responses were both accurate and user-friendly.

The pipeline’s flexibility also makes it adaptable to various platforms, with potential for future integration into web and mobile email applications. User-focused features like attachment search and sensitivity settings boosted usability, paving the way for broader adoption. Overall, this work can greatly improve productivity for users managing large volumes of information, and advances NLP research in retrieval and summarization for context-rich queries.

Future Work

  • Towards Seamless Integration: Our MVP is currently available as a web-based application. To start the enhanced email query experience, users log in and grant email access, with options to load emails via “Drag & Drop” or “Log Into an Account.” Looking ahead, we aim to provide an integrated in-app experience through a plug-in compatible with any web-based or smartphone email app.
  • Sensitivity Adjustment: Introduce a “sensitivity” setting for users, allowing them to increase the top_k parameter and add more context as needed. Higher sensitivity levels will provide deeper retrieval and more context for sensitive queries.
  • Enhanced Retriever Performance with Metadata: Improve query handling by enabling the retriever to incorporate metadata fields (e.g., sender, date, subject) for more targeted results. This will allow users to run specific queries like, “Summarize the emails Jeff sent me last Monday on Project Lemonade,” improving the accuracy and relevance of responses.

Acknowledgements

Last updated: November 14, 2024