MIDS Capstone Project Summer 2024

DIVITIAE.AI: Empowering Your Investment Research with Artificial Intelligence

Team members

Problem & Motivation

While numerous financial intelligence platforms exist, such as AlphaSense, Google Finance, and Apple Stocks, many retail investors find themselves wasting time navigating disparate sources to gather varied financial insights. Simultaneously, sophisticated platforms such as Bloomberg Terminal are not affordable to most retail investors.

Our market research also indicates that 85% of stock investors seek to improve their investment decisions through better sources of information*.

Through our customer interviews, we have identified two significant customer needs: an efficient investment research process and a trusted source of information**.

*Based on an online survey conducted with 553 participants from July 3 to July 4, 2024.
**Based on interviews conducted with 17 potential target customers in June 2024.

Solution

DIVITIAE.AI addresses these needs by integrating diverse quantitative and qualitative data, and leveraging state-of-the-art artificial intelligence techniques to yield actionable insights.

Our platform offers users key financial metrics & ratios, enabling a comprehensive evaluation of stocks. We provide calculations of fundamental and intrinsic value based on net present value, and assess market sentiment by analyzing a company's media presence to gauge hype. Furthermore, DIVITIAE.AI delivers concise summaries of all relevant news and earnings calls, offering insights into the objectives and challenges faced by management teams.

By utilizing DIVITIAE.AI, investors can make informed decisions effortlessly, relying on trustworthy data and analysis to enhance their investment experience and support their long-term financial growth.

Data Source & Data Science Approach

DIVITIAE.AI employs a robust architecture to seamlessly integrate bleeding-edge AI technology and diverse data sources.

DIVITIAE.AI’s Data Processing Platform handles both quantitative and qualitative data seamlessly. Within the platform, the orchestrator artifact defines the underlying infrastructure and automates job execution. All jobs leverage our custom software package DIVITIAE ETL, which streamlines common functions. Once processed, the data is stored in our datalake, ready for market use via DIVITIAE application.

We use four different datasets and APIs.

For quantitative data processing, we used Polygon.io and SEC API. Polygon.io is a financial data provider that specializes in real-time and historical market data for stocks and cryptocurrencies. DIVITIAE.AI uses polygon to pull market price of stocks. SEC API provides access to a wealth of financial and regulatory data related to publicly traded companies. DIVITIAE.AI extracts information from annual / quarterly reports to compute intrinsic values.

For Generative AI, we used NewsCatcher API and Finnhub API. NewsCatcher API aggregates news articles from various sources across the web. It provides a streamlined way to access up-to-date news content. DIVITIAE.AI uses the API for the news summary. Finnhub is an API provider that offers financial data and market information to developers, investors, and businesses. DIVITIAE.AI uses the API to collect the earnings call data including transcripts.

dataset logos for polygon.io, SEC API, Newscatcher, and Finnhub

For summary generation with Generative AI, DIVITIAE utilizes the Retrieval Augmented Generation (RAG) pipeline to enhance large language models (LLMs) by integrating external knowledge sources, facilitating summary generation. RAG enables us to access up-to-date news and earnings call data, which we condense into concise summaries.

The user's query is transformed into a prompt for retrieving proprietary data. This prompt is then combined with retrieved context and LLM instructions to form a template. This template serves as input for the final LLM, contributing to the process of generating summaries.

Evaluation

Considering the diverse elements of the RAG pipeline and potential future developments, we’ve adopted a holistic evaluation approach for various modeling combinations. We tested 44 total modeling combinations. The evaluation pipeline employs 47 metrics, which are categorized into 5 categories: count metrics, semantic similarity metrics, n-gram metrics, RAGAS metrics, and inference time.

Key Learnings & Impact

Our Team has overcome multiple key challenges with valuable input from Capstone Instructors, customers, and through iterative design to gain numerous learning opportunities. Some of them include:

Technology Infrastructure - working with new platforms, ensuring component compatibility
AI Engineering - Mastering new RAG and LLM technologies by testing over 44 model combinations against 47 key evaluation metrics
Problem Space & Customer Development - narrowing the target audience and communicating value propositions / differentiators / technical solutions with survey and interview feedback
Intellectual Property Law - Managing risk given the evolution of the legal landscape for AI generated content by including data summary sources and monitoring legal decisions

We already received positive feedback from potential customers:

"Loved the simplicity. Does not have to do any math on ratios or calculations. Really impressed by AI-generated summaries."
"The news and their earnings calls summarization feature isn’t available on any other platform."

We will continue improving the solution through iterative feedback and development process. Integrating with leading brokerages or charting platforms (e.g., Fidelity with 45.1M retail investors, Robinhood with 24M customers) could expand our potential user base to 25-45M users without competing with them.

Future Work

We anticipate improving our application with development in the following areas:

Scale & Coverage of Intelligence - Expand the selection of companies and provide more specific financial details for each
Cohort Analysis - develop and deploy comparison metrics and visualization components to the user interface
User Education - adding an education component that fosters financial literacy
Accessibility - ensure multi-platform compatibility to increase accessibility

Acknowledgements

We would like to express our gratitude to:

NewsCatcher Inc. and Finnhub for generously providing us with free access to their data.
Interview Participants for sharing valuable insights and suggestions with us.
Instructors Joyce Shen and Zona Kostic for their consistent and instrumental guidance throughout the project.
NLP instructors Mark Butler and Natalie Ahn for their extensive hands-on advice on NLP development.
All Capstone classmates for their constructive feedback and inspiring ideas.

Course

Data Science 210. Capstone , Summer 2024

Class Project Gallery

More Information

DIVITIAE.AI website

DIVITIAE.AI application

finalpresentation_divitiae.pdf

DIVITIAE.AI

Video

Last updated: August 6, 2024