MIDS Capstone Project Fall 2024

FreeFolio: Your Portfolio, Your Way

Problem & Motivation

Investing can be overwhelming for novice investors who lack financial expertise or access to affordable advice. Current solutions, such as human advisors or robo-advisors, are costly and often fail to provide personalized guidance aligned with users' values and goals. This leaves many retail investors without accessible, tailored investment tools.

FreeFolio addresses this gap by offering a free, AI-powered platform that simplifies portfolio creation through natural language inputs. By removing cost barriers and providing personalized, user-friendly tools, FreeFolio empowers novice and DIY investors to make informed financial decisions, democratizing access to investment opportunities.

Data Source & Data Science Approach

FreeFolio integrates structured and unstructured financial data using a sophisticated ReAct-agent system to deliver personalized portfolio recommendations and investment insights. The pipeline architecture leverages modern AI techniques, ensuring scalability and accuracy in portfolio management.

The platform relies on diverse financial data sources:

  1. Structured Data:
    • Data from SQL databases is sourced via YFinance API. This includes metrics such as market capitalization, sector performance, and key financial ratios.
    • User-specific portfolio data is maintained dynamically in memory, enabling real-time modifications and insights.
  2. Unstructured Data:
    • Regulatory filings (e.g., 10-K and 10-Q documents) are retrieved from the EDGAR database. These documents are processed into semantic embeddings using Amazon SageMaker, stored in a persistent VectorDB for quick retrieval.

These data sources ensure that FreeFolio provides both granular numerical analysis and contextual financial insights.

Data Collection, Processing, and Integration

  1. Unstructured Data Pipeline (for RAG Agent):
    • Data Collection: Financial filings are scraped from the EDGAR database, compressed into EC2 storage, and uploaded to an S3 object store.
    • Preprocessing: The documents are chunked for efficient indexing.
    • Embedding Creation: SageMaker generates embeddings for semantic search, stored in a VectorDB for fast context retrieval.
    • Integration: The Retrieval-Augmented Generation (RAG) agent queries this pipeline for relevant textual information.

Figure 1Unstructured Financial Filings Pipeline for RAG Agent

Figure 1 illustrates the unstructured data pipeline, highlighting the transformation from raw filings to query-ready embeddings.
 

  1. Structured Data Pipeline (for SQL Agent):
    • Data Collection: YFinance API.
    • Data Storage: Data is saved as flat files on EC2 and subsequently stored in an SQL database.
    • Integration: The SQL agent retrieves structured metrics for portfolio analysis.

Figure 2Structured Data Pipeline for SQL Agent

Figure 2 presents the structured data pipeline, showing the streamlined processing and integration into the SQL database.

Inference Pipeline

The inference pipeline orchestrates the interaction between data sources, tools, agents, and the user interface. This ensures seamless integration of multi-modal outputs:

  • SQL Agent: Handles structured data queries for portfolio metrics.
  • RAG Agent: Retrieves insights from unstructured textual data.
  • YFinance Plotting Tools: Visualizes data such as historical stock returns and analyst recommendations.
  • Portfolio Update Tools: Allows users to adjust portfolio weights or components dynamically.

The orchestrator agent mediates these components, ensuring correct tool selection and accurate, user-friendly outputs.

Figure 3Inference Pipeline Architecture

Figure 3 illustrates the inference pipeline, detailing the flow from data sources to the user interface.

Evaluation

FreeFolio employs a comprehensive evaluation framework tailored to the unique challenges of an AI-driven multi-agent system. The evaluation focuses on ensuring accuracy, relevance, and user satisfaction across structured and unstructured data interactions. Key components of the evaluation include:

  1. Evaluation Criteria:
    • ReAct Agents:
      • Tool Selection: Determining whether the correct tool is invoked for a given task.
      • Tool Argument Choice: Evaluating whether the agent provides the appropriate arguments for the selected tool.
      • Text Output: Assessing the clarity, relevance, and accuracy of the response, scored on a 1-5 scale.
    • SQL Agent:
      • Validates the construction of queries and the extraction of correct answers from structured data.
    • RAG Agent:
      • Ensures the relevance of retrieved unstructured data to the query.
      • Evaluates whether the extracted context and summary are accurate and meaningful.
  2. Evaluation Challenges:
    • State Dependence: User interactions, such as chat history and portfolio state, influence responses, complicating isolated tests.
    • NLP Complexity: Ambiguity and multiple valid outputs require nuanced evaluation.
    • Multi-Modal Outputs: Different response types, such as visualizations or textual explanations, add layers to assessment.
  3. Manual Evaluation Process:
    • A curated question bank tests the system’s functionality in sequence, ensuring stateful interactions.
    • Evaluation metrics are applied iteratively, recording performance and identifying areas for improvement.
    • Specific attention is given to multi-tool queries to ensure seamless orchestration between agents.
  4. Interactive Multi-Modal React Agent Evaluation:
    • Portfolio Bot: Tested with actions such as adding, removing, or adjusting portfolio weights, as well as querying portfolio characteristics.
    • YFinance Bot: Evaluated for accuracy and visualization capabilities, such as plotting historical returns, dividends, and earnings.

FreeFolio's robust evaluation framework ensures that its AI agents deliver reliable, accurate, and user-centric financial guidance. This iterative approach enhances the system’s ability to meet the complex needs of novice investors.

Key Learnings & Impact

FreeFolio represents a significant advancement in democratizing investment access. By eliminating advisor fees, users save 0.15% to 1% annually, making personalized portfolio management accessible to millions of retail investors. Moreover, the tool empowers users to make informed financial decisions, enhancing their confidence and reducing barriers to entry in the investment space.

The project demonstrates the power of AI to simplify complex financial processes. Key technical achievements include:

  • Integration of ReAct workflow to handle diverse data sources and allow natural language interaction.
  • Natural language data visualization simplifies decision-making for novice investors by creating dynamic visualizations of their proposed portfolios.
  • Development of an AI-driven chatbot that goes beyond static Q&A to provide actionable investment strategies.

FreeFolio’s impact extends beyond individual users, contributing to the broader mission of financial inclusivity and education. With planned expansions into mobile platforms and enhanced functionalities, the project is poised to redefine how people engage with investing.

Acknowledgements

We sincerely thank our Capstone instructors, Zona Kostic and Todd Holloway, for their unwavering support and insightful feedback throughout the semester. Their clear guidance has directed our project to achieve its current level of success.

Additional Credit

Data and Code Sources

  • Yahoo! Finance
  • SEC.GOV website
  • LlamaIndex User Example Guides

Technology Stack

  • Streamlit (for interactive web application)
  • Python (for scripting and data processing)
  • AWS Services (for hosting and cloud-based solutions)
Last updated: December 11, 2024