Brooke, the AI Broker
Problem & Motivation
Buying a home is not only one of the largest financial decisions a person will make in their life but can also be one of the most overwhelming and stressful. In fact
40% of Americans said that buying a house is the most stressful event in modern life
As a result of increased cost of living and rising housing prices, the median age for first-time home buyers has reached a record high of 38 years old. Furthermore, new real estate rules coming in Summer 2025 will only increase the financial burden placed on many home buyers by requiring buyers to pay their agents directly. For a median-priced home in the United States, this comes out to an additional $12.8K- $15.3K in additional costs that buyers may no longer be able to overlook. However, paying an agent is often not the ultimate solve.
Home buyers say the purchasing process is particularly difficult due to:
- Having unhelpful agent (24%)
- Struggling with negotiations (28%)
- Exceeding their budgets (40%)
Our project aims to empower users to take control of their homebuying experience by providing them with expert level guidance and support at their fingertips. BROOKE provides all the benefits of a traditional broker, without the cost and with ease of access. By integrating AI into real estate, we can help alleviate some of the financial burden of the homebuying process, allowing more individuals to buy their dream home.
Data Source
To power Brooke we made use of three key datasets: Zillow’s real estate queries, Redfin’s housing market data, tax brackets and tax rates from taxfoundation.org, mortgage rates from Freddie Mac MBS, and homeowners insurance costs from Policy Genius. The data from Zillow consists of over 20,000 synthetically generated questions and legally compliant responses covering a range of real estate specific topics. The Redfin data consists of historical home listing prices, home sale prices, active listings, and number of days on the market.
Data Science Approach
Brooke is designed to provide a holistic view of the real estate market while providing personal support and relieving the stress that comes with buying a home through three key features : 1) Chat with Brooke 2) Budget Calculator 3) Market Trends.
Chat with Brooke
To power conversation with Brooke, we opted to use the open source model DeepSeek-R1 Distill LLaMa 8B because of its high level reasoning skills as well as its fast and efficient processing. This ensures high quality responses without the wait time associated with a live broker.
In order to ensure our chatbot generates both helpful as well as legally compliant responses aligned with the Fair Housing Act and the Equal Credit Opportunity Act, we fine tuned the model using three key datasets:
Zillow’s dataset of real estate focused queries and responses generated using GPT-40
Our proprietary dataset of compliance-safe responses generated using Claude (Anthropic)
Redfin property data
To optimize performance, we fine tuned the model using Unsloth which allows for efficient memory, using 70% less memory and running 2x faster while maintaining high reliable model performance.
Budget Calculator
- Loaded, and cleaned regional level insurance and tax data. Structured the data for further analysis
- Established classification process based on property type to interact with user inputs
- Aggregated data by region to calculate housing cost ranges
- Consulted with financial expert to ensure accuracy
Market Trends
Evaluation
To ensure that chatting with Brooke delivers accurate and compliant results, we conducted an evaluation process comparing Brooke to other open source models such as:
- LLaMA3
- Mistral Small
- DeepSeek-R1(8B Distilled)
We benchmarked all of the models against Claude 3 Sonnet. We opted to use Claude 3 Sonnet because of its reasoning abilities while also producing safe and compliant answers at a high speed. We tested against 2,000 real estate focused queries, half of which were focused on safety/compliance and half that prioritized usefulness of the model. We chose to measure performance by using the BLEURT Score because it is excellent at focusing on the content of the responses rather than prioritizing exact word matches as well as its effectiveness for evaluating longer conversations with Brooke. Our model outperformed all 3 models that we compared it to across both safety and usefulness indicating that Brooke provides both highly accurate and legally compliant responses.
Key Learnings & Impact
At the start of the semester, we struggled to find data that would enable us to bring our vision to life while also differentiating ourselves from existing real estate platforms.
Acknowledgements
We would like to extend our gratitude to Danielle Cummings and Todd Holloway for their guidance and support throughout the past semester as we brought our vision to life. You have provided insights that challenged us to improve our product in ways we on our own would not have considered. We would also like to thank Sandra Dasaad,---, and —-, our domain experts who helped determine the vision and viability of this project by sharing domain expertise and background to how Brooke could be best utilized in the market. Lastly, we would like to thank—, our financial expert who ensured our budget calculations are sufficient and up to par.