MarketEdge: A Time on Market Prediction Tool
Problem & Motivation
The real estate market is highly dynamic, with property values and market conditions constantly changing due to various factors such as local economic trends, interest rates, neighborhood developments, and broader market sentiment. Homeowners, investors, and real estate professionals often struggle to determine the best time to sell a property and accurately estimate how long it will take to sell once listed. This lack of clarity can result in missed opportunities, prolonged market durations, and financial losses due to suboptimal selling conditions. Our project aims to solve these challenges by providing an intelligent, data-driven tool that evaluates the optimal timing for property sales and predicts market duration.
Data Source & Data Science Approach
We utilize three data sources: Zillow, Realtor.com and Massachusetts housing data to gather comprehensive insights and explore the relationship between listing price and days on market (DOM). Our approach starts with the Massachusetts housing data which has large zip code coverage and train it in an optimized XGBoosting model to identify patterns and predict the initial DOM from a set of extracted feature engineering metrics. Once we have this prediction, we incorporate the Zillow and Realtor.com dataset to adjust and refine our results in two ways:
- We adjust for inflation
- We also derive a multiplier or coefficient from the Zillow and Realtor.com dataset to estimate how changes in listing price will affect days on market
Finally, we use this coefficient to generate a range of paired values for both listing price and days on market. These values allow users to interact with the data through sliders. So, a user can input a listing price and get a prediction for days on market, or adjust either slider to explore how different prices or time frames might affect their property. The sliders work within a range, for instance, $100,000 more or less than the initial listing price, with roughly 10 preset values.
Evaluation
Our evaluation process uses Mean Absolute Error (MAE) as the benchmark metric for testing and validating datasets fed into our XGBoost model. We conducted an ablation study and employed a grid search algorithm to fine-tune hyper-parameters, including n_estimators, learning_rate, max_depth, and subsample. As a result, the optimized model achieved a MAE of 26 days for DOM. Our dataset exhibits a long-tail distribution, prompting us to apply a log transformation to the metrics in an effort to address this issue. However, this adjustment led to worse performance outcomes, indicating further opportunities to explore alternative solutions for this challenge.
Key Learnings & Impact
At the start, we faced significant challenges in finalizing an approach due to the limited quality and quantity of our data. By leveraging our product's unique scope, domain expertise, and innovative data science techniques, we developed a creative solution to address key challenges in today’s real estate industry. Despite being a cornerstone of the global economy for thousands of years, the real estate sector has been slow to adopt advancements in AI and digitization. Much of this resistance stems from the industry's reliance on an inherently volatile market and the rigid protocols governing housing processes. Through this project, our team aims to rethink and reinvent traditional practices in the real estate industry, fostering a more efficient and equitable landscape for all stakeholders.
Acknowledgements
We extend our gratitude to our advisors, Zona Kostic and Morgan Ames, and domain experts as well as our colleagues in the real estate technology/investor sector for their invaluable support and input to this project. Their collaboration has been crucial in advancing our research and achieving our goals.