HomePro: Understanding the Cost of your Future Home
Buying your first home can be daunting, especially in a buying economy where it’s hard to tell if a house is worth the high price tag! With HomePro, we try to demystify the value of homes on the market.
Through talking with professionals in the space as well as surveying new homebuyers, we found that 85% of people we surveyed specified that utility usage and environmental risks were crucial factors influencing their purchase decisions. These factors are not being invested in by competitors in the space, so we aim to fill the missing gap within the market to address these new homebuyer concerns. By comparing a house’s expected price against it’s listed price as well as giving visibility into the real cost of running said house through time, our proof of concept product for the San Francisco Bay Area gives a view of the gap in the real estate market that is currently not filled by possible competitors.
Our project machine learning approach is two-fold. One part is working using time-series data of past utilities costs and climate data to predict the expected cost of utilities over the near future. The Second part uses aggregated metrics and statistics about recent historical listings, such as amenities, school districts, and utilities, to score currently listed houses’ value compared to others in the same neighborhood.
For our time series prediction for utilities data, we combined the publicly available usage data for electricity and gas from PG&E with the monthly weather statistics from NOAA’s weather stations to predict out the expect utilities cost over the next 5 years. After extensive testing, we used a light gradient boosting model to get a high accuracy model at the zipcode level using the average temperatures for that zipcode on a monthly basis to estimate the average cost of utilities for a house in that zipcode. We were then able to use this modeling to predict out the next 5 years worth of utility costs. In evaluating the predictions, we found that our model found that overall with the higher variation of temperatures within seasons, we will be seeing increases in utilities costs year over year in most districts.
For our price prediction modeling, we used publicly available housing listings for California and married it with the enriched data from Zillow for those houses to capture rich features to be able to use to predict the expected cost of a house using a random forest regression model. Once we had that expected cost, we compared that prediction against the actual listing price to evaluate if the listing is overpriced or underpriced compared to history. We found that with the limited features that we were able to use, the bay area does seem to be experiencing some amount of inflation of value in comparison to recent history, however the variability within the prices over time leads us to consider the need for further feature enrichment and analysis would be more beneficial before making strong assumptions.
Finally, we pulled all our modeling together into a proof of concept website reminiscent of competitors to give a sense of how our findings could be integrated into a product environment for customers to consume.
We’d like to thank Emily Allman Gump for her advice at the inception of this project on the market and customer needs from a real estate perspective.