MIDS Capstone Project Fall 2023

College Matchmaker

Problem & Motivation

Addressing the complexity in college selection, this project aims to improve the process by leveraging data science. It emerged from recognizing a gap in how students and counselors approach college decisions, influenced by changing trends and an information gap regarding available university data. The project's motivation deepened through insights from domain experts, particularly a high school guidance councelor, emphasizing the challenges in college recommendations and the need for comprehensive support systems. The main motivation is to provide a tool that would allow students to move past analysis paralysis, and choose a university that is the right fit for them based on their goals and preferences. 

Data Source & Data Science Approach

The project employs an extensive dataset, including data from the Integrated Postsecondary Education Data System (IPEDS) to provide a holistic view of colleges. It utilizes advanced data science methods like clustering and k-NN algorithms, enriched by data transformation, classification, and human insights into the college selection process. These methods allow for a deep understanding of student preferences in college selection. One of the main challenges we faced was aquiring complete data, as methadologies for classifying universities change throughout historical data. Our team tackled this using a VAE transformation, implementing imputation on specific fields, and by using a surgate model to pull out important features for exaplanability.

Evaluation

The tool's efficacy is gauged by its ability to provide nuanced, tailored recommendations and its effectiveness in automating data retrieval, thereby simplifying the college selection process for students and counselors. Its value lies in broadening college exploration and delivering comprehensive insights. We conducted a usability study to evaluate responses from students, and after multiple iterations of our product, we landed on an approach that provided both a recommendation, and insights into why the recommendation was made using SHAPLY, allowing users to pinpoint key factors that influence the recommendation.

Key Learnings & Impact

The project underscores the significance of data-driven decision-making in education. By integrating feedback from admission professionals and aggregating diverse data sources, it offers a novel approach to college selection, targeting informational aspects of the process. The tool aims to bridge the information gap and support informed decision-making. Through usability testing we found that some students found the tool useful in introducing them to new schools, while others felt their recommendations were tailored in such a way that the schools recommended were already well known to the student. This is an improvement we would like to make by integrating more nuanced data.  

Acknowledgements

This initiative is a collaborative effort of a team dedicated to transforming education through data science. The project benefits from the insights of professionals in the admissions space as well as students, and is driven by the vision of enhancing the college selection experience for students and counselors.

More Information

Last updated: December 12, 2023