MIDS Capstone Project Summer 2024

Olivia: AI Travel Assistant

Problem & Motivation

When looking for a place to stay in your next vacation it can be overwhelming to look for a rental property that satisfies everyone's needs. That's why we created Olivia. An AI Travel assistant that you can chat with to easily find your next rental property. You can ask it to stay within your price range, to have specific rating, to be close to a landmark and to look for specific amenities like a pool.

Data Source & Data Science Approach

For our data we used Inside Airbnb USA dataset. It is an open dataset retrieved from Kaggle: https://www.kaggle.com/datasets/konradb/inside-airbnb-usa

For our approach we firstly reduced the dataset to focus on four cities: San Francisco, New York, Washington D.C. and Los Angeles.

From there we decided to create a chatbot people could interact with, for our case we decided to try using an Large Language Model (LLM) that was pretrained and that we could fine tune with our data.

We ended up with two usable models: Olivia powered by LLaMa and Olivia powered by Claude Sonnet.

Evaluation

So for our models evaluation we did a number of test prompts and tested them in both models.

  • Basic Facts: This involves  just plain out asking for the cheapest or most expensive listing. Both models were pretty bad at this and failed to fetch the correct listings, but we’re okay with this fact given that this question is not necessarily part of our use case.
  • Basic Requirements: In this category of tests we asked the models to gives us listing that match requirements that can be obtained from the tabular data (like price range and number of beds). Both models did a great job.
  • Complex requirements: both models did great. Complex requirements refer to particular amenities, such as “pool” or “having a view”.
  • Reviews lookup: we asked the models to tell us about reviews that had a specific word. Both models are able to do the job, with the only difference that Claude needs to be given the listing, while LLaMa does this as part of the Q&A task.
  • Summarization: We asked both models to summarize the reviews of a particular listing. Both models did an acceptable job.
  • Consistency: this was tested by asking the models the same questions multiple times and in a slight different way to see if they would give the same answer. Both models were able to pass that test.
  • Finally we did some calculations on the latency and pricing, with Claude being faster but pricier and LLaMa 3 being slower but cheaper.

Key Learnings & Impact

We faced the following challenges when trying out foundational LLMs
- Infrastructure requirements. Building a foundation model from scratch is expensive and requires enormous resources, and training may take months.
- Data preparation: Most times there will be a need to create vector stores for custom data and this will require significant effort, both for the scientist and in terms of computing power.
- Front end development: Most times the output needs to be incorporated into an existing piece of software or a standalone application to be able to interact with the models.
- Bias:  Keep in mind that this models are trained using larger datasets of the internet, if you’re only fine-tuning, the bias we’ll be carried over to the fine-tuned model.

Acknowledgements

More Information

Last updated: August 8, 2024