MIDS Capstone Project Fall 2024

RecycleRight

Team members

Problem

Recycling confusion is a major reason why many items are improperly disposed of and end up in landfills. This often happens because recycling instructions can be unclear, and recycling guidelines vary between different areas. As a result, people may find the process complicated and may not know how to recycle correctly.

Motivation

The market for wholesale recycled materials in the U.S. is valued at over $100 billion, according to IBIS research, yet only 30% of recyclable materials are successfully processed. Despite this low recycling rate, 92% of Americans believe that recycling is important, highlighting a strong public awareness of its environmental significance (report). This disconnect between belief and action underscores the need for more effective recycling systems and education to improve recycling rates.

Survey Findings

These findings are reflected in what we observed from our user survey results:

A large majority of people recycle, but are not looking up local information themselves
- 97% of users recycle
- 75% of users have not seen their local recycling guide
Average confidence in deciding what to recycle: 3/5
Level of interest from users surveyed in proposed app: 4.4/6
- 75% of users were interested in our app
- 50% were interested or very interested

“Recycling Quiz” Results

Not recyclable items:
- 35% of users believe that paper cups are recyclable
- 44% of users believe that shredded paper is recyclable
Recyclable items:
- 78% of users believe that a detergent jug is not recyclable
- 66% of users believe that hot beverage sleeves are not recyclable

Solution

Our proposed solution is an app that enables users to upload images of disposable items along with their locale details. In return, the app will provide clear, accurate, and easy-to-understand recycling instructions tailored to each item, helping users dispose of materials properly and efficiently.

Data Sources

For this project, we used a combination of image and text data. The image data was sourced from two datasets: a kaggle image dataset, consisting of images of recyclable and household waste items, and the Portland State’s Recycling dataset, consisting of images of recyclable items (KaggleDataset, PortlandDataset). These datasets provided a range of images for item and bin classification. For text data, we utilized two recycling guides: Cook County Recycling guide and the Palm Desert Curbside Recycling guide (CookCounty, PalmDesert). Additionally, we created our own test dataset, consisting of 84 images taken in four different light conditions to evaluate the model’s performance on a variety of real-world situations. We also created our own test dataset of 86 images taken in multiple different lighting conditions, and on a low-quality lense to accomodate all types of devices that users may have.

Evaluation

There are 3 stages of our model pipeline: Image Classification, Bin Classification, and Recycling Instruction Generation. The image classification stage was evaluated on classification accuracy. In the bin classification stage, the model's performance was also measured by accuracy, focusing on its ability to assign items to the correct recycling bin. Finally, in the recycling instruction generation stage, we used cosine similarity (based on a BERT model) and BLEU score to evaluate how closely the generated instructions matched the correct recycling guidelines.

Architecture

We utilized a Retrieval Augmented Generation (RAG) architecture to increase the performance of our model results and reduce the size of the context window.

Key Learnings and Impact

The project faced several challenges in developing a comprehensive recycling application. One major issue was the "End-to-End Challenge," where no single model was able to handle the entire pipeline effectively. To address this, a multi-stage model approach was implemented. Another challenge was the bin classification task, where the model often retrieved incorrect information from pre-training and ran out of context window space when processing longer recycling guides. To overcome this, embedding-based retrieval was used to shorten the context window, which improved the accuracy of recycling instructions.

Another challenge arose in the text generation phase, where models struggled to handle negative cases (e.g., items not listed in recycling guides). This was addressed by implementing a threshold value in the embedding retrieval process, ensuring that only relevant items passed through. Key learnings from the project included the effectiveness of the multi-stage model in addressing the unique requirements of recycling applications, as well as the impact of embedding-based retrieval in improving accuracy and robustness. Additionally, the threshold value helped the model remain robust against random or non-recyclable items, enhancing the overall system's reliability and performance.

Acknowledgements

We would like to thank Ramesh and Korin for helping us improve our approach and presentation skills along the way. We would also like to thank Mark and Shraddha for their assistance in model improvement and testing.

Course

Data Science 210. Capstone , Fall 2024

Class Project Gallery

More Information

GitHub

Website

Presentation Slide Deck

Last updated: December 10, 2024