VisualAIze Banner
MIDS Capstone Project Summer 2024

VisualAIze: Accessible Spatial Awareness Using AI

Problem & Motivation

An estimated 2.2 billion people worldwide have vision impairment or blindness. Navigating large, crowded indoor spaces, such as airports, can be challenging for these individuals and often requires heavy reliance on assistance from others or memorization of layouts. Enhancing accessibility for visually impaired individuals improves their quality of life and promotes inclusivity and diversity in society. Accessible environments and technologies empower individuals with visual impairments to participate fully in social, educational, and professional activities. 

The VisualAIze team aims to combat these challenges by leveraging large language models (LLMs) and multimodal datasets to generate real-time descriptions of images at various airport locations. By providing detailed descriptions of a user's surroundings, we hope that visually impaired individuals can quickly understand their environment to enable them better to navigate airports independently and confidently. The VisualAIze project contributes to the broader goal of creating a more inclusive society where individuals of differing abilities can feel empowered to explore the world confidently.

Data Source & Data Science Approach

VisualAIze used the Indoor Scene Recognition Dataset from MIT, which contains indoor images of airports, including signs, various walkways, and other objects a traveler would encounter. The dataset was normalized to the size and brightness of the image. Images were labeled for object detection using Roboflow. Once images were associated with objects, the list of objects was passed to the LLM for description generation.  The description is then read aloud to the user. Users evaluate description effectiveness, and the system may prompt the user for additional images for more details. 

Evaluation

Generated image descriptions are compared to gold data image descriptions using different LLM models. ROUGE-L and BERTScore were used to evaluate the baseline of BLIP compared to a feeding rubric. Generated image descriptions are evaluated for accuracy of object detection, sign reading, and general placement of objects in the image.

Key Learnings & Impact

This project has been an impactful learning experience as the team applied cutting-edge AI and LLM methods and supported all members of society in their exploration of the world. Throughout the project development and in collaboration with users, the team found the need for individual accommodation to support independent, confident travel. 

Acknowledgments

We are grateful for the guidance and encouragement of our Capstone Advisors, Joyce Shen and Kira Wetzel, who supported and advised us on our Capstone Project for the Masters in Data Science program at the School of Information, University of California, Berkeley. 

More Information

Last updated: July 25, 2024