SceneScout: Towards AI-Driven Access to Street Level Imagery for Blind Users

要旨

People who are blind or have low-vision (BLV) may hesitate to travel independently in unfamiliar environments due to uncertainty about the physical landscape. While most tools focus on in-situ navigation assistance, those supporting pre-travel assistance typically provide information about only landmarks and turn-by-turn instructions, lacking detailed visual context. Street level imagery, which contains rich visual information and has the potential to reveal environmental details, remains inaccessible to BLV people. In this work, we present SceneScout, a multimodal large language model (MLLM)-driven prototype that enables accessible interactions with street level imagery. SceneScout supports two modes: (1) Route Preview, enabling users to familiarize themselves with visual details along a route, and (2) Virtual Exploration, enabling user-driven movement within street level imagery. Our user study demonstrates that SceneScout helps BLV users uncover visual information otherwise unavailable through existing means. An initial analysis of AI-generated descriptions suggests that majority are accurate and describe stable visual elements even in older imagery, though occasional subtle and plausible errors make them difficult to verify without sight. We discuss future opportunities and challenges of street level imagery-based navigation experiences.

著者
Gaurav Jain
Columbia University, New York, New York, United States
Leah Findlater
Apple, Cupertino, California, United States
Cole Gleason
Apple Inc., Seattle, Washington, United States

会議: CHI 2026

ACM CHI Conference on Human Factors in Computing Systems

セッション: Sound, Music, and Dance Accessibility

P1 - Room 120
7 件の発表
2026-04-15 20:15:00
2026-04-15 21:45:00