2. Contextual Augmentations | UIST 2024 | Paper Guilds (ペーパーギルド)

UIST 2024

3. Learning to Learn

Blind and low-vision (BLV) people rely on GPS-based systems for outdoor navigation. GPS's inaccuracy, however, causes them to veer off track, run into obstacles, and struggle to reach precise destinations. While prior work has made precise navigation possible indoors via hardware installations, enabling this outdoors remains a challenge. Interestingly, many outdoor environments are already instrumented with hardware such as street cameras. In this work, we explore the idea of repurposing existing street cameras for outdoor navigation. Our community-driven approach considers both technical and sociotechnical concerns through engagements with various stakeholders: BLV users, residents, business owners, and Community Board leadership. The resulting system, StreetNav, processes a camera's video feed using computer vision and gives BLV pedestrians real-time navigation assistance. Our evaluations show that StreetNav guides users more precisely than GPS, but its technical performance is sensitive to environmental occlusions and distance from the camera. We discuss future implications for deploying such systems at scale.

Columbia University, New York, New York, United States

Columbia University , New York, New York, United States

Columbia University, New York, New York, United States

Columbia University , New York , New York, United States

Columbia University, New York, New York, United States

Columbia University, New York, New York, United States

Lehman College, Bronx, New York, United States

New York University, New York, New York, United States

Pomona College, Claremont, California, United States

New York City College of Technology, Brooklyn , New York, United States

Columbia University, New York, New York, United States

Columbia University, New York, New York, United States

Columbia University, New York, New York, United States

Columbia University , New York, New York, United States

Columbia University, New York, New York, United States

https://doi.org/10.1145/3654777.3676333

お気に入り

あとで読む

コレクション

Automated live visual descriptions can aid blind people in understanding their surroundings with autonomy and independence. However, providing descriptions that are rich, contextual, and just-in-time has been a long-standing challenge in accessibility. In this work, we develop WorldScribe, a system that generates automated live real-world visual descriptions that are customizable and adaptive to users' contexts: (i) WorldScribe's descriptions are tailored to users' intents and prioritized based on semantic relevance. (ii) WorldScribe is adaptive to visual contexts, e.g., providing consecutively succinct descriptions for dynamic scenes, while presenting longer and detailed ones for stable settings. (iii) WorldScribe is adaptive to sound contexts, e.g., increasing volume in noisy environments, or pausing when conversations start. Powered by a suite of vision, language, and sound recognition models, WorldScribe introduces a description generation pipeline that balances the tradeoffs between their richness and latency to support real-time use. The design of WorldScribe is informed by prior work on providing visual descriptions and a formative study with blind participants. Our user study and subsequent pipeline evaluation show that WorldScribe can provide real-time and fairly accurate visual descriptions to facilitate environment understanding that is adaptive and customized to users' contexts. Finally, we discuss the implications and further steps toward making live visual descriptions more context-aware and humanized.

University of Michigan, Ann Arbor, Michigan, United States

University of Michigan, Ann Arbor, Michigan, United States

University of Michigan, Ann Arbor, Michigan, United States

https://doi.org/10.1145/3654777.3676375

お気に入り

あとで読む

コレクション

Cooking is a central activity of daily living, supporting independence as well as mental and physical health. However, prior work has highlighted key barriers for people with low vision (LV) to cook, particularly around safely interacting with tools, such as sharp knives or hot pans. Drawing on recent advancements in computer vision (CV), we present CookAR, a head-mounted AR system with real-time object affordance augmentations to support safe and efficient interactions with kitchen tools. To design and implement CookAR, we collected and annotated the first egocentric dataset of kitchen tool affordances, fine-tuned an affordance segmentation model, and developed an AR system with a stereo camera to generate visual augmentations. To validate CookAR, we conducted a technical evaluation of our fine-tuned model as well as a qualitative lab study with 10 LV participants for suitable augmentation design. Our technical evaluation demonstrates that our model outperforms the baseline on our tool affordance dataset, while our user study indicates a preference for affordance augmentations over the traditional whole object augmentations.

University of Washington, Seattle, Washington, United States

University of Washington, Seattle, Washington, United States

University of Washington, Seattle, Washington, United States

University of Washington, Seattle, Washington, United States

Sungkyunkwan University, Suwon, Korea, Republic of

University of Washington, Seattle, Washington, United States

University of Washington, Seattle, Washington, United States

University of Texas at Dallas, Richardson, Texas, United States

University of Wisconsin-Madison, Madison, Wisconsin, United States

https://doi.org/10.1145/3654777.3676449

お気に入り

あとで読む

コレクション

Blind and low vision (BLV) developers create websites to share knowledge and showcase their work. A well-designed website can engage audiences and deliver information effectively, yet it remains challenging for BLV developers to review their web designs. We conducted interviews with BLV developers (N=9) and analyzed 20 websites created by BLV developers. BLV developers created highly accessible websites but wanted to assess the usability of their websites for sighted users and follow the design standards of other websites. They also encountered challenges using screen readers to identify illegible text, misaligned elements, and inharmonious colors. We present DesignChecker, a browser extension that helps BLV developers improve their web designs. With DesignChecker, users can assess their current design by comparing it to visual design guidelines, a reference website of their choice, or a set of similar websites. DesignChecker also identifies the specific HTML elements that violate design guidelines and suggests CSS changes for improvements. Our user study participants (N=8) recognized more visual design errors than using their typical workflow and expressed enthusiasm about using DesignChecker in the future.

University of Texas, Austin, Austin, Texas, United States

University of Texas, Austin, Austin, Texas, United States

https://doi.org/10.1145/3654777.3676369

お気に入り

あとで読む

コレクション

3. Learning to Learn