Supporting Accessibility of Text, Image and Video B

会議の名前
CHI 2024
Caption Royale: Exploring the Design Space of Affective Captions from the Perspective of Deaf and Hard-of-Hearing Individuals
要旨

Affective captions employ visual typographic modulations to convey a speaker's emotions, improving speech accessibility for Deaf and Hard-of-Hearing (DHH) individuals. However, the most effective visual modulations for expressing emotions remain uncertain. Bridging this gap, we ran three studies with 39 DHH participants, exploring the design space of affective captions, which include parameters like text color, boldness, size, and so on. Study 1 assessed preferences for nine of these styles, each conveying either valence or arousal separately. Study 2 combined Study 1's top-performing styles and measured preferences for captions depicting both valence and arousal simultaneously. Participants outlined readability, minimal distraction, intuitiveness, and emotional clarity as key factors behind their choices. In Study 3, these factors and an emotion-recognition task were used to compare how Study 2's winning styles performed versus a non-styled baseline. Based on our findings, we present the two best-performing styles as design recommendations for applications employing affective captions.

受賞
Honorable Mention
著者
Caluã de Lacerda Pataca
Rochester Institute of Technology, Rochester, New York, United States
Saad Hassan
Tulane University, New Orleans, Louisiana, United States
Nathan Tinker
Rochester Institute of Technology, Rochester, New York, United States
Roshan L. Peiris
Rochester Institute of Technology, Rochester, New York, United States
Matt Huenerfauth
Rochester Institute of Technology, Rochester, New York, United States
論文URL

doi.org/10.1145/3613904.3642258

動画
SPICA: Interactive Video Content Exploration through Augmented Audio Descriptions for Blind or Low-Vision Viewers
要旨

Blind or Low-Vision (BLV) users often rely on audio descriptions (AD) to access video content. However, conventional static ADs can leave out detailed information in videos, impose a high mental load, neglect the diverse needs and preferences of BLV users, and lack immersion. To tackle these challenges, we introduce SPICA, an AI-powered system that enables BLV users to interactively explore video content. Informed by prior empirical studies on BLV video consumption, SPICA offers novel interactive mechanisms for supporting temporal navigation of frame captions and spatial exploration of objects within key frames. Leveraging an audio-visual machine learning pipeline, SPICA augments existing ADs by adding interactivity, spatial sound effects, and individual object descriptions without requiring additional human annotation. Through a user study with 14 BLV participants, we evaluated the usability and usefulness of SPICA and explored user behaviors, preferences, and mental models when interacting with augmented ADs.

著者
Zheng Ning
University of Notre Dame, Notre Dame, Indiana, United States
Brianna L. Wimer
University of Notre Dame, South Bend, Indiana, United States
Kaiwen Jiang
Beijing Jiaotong University, Beijing, China
Keyi Chen
University of California San Diego, San Diego, California, United States
Jerrick Ban
University of Notre Dame, Notre Dame, Indiana, United States
Yapeng Tian
University of Texas at Dallas, Richardson, Texas, United States
Yuhang Zhao
University of Wisconsin-Madison, Madison, Wisconsin, United States
Toby Jia-Jun. Li
University of Notre Dame, Notre Dame, Indiana, United States
論文URL

doi.org/10.1145/3613904.3642632

動画
An AI-Resilient Text Rendering Technique for Reading and Skimming Documents
要旨

Readers find text difficult to consume for many reasons. Summarization can address some of these difficulties, but introduce others, such as omitting, misrepresenting, or hallucinating information, which can be hard for a reader to notice. One approach to addressing this problem is to instead modify how the original text is rendered to make important information more salient. We introduce Grammar-Preserving Text Saliency Modulation (GP-TSM), a text rendering method with a novel means of identifying what to de-emphasize. Specifically, GP-TSM uses a recursive sentence compression method to identify successive levels of detail beyond the core meaning of a passage, which are de-emphasized by rendering words in successively lighter but still legible gray text. In a lab study (n=18), participants preferred GP-TSM over pre-existing word-level text rendering methods and were able to answer GRE reading comprehension questions more efficiently.

著者
Ziwei Gu
Harvard University, Cambridge, Massachusetts, United States
Ian Arawjo
Harvard University, Cambridge, Massachusetts, United States
Kenneth Li
Harvard University, Cambridge, Massachusetts, United States
Jonathan K.. Kummerfeld
The University of Sydney, Sydney, NSW, Australia
Elena L.. Glassman
Harvard University, Allston, Massachusetts, United States
論文URL

doi.org/10.1145/3613904.3642699

動画
Investigating Use Cases of AI-Powered Scene Description Applications for Blind and Low Vision People
要旨

“Scene description” applications that describe visual content in a photo are useful daily tools for blind and low vision (BLV) people. Researchers have studied their use, but they have only explored those that leverage remote sighted assistants; little is known about applications that use AI to generate their descriptions. Thus, to investigate their use cases, we conducted a two-week diary study where 16 BLV participants used an AI-powered scene description application we designed. Through their diary entries and follow-up interviews, users shared their information goals and assessments of the visual descriptions they received. We analyzed the entries and found frequent use cases, such as identifying visual features of known objects, and surprising ones, such as avoiding contact with dangerous objects. We also found users scored the descriptions relatively low on average, 2.7 out of 5 (SD=1.5) for satisfaction and 2.4 out of 4 (SD=1.2) for trust, showing that descriptions still need significant improvements to deliver satisfying and trustworthy experiences. We discuss future opportunities for AI as it becomes a more powerful accessibility tool for BLV users.

著者
Ricardo E.. Gonzalez Penuela
Cornell Tech, Cornell University, New York, New York, United States
Jazmin Collins
Cornell University, Ithaca, New York, United States
Cynthia L. Bennett
Google, New York, New York, United States
Shiri Azenkot
Cornell Tech, New York, New York, United States
論文URL

doi.org/10.1145/3613904.3642211

動画
From Provenance to Aberrations: Image Creator and Screen Reader User Perspectives on Alt Text for AI-Generated Images
要旨

AI-generated images are proliferating as a new visual medium. However, state-of-the-art image generation models do not output alternative (alt) text with their images, rendering them largely inaccessible to screen reader users (SRUs). Moreover, less is known about what information would be most desirable to SRUs in this new medium. To address this, we invited AI image creators and SRUs to evaluate alt text prepared from various sources and write their own alt text for AI images. Our mixed-methods analysis makes three contributions. First, we highlight creators’ perspectives on alt text, as creators are well-positioned to write descriptions of their images. Second, we illustrate SRUs’ alt text needs particular to the emerging medium of AI images. Finally, we discuss the promises and pitfalls of utilizing text prompts written as input for AI models in alt text generation, and areas where broader digital accessibility guidelines could expand to account for AI images.

著者
Maitraye Das
Northeastern University, Boston, Massachusetts, United States
Alexander J.. Fiannaca
Google, Seattle, Washington, United States
Meredith Ringel. Morris
Google DeepMind, Seattle, Washington, United States
Shaun K.. Kane
Google Research, Boulder, Colorado, United States
Cynthia L. Bennett
Google, New York, New York, United States
論文URL

doi.org/10.1145/3613904.3642325

動画