GenAssist: Making Image Generation Accessible

Blind and low vision (BLV) creators use images to communicate with sighted audiences. However, creating or retrieving images is challenging for BLV creators as it is difficult to use authoring tools or assess image search results. Thus, creators limit the types of images they create or recruit sighted collaborators. While text-to-image generation models let creators generate high-fidelity images based on a text description (i.e. prompt), it is difficult to assess the content and quality of generated images. We present GenAssist, a system to make text-to-image generation accessible. Using our interface, creators can verify whether generated image candidates followed the prompt, access additional details in the image not specified in the prompt, and skim a summary of similarities and differences between image candidates. To power the interface, GenAssist uses a large language model to generate visual questions, vision-language models to extract answers, and a large language model to summarize the results. Our study with 12 BLV creators demonstrated that GenAssist enables and simplifies the process of image selection and generation, making visual authoring more accessible to all.

University of Texas, Austin, Austin, Texas, United States

Carnegie Mellon University, Pittsburgh, Pennsylvania, United States

University of Texas, Austin, Austin, Texas, United States

https://doi.org/10.1145/3586183.3606735

ACM Symposium on User Interface Software and Technology

Gold Room

6 件の発表

開始日時2023-10-31 18:00:00

終了日時2023-10-31 19:20:00

お気に入り

あとで読む

コレクション

要旨

受賞
Best Paper

著者

論文URL

動画

会議: UIST 2023

セッション: Inclusive Interactions: Accessibility Techniques and Systems

GenAssist: Making Image Generation Accessible

要旨

受賞Best Paper

著者

論文URL

動画

会議: UIST 2023

セッション: Inclusive Interactions: Accessibility Techniques and Systems

受賞
Best Paper