Vistoria: A Multimodal System to Support Fictional Story Writing through Instrumental Image-Text Co-Editing

Humans think visually—we remember in images, dream in pictures, and use visual metaphors to communicate. Yet, most creative writing tools remain text-centric, limiting how writers plan and translate ideas. We present Vistoria, a system for synchronized image-text co-editing in fictional story writing. A formative Wizard-of-Oz co-design study with 10 story writers revealed how sketches, images, and text serve as essential elements for ideation and organization. Drawing on theories of Instrumental Interaction, Vistoria introduces instrumental operations—Lasso, Collage, Perspective Shift, and Filter that enable seamless narrative exploration across modalities. A controlled study with 12 participants shows that co-editing enhances expressiveness, immersion, and collaboration, opening space for writers to follow divergent story directions and craft more vivid, detailed narratives. While multimodality increased cognitive demand, participants reported stronger senses of ownership and agency. These findings demonstrate how multimodal co-editing expands creative potential by balancing abstraction and concreteness in narrative development.

City University of Hong Kong , Hong Kong, SAR, China

Harvard University , Cambridge , Massachusetts, United States

Tongji University, Shanghai, China

University of Notre Dame, Notre Dame, Indiana, United States

The Hong Kong University of Science and Technology (Guangzhou), Guangzhou, Guangdong, China

City University of Hong Kong, Hong Kong, Hong Kong

University of Notre Dame, Notre Dame, Indiana, United States

ACM CHI Conference on Human Factors in Computing Systems

P1 - Room 115

7 件の発表

開始日時2026-04-15 18:00:00

終了日時2026-04-15 19:30:00

お気に入り

あとで読む

コレクション

Vistoria: A Multimodal System to Support Fictional Story Writing through Instrumental Image-Text Co-Editing

要旨

著者

動画

会議: CHI 2026

セッション: Designing Creative GenAI Tools