SketchFlex: Facilitating Spatial-Semantic Coherence in Text-to-Image Generation with Region-Based Sketches

要旨

Text-to-image models can generate visually appealing images from text descriptions. Efforts have been devoted to improving model controls with prompt tuning and spatial conditioning. However, our formative study highlights the challenges for non-expert users in crafting appropriate prompts and specifying fine-grained spatial conditions (e.g., depth or canny references) to generate semantically cohesive images, especially when multiple objects are involved. In response, we introduce SketchFlex, an interactive system designed to improve the flexibility of spatially conditioned image generation using rough region sketches. The system automatically infers user prompts with rational descriptions within a semantic space enriched by crowd-sourced object attributes and relationships. Additionally, SketchFlex refines users' rough sketches into canny-based shape anchors, ensuring the generation quality and alignment of user intentions. Experimental results demonstrate that SketchFlex achieves more cohesive image generations than end-to-end models, meanwhile significantly reducing cognitive load and better matching user intentions compared to region-based generation baseline.

受賞
Honorable Mention
著者
Haichuan Lin
The Hong Kong University of Science and Technology (Guangzhou), Guangzhou, China
Yilin Ye
The Hong Kong University of Science and Technology (Guangzhou), Guangzhou, Guangdong, China
Jiazhi Xia
Central South University, Changsha, China
Wei Zeng
The Hong Kong University of Science and Technology (Guangzhou), Guangzhou, Guangdong, China
DOI

10.1145/3706598.3713801

論文URL

https://dl.acm.org/doi/10.1145/3706598.3713801

動画

会議: CHI 2025

The ACM CHI Conference on Human Factors in Computing Systems (https://chi2025.acm.org/)

セッション: Image and AI

G303
7 件の発表
2025-04-28 23:10:00
2025-04-29 00:40:00
日本語まとめ
読み込み中…