Image and AI

https://dl.acm.org/doi/10.1145/3706598.3713227

The emergence of AI Image Generators (AIGs) has transformed image creation, making it more accessible to generate customized images from simple text prompts. While HCI research has explored the applications of text-to-image generation, the role of AIGs in visual content creation workflow remains relatively underexplored. To address this, we conducted in-depth interviews with 26 end users who had experience across 14 different AIGs and investigated users’ adoption and perceptions of AIGs and AI’s role throughout the entire workflow. Key factors examined include user goals, initial vision, tool integration, and decision-making. Our results indicated that functional goals often drive cross-tool integration to achieve desired outcomes, while in use cases motivated by recreational goals, the usage of AIGs influences the social implications of image sharing. We concluded with four distinct use cases, each highlighting how AIGs are integrated at different stages of the creative process based on varying user goals and visions.

Cornell University, Ithaca, New York, United States

10.1145/3706598.3713227

https://dl.acm.org/doi/10.1145/3706598.3713418

Existing approaches for color-concept association typically rely on query-based image referencing, and color extraction from image references. However, these approaches are effective only for common concepts, and are vulnerable to unstable image referencing and varying image conditions. Our formative study with designers underscores the need for primary-accent color compositions and context-dependent colors (e.g., 'clear' vs. 'polluted' sky) in design. In response, we introduce a generative approach for mining semantically resonant colors leveraging images generated by text-to-image models. Our insight is that contemporary text-to-image models can resemble visual patterns from large-scale real-world data. The framework comprises three stages: concept instancing produces generative samples using diffusion models, text-guided image segmentation identifies concept-relevant regions within the image, and color association extracts primarily accompanied by accent colors. Quantitative comparisons with expert designs validate our approach's effectiveness, and we demonstrate the applicability through cases in various design scenarios and a gallery.

The Hong Kong University of Science and Technology, Guangzhou, Guangzhou, China

The Hong Kong University of Science and Technology (Guangzhou), Guangzhou, Guangdong, China

CMA, Guangzhou, China

The Hong Kong University of Science and Technology (Guangzhou), Guangzhou, China

Zhejiang University, Hangzhou, Zhejiang, China

The Hong Kong University of Science and Technology (Guangzhou), Guangzhou, Guangdong, China

10.1145/3706598.3713418

https://dl.acm.org/doi/10.1145/3706598.3713801

Text-to-image models can generate visually appealing images from text descriptions. Efforts have been devoted to improving model controls with prompt tuning and spatial conditioning. However, our formative study highlights the challenges for non-expert users in crafting appropriate prompts and specifying fine-grained spatial conditions (e.g., depth or canny references) to generate semantically cohesive images, especially when multiple objects are involved. In response, we introduce SketchFlex, an interactive system designed to improve the flexibility of spatially conditioned image generation using rough region sketches. The system automatically infers user prompts with rational descriptions within a semantic space enriched by crowd-sourced object attributes and relationships. Additionally, SketchFlex refines users' rough sketches into canny-based shape anchors, ensuring the generation quality and alignment of user intentions. Experimental results demonstrate that SketchFlex achieves more cohesive image generations than end-to-end models, meanwhile significantly reducing cognitive load and better matching user intentions compared to region-based generation baseline.

The Hong Kong University of Science and Technology (Guangzhou), Guangzhou, China

The Hong Kong University of Science and Technology (Guangzhou), Guangzhou, Guangdong, China

Central South University, Changsha, China

The Hong Kong University of Science and Technology (Guangzhou), Guangzhou, Guangdong, China

10.1145/3706598.3713801

https://dl.acm.org/doi/10.1145/3706598.3713529

Cameras are increasingly augmented with computational processing, producing images that blur the line between documenting reality and creative expression. The rise of text-to-image models has redefined the concept of imagery, sparking ethical and philosophical debates. This paper presents the findings of a qualitative study that employed a provocative prototype `camera’ – the A(I)Cam – to engage creative practitioners directly in these discussions. Developed using a Research-through-Design (RtD) approach, the tangible prototype generates and instantly prints AI-created images. A(I)Cam facilitated reflection among creative practitioners (N=15) on their experiences with AI-driven tools and the broader implications for their future practices. We examine the shifts in perspective that emerged from engaging with this embodied form of generative AI (genAI), challenging traditional text-based interaction paradigms, and inviting new modes of creative exploration and reflection. In addition, we offer insights from the RtD project, highlighting the integration of genAI tools into the industrial design process.

Monash University, Melbourne, Victoria, Australia

10.1145/3706598.3713529

https://dl.acm.org/doi/10.1145/3706598.3713683

Visual blends combine elements from two distinct visual concepts into a single, integrated image, with the goal of conveying ideas through imaginative and often thought-provoking visuals. Communicating abstract concepts through visual blends poses a series of conceptual and technical challenges. To address these challenges, we introduce Creative Blends, an AI-assisted design system that leverages metaphors to visually symbolize abstract concepts by blending disparate objects. Our method harnesses commonsense knowledge bases and large language models to align designers’ conceptual intent with expressive concrete objects. Additionally, we employ generative text-to-image techniques to blend visual elements through their overlapping attributes. A user study (N=24) demonstrated that our approach reduces participants’ cognitive load, fosters creativity, and enhances the metaphorical richness of visual blend ideation. We explore the potential of our method to expand visual blends to include multiple object blending and discuss the insights gained from designing with generative AI.

Shenzhen University, Shenzhen, China

Shenzhen University, Shenzhen, Guangdong, China

Hebrew University, Jerusalem, Israel

Shenzhen University, Shenzhen, China

10.1145/3706598.3713683

https://dl.acm.org/doi/10.1145/3706598.3713722

Image-generative AI provides new opportunities to transform personal data into alternative visual forms. In this paper, we illustrate the potential of AI-generated images in facilitating meaningful engagement with personal data. In a formative autobiographical design study, we explored the design and use of AI-generated images derived from personal data. Informed by this study, we designed a web-based application as a probe that represents personal data through generative images utilizing Open AI’s GPT-4 model and DALL-E 3. We then conducted a 21-day diary study and interviews using the probe with 16 participants to investigate users’ in-depth experiences with images generated by AI in everyday lives. Our findings reveal new qualities of experiences in users’ engagement with data, highlighting how participants constructed personal meaning from their data through imagination and speculation on AI-generated images. We conclude by discussing the potential and concerns of leveraging image-generative AI for personal data meaning-making.

KAIST, Daejeon, Korea, Republic of

10.1145/3706598.3713722