136. Optimization with/for AI

前のセッションの直後

7

30分

FontCraft: Multimodal Font Design Using Interactive Bayesian Optimization

OptiSub: Optimizing Video Subtitle Presentation for Varied Display and Font Sizes via Speech Pause-Driven Chunking

Artificial Intimacy: Exploring Normativity and Personalization through Fine-tuning LLM Chatbots

Characterizing Photorealism and Artifacts in Diffusion Model-Generated Images

Continual Human-in-the-Loop Optimization

Coordination Mechanisms in AI Development: Practitioner Experiences on Integrating UX Activities

Exploring Reduced Feature Sets for American Sign Language Dictionaries

この勉強会は終了しました。ご参加ありがとうございました。

135. Medical Contexts

137. Technologies for Decision Making

リンク: https://dl.acm.org/doi/10.1145/3706598.3713863

Creating new fonts requires a lot of human effort and professional typographic knowledge. Despite the rapid advancements of automatic font generation models, existing methods require users to prepare pre-designed characters with target styles using font-editing software, which poses a problem for non-expert users. To address this limitation, we propose FontCraft, a system that enables font generation without relying on pre-designed characters. Our approach integrates the exploration of a font-style latent space with human-in-the-loop preferential Bayesian optimization and multimodal references, facilitating efficient exploration and enhancing user control. Moreover, FontCraft allows users to revisit previous designs, retracting their earlier choices in the preferential Bayesian optimization process. Once users finish editing the style of a selected character, they can propagate it to the remaining characters and further refine them as needed. The system then generates a complete outline font in OpenType format. We evaluated the effectiveness of FontCraft through a user study comparing it to a baseline interface. Results from both quantitative and qualitative evaluations demonstrate that FontCraft enables non-expert users to design fonts efficiently.

読み込み中…

リンク: https://dl.acm.org/doi/10.1145/3706598.3714199

Viewers desire to watch video content with subtitles in various font sizes according to their viewing environment and personal preferences. Unfortunately, because a chunk of the subtitle—a segment of the text corpus displayed on the screen at once—is typically constructed based on one specific font size, text truncation or awkward line breaks can occur when different font sizes are utilized. While existing methods address this problem by reconstructing subtitle chunks based on maximum character counts, they overlook synchronization of the display time with the content, often causing misaligned text. We introduce OptiSub, a fully automated method that optimizes subtitle segmentation to fit any user-specified font size while ensuring synchronization with the content. Our method leverages the timing of speech pauses within the video for synchronization. Experimental results, including a user study comparing OptiSub with previous methods, demonstrate its effectiveness and practicality across diverse font sizes and input videos.

読み込み中…

リンク: https://dl.acm.org/doi/10.1145/3706598.3713728

Fine-tuning Large Language Models (LLMs) is one response to the critique of LLMs being biased, erasing diversity, and raising ethical concerns. The Artificial Intimacy project employs artistic methods, taking personalization of chatbots to an extreme by fine-tuning LLMs on individual social media data. We find that regular GPT-3 chatbots attempt to circumvent value-laden content through flagging prompts and producing generic non-answers with variable success. While the transactional nature of such output allowed participants to make sense of responses with less personification, fine-tuned models presented value-laden, normative, and familiar personalities, resulting in strong personification as a way of making sense of the interactions. This mimicry of emotional connection resulted in a sense of artificial intimacy creating expectations for reciprocity and consideration that the models cannot express by design. As the commercialization of interactions with chatbots continues, we discuss the ethics of such emotional manipulation and its implications for personalization of LLMs.

読み込み中…

リンク: https://dl.acm.org/doi/10.1145/3706598.3713962

Diffusion model-generated images can appear indistinguishable from authentic photographs, but these images often contain artifacts and implausibilities that reveal their AI-generated provenance. Given the challenge to public trust in media posed by photorealistic AI-generated images, we conducted a large-scale experiment measuring human detection accuracy on 450 diffusion-model generated images and 149 real images. Based on collecting 749,828 observations and 34,675 comments from 50,444 participants, we find that scene complexity of an image, artifact types within an image, display time of an image, and human curation of AI-generated images all play significant roles in how accurately people distinguish real from AI-generated images. Additionally, we propose a taxonomy characterizing artifacts often appearing in images generated by diffusion models. Our empirical observations and taxonomy offer nuanced insights into the capabilities and limitations of diffusion models to generate photorealistic images in 2024.

読み込み中…

リンク: https://dl.acm.org/doi/10.1145/3706598.3713603

Optimal input settings vary across users due to differences in motor abilities and personal preferences, which are typically addressed by manual tuning or calibration. Although human-in-the-loop optimization has the potential to identify optimal settings during use, it is rarely applied due to its long optimization process. A more efficient approach would continually leverage data from previous users to accelerate optimization, exploiting shared traits while adapting to individual characteristics. We introduce the concept of Continual Human-in-the-Loop Optimization and a Bayesian optimization-based method that leverages a Bayesian-neural-network surrogate model to capture population-level characteristics while adapting to new users. We propose a generative replay strategy to mitigate catastrophic forgetting. We demonstrate our method by optimizing virtual reality keyboard parameters for text entry using direct touch, showing reduced adaptation times with a growing user base. Our method opens the door for next-generation personalized input systems that improve with accumulated experience.

読み込み中…

リンク: https://dl.acm.org/doi/10.1145/3706598.3713200

Software development relies on collaboration and alignment between a variety of roles, including software developers and user experience designers. The increasing focus on artificial intelligence in today's development projects has given rise to new challenges in this collaboration. We extend previous work on the process of designing human-AI systems by analysing collaborative practices between UX designers and AI developers through Mintzberg's theory on coordination mechanisms. We conducted 15 in-depth interviews with UX designers and AI developers currently working on AI projects. We contribute by identifying how coordination mechanisms impact the UX design process when developing AI systems, inter-team (a)symmetries in power relations, and a growing need for tools and cross-disciplinary knowledge to support these collaborative efforts. In particular, we outline the risks of coordinating AI development work through the standardisation of output and skills in separately organised UX and AI development teams.

読み込み中…

リンク: https://dl.acm.org/doi/10.1145/3706598.3714118

There is currently no easy way to look up signs in sign language. Feature-based dictionaries help overcome this challenge by enabling users to look up a sign by inputting descriptive visual features, such as handshape and movement. However, feature-based dictionaries are typically cumbersome, including large numbers of complex features that the user must sort through. In this work, we explore simplifying the set of features used in feature-based American Sign Language (ASL) dictionaries. We present two studies: 1) a simulation study focused on lookup accuracy for various reduced feature sets, and 2) a user study focused on understanding human preferences between feature sets. Our results suggest that it is possible to dramatically reduce the number of features needed to search for signs without significantly impacting the accuracy of search results, and that smaller feature sets can improve the user experience in some cases.

読み込み中…

目次

終了した勉強会

説明

日本語まとめ

説明

日本語まとめ

説明

日本語まとめ

説明

日本語まとめ

説明

日本語まとめ

説明

日本語まとめ

説明

日本語まとめ