Large Language Models

https://doi.org/10.1145/3613904.3642628

On-device machine learning (ML) moves computation from the cloud to personal devices, protecting user privacy and enabling intelligent user experiences. However, fitting models on devices with limited resources presents a major technical challenge: practitioners need to optimize models and balance hardware metrics such as model size, latency, and power. To help practitioners create efficient ML models, we designed and developed Talaria: a model visualization and optimization system. Talaria enables practitioners to compile models to hardware, interactively visualize model statistics, and simulate optimizations to test the impact on inference metrics. Since its internal deployment two years ago, we have evaluated Talaria using three methodologies: (1) a log analysis highlighting its growth of 800+ practitioners submitting 3,600+ models; (2) a usability survey with 26 users assessing the utility of 20 Talaria features; and (3) a qualitative interview with the 7 most active users about their experience using Talaria.

Apple, Seattle, Washington, United States

Apple, Beijing, China

Apple, Cupertino, California, United States

Independent Researcher, Walldorf, Germany

Apple, Pittsburgh, Pennsylvania, United States

Apple, Seattle, Washington, United States

Apple, Cupertino, California, United States

Apple Inc, Seattle, Washington, United States

https://doi.org/10.1145/3613904.3642400

Thanks to their generative capabilities, large language models (LLMs) have become an invaluable tool for creative processes. These models have the capacity to produce hundreds and thousands of visual and textual outputs, offering abundant inspiration for creative endeavors. But are we harnessing their full potential? We argue that current interaction paradigms fall short, guiding users towards rapid convergence on a limited set of ideas, rather than empowering them to explore the vast latent design space in generative models. To address this limitation, we propose a framework that facilitates the structured generation of design space in which users can seamlessly explore, evaluate, and synthesize a multitude of responses. We demonstrate the feasibility and usefulness of this framework through the design and development of an interactive system, Luminate, and a user study with 14 professional writers. Our work advances how we interact with LLMs for creative tasks, introducing a way to harness the creative potential of LLMs.

University of California, San Diego, San Diego, California, United States

University of Notre Dame, Notre Dame, Indiana, United States

University of California San Diego, La Jolla, California, United States

University of Notre Dame, Notre Dame, Indiana, United States

University of California, San Diego, San Diego, California, United States

https://doi.org/10.1145/3613904.3642032

While fitness trackers generate and present quantitative data, past research suggests that users often conceptualise their wellbeing in qualitative terms. This discrepancy between numeric data and personal wellbeing perception may limit the effectiveness of personal informatics tools in encouraging meaningful engagement with one’s wellbeing. In this work, we aim to bridge the gap between raw numeric metrics and users’ qualitative perceptions of wellbeing. In an online survey with $n=273$ participants, we used step data from fitness trackers and compared three presentation formats: standard charts, qualitative descriptions generated by an LLM (Large Language Model), and a combination of both. Our findings reveal that users experienced more reflection, focused attention and reward when presented with the generated qualitative data compared to the standard charts alone. Our work demonstrates how automatically generated data descriptions can effectively complement numeric fitness data, fostering a richer, more reflective engagement with personal wellbeing information.

Osnabrück University, Osnabrück, Germany

ENSEIRB-MATMECA Bordeaux, Bordeaux, France

Chalmers University of Technology, Gothenburg, Sweden

University of Oslo, Oslo, Norway

Chalmers University of Technology, Gothenburg, Sweden

https://doi.org/10.1145/3613904.3641904

Large Language Models (LLMs) are notorious for blending fact with fiction and generating non-factual content, known as hallucinations. To address this challenge, we propose an interactive system that helps users gain insight into the reliability of the generated text. Our approach is based on the idea that the self-consistency of multiple samples generated by the same LLM relates to its confidence in individual claims in the generated texts. Using this idea, we design RELIC, an interactive system that enables users to investigate and verify semantic-level variations in multiple long-form responses. This allows users to recognize potentially inaccurate information in the generated text and make necessary corrections. From a user study with ten participants, we demonstrate that our approach helps users better verify the reliability of the generated text. We further summarize the design implications and lessons learned from this research for future studies of reliable human-LLM interactions.

ETH Zürich, Zürich, Switzerland

ETH Zurich, Zurich, Switzerland

Stanford University, Stanford, California, United States

ETH Zurich, Zurich, Switzerland

IBM Research AI, Cambridge, Massachusetts, United States

ETH Zürich, Zürich, Switzerland