ChainBuddy: An AI-assisted Agent System for Generating LLM Pipelines

要旨

As large language models (LLMs) advance, their potential applications have grown significantly. However, it remains difficult to evaluate LLM behavior on user-defined tasks and craft effective pipelines to do so. Many users struggle with where to start, often referred to as the "blank page problem." ChainBuddy, an AI workflow generation assistant built into the ChainForge platform, aims to tackle this issue. From a single prompt or chat, ChainBuddy generates a starter evaluative LLM pipeline in ChainForge aligned to the user's requirements. ChainBuddy offers a straightforward and user-friendly way to plan and evaluate LLM behavior and make the process less daunting and more accessible across a wide range of possible tasks and use cases. We report a within-subjects user study comparing ChainBuddy to the baseline interface. We find that when using AI assistance, participants reported a less demanding workload, felt more confident, and produced higher quality pipelines evaluating LLM behavior. However, we also uncover a mismatch between subjective and objective ratings of performance: participants rated their successfulness similarly across conditions, while independent experts rated participant workflows significantly higher with AI assistance. Drawing connections to the Dunning–Kruger effect, we discuss implications for the future design of workflow generation assistants regarding the risk of over-reliance.

著者
Jingyue Zhang
Université de Montréal, Montréal, Quebec, Canada
Ian Arawjo
Université de Montréal, Montréal, Quebec, Canada
DOI

10.1145/3706598.3714085

論文URL

https://dl.acm.org/doi/10.1145/3706598.3714085

動画

会議: CHI 2025

The ACM CHI Conference on Human Factors in Computing Systems (https://chi2025.acm.org/)

セッション: DeIving into LLMs

G303
7 件の発表
2025-04-29 20:10:00
2025-04-29 21:40:00
日本語まとめ
読み込み中…