ChainBuddy: An AI-assisted Agent System for Generating LLM Pipelines

As large language models (LLMs) advance, their potential applications have grown significantly. However, it remains difficult to evaluate LLM behavior on user-defined tasks and craft effective pipelines to do so. Many users struggle with where to start, often referred to as the "blank page problem." ChainBuddy, an AI workflow generation assistant built into the ChainForge platform, aims to tackle this issue. From a single prompt or chat, ChainBuddy generates a starter evaluative LLM pipeline in ChainForge aligned to the user's requirements. ChainBuddy offers a straightforward and user-friendly way to plan and evaluate LLM behavior and make the process less daunting and more accessible across a wide range of possible tasks and use cases. We report a within-subjects user study comparing ChainBuddy to the baseline interface. We find that when using AI assistance, participants reported a less demanding workload, felt more confident, and produced higher quality pipelines evaluating LLM behavior. However, we also uncover a mismatch between subjective and objective ratings of performance: participants rated their successfulness similarly across conditions, while independent experts rated participant workflows significantly higher with AI assistance. Drawing connections to the Dunning–Kruger effect, we discuss implications for the future design of workflow generation assistants regarding the risk of over-reliance.

Université de Montréal, Montréal, Quebec, Canada

10.1145/3706598.3714085

https://dl.acm.org/doi/10.1145/3706598.3714085

The ACM CHI Conference on Human Factors in Computing Systems (https://chi2025.acm.org/)

G303

7 件の発表

開始日時2025-04-29 20:10:00

終了日時2025-04-29 21:40:00

読み込み中…

お気に入り

あとで読む

コレクション