As multi-agent Large Language Models (LLMs) gain traction, designers must consider how to surface their internal reasoning in ways that foster appropriate trust. We present a design-led, qualitative, comparative structured observation study, exploring how users interpret and evaluate transparency in multi-agent LLMs. Participants interacted with five interface variants, each instantiating different combinations of transparency-related design dimensions, across two task types: information-seeking and logical reasoning. We surface participants’ mental models, the cues they interpret as signals of transparency and trustworthiness, and how they weigh the costs and benefits of increasing process visibility. Transparency needs were dynamic and context-sensitive, with the ideal "Goldilocks" (i.e., "just right" transparency) level shaped jointly by task demands, interface affordances, and user characteristics such as task expertise and dispositional AI trust. We highlight tensions between process visibility, information sufficiency, and cognitive effort, and synthesise these insights into design considerations for aligning transparency with user needs in future multi-agent LLM interfaces.
ACM CHI Conference on Human Factors in Computing Systems