LLM-based agents improve upon standalone LLMs, which are optimized for immediate intent-satisfaction, by allowing the pursuit of more extended objectives, such as helping users over the long term. To do so, LLM-based agents need to reason before responding. For complex tasks like personalized coaching, this reasoning can be informed by adding relevant information at key moments, shifting it in the desired direction. However, the pursuit of objectives beyond interaction quality may compromise this very quality. Moreover, as the depth and informativeness of reasoning increase, so do the number of tokens required, leading to higher latency and cost. This study investigates how an LLM-based coaching agent can adjust its reasoning depth using a discrepancy mechanism that signals how much reasoning effort to allocate based on how well the objective is being met. Our discrepancy-based mechanism constrains reasoning to better align with alternative objectives, reducing cost roughly tenfold while minimally impacting interaction quality.
https://dl.acm.org/doi/10.1145/3706598.3713606
The ACM CHI Conference on Human Factors in Computing Systems (https://chi2025.acm.org/)