Users feel frustrated when they do not know when to speak with LLM-based agents. Technical delays disrupt the natural rhythm of conversation (turn-taking), yet there is little understanding of how these specific delays impact the back-and-forth flow of interaction. To address this, we analyzed human-agent conversations in social VR to measure timing differences. We used conversation analysis techniques to track specific timing metrics, such as how long it takes to respond (response latencies) and how agents handle interruptions (repair attempts). We found that agents are significantly slower to respond with a median of 4.1 seconds compared to a human's 1.2 seconds. We identified a "conversational timing drift", noting that agents struggle with start-up latency, i.e., taking too long to start speaking, and wind-down latency, i.e., failing to stop speaking quickly when a user interrupts them. This is the first study to empirically quantify human-agent conversational latencies within VR. We offer design suggestions to help future agents manage conversational timing better, ultimately improving natural conversation and user experience.
ACM CHI Conference on Human Factors in Computing Systems