Spatialized Audio and Hybrid Video Conferencing: Where Should Voices be Positioned for People in the Room and Remote Headset Users?


Hybrid video calls include attendees in a conference room with loudspeakers and remote attendees using headsets, each with dif- ferent options for rendering sound spatially. Two studies explored the listener experience with spatial audio in video calls. One study examined the in-room experience using loudspeakers, comparing among spatialization algorithms spreading voices out horizontally. A second study compared varying degrees of horizontal separation of binaurally rendered voices for a remote participant using a head- set. In-room participants preferred the widest spatialization over monophonic, stereo, and stereo-binary audio in metrics related to intelligibility and helpfulness. Remote participants preferred differ- ent widths of the audio stage depending on the number of voices. In both studies, rendering sound spatially increased performance in speech stream identification. Results indicate spatial audio bene- fits for in-room and remote attendees in video calls, although the in-room attendees accepted a wider audio stage than remote users.

Jeremy Hyrkas
Microsoft Research, Redmond, Washington, United States
Andrew D. Wilson
Microsoft Research, Redmond, Washington, United States
John Tang
Microsoft Research, Redmond, Washington, United States
Hannes Gamper
Microsoft Research, Redmond, Washington, United States
Hong Sodoma
Microsoft, Redmond, Washington, United States
Lev Tankelevitch
Microsoft Research, Cambridge, United Kingdom
Kori Inkpen
Microsoft, Redmond, Washington, United States
Shreya Chappidi
Microsoft Research, Redmond, Washington, United States
Brennan Jones
Microsoft Research, Redmond, Washington, United States


会議: CHI 2023

The ACM CHI Conference on Human Factors in Computing Systems (

セッション: Video Sharing

Hall A
6 件の発表
2023-04-25 18:00:00
2023-04-25 19:30:00