Speeding up Inference with User Simulators through Policy Modulation

要旨

The simulation of user behavior with deep reinforcement learning agents has shown some recent success. However, the inverse problem, that is, inferring the free parameters of the simulator from observed user behaviors, remains challenging to solve. This is because the optimization of the new action policy of the simulated agent, which is required whenever the model parameters change, is computationally impractical. In this study, we introduce a network modulation technique that can obtain a generalized policy that immediately adapts to the given model parameters. Further, we demonstrate that the proposed technique improves the efficiency of user simulator-based inference by eliminating the need to obtain an action policy for novel model parameters. We validated our approach using the latest user simulator for point-and-click behavior. Consequently, we succeeded in inferring the user’s cognitive parameters and intrinsic reward settings with less than 1/1000 computational power to those of existing methods.

著者
Hee-Seung Moon
Yonsei University, Incheon, Korea, Republic of
Seungwon Do
ETRI, Daejeon, Korea, Republic of
Wonjae Kim
NAVER AI LAB, Seongnam-si, Gyeonggi-do, Korea, Republic of
Jiwon Seo
Yonsei University, Incheon, Korea, Republic of
Minsuk Chang
NAVER AI Lab, Seongnam, Korea, Republic of
Byungjoo Lee
Yonsei University, Seoul, Korea, Republic of
論文URL

https://dl.acm.org/doi/abs/10.1145/3491102.3502023

動画

会議: CHI 2022

The ACM CHI Conference on Human Factors in Computing Systems (https://chi2022.acm.org/)

セッション: Predictive Modelling and Simulating Users

291
5 件の発表
2022-05-03 01:15:00
2022-05-03 02:30:00