Asymmetric actor-critic with approximate information state


BibTeX reference

Reinforcement learning (RL) for partially observable Markov decision processes (POMDPs) is a challenging problem because decisions need to be made based on the entire history of observations and actions. However, in several scenarios, state information is available during the training phase. We are interested in exploiting the availability of this state information during the training phase to efficiently learn a history-based policy using RL. Specifically, we consider actor-critic algorithms, where the actor uses only the history information but the critic uses both history and state. Such algorithms are called asymmetric actor-critic, to highlight the fact that the actor and critic have asymmetric information. Motivated by the recent success of using representation losses in RL for POMDPs (Subramanian et al. (2022)), we derive similar theoretical results for the asymmetric actor-critic case and evaluate the effectiveness of adding such auxiliary losses in experiments. In particular, we learn a history representation -called an approximate information state (AIS)- and bound the performance loss when acting using AIS.

, 13 pages

Research Axis

Research application


G2357.pdf (1000 KB)