Recurrent Natural Policy Gradient for POMDPs
Semih Cayci – RWTH Aachen University, Allemagne
Séminaire hybride à l'Université McGill ou Zoom.
In this talk, we introduce a natural policy gradient method leveraging recurrent neural networks (RNNs) to address the challenges in reinforcement learning for partially observable Markov decision processes (POMDPs) that stem from non-Markovian dynamics. Our approach adopts an actor-critic approach, incorporating RNNs into multi-step temporal difference learning and a natural policy gradient method to enable efficient learning in POMDPs.
We present a rigorous theoretical analysis in the kernel regime, providing finite-time and finite-width guarantees for both the critic and the policy optimization. Our results include explicit bounds on the required network widths and sample complexity, highlighting the potential of RNNs to address challenges in reinforcement learning with partial observability. Additionally, we discuss the limitations of this approach when dealing with long-term dependencies, outlining critical challenges and open problems. This talk will provide insights into the interplay between memory, network architecture, and learning efficiency in POMDPs.
Bio: Semih Cayci is a tenure-track Assistant Professor in the Department of Mathematics at RWTH Aachen University, Germany. Previously, he was an NSF TRIPODS Postdoctoral Fellow at the Coordinated Science Laboratory, University of Illinois at Urbana-Champaign. His research focuses on the theoretical and algorithmic foundations of reinforcement learning, deep learning theory and optimization.





Lieu
CIM
Pavillon McConnell
Université McGill
Montréal QC H3A 0E9
Canada