Decision Awareness in Reinforcement Learning
Pierre-Luc Bacon – Université de Montréal, Canada
Decision awareness is the learning principle according to which the components of a learning system ought to be optimized directly to satisfy the global performance criterion: to produce optimal decisions. This end-to-end perspective has recently led to significant advances in model-based reinforcement learning by addressing the problem of compounding errors plaguing alternative approaches. In this talk, I will present some of our recent work on this topic: 1. on learning control-oriented transition models by implicit differentiation and 2. on learning neural ordinary differential equations end-to-end for nonlinear trajectory optimization. Along the way, we will also discuss some of the computational challenges associated with those methods and our attempts at scaling up performance, specifically: using an efficient factorization of the Jacobians in the forward mode of automatic differentiation through novel constrained optimizers inspired by adversarial learning.
Biography: Pierre-Luc Bacon is an assistant professor at the University of Montreal in the Computer Science and Operations Research department. He is also a core member of Mila and Ivado and a Facebook CIFAR chair holder. He leads a research group of 15 students working on the challenge posed by the curse of the horizon in reinforcement learning and optimal control.