In recent years, deep learning and deep reinforcement learning have been astonishingly successful in outperforming classical results in some complex applications such as natural language processing, image processing and Atari games. However, with all the recent progresses, there are still some important challenges that need to be addressed such as reliability, scalability, explainability and robustness of the AI algorithms, to name only a few.
As an attempt to address some of the above shortcomings, it is critical to start thinking about a new set of problems called deep planning, where the objective is to study deep structured models parametrically (irrespective of the underlying data sets). On the other hand, deep planning is a difficult problem, in general, and that is why it has not received much attention so far. To this end, we introduce a novel class of tractable deep planning algorithms for sequential decision making in large-scale decentralized multi-agent systems called deep structured team (DST).
In DSTs, agents wish to collaborate to minimize a common cost function while they are coupled in dynamics, cost function and information through a set of linear regressions of the states and actions of all agents. The salient feature of DSTs is that their computational complexity is amenable to the number of agents. In this talk, we present some theoretical results involving Markov chain models, linear quadratic regulators, Kalman filters, reinforcement learning, constrained optimization, global convergence of policy gradient methods, mean-field approximation, minmax optimization and risk-sensitive cost function. In addition, we illustrate the efficacy of the theoretical results by a few toy examples including (a) cyber-physical attacks in swarm robotics, (b) resource allocation in communication networks, (c) integration of renewable energies in power grids, and (d) control of pandemics. If time permits, we extend DSTs to non-cooperative games, where agents play a non-zero sum game.
Inscrivez-vous à la notification par e-mail des séminaires d'apprentissage automatique efficace du GERAD.