When a transition probability matrix is represented as the product of two stochastic matrices, one can swap the factors of the multiplication to obtain another transition matrix that retains some fundamental characteristics of the original. Since the derived matrix can be much smaller than its precursor, this property can be exploited in the context of solving Markov decision processes (MDPs). I will describe how we can use this property in order to provide approximate solutions for MDPs much faster than by using classical methods. For example, an approximate policy iteration algorithm based on stochastic factorization has linear dependence on the number of states in the model. I will briefly also discuss learning algorithm based on this trick, and its relationship to other types of matrix factorization, which we are beginning to uncover. This is joint work with Andre M.S. Baretto and Joelle Pineau.
Bienvenue à tous.