Stability, Approximation, and Robustness of Optimal Policies in POMDPs
Yunus Emre Demirci – Queen's University, Canada

Séminaire hybride à l'Université McGill ou Zoom.
In this talk I will focus on discounted and average cost criteria for partially observable Markov decision processes. I will begin with the average cost setting and describe conditions under which a stationary optimal policy exists, together with a robustness result showing that the optimal policy is stable with respect to errors in the prior distribution. The analysis relies on a set of assumptions ensuring that the nonlinear filter exhibits a contraction property on general state and observation spaces. Under these conditions, the vanishing discounted approach yields a solution to the average cost optimality equation and provides explicit bounds that quantify the influence of initial distribution errors on long-run performance.
I will then turn to approximation and learning. Finite window controllers achieve near-optimal average cost, and I will present refined error bounds that describe how the performance gap decreases as the window length increases. I will also discuss how Q-learning can be used to obtain near-optimal policies for both discounted and average cost criteria through either finite window updates or quantization of the belief process, which leads to implementable approximations of optimal stationary policies.
Finally, I will discuss robustness to model misspecification. Small perturbations in the transition or observation kernels result in explicit non-asymptotic bounds on the filter kernel and on the performance of the induced policies. In particular, the optimal policy computed under an incorrect model remains near-optimal for the true model, and the resulting error can be quantitatively bounded.
Biography: Yunus Emre Demirci is a PhD candidate in the Department of Mathematics and Statistics at Queen’s University, supervised by Serdar Yüksel. His research focuses on filter stability, robustness, and approximation of optimal policies for partially observable Markov decision processes under model uncertainty, and on ergodicity of nonlinear filtering in the uncontrolled setting. He previously completed an MSc in Mathematics, with earlier work on graph theory and Markov chains, and holds a BSc in Computer Engineering from Boğaziçi University.
Lieu
CIM
Pavillon McConnell
Université McGill
Montréal QC H3A 0E9
Canada