Dounia Lakhmiri – Polytechnique Montréal, Canada
Hybrid seminar on Zoom and in the GERAD seminar room.
We consider the problem of training a deep neural network with non-smooth regularization to retrieve a sparse and efficient sub-structure. Our regularizer is only assumed to be lower semi-continuous and prox-bounded. We combine an adaptive quadratic regularization approach with proximal stochastic gradient principles to derive a new solver, called SR2. Our experiments on network instances trained on CIFAR-10 and CIFAR-100 with L1 and L0 regularization show that SR2 achieves higher sparsity than other proximal methods such as ProxGEN and ProxSGD with satisfactory accuracy.
Campus de l'Université de Montréal
2920, chemin de la Tour
Montréal Québec H3T 1J4 Canada