Dounia Lakhmiri – Polytechnique Montréal, Canada
Séminaire hybrique sur Zoom et dans la salle de séminaire du GERAD.
We consider the problem of training a deep neural network with non-smooth regularization to retrieve a sparse and efficient sub-structure. Our regularizer is only assumed to be lower semi-continuous and prox-bounded. We combine an adaptive quadratic regularization approach with proximal stochastic gradient principles to derive a new solver, called SR2. Our experiments on network instances trained on CIFAR-10 and CIFAR-100 with L1 and L0 regularization show that SR2 achieves higher sparsity than other proximal methods such as ProxGEN and ProxSGD with satisfactory accuracy.
Campus de l'Université de Montréal
2920, chemin de la Tour
Montréal Québec H3T 1J4 Canada