Improving neural network optimizer convergence speed is a long-standing priority.
Recently, there has been a focus on quasi-Newton optimization methods, which have fewer hyperparameters compared to gradient-based methods and show improved convergence results in deterministic optimization.
We introduce PLSR1, a limited-memory partitioned quasi-Newton optimizer designed for optimizing a partially separable loss function, which is a sum of element loss functions of smaller dimensions.
PLSR1 aggregates limited-memory quasi-Newton approximations of individual element loss Hessians to better approximate the overall loss Hessian.
To keep storage affordable, element function dimensions must be small compared to the total dimension. Thus, we adapt standard neural network architectures by incorporating separable layers, creating a partitioned architecture (PSNet). The numerical results compare the performance of several optimizers training the same partially-separable loss function on LeNet and PSNet architectures of similar sizes and effectiveness. The graphs exhibit the optimizer accuracies over epochs, on both the MNIST and CIFAR10 datasets. PLSR1 and an adaptative Nesterov variant show a training convergence comparable to Adam and outperforms LBFGS and SGD.
Paru en septembre 2023 , 8 pages
Axe de recherche
Application de recherche
G2341.pdf (470 Ko)