Pruning for efficient hardware implementations of deep neural networks

Boukli Hacene, Ghouthi; Gripon, Vincent; Arzel, Matthieu; Farrugia, Nicolas; Bengio, Yoshua

Convolutional Neural Networks (CNNs) are state-of-the-art in numerous computer vision tasks such as object classification and detection. However, the large amount of parameters they contain leads to a high computational complexity and strongly limits their usability in budget-constrained mobile devices. In this paper, we propose a combination of a pruning technique and a quantization scheme that reduces complexity and memory of convolutional layers of CNNs, by replacing the complex convolutional operation by a low-cost multiplexer. We perform experiments on CIFAR10, CIFAR100, and SVHN and show that the proposed method achieves almost state-of-the-art accuracy, while drastically reducing the computational and memory footprint. We also propose an efficient hardware architecture to accelerate inference, which works as a pipeline and accommodates multiple layers working at the same time. In contrast with most proposed approaches that have used external memory or software defined memory controllers, our work is based on algorithmic optimization and full-hardware design.

Paru en avril 2020 , 7 pages

Document

G2023-EIW02.pdf (290 Ko)

GERAD

G-2020-23-EIW02

Pruning for efficient hardware implementations of deep neural networks

Ghouthi Boukli Hacene, Vincent Gripon, Matthieu Arzel, Nicolas Farrugia et Yoshua Bengio

Document