Groupe d’études et de recherche en analyse des décisions


Uncertainty transfer with knowledge distillation


Knowledge distillation is a technique that consists in training a student network, usually of a low capacity, to mimic the representation space and the performance of a pre-trained teacher network, often cumbersome, large and very high capacity. Starting from the observation that a student can learn about the teacher’s ability in providing predictions, we examine the idea of uncertainty transfer from teacher to student network. We show that through distillation, the distilled network does not only mimic the teacher’s performance but somehow captures the original network’s uncertainty behavior. We provide experiments validating our hypothesis on the MNIST dataset.

, 9 pages