Incremental learning is key to effectively training adaptive neural networks that face a constant stream of new data. It suffers from a major issue---catastrophic forgetting.
In this work, we address catastrophic forgetting in incremental learning and systematically investigate how the width and depth of neural networks affect forgetting. The experimental part consists of training multilayer perceptrons (MLP) and convolutional neural networks (CNN) on the publicly available datasets CIFAR-100 and pMNIST. We track changes in classification accuracy across tasks and the average forgetting rate after the training process concludes. We explain the results by analyzing the orthogonality of task gradients and their density. In addition, we show how far the model parameters drift from the optimum of the first task.
We demonstrate that, in most incremental learning scenarios, width noticeably reduces forgetting, although in some cases less than in others. We also show that, depending on the scenario, depth may have no effect or can even worsen forgetting.
|