Convolutional neural networks have demonstrated excellent performance at computer vision tasks. The central operation of these networks is a convolution with a small, fixed size kernel. In practice, therefore, the standard approach for increasing the receptive field is to combine adjacent pixels, which for many computer vision tasks does not have a sufficient output resolution. The problem is addressed by the so-called dilation, which extends the units from the convolution kernel to a wider area, thereby increasing the receptive field. The size of the kernel is manually set and is not variable during learning, which can be a problem, as we generally do not know its optimal value. To solve this problem, a method has recently been proposed in which the convolution kernel consists of displaced aggregation units (DAU). Each kernel has its own set of parameters, its own size of receptive field. In this thesis we address the question of whether it is possible to reduce model degree of freedom without loss of its accuracy. We propose three ways to reduce degrees of freedom by sharing displacements at the inputs and outputs. We implement a forward and backward pass for these three versions, embed them in architectures of convolutional neural networks of different sizes and evaluate on the problem of classifying images into 10 classes. All versions have more than 50% fewer parameters than the original DAU layer. The experimental results show that the model, which has output-independent displacements, has a significantly lower computational complexity than the original DAU layer, with the classification accuracy lower by less than 2%.
|