We examine the problem of automatic motion detection in video sequences captured by video surveillance systems. The state of the art methods use convolutional neural networks. Their main limitation is that they need to be retrained if they are to be applied on different sequences. In our thesis, we present a novel method which is based on the architecture of siamese convolutional neural networks. Our network semantically describes the input image from the sequence and the model of the background of the sequence. It does this by using the siamese architecture. It then applies convolutional layers to detect relevant differences and generates the final probability segmentation mask. Our approach allows detection on different video sequences without retraining the network on each new sequence. To detect motion only a reference background images is required. The method automatically updates the background image during application. We trained our network on the CDNET data set. We compared our method with the other methods published on the CDNET website. It ranked as the eight best method of the 46 published methods. We also evaluated our method on the Wallflower and SGM-RGBD data sets. There, we tested it in different circumstances and provided qualitative analysis of its performance.
|