In this thesis we address the problem of tracking an arbitrary object in a sequence of images. We propose a long-term tracker based on the use of Siamese convolutional neural networks. For detection, we use a template with which we compute cross correlation on every point of the search image to find the best matching region. The template is initialized on the first frame, where we crop the image so that it represents only the tracking object and input it to the convolutional neural network. After each localization the tracker detects if tracking has failed. We propose two online methods of updating the visual model. One updates the template and the other fine tunes the parameters of the network. We carried out two analysis, where we measure long-term tracking performance on dataset LTB35 on modifications of our tracker. With the first analysis we find out what is a good setting for generating region proposals. The purpose of the second analysis is to test the proposed methods for updating the visual model. We find out that without updating the visual model, our tracker achieves F-measure of 0.34, when updating the template 0.22, when fine tuning 0.38 and with both methods we get 0.20. Finally we compared the performance of our tracker with the trackers submitted in the VOT-LT2018 challange, and achieved 11th place when fine tuning and 12th without fine tuning or updating the template.
|