Object tracking is one of the main problems in the field of computer vision. In this thesis, we investigate the problem of visual tracking of a single object in the presence of distractors. Distractors are objects that are visually similar to the target, which increases the uncertainty of tracking. Distractors can cause tracking failure so that the tracker starts tracking one of them instead of the target. The algorithm proposed in this thesis is based on the detection tracking method, where the detector is a part of the tracker that generates object and distractor detections in each frame. Tracking is implemented as a temporal clustering of detections over several frames of a video sequence into object and distractor trajectories. In this case, on each individual frame, we create a set of trajectory hypotheses with new detections at the next time point and evaluate their qualities and the costs of their mutual interactions. These estimates are then used to select those hypotheses that best describe the movement of the object and distractors, while removing those that represent incorrect continuations of trajectories. We evaluate our method on DiDi and compare the results with other trackers for single object tracking. The algorithm achieves an AUC of 0.252, which is 63% lower than the best tracker on this dataset, DAM4SAM, which achieves a score of 0.694. The algorithm does not achieve comparable results compared to the best trackers on the used dataset, which is largely due to the time complexity of the method used to select the best hypotheses, which poses a challenge for clustering in limiting the number of generated hypotheses or the need to implement additional mechanisms to remove less important hypotheses.
|