The thesis addresses the challenges of single-object tracking in videos in the presence of distractors. The main objective is to develop a method that leverages graph neural networks and additional information about surrounding objects to improve the stability and accuracy of the tracker. The proposed method, NSG (Neural Solver with Grouping), is based on a four-stage pipeline: initial tracking using a baseline tracker for primary detections, a hypothesis generator that produces additional candidate object locations, a neural solver for associating detections into trajectories, and finally, trajectory grouping with a Kalman filter for position prediction and re-identification in case of track loss. The experimental evaluation was conducted on the LaSOT and DiDi datasets using metrics from the VOT toolkit. Results showed an improvement in tracking quality by 1.2 percentage points compared to the baseline tracker. The method’s limitations are primarily related to sensitivity to input detection quality and challenges in highly dynamic scenarios. Future work could include integrating clustering algorithms into deep neural networks and optimizing the hypothesis generator. The proposed solution represents a step towards more robust real-time tracking systems.
|