Image segmentation and object tracking are gaining traction in the field of computer vision with applications in industry quality assurance, autonomous vehicles, etc. In the following work we research how we can improve existing segmentation algorithms with the addition of monocular depth to the RGB images where we segment or track the object in question. Input in our tracker is a reference RGB image and corresponding segmentation mask of the tracked object. For the next image of the same sequence we extract the features and based on their cosine similarity with features from reference image we calculate the probability matrix of pixels belonging to the object. Then we predict a depth map for both images and use that to estimate depth similarity matrix based on the depth inside the object on reference image and depth of the test image. Next we cut a template around the object from the reference image, extract the features and use cross correlation with the features from the test image to find the maximum response. We then generate a 2D gaussian a priori probability for object location around the maximum response. Segmentation probability matrix, depth similarity and a priori probability of object location are then merged and serve as an input for a shallow net called MergeNet. Result is a segmentation mask of the object on a test image. Our segmentation depth tracker (SDT) is first evaluated on DAVIS2016, where we achieved 26% improvement of mean Jaccard’s index from the basic untrained RGB segmentational model. We also evaluated our tracker on VOT2016.
|