In this work, we studied the problem of automatic recognition of wildlife species. Camera traps are often used for observing animals, as they are affordable and non-invasive. As a result, they produce large amounts of images that need to be labeled. Manual labeling requires a lot of time and money, so we addressed this problem using weakly supervised learning. For this purpose, we used a dataset of images, each annotated with a single animal species. Due to increasing availability of high-quality labeled image datasets from other domains, we focused on solutions that utilize the knowledge of pre-trained models. We evaluated the performance of three weakly supervised models that use the knowledge of a pre-trained model in different ways. We proposed an improvement to the WS-DETR model, in which we replaced the base architecture with RT-DETR and used a more complex multiple instance larning classifier. The proposed upgrade improved the mAP of the original WS-DETR model by 0,16 and achieved better results than the other two weakly supervised models. Finally, we compared the models with the existing DeepFaune solution and evaluated whether they are suitable for practical implementation. The DeepFaune model performs better than our solutions, however due to its low recall, it is also not suitable for complete automation.
|