In this bachelor's thesis we examine the task of Deepfake detection. These fake videos are appearing online with increasing frequency. With the use of deep learning for their creation, they have become convincing enough to trick humans. The goal of creating these fake videos is often to spread misinformation or damage the reputations of celebrities. For this task of detecting fake videos, we present two related video-based approaches, with each using the transformer architecture. These approaches are known as the Video Vision Transformer (ViViT) and UniFormerV2. We trained models of these two approaches on two datasets of fake videos, FaceForensics++ and Celeb-DF-v2. We also tested the performance of these models on an additional test set of videos from the DFDC dataset. With the use of these models, we have achieved results comparable to state-of-the-art approaches in this field. As part of the thesis, we describe our methodology, the technologies used in the approaches, and certain implementation details. We also present detailed results of the models we trained, our experiments, and a comparison of our results with some of the different approaches to Deepfake detection.
|