Details

Detection of Face Forgeries with Self-Supervised Learning
ID Todorov, Leon (Author), ID Peer, Peter (Mentor) More about this mentor... This link opens in a new window, ID Ivanovska Preskar, Marija (Comentor)

.pdfPDF - Presentation file, Download (26,39 MB)
MD5: 9004643E71CC172D699A9303F04E91B0

Abstract
In biometric identity verification, the growing fidelity of AI-synthesized faces is threatening the reliability of face recognition systems. Modern gen- erative models can create extremely realistic facial forgeries that often evade detection by current methods. Among these, morphing attacks are especially dangerous: by digitally merging the faces of two or more individuals, they yield a single image that can fool face recognition systems into matching multiple identities, thereby facilitating identity fraud and other malicious exploits. Most Morphing Attack Detection (MAD) approaches use super- vised learning on a fixed set of known morphing techniques. Such models often achieve high accuracy on morphs created by the same algorithms seen during training, but they tend to rely on method-specific artifacts and strug- gle to generalize to morphs from unseen techniques or under different data conditions. Unsupervised one-class methods avoid overfitting to specific at- tacks, but they often lack the sensitivity to detect the faint, distributed arti- facts left by high-quality morphs. To overcome these challenges, a new face forgery detection framework with improved robustness and generalization is introduced. The approach uses self-supervised training on synthetic forgery artifacts, which helps the detector learn decision boundaries that are more generic and resilient. At the heart of the model, a gating mechanism fuses two complementary information streams: the high-level semantic features from a vision-language foundation model and the fine-grained spatial features from a high-resolution convolutional network. The semantic branch is built on a CLIP vision-language backbone and fine-tuned via Low-Rank Adap- tation (LoRA) to adapt its image-text embeddings to the forgery detection task, enabling accurate discrimination between authentic and manipulated faces. In parallel, a high-resolution convolutional branch (based on HRNet) preserves detailed spatial information and aggregates multi-scale features, allowing it to capture even very subtle artifacts. An auxiliary segmentation module provides pixel-level guidance to this branch by distinguishing genuine facial regions from likely manipulated ones, which regularizes the training. By combining the CLIP branch’s global semantic context with the convolu- tional branch’s local artifact sensitivity, the model produces a well-balanced and highly discriminative representation for detecting face forgeries. The entire architecture is trained end-to-end with a composite loss that simulta- neously enforces semantic alignment, segmentation consistency, and classifi- cation accuracy. Evaluated on diverse morphing benchmarks, the proposed method achieves state-of-the-art performance. It significantly outperforms both supervised and unsupervised baseline detectors, attaining an average Equal Error Rate (EER) of just 0.85%. Notably, the improvements are most pronounced on high-quality morphs generated by advanced GAN and diffu- sion models, highlighting the framework’s resilience against next-generation forgery techniques.

Language:English
Keywords:computer vision, attack detection, face image morphing attacks, face forg- eries, deep learning, self-supervised learning
Work type:Master's thesis/paper
Typology:2.09 - Master's Thesis
Organization:FRI - Faculty of Computer and Information Science
Year:2025
PID:20.500.12556/RUL-171822 This link opens in a new window
COBISS.SI-ID:248411395 This link opens in a new window
Publication date in RUL:03.09.2025
Views:305
Downloads:59
Metadata:XML DC-XML DC-RDF
:
Copy citation
Share:Bookmark and Share

Secondary language

Language:Slovenian
Title:Zaznavanje obraznih ponaredkov s samonadzorovanim učenjem
Abstract:
Naraščajoča kvaliteta obrazov sintetiziranih z umetno inteligenco ogroža zanesljivost biometičnih sistemov za verifikacijo identitete. Sodobni generativni modeli zlahka ustvarijo izjemno prepričljive ponaredke, ki jih obstoječe detekcijske metode pogosto spregledajo. Med njimi so posebej nevarni t.i. napadi z zlivanjem obrazov oz. obrazne zlitine: z digitalnim spajanjem obrazov dveh ali več oseb v eno samo sliko lahko sistem za prepoznavanje obraza takšno sliko napačno pripiše več identitetam, kar odpira možnosti za zlorabo identitete in druge zlonamerne dejavnosti. Večina obstoječih pristopov za zaznavanje teh napadov temelji na nadzorovanem učenju na omejenem naboru znanih tehnik zlivanja obrazov. Takšni modeli sicer učinkovito zaznajo napade, ustvarjene z istimi algoritmi kot so bili prisotni v učnem naboru, a se pogosto opirajo na artefakte, specifične za posamezno metodo, zato se slabo posplošijo na primere iz še nevidenih tehnik ali drugih domen. Nenadzorovani (enorazredni) pristopi se temu prekomernemu prileganju sicer izognejo, vendar jim pogosto primanjkuje občutljivosti za prefinjene, razpršene artefakte, značilne za visokokakovostne zlitine. Kot odgovor na te izzive, je v tej nalogi predstavljeno ogrodje za zaznavanje tovrstnih obraznih ponaredkov. Uporablja samonadzorovano učenje na sintetično generiranih artefaktih. V jedru predlaganega modela je fuzijski mehanizem, ki združuje dva komplementarna informacijska toka: visokonivojske semantične značilke iz slikovno-jezikovnega modela ter natančne prostorske značilke iz visokoločljivostne konvolucijske mreže. Semantična veja temelji na slikovno-jezikovnem modelu CLIP; LoRA-adaptacija prilagodi njegove slikovno-jezikovne vložitve nalogi zaznavanja ponaredkov ter omogoči razločevanje med pristnimi in manipuliranimi obrazi. Visokoločljivostna konvolucijska veja na osnovi HRNet ohranja podrobne prostorske informacije in združuje večrazločljivostne značilke za zajem subtilnih artefaktov. Pomožni segmentacijski modul usmerja učenje modela na ravni pikslov: razmejuje pristna obrazna območja od verjetno manipuliranih in s tem regularizira učenje. Fuzija globalnega konteksta semantične veje in lokalne občutljivosti konvolucijske veje tvori uravnoteženo, diskriminativno predstavitev za zaznavanje obraznih ponaredkov. Učenje poteka s sestavljeno kriterijsko funkcijo, ki hkrati optimizira slikovno-jezikovno uskladitev, skladnost segmentacije in klasifikacijsko točnost. Na raznolikih evalvacijskih zbirkah predlagan pristop preseže rezultate najsodobnejših detektorjev, tako nadzorovanih kot nenadzorovanih. Doseže povprečni EER 0,85\%, pri čemer so izboljšave posebej izrazite pri visokokakovostnih napadih, ustvarjenih z naprednimi GAN- in difuzijskimi modeli, kar potrjuje odpornost na tehnike naslednje generacije.

Keywords:računalniški vid, zaznavanje napadov, slike napadov z zlivanjem obrazov, obrazni ponaredki, globoko učenje, samonadzorovano učenje

Similar documents

Similar works from RUL:
Similar works from other Slovenian collections:

Back