Effective training of face recognition (FR) models requires large, meticulously labeled datasets of facial identities. Acquiring such datasets is time-consuming, and their use is often constrained by privacy concerns. For this reason, as part of the master's thesis, we introduce a new generative diffusion model called IDSync. This model extends Arc2Face and can produce high-quality synthetic facial images. By incorporating an additional classification loss and a customized training procedure, IDSync preserves individuals’ original identities while ensuring visually convincing outputs. We evaluate the synthetic face image set generated by our model using multiple metrics: inter-class and intra-class cosine similarity of extracted features, the angular distribution of facial embeddings, the Fréchet distance between synthetic and real feature distributions, and other statistical measures. The primary indicator of dataset quality is the verification accuracy of an FR model trained on these synthetic images, measured on standard FR benchmarks. Our results show that data generated by IDSync is more suitable for training FR models than data produced by Arc2Face, as confirmed by the above metrics. Motivated by these promising findings, we further scale up IDSync training to a larger set of images and report the resulting performance improvements. The thesis also includes a sensitivity analysis of hyperparameters and an ablation study to provide deeper insights into the impact of the additional loss function.
|