In this thesis, we explore the application of large language models for the pseudonymization of named entities in various types of texts containing sensitive information. We focus on generating suitable replacements that preserve the meaning and readability of the text while protecting personal data. We compare several open-source language models of different sizes and evaluate them using the GLiNER model. Additionally, we attempt to improve the performance of two smaller models through supervised fine-tuning and in-context learning. The results show that some models can successfully generate replacements without additional fine-tuning, while the adapted smaller models represent a promising solution for use in resource-constrained environments.
|