Details

Prilagajanje in vrednotenje jezikovnih modelov pri zamenjavah imenskih entitet za anonimizacijo besedil
ID Kokalj, Nina (Author), ID Žitnik, Slavko (Mentor) More about this mentor... This link opens in a new window, ID Erik, Novak (Comentor)

.pdfPDF - Presentation file, Download (3,78 MB)
MD5: 7F96BA6ECDB2269D380A0E7B44E5A886

Abstract
V diplomski nalogi raziskujemo uporabo velikih jezikovnih modelov za psevdonimizacijo imenskih entitet v različnih vrstah besedil z občutljivimi podatki. Osredotočimo se na generiranje ustreznih zamenjav, ki ohranjajo pomen in berljivost besedila brez razkritja osebnih podatkov. Primerjamo več odprtokodnih jezikovnih modelov različnih velikosti in jih vrednotimo z modelom GLiNER. Dodatno poskusimo izboljšati uspešnost dveh manjših modelov s pomočjo nadzorovanega prilagajanja in učenja v kontekstu. Rezultati pokažejo, da nekateri modeli že brez dodatnega prilagajanja uspešno generirajo zamenjave, prilagojeni manjši modeli pa predstavljajo obetavno rešitev za uporabo v okoljih z omejenimi viri.

Language:Slovenian
Keywords:anonimizacija, prepoznavanje imenskih entitet, veliki jezikovni modeli, prilagajanje
Work type:Bachelor thesis/paper
Typology:2.11 - Undergraduate Thesis
Organization:FRI - Faculty of Computer and Information Science
Year:2025
PID:20.500.12556/RUL-170955 This link opens in a new window
COBISS.SI-ID:243976195 This link opens in a new window
Publication date in RUL:23.07.2025
Views:265
Downloads:65
Metadata:XML DC-XML DC-RDF
:
Copy citation
Share:Bookmark and Share

Secondary language

Language:English
Title:Fine-tuning and evaluating language models for named entity replacement in text anonymization
Abstract:
In this thesis, we explore the application of large language models for the pseudonymization of named entities in various types of texts containing sensitive information. We focus on generating suitable replacements that preserve the meaning and readability of the text while protecting personal data. We compare several open-source language models of different sizes and evaluate them using the GLiNER model. Additionally, we attempt to improve the performance of two smaller models through supervised fine-tuning and in-context learning. The results show that some models can successfully generate replacements without additional fine-tuning, while the adapted smaller models represent a promising solution for use in resource-constrained environments.

Keywords:anonymization, named entity recognition, large language models, fine-tuning

Similar documents

Similar works from RUL:
Similar works from other Slovenian collections:

Back