Details

Izboljšava strojnega prevajanja figurativnega jezika z vstavljanjem jezikovnega znanja
ID Pelko, Jaka (Author), ID Robnik Šikonja, Marko (Mentor) More about this mentor... This link opens in a new window

.pdfPDF - Presentation file, Download (916,40 KB)
MD5: 0D8CAD3FAEC42092ACEF0D2F1DA9AD58

Abstract
Razumevanje in prevajanje figurativnega jezika kljub napredku strojnega prevajanja ostajata velik izziv, zlasti pri manjših jezikih z omejenimi jezikovnimi viri, kot je slovenščina. V diplomskem delu raziskujemo vpliv vstavljanja figurativnega znanja v velike jezikovne modele z namenom izboljšanja kakovosti prevodov iz angleščine v slovenščino. Zaradi omejene razpoložljivosti obstoječih dvojezičnih virov smo oblikovali vzporedno zbirko IdiomKB-SLO-EN s skoraj 4000 pari figurativnih izrazov. Z uporabo s poizvedovanjem obogatenega generiranja (RAG) smo jo vključili v veliki slovenski jezikovni model GaMS. Razvili smo hibridni mehanizem pridobivanja znanja o figurativnih izrazih, relevantnih za dano vhodno besedilo, ki združuje leksikalno in semantično iskanje, izbrane informacije pa modelu posreduje kot dodaten kontekst pri prevajanju. Evalvacijo smo izvedli s kombinacijo kvantitativnih in kvalitativnih metod: s standardnimi metrikami strojnega prevajanja, ročnim ocenjevanjem in ocenjevanjem s pomočjo velikega jezikovnega modela. Rezultati kažejo, da vstavljeno znanje prispeva k bolj naravnim in tekočim prevodom ter zmanjšuje pojav dobesednega prevajanja figurativnega jezika.

Language:Slovenian
Keywords:strojno prevajanje, figurativni jezik, veliki jezikovni modeli, vstavljanje znanja, s poizvedovanjem obogateno generiranje
Work type:Bachelor thesis/paper
Typology:2.11 - Undergraduate Thesis
Organization:FRI - Faculty of Computer and Information Science
Year:2026
PID:20.500.12556/RUL-179659 This link opens in a new window
COBISS.SI-ID:270350595 This link opens in a new window
Publication date in RUL:19.02.2026
Views:258
Downloads:42
Metadata:XML DC-XML DC-RDF
:
Copy citation
Share:Bookmark and Share

Secondary language

Language:English
Title:Enhancing machine translation of figurative language by linguistic knowledge injection
Abstract:
The understanding and translation of figurative language remain major challenges for machine translation, despite recent advances, particularly for low-resource languages with limited linguistic resources, such as Slovene. In this thesis, we explore the impact of injecting figurative knowledge into large language models to improve translation quality from English to Slovene. Due to the limited availability of existing bilingual resources, we constructed a parallel dataset, IdiomKB-SLO-EN, containing nearly 4,000 pairs of figurative expressions. Using a retrieval-augmented generation (RAG) approach, we integrated the dataset into the GaMS Slovene large language model. We developed a hybrid retrieval mechanism that combines lexical and semantic search to retrieve knowledge about figurative expressions relevant to the input text, providing it as additional context during translation. The proposed approach was evaluated using a combination of quantitative and qualitative methods, including standard machine translation metrics, manual evaluation, and a large language model as a judge. The results show that injected knowledge leads to more natural and fluent translations and reduces the occurrence of literal translations of figurative expressions.

Keywords:machine translation, figurative language, large language models, knowledge injection, retrieval-augmented generation

Similar documents

Similar works from RUL:
Similar works from other Slovenian collections:

Back