The understanding and translation of figurative language remain major challenges for machine translation, despite recent advances, particularly for low-resource languages with limited linguistic resources, such as Slovene. In this thesis, we explore the impact of injecting figurative knowledge into large language models to improve translation quality from English to Slovene. Due to the limited availability of existing bilingual resources, we constructed a parallel dataset, IdiomKB-SLO-EN, containing nearly 4,000 pairs of figurative expressions. Using a retrieval-augmented generation (RAG) approach, we integrated the dataset into the GaMS Slovene large language model. We developed a hybrid retrieval mechanism that combines lexical and semantic search to retrieve knowledge about figurative expressions relevant to the input text, providing it as additional context during translation. The proposed approach was evaluated using a combination of quantitative and qualitative methods, including standard machine translation metrics, manual evaluation, and a large language model as a judge. The results show that injected knowledge leads to more natural and fluent translations and reduces the occurrence of literal translations of figurative expressions.
|