Details

Fine-tuning large language models for target-based summarization in less-resourced languages
ID Đuranović, Vuk (Author), ID Robnik Šikonja, Marko (Mentor) More about this mentor... This link opens in a new window

.pdfPDF - Presentation file, Download (838,71 KB)
MD5: 33B8E94F501458528C56FB1F483E5239

Abstract
State-of-the-art large language models demonstrate strong performance in text summarization, yet their effectiveness varies significantly across languages with restricted training resources. This work addresses the challenge of query-focused summarization in Slovene, a language with limited availability of labeled datasets and evaluation tools. We present a novel query-focused summarization (QFS) framework, QFS-Composer, which integrates query decomposition, question generation (QG), question answering (QA), and abstractive summarization to increase factual alignment of a summary with user intent. To enable high-quality supervision and evaluation, we develop the Slovenian QA and QG models based on large language model (LLM) GaMS-9B-Instruct, and adapt evaluation approaches for reference-free summary evaluation in the Slovenian language. Experimental results show that the QA-guided summarization pipeline yields improved consistency and relevance over baseline LLMs. This research establishes an extensible methodology for advancing QFS in less-resourced languages.

Language:English
Keywords:summarization, large language models, less-resourced languages, question answering based evaluation, Slovene
Work type:Master's thesis/paper
Typology:2.09 - Master's Thesis
Organization:FRI - Faculty of Computer and Information Science
Year:2025
PID:20.500.12556/RUL-177208 This link opens in a new window
COBISS.SI-ID:262745859 This link opens in a new window
Publication date in RUL:17.12.2025
Views:129
Downloads:42
Metadata:XML DC-XML DC-RDF
:
Copy citation
Share:Bookmark and Share

Secondary language

Language:Slovenian
Title:Prilagajanje velikih jezikovnih modelov za ciljno povzemanje besedil v jezikih z manj viri
Abstract:
Sodobni veliki jezikovni modeli kažejo visoko zmogljivost pri povzemanju besedil, vendar se njihova učinkovitost med jeziki z omejenimi viri precej razlikuje. To delo obravnava izziv ciljnega povzemanja besedil v slovenščini, jeziku z omejeno razpoložljivostjo označenih učnih množic in orodij za vrednotenje. Predstavljamo novo ogrodje za ciljno povzemanje besedil (QFS - query-focused summarization), QFS-Composer, ki združuje dekompozicijo poizvedb, generiranje vprašanj (QG), odgovarjanje na vprašanja (QA) in abstraktno povzemanje za povečanje skladnosti povzetka s ciljem povzemanja. Da bi omogočili kakovosten nadzor in vrednotenje učenja, smo razvili slovenske modele QA in QG na podlagi velikega jezikovnega modela GaMS-9B-Instruct, in prilagojene metrike QAGS, QuestEval in RQUGE za vrednotenje povzetkov v slovenščini. Eksperimentalni rezultati kažejo, da sistem za povzemanje, usmerjen z QA, zagotavlja izboljšano doslednost in ustreznost v primerjavi z osnovnimi velikimi jezikovnimi modeli. Raziskava vzpostavlja razširljivo metodologijo za izboljšanje QFS v jezikih z manj viri.

Keywords:povzemanje besedil, veliki jezikovni modeli, jeziki z manj viri, evaluacija na podlagi vprašanj in odgovorov, slovenščina

Similar documents

Similar works from RUL:
Similar works from other Slovenian collections:

Back