izpis_h1_title_alt

Avtomatsko povzemanje daljših besedil v slovenščini
ID COLNAR, BRIN (Author), ID Robnik Šikonja, Marko (Mentor) More about this mentor... This link opens in a new window

.pdfPDF - Presentation file, Download (1,04 MB)
MD5: 689D154D32BADF5842B95B5AEE237916

Abstract
V okviru diplomske naloge sem razvil model, ki povzema daljša besedila v slovenskem jeziku. Pri tem sem si pomagal z obstoječimi vnaprej naučenimi nevronskimi mrežami tipa transformer, kot sta mBART in Longformer. Za učenje sem uporabil podatkovno množico akademskih del in njihovih povzetkov KAS 2.0. Model sem evalviral z obstoječimi merami za ocenjevanje povzetkov in tudi ročno. Kvalitativno gledano model za nekatera besedila (okoli 36%) vrne dober povzetek, ki vsebuje pomembne informacije iz besedila, medtem ko je za večino besedil (okoli 63%) manj uspešen.

Language:Slovenian
Keywords:obdelava naravnega jezika, povzemanje daljših besedil, arhitektura transformer
Work type:Bachelor thesis/paper
Typology:2.11 - Undergraduate Thesis
Organization:FRI - Faculty of Computer and Information Science
Year:2022
PID:20.500.12556/RUL-140420 This link opens in a new window
COBISS.SI-ID:123603203 This link opens in a new window
Publication date in RUL:14.09.2022
Views:554
Downloads:64
Metadata:XML RDF-CHPDL DC-XML DC-RDF
:
Copy citation
Share:Bookmark and Share

Secondary language

Language:English
Title:Automatic summarization of long texts in Slovene
Abstract:
I developed a model that summarises long texts in Slovenian. I used existing pre-trained transformer based neural networks such as mBART and Longformer. I used the KAS 2.0 dataset of academic papers and their abstracts. I evaluated the model using existing summary evaluation criteria and also manually. Qualitatively, for some texts (around 36%) the model returns a good summary containing the relevant information from the text, while for most texts (around 63%) it performs less well.

Keywords:natural language processing, long text summarization, transformer architecture

Similar documents

Similar works from RUL:
Similar works from other Slovenian collections:

Back