izpis_h1_title_alt

Prilagoditev velikih jezikovnih modelov za popravljanje slovničnih napak v slovenščini
ID Božič, Martin (Author), ID Robnik Šikonja, Marko (Mentor) More about this mentor... This link opens in a new window

.pdfPDF - Presentation file, Download (2,62 MB)
MD5: BC0A1B7E403D2F5CF1076E15E523750F

Abstract
Strojno popravljanje slovničnih napak v slovenskem jeziku je še ne rešen problem. Rešitev bi olajšala pisno komunikacijo. Problem v okviru magistrske naloge razdelimo na podprobleme: popravljanje zapisa besed, zaznavanje napačno zapisanih besed, popravljanje pregibanja besed in popravljanje vrstnega reda besed. Najboljše rezultate dosežemo z izpopolnjevanjem slovenskega SloT5 modela. Najboljše modele uporabimo pri izdelavi spletne aplikacije. Ugotovimo, da je pri reševanju problema popravljanja slovničnih napak najbolj pomembna izbira ustreznega osnovnega jezikovnega modela in izgradnja kvalitetne učne množice. Pri izgradnji učne množice skušamo zajeti čim več kvalitetnih in realnih slovničnih napak, ne da bi pri tem spremenili ali pokvarili izvorni pomen besedila.

Language:Slovenian
Keywords:model T5, transformerji, model BERT, model SloBERTa, slovnični popravki, nevronske mreže, strojno učenje
Work type:Master's thesis/paper
Typology:2.09 - Master's Thesis
Organization:FRI - Faculty of Computer and Information Science
Year:2023
PID:20.500.12556/RUL-150180 This link opens in a new window
COBISS.SI-ID:168228099 This link opens in a new window
Publication date in RUL:14.09.2023
Views:612
Downloads:126
Metadata:XML DC-XML DC-RDF
:
Copy citation
Share:Bookmark and Share

Secondary language

Language:English
Title:Adaptation of large language models for grammar correction in Slovene
Abstract:
Machine correction of grammatical errors in the Slovenian language is still an unsolved problem. Its solution would improve written communication. We divide the problem into subproblems: correcting word spelling, detecting misspelled words, correcting word inflection and correcting word order. The best results are achieved by finetuning the Slovenian SloT5 model. We use the best models in a web application. We conclude that in correcting grammatical errors, the most important consideration is the choice of a large language model and construction of a learning set. When building the learning set, we try to capture as many realistic grammatical errors as possible, without changing the meaning of the text.

Keywords:model T5, transformers, model BERT, model SloBERTa, grammatical corrections, neural networks, machine learning

Similar documents

Similar works from RUL:
Similar works from other Slovenian collections:

Back