izpis_h1_title_alt

Cross-lingual transfer of POS tagger into a low-resource language
ID Stojanoska, Sanja (Author), ID Robnik Šikonja, Marko (Mentor) More about this mentor... This link opens in a new window, ID Ljubešić, Nikola (Comentor)

.pdfPDF - Presentation file, Download (374,58 KB)
MD5: 8B7A7E917C981E70501B88354CC9D1AE

Abstract
With the continuous growth of online textual content, machine learning is the only feasible approach for implementing advanced systems for language processing. Although many natural language processing (NLP) applications exist, most of them are anglocentric and low-resourced languages are left behind. We apply a cross-lingual transfer approach from several languages to overcome this limitation. Part-of-speech tagging (POS), a fundamental text processing task, is a prerequisite for a variety of NLP problems. To implement a POS-tagger in the low-resource Macedonian language, we use pretrained multilingual models along with annotated data in Serbian, Croatian and Bulgarian. We show that multilingual models fine-tuned with a set of languages similar to the target language achieve good performance in solving the POS-tagging task.

Language:English
Keywords:cross-lingual transfer, part-of-speech tagging, multilingual language model, low-resource language, Macedonian language
Work type:Master's thesis/paper
Typology:2.09 - Master's Thesis
Organization:FRI - Faculty of Computer and Information Science
Year:2021
PID:20.500.12556/RUL-130311 This link opens in a new window
COBISS.SI-ID:77541891 This link opens in a new window
Publication date in RUL:13.09.2021
Views:824
Downloads:142
Metadata:XML DC-XML DC-RDF
:
Copy citation
Share:Bookmark and Share

Secondary language

Language:Slovenian
Title:Medjezikovni prenos oblikoskladenjskega označevalnika v jezik z malo viri
Abstract:
Zaradi nenehne rasti količine spletnih besedil je strojno učenje edini izvedljiv pristop za izvajanje naprednih jezikovnih obdelav. Čeprav obstajajo števline aplikacije za obdelavo naravnega jezika, je večina anglocentričnih in jeziki z malo viri so zanemarjeni. V tem delu uporabljamo medjezikovni prenos iz več jezikov v jezik z malo viri. Oblikoskladenjski označevalnik je ena od temeljnih nalog obdelave besedil in je predpogoj za različne jezikovne naloge. Za implementacijo oblikoskladenjskega označevalnika za makedonski jezik, ki ima na voljo le malo virov, uporabljamo večjezikovne modele in označene podatke iz srbskega, hrvaškega in bolgarskega jezika. Pokazali smo, da večjezikovni modeli, prilagojeni z jeziki podobnimi ciljnemu jeziku, dosegajo dobre rezultate pri oblikoskladenjskem označevanju v makedonščini.

Keywords:medjezikovni prenos, oblikoskladenjski označevalnik, večjezikovni model, jezik z malo viri, makedonski jezik

Similar documents

Similar works from RUL:
Similar works from other Slovenian collections:

Back