izpis_h1_title_alt

Cross-lingual transfer of resources and models for question answering
ID Dodevska, Lodi (Author), ID Robnik Šikonja, Marko (Mentor) More about this mentor... This link opens in a new window

.pdfPDF - Presentation file, Download (959,65 KB)
MD5: A04CCF15F31AC01A1A4778B128889CF3

Abstract
Implementing natural language processing (NLP) techniques for low-reso-urce languages is one of the biggest challenges in today's machine learning field. Most state-of-the-art works are focused on well-resourced languages, such as English. However, most languages have scarce resources and it is hard, and in some cases almost impossible, to develop NLP models. We focus on implementation of automatic question answering (QA) models in Macedonian. Since there are no QA datasets in Macedonian yet, we provide the first semi-automatic translation of the SuperGLUE benchmark. Using three question answering datasets from this benchmark (BoolQ, COPA and MultiRC) we fine-tune and compare several transformer-based models. The obtained results show that even in a low-resource language such as Macedonian, we can obtain good results for automatic QA. The translated benchmark and the fine-tuned models can represent a baseline for further research.

Language:English
Keywords:question answering, cross-lingual transfer, information retrieval, deep learning, Macedonian language, transformer models
Work type:Master's thesis/paper
Typology:2.09 - Master's Thesis
Organization:FRI - Faculty of Computer and Information Science
Year:2022
PID:20.500.12556/RUL-142106 This link opens in a new window
COBISS.SI-ID:128897795 This link opens in a new window
Publication date in RUL:20.10.2022
Views:669
Downloads:98
Metadata:XML DC-XML DC-RDF
:
Copy citation
Share:Bookmark and Share

Secondary language

Language:Slovenian
Title:Medjezikovni prenos virov in modelov za problem odgovarjanja na vprašanja
Abstract:
Implementacija tehnik obdelave naravnega jezika (NLP) za jezike z malo viri je eden večjih izzivov na področju strojnega učenja. Večina raziskav je osredotočena na jezike z dovolj viri, kot je angleščina. Ker so za večino jezikov viri omejeni, je zanje težko razviti modele NLP. V magisterskem delu se osredotočimo na implementacijo modelov avtomatskega odgovarjanja na vprašanja (QA) v makedonskem jeziku. Ker v makedonščini še ne obstajajo učne množice za ta namen, izdelamo prvi polavtomatski prevod zbirke nalog SuperGLUE. Z uporabo treh učnih množic za odgovarjanje na vprašanja (BoolQ, COPA in MultiRC) prilagodimo več modelov, ki temeljijo na arhitekturi transformer. Dobljeni rezultati kažejo, da lahko tudi v jeziku z malo viri, kot je makedonščina, dobimo dobre rezultate za QA. Prevedene učne množice in prilagojeni modeli predstavljajo izhodišče za nadaljnje raziskave.

Keywords:odgovarjanje na vprašanja, medjezikovni prenos, pridobivanje informacij, globoko učenje, makedonščina, transformer model

Similar documents

Similar works from RUL:
Similar works from other Slovenian collections:

Back