izpis_h1_title_alt

Avtomatsko zaznavanje sarkazma v slovenskih besedilih različnih tematik
ID Kranjec, Matej (Author), ID Robnik Šikonja, Marko (Mentor) More about this mentor... This link opens in a new window

.pdfPDF - Presentation file, Download (1,86 MB)
MD5: 89A7BE2F80C18804EA542C0EF2413323

Abstract
Sarkazem je jezikovni pojav, pri katerem pomen besedila izraža nasproten sentiment od dobesednega. Razumevanje sarkazma bi izboljšalo klasifikacijo sentimenta v besedilih. V magistrski nalogi smo na slovenščino prenesli metode za zaznavanje sarkazma v angleščini. Uporabili smo označeni učni množici News Headlines Dataset in SARC v angleščini, ju strojno prevedli najprej z javno dostopnim modelom, nato pa prevode pregledali in pridobili boljše s storitvijo ChatGPT. Prilagodili smo vnaprej naučene velike jezikovne modele na osnovi transformerjev SloBERTa, SloT5, mT5 in Llama 3 ter primerjali njihovo delovanje. Ugotovili smo, da se modeli v približno 20% primerov zmotijo pri klasifikaciji sarkazma. Najboljši model, Llama 3, smo uporabili za analizo dela korpusa novic. Obravnavane novice smo razdelili po tematikah in ugotovili razlike v zastopanosti sarkazma med njimi. Napovedi modela so bile v večini primerov lažno pozitivne, razen v člankih s področja politike in kriptovalut. V teh tematikah je bil sarkazem tudi najbolje zastopan.

Language:Slovenian
Keywords:klasifikacija, veliki jezikovni modeli, sarkazem, modeliranje tematik
Work type:Master's thesis/paper
Organization:FRI - Faculty of Computer and Information Science
Year:2024
PID:20.500.12556/RUL-165017 This link opens in a new window
Publication date in RUL:21.11.2024
Views:34
Downloads:3
Metadata:XML DC-XML DC-RDF
:
Copy citation
Share:Bookmark and Share

Secondary language

Language:English
Title:Automatic sarcasm detection in Slovene texts of different topics
Abstract:
Sarcasm is a linguistic phenomenon in which words convey the opposite sentiment of the intended meaning. Understanding sarcasm would improve sentiment classification in text. In the thesis, we have transferred methods for sarcasm detection from English to Slovene. We used labelled English datasets, News Headlines Dataset and SARC. We first translated them using a publicly available neural machine translation model and evaluated the translations, then we obtained better ones using ChatGPT. We fine-tuned pretrained large language models SloBERTa, SloT5, mT5 and Llama 3, and compared their performance. We found out that the models misclassify sarcasm in approximately 20% of cases. We used the best performing model, Llama 3, to analyze part of a Slovene news corpus. We split the considered articles by topic and noticed differences in sarcasm representation among the topics. The predictions were false positives in most cases, except in articles about politics and cryptocurrencies. Sarcasm was most prevalent in those topics.

Keywords:classification, large language models, sarcasm, topic modeling

Similar documents

Similar works from RUL:
Similar works from other Slovenian collections:

Back