Sarcasm is a linguistic phenomenon in which words convey the opposite sentiment of the intended meaning. Understanding sarcasm would improve sentiment classification in text. In the thesis, we have transferred methods for sarcasm detection from English to Slovene. We used labelled English datasets, News Headlines Dataset and SARC. We first translated them using a publicly available neural machine translation model and evaluated the translations, then we obtained better ones using ChatGPT. We fine-tuned pretrained large language models SloBERTa, SloT5, mT5 and Llama 3, and compared their performance. We found out that the models misclassify sarcasm in approximately 20% of cases. We used the best performing model, Llama 3, to analyze part of a Slovene news corpus. We split the considered articles by topic and noticed differences in sarcasm representation among the topics. The predictions were false positives in most cases, except in articles about politics and cryptocurrencies. Sarcasm was most prevalent in those topics.
|