Določanje sentimenta slovenskim spletnim komentarjem s pomočjo strojnega učenja

KADUNC, KLEMEN

Določanje sentimenta slovenskim spletnim komentarjem s pomočjo strojnega učenja
ID KADUNC, KLEMEN (Author), ID Robnik Šikonja, Marko (Mentor) More about this mentor... This link opens in a new window

PDF - Presentation file, Download (3,62 MB)
MD5: C574DD44FA4FCDDD8A7A8491013BA59C
PID: 20.500.12556/rul/0a85bc6b-f67b-4577-877f-1d508be49077

Abstract

Cilj diplomske naloge je izdelava orodja za sentimentno analizo besedila, konkretneje uporabniških komentarjev. Preizkusili smo več metod strojnega učenja in več metod za predobdelavo besedil, še posebej tistih za spletna besedila. Kot najboljši klasifikator se je izkazal multinomski naivni Bayes. Za izboljšanje klasifikatorja smo pripravili slovenski slovar sentimenta - seznam besed in besednih zvez s pozitivno in negativno konotacijo. Za osnovo smo vzeli angleški slovar sentimentnih besed ter ga ročno prevedli v slovenščino. Analizo sentimenta smo izvajali na ročno označenem korpusu uporabniških komentarjev, ki smo jih izluščili iz nekaterih najbolj obiskanih slovenskih novičarskih portalov. Slovar ter označen korpus uporabniških komentarjev sta naša glavna prispevka k analizi sentimenta za slovenski jezik.

Language:	Slovenian
Keywords:	analiza sentimenta, strojno učenje, rudarjenje mnenj, obdelava naravnega jezika, klasifikacija, označevanje besedil, slovar sentimenta, slovenski jezik, predobdelava besedila, uporabniško generirane vsebine
Work type:	Undergraduate thesis
Organization:	FRI - Faculty of Computer and Information Science
Year:	2016
PID:	20.500.12556/RUL-91182
Publication date in RUL:	24.03.2017
Views:	3112
Downloads:	550
Metadata:
:	Copy citation
Share:

Secondary language

Abstract:
Language:	English
Title:	Using machine learning for sentiment analysis of Slovene web commentaries
The purpose of this work is to develop a tool for sentiment analysis of user comments. Several machine learning classifiers were tested and multinomial naive Bayes turned out to be the best predictor. We tried several preprocessing techniques, especially those for web texts. The classifier was improved with a Slovene sentiment lexicon, which is a list of words and set phrases with a positive and a negative connotation. An English sentiment lexicon was manually translated into Slovene. The analysed corpus of user comments was manually annotated by three annotators; its entries were selected from some of the most visited Slovene news portals. Both the lexicon and the annotated corpus of user comments are the main contributions of this work.
Keywords:	sentiment analysis, machine learning, opinion mining, natural language processing, classification, annotating text, opinion lexicon, Slovenian language, text preprocessing, user generated content

Similar works from RUL:
Similar works from other Slovenian collections:

Secondary language

Similar documents