The purpose of this work is to develop a tool for sentiment analysis of user comments. Several machine learning classifiers were tested and multinomial naive Bayes turned out to be the best predictor. We tried several preprocessing techniques, especially those for web texts. The classifier was improved with a Slovene sentiment lexicon, which is a list of words and set phrases with a positive and a negative connotation. An English sentiment lexicon was manually translated into Slovene. The analysed corpus of user comments was manually annotated by three annotators; its entries were selected from some of the most visited Slovene news portals. Both the lexicon and the annotated corpus of user comments are the main contributions of this work.
|