The goal of this thesis was to build a sentiment analysis system, which can tag exuberant reviews in the Google Play store. First we gave an overview of the sentiment analysis field and analysis of input comments to better understand our problem domain. We described theoretical foundations of every method used to build our system. We started by transforming input reviews into tokens which were then normalized, negated and transformed in n-grams. After that we used stemming, spell correction, part of speech tagging and adding other attributes to generate eight different collections of features. We selected best features from every collection with χ2 method. For classification we used naive Bayes, logistic regression and support vector machine to classify reviews. After that we evaluated classifiers by using internal cross-validation and computing classification accuracy, recall, precision, F1 score and statistical tests. In the end we tested tagging reviews from our problem domain with existing solutions for sentiment analysis and compared the results. Results revealed that there were statistically significant differences between classifiers. There were also statistically significant differences between some feature collections. Results also revealed that there were statistically significant differences between existing solutions and some of our models.
|