On every day basis, we grade and make comments about many subjects around us. This thesis is aiming to show how can we predict the grades on the largest online retailer nowadays Amazon. For that purpose, we built models in three different phases, by three different but also closely connected fields in the data analysis branch.
At the beginning, we give a short overview of each field and basic mathematical description of the models and estimators we use. Via those models, we show the big picture of which review's attributes give us the most information about user's numerical score.
Each of the three approaches extracts various attributes such as the time stamp, the helpfulness, the words in the comments or the interaction between the users, to name a few. Furthermore, we explore those features by comparing different combinations of them at each of the three steps. Then, we evaluate their success in making a prediction of the numerical score in each review.
At the end, we conclude with some of the advantages and disadvantages of
the built models and possibilities for future improvements and further work.
|