The thesis describes a prototype of recommender system for Slovenian e-library. Data includes multitude of books in the ePub format, anonymized list of users and a set of transactions between the books and users. Book content was tokenized, POS-tagged and lemmatized. Data extraction methods were used to extract numerical stylometric features of books. Features were evaluated with SPEC and Laplacian scores. We reduced dimensionality of feature vectors of both books and users and clustered them. We used the method proposed by Elkan and Noto as well as one-class SVM method to classify the books. We constructed several variants of recommender systems based on content and collaborative filtering. The evaluation results show that recommender system using only stylometric features is possible, however, collaborative filtering offers better overall performance.
|