This master thesis describes an example of applying machine learning methods on data model built from the most frequent sequences of events, occurring in log data of identity management system, connected with log data of DHCP server and access control system of medium sized Slovenian enterprise, with a goal of predicting employees’ performance for behavioral competence enthusiasm. Features were represented as a number of occurrences of different frequent sequence patterns for each user in multinomial and TF-IDF data format. Sequences with most discriminatory power based on class label, were extracted with Χ2 test and with Χ2 test with Bonferroni correction on all data.
The first and the second chapter present a concept of digital identity, identity management systems and main motivations for predicting users’ behavior with machine learning methods based on identity management systems.
The third chapter presents a concepts of machine learning, classification models training, feature generation process, and metrics for evaluation of classification models quality. The chapter also presents use cases of application of machine learning methods on analysis of log data, with a goal of intrusion detection in information systems, and application of machine learning methods in the field of human resource management.
The fourth chapter describes process of data collection, preparation and data model building, and describes tools used in the thesis.
The fifth chapter describes procedures for users’ performance prediction with machine learning methods with comments on results.
The conclusion presents reasons for poor classification results and proposes other applications of analysis of identity management systems with machine learning methods.
|