The purpose of this thesis is to present the basics of supervised machine learning and some of the most used methods of supervised machine learning (linear regression, logistic regression, k nearest neighbors, random forests, support vector machine, neural networks).
The first part of the thesis describes the main ideas of the selected machine learning methods, where the basic mathematical idea is given only for linear regression, whereas for other methods, the emphasis is on intuitive explanation. The described methods are presented for the case of predicting good and bad customers based on the given data. Models for methods are built in a program named Weka, which allows for a visual display of data and results. In addition to the main results, such as the accuracy of the method, Weka prints out various statistical indicators that measure its effectiveness.
The second part describes the neural networks method, its usage, and the implementation for an example that predicts whether the customer is good or bad for a bank (on a larger dataset than the previous example). This model is built in Python instead of Weka, which provides greater freedom in choosing the number of layers, the number of neurons, and other parameters.
|