With the emergence of new technologies, such as cryptocurrencies, new types of fraud are born. We focus on bank fraud common on crypto exchanges. We want to classify and distinguish fraudulent cases, so they can be prevented in advance. We use machine learning algorithms for classification based on decision trees, more specifically, random forests, i.e., ensembles of decision trees. Predicting the rare event of fraud leads to a classification problem with unbalanced distribution of the target variable, which we address with using methods for over- and under-sampling of learning data. We build the final model with random forest method alongside with transformation of categorical variables to numerical, which was proved to improve the results. Test results show that the end model is very useful, as it identifies almost all fraudulent cases. Encouraging results can be used for further development of the model and its implementation into the exchanges' system.
|