The field of peer-to-peer lending is a relatively new investment opportunity. As such, there is not much research regarding the risk factors of loans on this kind of platform. This thesis aims to determine whether potential investors can estimate their expected risk using machine learning models and publicly available data to aid in the process. Additionally, we aimed to investigate the effect of the COVID-19 pandemic on loan profiles and success rates.
The original dataset was expanded using additional data and cleaned (outlier detection, currency conversion). We tested several classificators (logistic regression, random forests, neural networks and the AdaBoost algorithm) on the expanded dataset. The neural network performed best with an F1 score of 0,783 and a classification accuracy of 78,5 \%, followed closely by the random forest classificator, although a significant difference between the two models was not identified. Nevertheless, due to its' ability to reveal more information about feature importance, the logistic regression model was chosen.Our analysis revealed that the most influential factors, in regards to loan performance, are their length, initial amount and loan rate percentage alongside the loan type itself. On the contrary, macroeconomic indicators and loan grades proved to be bad predictors of loan outcomes.
|