In this thesis, we investigated the effectiveness of machine learning methods, namely Random Forest, Gradient Boosting, and Lasso, in predicting the monthly returns of the S&P 500 index. Our analysis was based on various datasets that include technical, fundamental, and economic features. We found that technical data proved to be the most useful, while economic data showed the worst results. Among the used models, Lasso achieved the best results, while the results of the Random Forest and Gradient Boosting methods were comparable. Despite the fact that the prediction of returns based on different datasets did not yield good results, we found that it is possible to correctly select stocks for which the price will rise and fall using a long-short strategy.
|