This thesis deals with applying machine learning algorithms in sports analytics. Sports analytics can be defined as management of structured data collected from sport activities that would allow for getting competitive advantage for a sport team or an individual. More specifically, I am interested in predicting the results of basketball games. I used machine learning algorithms to builds prediction models from data on the outcomes of the games played within the NBA league and data on the league participants. The thesis documents the whole process of data analysis, from the early phase of data collection and transformation, through selecting optimal settings of the parameters of the algorithms for learning predictive models, to the final phase of selecting the most accurate predictive models.
The main purpose of the thesis is a preparation of a guide for beginners in the field of machine learning that would familiarize interested students with machine learning algorithms through a show case of their application. To follow the purpose, the first, theoretical part of the thesis introduces different machine learning algorithms for building prediction models from data and methods for measuring the models' accuracy. The focus of the second, practical part is on building accurate models for predicting the outcomes of NBA games. Results obtained on data from the last three NBA seasons show that linear models are the most accurate ones. First research question was related to the benefit of using principal component analysis for data pre-processing. As expected, due to the large number of predictive variables, the use of principal component analysis leads to more accurate models. The focus of the second research question was on exploring the impact of team’s rank in the standings on the predictive accuracy. Results show that team’s place in the standings at the end of the season is not related to the accuracy of the model for predicting the outcomes of their games.