In our thesis, we address the prediction of health-related employee absenteeism at a key Slovenian company. The issue of health-related absenteeism is highly complex and influenced by many factors, including personal ones. For our analysis, we first transformed the data into a suitable format and then enriched it with new attributes based on our analysis and domain knowledge. Throughout our work, we incorporated domain knowledge in collaboration with the company. We assessed the quality of the final dataset and used it in machine learning methods, with the best-performing model being GBRT. We tested our models against the baseline model, which predicted the last year’s value. The GBRT model achieved an MAE of ∼0.045, an RMSE of ∼0.10, and an R² of ∼0.40 on the latest year’s data. These results are statistically significant and exceed the baseline model’s performance. We also analyzed the importance of attributes in our models. We created a more detailed dataset and identified key attributes for predicting absenteeism. Based on our findings, we developed a set of guidelines for the company’s future research into this complex phenomenon.
|