Napovedovanje zdravstvenega absentizma s strojnim učenjem

Piciga, Aleksander

Napovedovanje zdravstvenega absentizma s strojnim učenjem
ID Piciga, Aleksander (Author), ID Kukar, Matjaž (Mentor) More about this mentor... This link opens in a new window

PDF - Presentation file, Download (1,04 MB)
MD5: 8AB88CAE347A70B18CC04E66EED741E6

Abstract

V diplomskem delu obravnavamo problem napovedovanja zdravstvenega absentizma zaposlenih v pomembnejšem slovenskem podjetju. Problem zdravstvenega absentizma je izjemno kompleksne narave in nanj vpliva mnogo različnih, tudi osebnih dejavnikov. Za podrobno analizo smo podatke najprej ustrezno preoblikovali. Nabor podatkov smo nato obogatili z novimi atributi, ki temeljijo na analizi in domenskem znanju. V delo smo vključevali tudi domensko znanje v sodelovanju s podjetjem. Preverili smo kakovost obogatenega nabora podatkov in ga uporabili v metodah strojnega učenja. Več različnih modelov smo primerjali z osnovnim modelom, ki napoveduje lansko vrednost. Kot najuspešnejši model se izkaže GBRT, ki v zadnjem letu dosega MAE ∼ 0.045, RMSE ∼ 0.10 in R2 ∼ 0.40, kar statistično značilno presega rezultate osnovnega modela. Analizirali smo tudi pomembnost atributov v modelih. Ustvarili smo pregleden in poenoten nabor podatkov ter odkrili ključne atribute za napovedovanje absentizma. Iz izsledkov smo pripravili tudi smernice za nadaljnje raziskovanje tega kompleksnega problema.

Language:	Slovenian
Keywords:	absentizem, podatkovna analiza, obogatitev podatkov, strojno učenje, analiza ključnih atributov.
Work type:	Bachelor thesis/paper
Organization:	FRI - Faculty of Computer and Information Science
Year:	2024
PID:	20.500.12556/RUL-164834
Publication date in RUL:	13.11.2024
Views:	51
Downloads:	3
Metadata:
:	Copy citation
Share:

Secondary language

Abstract:
Language:	English
Title:	Predicting Health-Related Absenteeism Using Machine Learning
In our thesis, we address the prediction of health-related employee absenteeism at a key Slovenian company. The issue of health-related absenteeism is highly complex and influenced by many factors, including personal ones. For our analysis, we first transformed the data into a suitable format and then enriched it with new attributes based on our analysis and domain knowledge. Throughout our work, we incorporated domain knowledge in collaboration with the company. We assessed the quality of the final dataset and used it in machine learning methods, with the best-performing model being GBRT. We tested our models against the baseline model, which predicted the last year’s value. The GBRT model achieved an MAE of ∼0.045, an RMSE of ∼0.10, and an R² of ∼0.40 on the latest year’s data. These results are statistically significant and exceed the baseline model’s performance. We also analyzed the importance of attributes in our models. We created a more detailed dataset and identified key attributes for predicting absenteeism. Based on our findings, we developed a set of guidelines for the company’s future research into this complex phenomenon.
Keywords:	absenteeism, data analysis, data enrichment, machine learning, key attribute analysis.

Similar works from RUL:
Similar works from other Slovenian collections:

Secondary language

Similar documents