Weather and air pollution are known to affect the occurrence of respiratory diseases. In this thesis, we built machine learning models that predict the number of respiratory disease diagnoses in Slovenia based on meteorological data and air pollution data. We modeled monthly data sorted by time and developed a sliding window forecasting algorithm. The sliding window forecasting results are shown to be better than the classical forecasting method. The aim of the task was to assess the impact of air pollution (amount of PM10 and NO2) on the number of diagnoses of respiratory diseases. We did this by comparing models trained only on meteorological data with models to which we added air pollution data. We found that air pollution does have an impact, but it is smaller than meteorological data. In the end, we explained our models using the SHAP method, and the results of the analysis were supported and linked to various articles. We assume that the models would be even better if they modeled daily data, had data for several respiratory diseases, and for a longer period of time. Our models could be developed into warning systems in the future.
|