In this master’s thesis we present the development of the methodology for the analysis of insurance data. Due to the fact that insurance data are presented temporally and spatially, a special approach is necessary for the development of the methodology. For this purpose we use approaches of spatio-temporal data mining, which enable us the appropriate treatment of temporal and spatial attributes.
When analysing data we limit ourselves on the data of personal insurance. Into the treatment we capture data on reports of the loss events in the field of accidents and diseases. Insurance data are linked to the data on weather conditions on the day of the report of the claim of compensation. By this linkage we wish to use the thesis on the influence of the weather on the accidents. We hope that in this way we shall predict the number of the claims of compensation and the average amount of the disbursed claims of compensation more easily.
Firstly, we deal with the solving of the classification problem. We use some of the basic classification algorithms, but the level of successfulness of predicting in case of all algorithms proves to be extremely low. Due to the fact that the problems are by nature regression, we try to solve the regression problem too. Even regression algorithms do not offer much better results. We check the adequacy of training set time window. We get the confirmation that the time windows is selected appropriately with respect to the data. Furthermore, we check if the instances deal with the appropriate time scale. We come to the conclusion that the time scale is selected appropriately. We try to localize the problem: we divide the data. In different cases of the cut of Slovenia we come to the ascertainment that the estimate of the prediction for each of the local areas is worse than for the entire Slovenia.
The set methodology proves to be partially useful. It can be used for predicting the number of accidents. By means of diagnostics we receive the confirmation that the failure is due to the smallness of the multitude of treated events. We give proposals regarding the possibility of improvements. We hope that the usefulness of the set methodology will become evident in the future.
For the needs of the master’s thesis we also prepare the maps for following the accidents and diseases in Slovenia through time. On the basis of the maps we ascertain the trends for the future. We identify the most important factors, responsible for the emergence of the accidents. At the end we perform the analysis of the influence of the altitude for the emergence of the accidents.
|