When researching the incidence, prevalence, risk and mortality of diseases, we often present the data on maps, divided by a specific geographical unit (e.g. settlement, municipality, administrative unit, statistical region, a grid of squares with a diameter of 1 km). The geographical units can be compared between themselves based on the value of an indicator and thus indicate possible "hotspots" of the disease. Determining, for example, the risk of a particular disease can be problematic, especially on small geographical units, as spatial autocorrelation (observations on geographically closer regions are more alike) and sample variability (the difference can come from the small population or the heterogeneity of the individuals) have to be accounted for.
There is an abundance of the so-called spatial smoothing models, which mainly reduce extreme deviations, and take into account spatial autocorrelation and sample variability. In the Master's thesis, we have used the BYM model with the CAR distribution for the spatial component. One approach to calculating the smoothed values of relative risk is with the established MCMC methods, using the Gibbs sampler, whereas the other is to use the approximative method INLA. When using the Gibbs sampler to calculate smoothed values, the calculations take a long time, unlike when using INLA, as INLA is an approximative method. Spatial smoothing was first performed on the real number of new cancer cases in the period from 2006 to 2015 by municipalities, obtained from the Cancer Registry of Republic of Slovenia, where we observed the occurrence of breast cancer among women up to including 49 years old. In some cases, we have, additionally to the neighbourhood data, also included the Slovenian version of the deprivation index as a covariate. Slovenian municipalities highly vary in the number of inhabitants, therefore the number of cancer cases is unevenly distributed among them. In municipalities with a small population, the variability of the indicator is high solely due to the small number of cases, therefore in practice, such data needs to be spatially smoothed. In the second part of the thesis, we have made a comparison of both methods on generated data, trying to generally answer the question of how the methods behave in different situations.
When using the methods on real and generated data, we were able to observe minimal differences between them. However, for some general advice on when to use which, additional research is required.
|