Retrospective analysis of rare diseases frequently employs case-control studies,
but retrospective cohort studies can be used, as well. In the thesis, we compare
results of both studies to assess to what extent their results are comparable and
to evaluate which study is better in a specic situation.
In the thesis we re-analyzed a dataset of real data on tick-borne meningo-
encephalitis using univariate and multiple logistic regression for data analysis.
Real data results were supplemented with simulations, where we compared bias
of regression coecient, variability of regression coecient and percentage of sta-
tistically signicant results.
In the theoretical part of the thesis we brie
y described the cohort study and
the case-control study, their main characteristics, and measures of association
(relative risk and odds ratio). Among other things, we focused on confounding
variable problem and described methods that can be used to control confounding.
Statistical models that were used for data analysis were presented as well: logistic
regression, condence interval, Wald's statistical test, statistical power and type I
error. We concluded the theoretical part with real data description, brief variable
description and methods used to analyze real data.
Real data characteristics were explored using descriptive statistics (frequency
and percentage in case of descriptive variables, median value and interquartile
range in case of numerical variables, number of missing values). We noticed
that two variables had a high percentage of missing data. With univariate logi-
stic regression's analysis (retrospective cohort study) all numerical variables were
statistically signicantly associated to the outcome, except for age. Results of
unconditional and conditional logistic regression were almost identical { statisti-
cal power of unconditional regression was slightly higher. With multiple logistic
regression none of the varaibles achieved statistical signicance, but results of
unconditional and conditional logistic regression were again comparable.
For simulations we generated a cohort of comparable size and percentage of
events similar to those observed for real data. The smallest bias of explana-
tory variables' regression coecient was found in case of retrospective cohort
study with adjustment for confounding. The same study had the best statisti-
cal power, which was expected, since cohort study had larger sample size than
case-control study. We would like to emphasise that with case-control study we
obtained results that were comparable to those from cohort study, or at least
they approached them increasing the number of controls. We do not recommend
use of case-control study when the explanatory variable of interest and confoun-
ding variable are highly correlated, which causes overmatching and consequently
regression coecients are underestimated and statistical power is very low. Simu-
lation results supported our suppositions that there is practically no dierence
in results of unconditional and conditional logistic regression. The only slight
dierence that we observed was in statistical power, where unconditional logistic
regression performed slightly better.
|