This thesis evaluates the literature suggested data analysis approaches for a special type of data that arises in genomic experiments. In our data, we are interested in comparing the viral concentration of four groups of children. Variables that describe viral concentrations have a high proportion of values clumped at one point, which poses a problem when analysing this data. We denote such variables as variables with a point-mass.
To analyse the relationship between groups and viral concentrations we used four methods: the proportional odds model, the Tobit model, a combination of logistic regression and a linear model (Log+Lin model), and the Mann-Whitney test. The first three methods allow us to include additional explanatory variables in the analysis. Variables with a point-mass are handled as outcome variables.
We assessed the performance of the selected methods in different situations through a series of simulation studies. Our results indicate that consonant and dissonant effects, as well as the property of proportionality in the data, have a big impact on method performance. The proportional odds model, the Tobit model and the Mann-Whitney test have comparable power to identify differences between groups in most situations, but only the Tobit model keeps an adequate type I error in all situations. The only considered two-part test in the study, the Log+Lin model, has an advantage over the mentioned one-part tests when dissonant differences and nonproportionality are present in the data. Because we expect to find both properties in our data, we recommend the use of the two-part approach.
Additionally, we assessed the performance of a test that validates the proportional odds assumption. The test proved to be anti-conservative and loses power when the sample size is small.
|