Primerjava skupin pri podatkih z visokim deležem nakopičenih vrednosti

KUSTEC, MARUŠA

Repository of the University of Ljubljana

Details

Primerjava skupin pri podatkih z visokim deležem nakopičenih vrednosti
ID KUSTEC, MARUŠA (Author), ID Lusa, Lara (Mentor) More about this mentor... This link opens in a new window

PDF - Presentation file, Download (879,31 KB)
MD5: 1AA28D1389AFB24112113A35705DAA5F
PID: 20.500.12556/rul/9e05fb88-4d5c-4029-85e4-b620dd54ee7e

Abstract

V tej magistrski nalogi preučujemo pristope za analizo posebne vrste podatkov, ki se pogosto pojavijo pri raziskavah v genomiki. Primer takšnih podatkov so obravnavani podatki o virusih, kjer nas zanima primerjava koncentracije posameznega virusa v vzorcu med skupinami otrok, ki so bili predhodno različno diagnosticirani. Spremenljivke, ki opisujejo koncentracijo virusov, imajo del vrednosti nakopičen pri eni točki, kar predstavlja problem pri analizi podatkov. Takšne spremenljivke poimenujemo spremenljivke z nakopičenimi vrednostmi. Za ugotavljanje povezanosti skupin otrok s koncentracijo virusa, smo izbrali štiri metode: model sorazmernih obetov, model Tobit, kombiniran pristop logistične in linearne regresije (model Log+Lin) in Mann-Whitneyev test. Prve tri metode omogočajo vključitev dodatnih pojasnjevalnih spremenljivk v analizo. Spremenljivka z nakopičenimi vrednostmi je obravnavana kot odvisna spremenljivka. S simulacijami smo preučevali delovanje izbranih metod v različnih situacijah. Izkaže se, da imata velik vpliv na delovanje metod skladnost razlik in sorazmernost v podatkih. Model sorazmernih obetov, model Tobit in Mann-Whitneyev test imajo primerljive moči v večini situacij, le model Tobit pa ohrani ustrezno velikost testa v vseh situacijah. Edini obravnavani dvodelni pristop, model Log+Lin, ima bistveno prednost pred omenjenimi enodelnimi pristopi ob prisotnosti neskladnih razlik in nesorazmerij. Ker v podatkih o virusih pričakujemo oboje, dvodelni pristop prepoznamo kot najbolj primeren pristop za analizo. Dodatno preučimo še delovanje testa, ki preverja veljavnost predpostavke o sorazmernih obetih. Test je anti-konzervativen in ima majhno moč pri majhnem vzorcu.

Language:	Slovenian
Keywords:	spremenljivke z nakopičenimi vrednostmi, primerjava skupin, model sorazmernih obetov, model Tobit, Mann-Whitneyev test, kombiniran pristop logistične in linearne regresije, sorazmerni obeti, skladnost razlik
Work type:	Master's thesis/paper
Organization:	FE - Faculty of Electrical Engineering
Year:	2016
PID:	20.500.12556/RUL-87166
Publication date in RUL:	28.11.2016
Views:	3621
Downloads:	763
Metadata:
:	Copy citation
Share:

Secondary language

Abstract:
Language:	English
Title:	Comparison of groups with spikes in their covariate distributions
This thesis evaluates the literature suggested data analysis approaches for a special type of data that arises in genomic experiments. In our data, we are interested in comparing the viral concentration of four groups of children. Variables that describe viral concentrations have a high proportion of values clumped at one point, which poses a problem when analysing this data. We denote such variables as variables with a point-mass. To analyse the relationship between groups and viral concentrations we used four methods: the proportional odds model, the Tobit model, a combination of logistic regression and a linear model (Log+Lin model), and the Mann-Whitney test. The first three methods allow us to include additional explanatory variables in the analysis. Variables with a point-mass are handled as outcome variables. We assessed the performance of the selected methods in different situations through a series of simulation studies. Our results indicate that consonant and dissonant effects, as well as the property of proportionality in the data, have a big impact on method performance. The proportional odds model, the Tobit model and the Mann-Whitney test have comparable power to identify differences between groups in most situations, but only the Tobit model keeps an adequate type I error in all situations. The only considered two-part test in the study, the Log+Lin model, has an advantage over the mentioned one-part tests when dissonant differences and nonproportionality are present in the data. Because we expect to find both properties in our data, we recommend the use of the two-part approach. Additionally, we assessed the performance of a test that validates the proportional odds assumption. The test proved to be anti-conservative and loses power when the sample size is small.
Keywords:	variables with a point mass, group comparison, proportional odds model, Tobit model, Mann-Whitney test, a combination of logistic regression and a linear model, proportional odds, consonant and dissonant effects

Similar works from RUL:
Similar works from other Slovenian collections:

Details

Secondary language

Similar documents