The two-sample problem is centered about a statistical test that determines if two samples are from the same probability distribution. A special case of this problem is the paired two-sample problem, where the samples are dependent in the sense that each observation in one sample is paired with an observation in the other. This is a classical statistical problem that frequently arises in practice, for example in longitudinal studies where data is collected for each subject at two different time points.
The aim of this thesis is to develop a test for the paired two-sample problem with MCAR missing data. For this purpose we introduce the mathematical framework required for constructing the test. We begin by presenting the theory of U- and V-statistics and their asymptotic behavior. This is followed by a discussion of the theory of kernel functions, introducing the concept of reproducing kernel Hilbert spaces (RKHS) and kernel mean embeddings. Finally, we develop a nonparametric test that incorporates both the complete and incomplete pairs of observations. The test is based on the maximum mean discrepancy statistic, which is a (pseudo)distance between probability distributions, defined as the distance between their kernel mean embeddings in a RKHS. We establish the consistency of the test using the asymptotic properties of degenerate V-statistics. We also present a bootstrap algorithm for testing the null hypothesis and establish its consistency.
|