In this thesis we address the problem of multiple testing of a hypothesis, where the need arises to combine the obtained $p$-values into a single decision to reject the null hypothesis when performing several statistical tests. Various methods for combining $p$-values are presented, including Fisher's method, Brown's empirical method, and combination based on generalized means. We analyze the theoretical properties of individual $p$-values and the combined $p$-values, and empirically assess the power of individual methods using simulations in Python. Special emphasis is placed on the impact of correlation between statistical tests on the power of testing hypothesis $H_0$, based on the combined $p$-values. An example of multiple testing of a hypothesis is also provided in the context of testing random number generators, which form the basis of gaming products.
|