The Mann-Whitney test is a commonly used non-parametric alternative of the t-test. Despite its frequent use, it is only rarely accompanied with confidence intervals of an effect size. If reported, the effect size is usually measured with the difference of medians or the shift of the two distribution locations. Neither of these two measures directly coincides with the test statistic of the Mann-Whitney test, so the interpretation of the test results and the confidence intervals may be importantly different.
In this paper, we focus on the probability that the value of the random variable X is lower than the random variable Y. The measure's estimator is in a one-to-one relationship with the Mann-Whitney test statistic and the measure itself is often referred to as the degree of overlap or the probabilistic index. It equals the area under the ROC curve. Several methods have been proposed for the construction of the confidence interval for this measure, we review the most promising ones and explain their ideas. We study the properties of the different variance estimators and the small sample problems of the confidence intervals construction. We identify scenarios in which the existing approaches yield inadequate coverage probabilities. We conclude that the DeLong variance estimator is a reliable option regardless of the scenario, but the intervals should be constructed using the logit scale to avoid values above 1 or below 0 and the poor coverage probability that follows. A correction is needed for the case when all values from one group are smaller than the values of the other. We propose a method that improves the coverage probability also in these cases.