The aim of this master’s thesis was to use simulations to examine the impact of various randomization procedures on covariate imbalance in small clinical trials and to assess whether covariate-adaptive randomization (CAR) provides additional benefits in terms of statistical power compared with classical randomization procedures. We considered a study with two treatments and one categorical covariate.
Simulations were designed using the ADEMP approach. The covariate (age) was generated from a Beta distribution and transformed to the interval of 30–80 years, then categorized into five age groups. The outcome (VO₂peak) was generated as a function of age, the intervention effect, and random noise. Sample sizes of n = 20–100 were considered, together with scenarios without an intervention effect and with effect sizes of δ = 1, 2 and 3,5 ml/kg/min. In addition to the baseline scenario, scenarios involving increased asymmetry of the covariate distribution, absence of association between the covariate and the outcome, and nonlinear associations between the covariate and the outcome were included. We compared simple, block, and stratified randomization, as well as two CAR approaches: Pocock and Simon minimization and the general approach proposed by Hu and Hu.
Randomization procedures were evaluated using between-group imbalance, imbalance across covariate levels, and the standardized mean difference (SMD). For statistical inference, we used the t-test, a linear regression model including age as a covariate, the adjusted t-test, the bootstrap t-test, and the randomization test. Statistical inference was evaluated using Type I error, statistical power, bias, and root-mean-square error (RMSE). To ensure the stability of the estimates, 7,000 simulation repetitions were performed, and uncertainty was evaluated using the Monte Carlo standard error.
Simple randomization resulted in the greatest variability in imbalance, whereas block, stratified, and CAR procedures were more effective at maintaining group balance. Relative imbalance decreased with increasing sample size across all procedures. Stratified randomization and CAR procedures also achieved the lowest SMD values, indicating more effective covariate balancing between the control and intervention groups.
Effect estimates were unbiased in all scenarios. RMSE decreased with increasing sample size for all methods, with the highest values observed for the t-test combined with simple or block randomization. Stratified randomization, CAR procedures, and the remaining statistical approaches achieved slightly lower RMSE values.
In statistical inference, the linear model including the covariate proved to be the most reliable approach, as it ensured adequate control of Type I error in all scenarios and achieved the highest statistical power. The t-test was appropriate primarily for simple and block randomization, whereas it often became conservative under stratified and CAR randomization. The adjusted t-test improved its performance but did not always ensure adequate calibration in smaller samples. The randomization test and bootstrap t-test from the carat package did not provide consistent control of Type I error, whereas their manually implemented versions showed better calibration. The results suggest a possible mismatch between the implementation of the tests, or their assumptions, and the simulation framework used.
Statistical power was lowest for the t-test, whereas the remaining approaches achieved higher and comparable power. The linear model achieved the highest power, while the adjusted t-test, the manually implemented randomization test, and the manually implemented bootstrap t-test achieved only slightly lower values.
The results indicate that the greatest contribution to statistical power stems from the direct inclusion of the covariate in the analysis rather than solely from the randomization procedure used. At the same time, the choice of an appropriate statistical test according to the randomization procedure is also crucial for valid statistical inference.
|