The increase of medical research generates more and more findings which can result in new or enhanced existing treatments. This increase of medical research leads to problems at ensuring a sufficent number of patients for prospective studies of all of the promising treatments. On the other hand a prospective study can be simulated to a certain degree with a retrospective study using existing data. The main problem with this approach is, that the existing data usually have unbalanced distributions of characteristics over the sets of patients, which makes it difficult to evaluate effects of treatment. An algorithm is described for balancing sets of patients with given characteristics, which creates balanced subsets of patients using pairing and elimination of selected patients. The algorithm uses Pearson's chi-squared test for measuring the balance quality between two sets, and the sum of weighed differences between the characteristics for defining element pairs between sets. Two new element pairing strategies are introduced: a greedy method using an element similarity matrix, and the minimin algorithm using a state tree with limited depth for choosing the next elements to pair. A measure for the quality of a match between two sets is introduced. Results show that the greedy method gives better results from the original algorithm, whereas the minimin algorithm turns out to be time demanding because of the combinatorial complexity. At depths at which the algorithm is still practical to use, it gives results at best comparable to the original algorithm, but worse than the greedy method. The methods were experimentally compared on real data from medical studies in cancer treatment.
|