Rank data is excessively common and ubiquitous, but not much research has been done for mining them and only few methods exist. We can find this kind of data in various competitions, user preferences and various voting events. Rank data is well suited for data that is hard to compare or differs in magnitude. We implemented two existing rank matrix factorisation algorithms that use the max-product semiring and the integer programming that they employ. Algorithm Sparse mRMF searches for recurring subsequences of rankings in the rows of the rank matrix. Algorithm mRMT searches for tiles with high ranks. We turned data that links gene expression and cancer type into rank form and demonstrated that mRMT can, by itself, find existing subclassifications of cancer types.
|