Multitask learning is an approach to machine learning, in which algorithm learns to solve multiple related problems. It tries to find one common model instead of building multiple separate models. Such a model is usually smaller than the sum of separate models, easier to understand and less likely to overfit training data. In prediction stage the algorithm predicts values for several problems at the same time. Problems that are learned together must be related, so that learning of one problem can improve learning of other problems. Currently this approach is used with tree models for either multiple classification or multiple regression tasks. In this work we extend the approach to mixed classification and regression tasks.
During construction of trees different attribute selection methods are used in regression and classification. The returned scores are not directly comparable, so in our scenario we rank attributes for each task and choose the attribute that is best ranked in total.
We implement multitask regression and classification tree, multitask bagging and multitask random forest based on rankings of attributes. We compare these algorithms with their single task variants, with regular multitask tree and with multitask neural network.
We propose task relatedness measure based on ranking of attributes. In this way we can find related tasks in a dataset and use them together in multitask approach. On one dataset implemented multitask random forest works statistically significantly better than single-task version. On some datasets implemented algorithms work worse than single-task versions.
|