Data fusion by matrix factorization methods have a cold start problem in common, which is characterized by a lack of initial data that could suffice for the initiation of the algorithms' learning process.
In this master thesis we focus on the DFMF method and adjust it in such a way that a cold start problem is addressed by transfer learning.
We implement several adjustments of the method and cross validate their efficiency on artificially created data where most of the adjustments reach higher AUC numbers than its basic version.
Then we apply the adjusted methods on the real problem of defining viral bacterial hosts, with numerous in laboratory confirmed interactions, upon which we wish to suggest potentially new ones.
Transfer learning is achieved with the use of convolutional neural network used for predicting taxonomic classification of organisms which we adjust in such a way that vectors from the last level can be used for the initialization of the factor matrix in the DFMF method.
Cross validation suggests that two of the adjusted versions reach approximately the same precision results as the basic DFMF method, whereas the others prove to be worse.
In the end we present some potentially new interactions among bacteriophage and bacteria which we predict with the basic method and one of the adjusted versions that gives the best results.
|