We have developed a neural network for assigning taxonomic labels based on short genome sequences. Sequences are collected with next-generation sequencing. We evaluated the contribution of extra attributes of genome and the influence of using extra dependencies between target classes, which are encoded as a taxonomy. We designed a deep convolutional neural network and evaluated it on reference genomes of viruses and bacteria. Experiments show that at best we can achieved a ROC AUC score of 0.92. We observed, that the use of extra attributes did not improve accuracy. However, using the information on extra dependencies among target classes decreased the number of training data needed for training the convolutional neural network.
|