Protein–RNA interactions play an important role in a wide variety of cellular processes. We can determine protein-RNA complexes experimentally, but that is a difficult and slow process. Even though their accuracy is lower than that of experimental observations, faster computational predictions can be sufficiently accurate to guide experimental validation. In this thesis, we first try to predict the site of interactions on amino acids, which are the building block of proteins in the protein-RNA complex. For every amino acid we infer two predictions, we predict whether the amino acid is in interaction with the RNA and also which parts of the amino acid are in interactions with the RNA. For predictions on amino acids, we implemented a 3D convolutional neural network. We also developed a method to combine these predictions on amino acids into a spacial prediction of interactions in 3D protein and RNA complexes. We estimate the performance of our method with classification accuracy and ROC AUC measured on every 3D protein and RNA complex. The average AUC estimated on Protein-RNA complexes in the test set equals 0.79, whereas the average ROC AUC in an additional, independent test set equals 0.74. We also observe that more specific predictions on amino acids give better final predictions.
|