Interactions between proteins and RNA play an important role in the regulation of gene expression and therefore in the functioning of cells. Errors in interactions are often related to the development of diseases, such as neuropathy, cancer, etc. To this end, knowing the locations of interactions is crucial for understanding, discovering and managing gene expression and for treating those diseases.
The master's thesis focuses on modeling the amino acids interacting with RNA based on simulated data on RBDmap experiments, which is the continuation of the study by Castello et al. from 2012. RBDmap was simulated using the PDB database on 3D structures of ribonucleoprotein complexes. A number of methods of machine learning, such as support vector machines, classification tree, naive Bayes classifier and k-nearest neighbours were evaluated for predicting individual amino acids and fragments of amino acids interacting with RNA. Moreover, a method was developed to determine amino acids interacting with RNA, which considers the characteristics of fragments of amino acids and the entire protein. The method achieved good results (AUC 0.783), which is comparable with current methods. Including features on fragments did not improve the predictive model.