Protein interactions are involved in most cellular functions, with ligands binding to active sites and affecting the structure and function of molecules. We investigated the relationship between interactions and pathogenicity based on single nucleotide polymorphisms (SNPs). The work was carried out in silico on the GenProBiS server, which allows the prediction and visualization of binding sites, ligands, and genetic variants. The code is written in Python programming language, and a SPARQL query was constructed to integrate the biological data from various sources. Data were collected from PDB, Uniprot, Ensembl, ClinVar, DisGeNET and WikiPathways. SNP frequencies were analyzed by Fisher's exact test. SNPs in binding sites proved to be 2.72 times more likely pathogenic than those outside binding sites, and the percentage of pathogenic binding SNPs increased (from 62.8% to 91.1%) with the degree of evolutionary conservation. In the reference sequence, the highest percentage of pathogenicity was found for tryptophan (95.1%) and cysteine (89.6%), and arginine was replaced the most frequently. High odds ratios for SNP pathogenicity were obtained in binding sites for ions (10.10), cofactors (6.77), and nucleic acids (5.66); glycans had no significant effect (0.98). For each ligand type, a 3D visualization example was shown. Higher SNP pathogenicity in binding sites with water being the only ligand suggest the overlooked importance of protein-water interactions. A large impact of the data selection criteria on the size and diversity of the sample was observed. This work presents the most comprehensive analysis of the SNP pathogenicity in terms of amino acid composition and predicted ligands to date, and the first SNP-based analysis of the water molecule influence on protein interactions, which will allow further research on human pathogenesis on the structural interactomics level.
|