Prediction of Protein–ligand Interaction Based on Sequence Similarity and Ligand Structural Features
<p>Scenarios simulating prediction of new ligands for known proteins (<b>a</b>), new targets for known ligands (<b>b</b>), and the interaction between new ligands and new targets (<b>c</b>).</p> "> Figure 2
<p>The accuracy prediction obtained for datasets related to five protein–ligand groups. Each point corresponds to the AUC value from <a href="#ijms-21-08152-t002" class="html-table">Table 2</a> and <a href="#ijms-21-08152-t003" class="html-table">Table 3</a>. The blue, orange, and gray curves are related to the first, second, and third scenario, respectively. Each AUC value equals to maximum of ones calculated at frames of 7 and 30.</p> ">
Abstract
:1. Introduction
2. Results and Discussion
- (a)
- A new ligand for a target with a known ligand spectrum based on the chemical structure comparisons;
- (b)
- A new target for a ligand with a known target spectrum based on the protein sequence comparisons;
- (c)
- A new protein–ligand pair if both objects were uncharacterized. It includes a comparison of ligand structures as well as a comparison of protein sequences.
3. Material and Methods
3.1. Data Preparing
- Direct testing of the protein–ligand binding;
- The accurate definition of a protein (neither a homolog nor a protein family);
- Testing the isolated protein (it is not in a complex or cellular fraction, etc.);
- IC50, Ki, Kd as interaction parameters;
- Excluding the mutant proteins.
3.2. Activity Definition
- In the presence of both fixed and interval values, only fixed parameters were accounted for if even interval values contradicted fixed ones. The interaction index adopted a value of 1 if the data median did not exceed the cutoff and 0 otherwise;
- In the absence of fixed values, the protein–ligand pair was not considered if interval values put in non-intersected areas (e.g., <100 and >5000) or the parameter values were at the interval limited at two ends (e.g., >100 and <5000);
- In the case of a few intervals limited below (e.g., >100, >1000, >5000), the index adopted 0 if the maximal edge value was higher than a given cutoff; otherwise, the protein–ligand pair was excluded;
- In the case of a few intervals limited above (e.g., <100, <1000, <5000), the index adopted a value of 1 if the minimal extreme value was less than the cutoff; otherwise, the protein–ligand pair was excluded.
3.3. Prediction of the Target Proteins from Ligand Structural Features
3.4. Retrieving the Fuzzy Coefficients for Target–Ligand Pairs
3.5. Prediction of Protein–Ligand Interactions
4. Conclusions
Author Contributions
Funding
Conflicts of Interest
References
- Cherkasov, A.; Muratov, E.N.; Fourches, D.; Varnek, A.; Baskin, I.I.; Cronin, M.; Dearden, J.; Gramatica, P.; Martin, Y.C.; Todeschini, R.; et al. QSAR modeling: Where have you been? Where are you going to? J. Med. Chem. 2014, 57, 4977–5010. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Muratov, E.N.; Bajorath, J.; Sheridan, R.P.; Tetko, I.V.; Filimonov, D.; Poroikov, V.; Oprea, T.I.; Baskin, I.I.; Varnek, A.; Roitberg, A.; et al. QSAR without borders. Chem. Soc. Rev. 2020, 49, 3525–3564. [Google Scholar] [CrossRef] [PubMed]
- Yamanishi, Y. Chemogenomic approaches to infer drug-target interaction networks. Methods Mol. Biol. 2013, 939, 97–113. [Google Scholar] [CrossRef] [PubMed]
- Van Westen, G.J.P.; Wegner, J.K.; IJzerman, A.P.; van Vlijmenab, H.W.T.; Bender, A. Proteochemometric modeling as a tool to design selective compounds and for extrapolating to novel targets. Med. Chem. Commun. 2011, 2, 16–30. [Google Scholar] [CrossRef]
- Qiu, T.; Qiu, J.; Feng, J.; Wu, D.; Yang, Y.; Tang, K.; Cao, Z.; Zhu, R. The recent progress in proteochemometric modelling: Focusing on target descriptors, cross-term descriptors and application scope. Brief. Bioinform. 2017, 18, 125–136. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- He, Z.; Zhang, J.; Shi, X.H.; Hu, L.L.; Kong, X.; Cai, Y.D.; Chou, K.C. Predicting drug-target interaction networks based on functional groups and biological features. Int. J. Mol. Sci. 2010, 5, e9603. [Google Scholar] [CrossRef]
- Yamanishi, Y.; Koteyamanra, M.; Kanehisa, M.; Goto, S. Drug-target interaction prediction from chemical, genomic and pharmacological data in an integrated framework. Bioinformatics 2010, 26, i246–i254. [Google Scholar] [CrossRef]
- Xia, Z.; Wu, L.Y.; Zhou, X.; Wong, S.T. Semi-supervised drug-protein interaction prediction from heterogeneous biological spaces. BMC Syst. Biol. 2010, 4, S6. [Google Scholar] [CrossRef] [Green Version]
- Junaid, M.; Lapins, M.; Eklund, M.; Spjuth, O.; Wikberg, J.E. Proteochemometric modeling of the susceptibility of mutated variants of the HIV-1 virus to reverse transcriptase inhibitors. PLoS ONE 2010, 5, e14353. [Google Scholar] [CrossRef]
- Lapins, M.; Wikberg, J.E. Kinome-wide interaction modelling using alignment-based and alignment-independent approaches for kinase description and linear and non-linear data analysis techniques. BMC Bioinform. 2010, 11, 339. [Google Scholar] [CrossRef] [Green Version]
- Cao, D.S.; Liu, S.; Xu, Q.S.; Lu, H.M.; Huang, J.H.; Hu, Q.N.; Liang, Y.Z. Large-scale prediction of drug-target interactions using protein sequences and drug topological structures. Anal. Chim. Acta 2012, 752, 1–10. [Google Scholar] [CrossRef]
- Dakshanamurthy, S.; Issa, N.T.; Assefnia, S.; Seshasayee, A.; Peters, O.J.; Madhavan, S.; Uren, A.; Brown, M.L.; Byers, S.W. Predicting new indications for approved drugs using a proteochemometric method. J. Med. Chem. 2012, 55, 6832–6848. [Google Scholar] [CrossRef] [Green Version]
- Huang, Q.; Jin, H.; Liu, Q.; Wu, Q.; Kang, H.; Cao, Z.; Zhu, R. Proteochemometric modeling of the bioactivity spectra of HIV-1 protease inhibitors by introducing protein–ligand interaction fingerprint. PLoS ONE 2012, 7, e41698. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Gönen, M. Predicting drug-target interactions from chemical and genomic kernels using Bayesian matrix factorization. Bioinformatics 2012, 28, 2304–2310. [Google Scholar] [CrossRef]
- Paricharak, S.; Klenka, T.; Augustin, M.; Patel, U.A.; Bender, A. Are phylogenetic trees suitable for chemogenomics analyses of bioactivity data sets: The importance of shared active compounds and choosing a suitable data embedding method, as exemplified on kinases. J. Cheminform. 2013, 5, 49. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Ain, Q.U.; Méndez-Lucio, O.; Ciriano, I.C.; Malliavin, T.; van Westen, G.J.; Bender, A. Modelling ligand selectivity of serine proteases using integrative proteochemometric approaches improves model performance and allows the multi-target dependent interpretation of features. Integr. Biol. (Camb.) 2014, 6, 1023–1033. [Google Scholar] [CrossRef] [Green Version]
- Cortes-Ciriano, I.; Murrell, D.S.; van Westen, G.J.; Bender, A.; Malliavin, T.E. Prediction of the potency of mammalian cyclooxygenase inhibitors with ensemble proteochemometric modeling. J. Cheminform. 2015, 7, 1. [Google Scholar] [CrossRef] [Green Version]
- Pahikkala, T.; Airola, A.; Pietilä, S.; Shakyawar, S.; Szwajda, A.; Tang, J.; Aittokallio, T. Toward more realistic drug-target interaction predictions. Brief. Bioinform. 2015, 16, 325–337. [Google Scholar] [CrossRef]
- Shi, J.Y.; Liu, Z.; Yu, H.; Li, Y.J. Predicting drug-target interactions via within-score and between-score. Biomed. Res. Int. 2015, 350983. [Google Scholar] [CrossRef] [Green Version]
- Liu, Y.; Wu, M.; Miao, C.; Zhao, P.; Li, X.L. Neighborhood regularized logistic matrix factorization for drug-target interaction prediction. PLoS Comput. Biol. 2016, 12, e1004760. [Google Scholar] [CrossRef]
- Yamanishi, Y. Linear and kernel model construction methods for predicting drug-target interactions in a chemogenomic framework. In Computational Chemogenomics Methods in Molecular Biology; Brown, J., Ed.; Humana Press: New York, NY, USA, 2018; Volume 1825, pp. 355–368. [Google Scholar] [CrossRef]
- Snow, O.; Lallous, N.; Ester, M.; Cherkasov, A. Deep learning modeling of androgen receptor responses to prostate cancer therapies. Int. J. Mol. Sci. 2020, 21, 5847. [Google Scholar] [CrossRef]
- Danishuddin; Khan, A.U. Descriptors and their selection methods in QSAR analysis: Paradigm for drug design. Drug Discov. Today 2016, 21, 1291–1302. [Google Scholar] [CrossRef]
- Karaman, M.W.; Herrgard, S.; Treiber, D.K.; Gallant, P.; Atteridge, C.E.; Campbell, B.T.; Chan, K.W.; Ciceri, P.; Davis, M.I.; Edeen, P.T.; et al. A quantitative analysis of kinase inhibitor selectivity. Nat. Biotechnol. 2008, 26, 127–132. [Google Scholar] [CrossRef]
- Gao, Y.; Davies, S.P.; Augustin, M.; Woodward, A.; Patel, U.A.; Kovelman, R.; Harvey, K.J. A broad activity screen in support of a chemogenomic map for kinase signaling research and drug discovery. Biochem. J. 2013, 15, 313–328. [Google Scholar] [CrossRef]
- Ragland, D.A.; Nalivaika, E.A.; Nalam, M.N.; Prachanronarong, K.L.; Cao, H.; Bandaranayake, R.M.; Cai, Y.; Kurt-Yilmaz, N.; Schiffer, C.A. Drug resistance conferred by mutations outside the active site through alterations in the dynamic and structural ensemble of HIV-1 protease. J. Am. Chem. Soc. 2014, 136, 11956–11963. [Google Scholar] [CrossRef] [Green Version]
- Tarasova, O.; Filimonov, D.; Poroikov, V. PASS-based approach to predict HIV-1 reverse transcriptase resistance. J. Bioinform. Comput. Biol. 2017, 15, 1650040. [Google Scholar] [CrossRef]
- Karasev, D.A.; Veselovsky, A.V.; Lagunin, A.A.; Filimonov, D.A.; Sobolev, B.N. Determination of amino acid residues responsible for specific interaction of protein kinases with small molecule inhibitors. Mol. Biol. (Mosk.) 2018, 52, 478–487. [Google Scholar] [CrossRef]
- Karasev, D.; Sobolev, B.; Lagunin, A.; Filimonov, D.; Poroikov, V. Prediction of protein–ligand interaction based on the positional similarity scores derived from amino acid sequences. Int. J. Mol. Sci. 2019, 21, 24. [Google Scholar] [CrossRef] [Green Version]
- Filimonov, D.A.; Lagunin, A.A.; Gloriozova, T.A.; Rudik, A.V.; Druzhilovskii, D.S.; Pogodin, P.V.; Poroikov, V.V. Prediction of the biological activity spectra of organic compounds using the PASS online web resource. Chem. Heterocycl. Comp. 2014, 50, 444–457. [Google Scholar] [CrossRef]
- Pogodin, P.V.; Lagunin, A.A.; Filimonov, D.A.; Poroikov, V.V. PASS Targets: Ligand-based multi-target computational system based on a public data and naïve Bayes approach. Sar. Qsar. Environ. Res. 2015, 26, 783–793. [Google Scholar] [CrossRef]
- Sonego, P.; Kocsor, A.; Pongor, S. ROC analysis: Applications to the classification of biological sequences and 3D structures. Brief. Bioinform. 2008, 9, 198–209. [Google Scholar] [CrossRef]
- Roskoski, R., Jr. Properties of FDA-approved small molecule protein kinase inhibitors. Pharm. Res. 2019, 144, 19–50. [Google Scholar] [CrossRef]
- Gaulton, A.; Hersey, A.; Nowotka, M.; Bento, A.P.; Chambers, J.; Mendez, D.; Mutowo, P.; Atkinson, F.; Bellis, L.J.; Cibrián-Uhalte, E.; et al. The ChEMBL database in 2017. Nucleic Acids Res. 2017, 45, D945–D954. [Google Scholar] [CrossRef]
Protein Group | Parameter | Cutoff µmol | Number of Target Proteins | Number of Ligands |
---|---|---|---|---|
GPCR * | IC50 | 1 | 126 | 546 |
10 | 130 | 839 | ||
Kd | 1 | 30 | 8 | |
10 | 32 | 15 | ||
Ki | 1 | 110 | 4754 | |
10 | 112 | 6411 | ||
Protein kinases | IC50 | 1 | 200 | 3014 |
10 | 215 | 3883 | ||
Kd | 1 | 233 | 111 | |
10 | 307 | 120 | ||
Ki | 1 | 72 | 277 | |
10 | 77 | 339 | ||
Ion channel (ligand-gated) | Ki | 1 | 16 | 15 |
10 | 16 | 20 | ||
Ion channel (voltage-gated) | IC50 | 1 | 29 | 75 |
10 | 35 | 163 | ||
Nuclear receptors | IC50 | 1 | 26 | 121 |
10 | 28 | 340 | ||
Kd | 1 | 13 | 15 | |
10 | 13 | 19 | ||
Ki | 1 | 22 | 55 | |
10 | 23 | 89 |
Protein Group | Parameter | Cutoff µmol | AUC * |
---|---|---|---|
GPCR | IC50 | 1 | 0.986 |
10 | 0.981 | ||
Kd | 1 | 0.979 | |
10 | 0.981 | ||
Ki | 1 | 0.987 | |
10 | 0.984 | ||
Protein kinases | IC50 | 1 | 0.963 |
10 | 0.954 | ||
Kd | 1 | 0.808 | |
10 | 0.803 | ||
Ki | 1 | 0.980 | |
10 | 0.981 | ||
Ion channel (ligand-gated) | Ki | 1 | 0.986 |
10 | 0.991 | ||
Ion channel (voltage-gated) | IC50 | 1 | 0.986 |
10 | 0.968 | ||
Nuclear receptors | IC50 | 1 | 0.984 |
10 | 0.981 | ||
Kd | 1 | 0.969 | |
10 | 0.972 | ||
Ki | 1 | 0.993 | |
10 | 0.996 |
Protein Group | Parameter | Cutoff µmol | Second Scenario | Third Scenario | ||
---|---|---|---|---|---|---|
Frame = 7 | Frame = 30 | Frame = 7 | Frame = 30 | |||
GPCR | IC50 | 1 | 0.959 | 0.968 | 0.885 | 0.904 |
10 | 0.944 | 0.953 | 0.890 | 0.910 | ||
Kd | 1 | 0.918 | 0.875 | 0.806 | 0.805 | |
10 | 0.967 | 0.962 | 0.874 | 0.882 | ||
Ki | 1 | 0.963 | 0.976 | 0.881 | 0.901 | |
10 | 0.966 | 0.977 | 0.899 | 0.918 | ||
Protein kinases | IC50 | 1 | 0.896 | 0.924 | 0.824 | 0.857 |
10 | 0.866 | 0.902 | 0.769 | 0.838 | ||
Kd | 1 | 0.686 | 0.790 | 0.642 | 0.655 | |
10 | 0.710 | 0.797 | 0.647 | 0.650 | ||
Ki | 1 | 0.930 | 0.956 | 0.869 | 0.893 | |
10 | 0.907 | 0.937 | 0.866 | 0.856 | ||
Ion channel (ligand-gated) | Ki | 1 | 0.979 | 0.979 | 0.948 | 0.957 |
10 | 0.970 | 0.973 | 0.958 | 0.963 | ||
Ion channel (voltage-gated) | IC50 | 1 | 0.985 | 0.985 | 0.913 | 0.932 |
10 | 0.886 | 0.898 | 0.787 | 0.839 | ||
Nuclear receptors | IC50 | 1 | 0.966 | 0.973 | 0.817 | 0.852 |
10 | 0.984 | 0.988 | 0.924 | 0.943 | ||
Kd | 1 | 0.995 | 0.995 | 0.973 | 0.961 | |
10 | 0.995 | 0.995 | 0.982 | 0.972 | ||
Ki | 1 | 0.994 | 0.988 | 0.986 | 0.976 | |
10 | 0.993 | 0.987 | 0.992 | 0.989 |
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. |
© 2020 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).
Share and Cite
Karasev, D.; Sobolev, B.; Lagunin, A.; Filimonov, D.; Poroikov, V. Prediction of Protein–ligand Interaction Based on Sequence Similarity and Ligand Structural Features. Int. J. Mol. Sci. 2020, 21, 8152. https://doi.org/10.3390/ijms21218152
Karasev D, Sobolev B, Lagunin A, Filimonov D, Poroikov V. Prediction of Protein–ligand Interaction Based on Sequence Similarity and Ligand Structural Features. International Journal of Molecular Sciences. 2020; 21(21):8152. https://doi.org/10.3390/ijms21218152
Chicago/Turabian StyleKarasev, Dmitry, Boris Sobolev, Alexey Lagunin, Dmitry Filimonov, and Vladimir Poroikov. 2020. "Prediction of Protein–ligand Interaction Based on Sequence Similarity and Ligand Structural Features" International Journal of Molecular Sciences 21, no. 21: 8152. https://doi.org/10.3390/ijms21218152
APA StyleKarasev, D., Sobolev, B., Lagunin, A., Filimonov, D., & Poroikov, V. (2020). Prediction of Protein–ligand Interaction Based on Sequence Similarity and Ligand Structural Features. International Journal of Molecular Sciences, 21(21), 8152. https://doi.org/10.3390/ijms21218152