Abstract
The paper presents an application of Conformal Predictors to a chemoinformatics problem of predicting the biological activities of chemical compounds. The paper addresses some specific challenges in this domain: a large number of compounds (training examples), high-dimensionality of feature space, sparseness and a strong class imbalance. A variant of conformal predictors called Inductive Mondrian Conformal Predictor is applied to deal with these challenges. Results are presented for several non-conformity measures extracted from underlying algorithms and different kernels. A number of performance measures are used in order to demonstrate the flexibility of Inductive Mondrian Conformal Predictors in dealing with such a complex set of data. This approach allowed us to identify the most likely active compounds for a given biological target and present them in a ranking order.
Article PDF
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.Avoid common mistakes on your manuscript.
References
Monev, V.: Introduction to similarity searching in chemistry. Comm. Math. Comp. Chem. 51, 7–38 (2004)
Bottou, L., Chapelle, O., DeCoste, D., Weston, J.: Large-scale kernel machines (neural information processing). The MIT press (2007)
Bussonnier, M.: Interactive parallel computing in Python. https://github.com/ipython/ipyparallel
Pérez, F., Granger, B.E.: IPython: a system for interactive scientific computing, vol. 9 (2007). http://ipython.org
Kluyver, T., et al.: Jupyter Notebooks – a publishing format for reproducible computational workflows. Positioning and Power in Academic Publishing: Players, Agents and Agendas, 87–90 doi:10.3233/978-1-61499-649-1-87
Chang, C.-C., Lin, C.-J.: LIBSVM: A library for support vector machines. ACM Trans. Intell. Syst. Technol. 2, 27:1–27:27 (2011). Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm
Chang, E.Y.: PSVM: parallelizing support vector machines on distributed computers. In: Foundations of Large-Scale Multimedia Information Management and Retrieval, pp. 213–230. Springer, Berlin Heidelberg (2011)
Faulon, J.-L., Visco, D.P. Jr., Pophale, R.S.: The signature molecular descriptor. 1. using extended valence sequences in qsar and qspr studies. J. Chem. Inf. Comput. Sci. 43(3), 707–720 (2003). PMID: 12767129
Gammerman, A., Vovk, V.: Hedging predictions in machine learning. Comput. J. 50(2), 151–163 (2007)
Gärtner, T.: Kernels for Structured Data. World Scientific Publishing Co., Inc., River Edge (2009)
Graf, H.P., Cosatto, E., Bottou, L., Durdanovic, I., Vapnik, V.: Parallel support vector machines: the cascade SVM. In: Advances in Neural Information Processing Systems, pp 521–528. MIT Press (2005)
Jain, A.N., Nicholls, A.: Recommendations for evaluation of computational methods. J. Comput. Aided Mol. Des. 22(3-4), 133–139 (2008)
Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion, B., Grisel, O., Blondel, M., Prettenhofer, P., Weiss, R., Dubourg, V., Vanderplas, J., Passos, A., Cournapeau, D., Brucher, M., Perrot, M., Duchesnay, É.: Scikit-learn: Machine learning in Python. J. Mach. Learn. Res. 12, 2825–2830 (2011)
Shafer, G., Vovk, V.: A tutorial on conformal prediction. J. Mach. Learn Res. 9, 371–421 (2008)
Vovk, V., Gammerman, A., Shafer, G.: Algorithmic Learning in a Random World. Springer-Verlag New York, Inc., Secaucus, NJ, USA (2005)
Weis, D.C., Visco, D.P. Jr.: Jean-loup Faulon. Data mining pubchem using a support vector machine with the signature molecular descriptor Classification of factor {XIa} inhibitors. J. Mol. Graph. Model. 27(4), 466 –475 (2008)
Holenz, J., et al. (eds.): Lead Generation: Methods and Strategies, vol. 68. Wiley-VCH (2016)
Woodsend, K., Gondziom, J.: Hybrid MPI/OpenMP parallel linear support vector machine training. J. Mach. Learn. Res. 10, 1937–1953 (2009)
You, Y., Fu, H., Song, S.L., Randles, A., Kerbyson, D., Marquez, A., Yang, G., Hoisie, A.: Scaling support vector machines on modern HPC platforms. J. Parallel Distrib. Comput. 76(C), 16–31 (2015)
Toccaceli, P., Nouretdinov, I., Gammerman, A.: Conformal predictors for compound activity prediction. In: COPA Proceedings of the 5th International Symposium on Conformal and Probabilistic Prediction with Applications, vol. 9653, p 2016. Springer-Verlag New York Inc. (2016)
Nouretdinov, I., Gammerman, A., Qi, Y., Klein-Seetharaman, J.: Determining confidence of predicted interactions between HIV-1 and human proteins using conformal method. Pac. Symp. Biocomput. 311 (2012)
Wang, Y., Suzek, T., Zhang, J., Wang, J., He, S., Cheng, T., Shoemaker, B.A., Gindulyte, A., Bryant, S.H.: Pubchem BioAssay: 2014 upyear. Nucleic Acids Res. 42(1), D1075–82 (2014)
McCool, M., Robison, A.D., Reinders, J.: Structured Parallel Programming: Patterns for Efficient Computation. Morgan-Kaufmann (2012)
Acknowledgments
This project (ExCAPE) has received funding from the European Union’s Horizon 2020 Research and Innovation programme under Grant Agreement no. 671555. We are grateful for the help in conducting experiments to the Ministry of Education, Youth and Sports (Czech Republic) that supports the Large Infrastructures for Research, Experimental Development and Innovations project “IT4Innovations National Supercomputing Center – LM2015070”. This work was also supported by EPSRC grant EP/K033344/1 (“Mining the Network Behaviour of Bots”) and by Technology Integrated Health Management (TIHM) project awarded to the School of Mathematics and Information Security at Royal Holloway as part of an initiative by NHS England supported by InnovateUK. We are indebted to Lars Carlsson of Astra Zeneca for providing the data and useful discussions. We are also thankful to Zhiyuan Luo and Vladimir Vovk for many valuable comments and discussions.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.
About this article
Cite this article
Toccaceli, P., Nouretdinov, I. & Gammerman, A. Conformal prediction of biological activity of chemical compounds. Ann Math Artif Intell 81, 105–123 (2017). https://doi.org/10.1007/s10472-017-9556-8
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10472-017-9556-8