Review Article
Published: 11 April 2019

Applications of machine learning in drug discovery and development

Jessica Vamathevan ORCID: orcid.org/0000-0003-2016-9754¹,
Dominic Clark¹,
Paul Czodrowski ORCID: orcid.org/0000-0002-7390-8795²,
Ian Dunham³,
Edgardo Ferran¹,
George Lee⁴,
Bin Li⁵,
Anant Madabhushi^6,7,
Parantu Shah⁸,
Michaela Spitzer³ &
…
Shanrong Zhao⁹

Nature Reviews Drug Discovery volume 18, pages 463–477 (2019)Cite this article

84k Accesses
1511 Citations
274 Altmetric
Metrics details

Subjects

Abstract

Drug discovery and development pipelines are long, complex and depend on numerous factors. Machine learning (ML) approaches provide a set of tools that can improve discovery and decision making for well-specified questions with abundant, high-quality data. Opportunities to apply ML occur in all stages of drug discovery. Examples include target validation, identification of prognostic biomarkers and analysis of digital pathology data in clinical trials. Applications have ranged in context and methodology, with some approaches yielding accurate predictions and insights. The challenges of applying ML lie primarily with the lack of interpretability and repeatability of ML-generated results, which may limit their application. In all areas, systematic and comprehensive high-dimensional data still need to be generated. With ongoing efforts to tackle these issues, as well as increasing awareness of the factors needed to validate ML approaches, the application of ML can promote data-driven decision making and has the potential to speed up the process and reduce failure rates in drug discovery and development.

Access through your institution

Buy or subscribe

This is a preview of subscription content, access via your institution

Access options

Access through your institution

Buy this article

Purchase on SpringerLink
Instant access to full article PDF

Buy now

Prices may be subject to local taxes which are calculated during checkout

**Fig. 1: Machine learning applications in the drug discovery pipeline and their required data characteristics.**

**Fig. 2: Machine learning tools and their drug discovery applications.**

**Fig. 3: The challenges of compound structure representation in machine learning models.**

**Fig. 4: Utilizing predictive biomarkers to support drug discovery and development.**

**Fig. 5: Computational pathology tasks for machine learning applications.**

Machine learning in preclinical drug discovery

Article 19 July 2024

The future of machine learning for small-molecule drug discovery will be driven by data

Article 15 October 2024

Drug ranking using machine learning systematically predicts the efficacy of anti-cancer drugs

Article Open access 25 March 2021

References

Mamoshina, P. et al. Machine learning on human muscle transcriptomic data for biomarker discovery and tissue-specific drug target identification. Front. Genet. 9, 242 (2018).
PubMed PubMed Central Google Scholar
LeCun, Y., Bengio, Y. & Hinton, G. Deep learning. Nature 521, 436 (2015).
CAS PubMed Google Scholar
Chen, H., Engkvist, O., Wang, Y., Olivecrona, M. & Blaschke, T. The rise of deep learning in drug discovery. Drug Discov. Today 23, 1241–1250 (2018). This article is the first effort to highlight the recent applications of DL in drug discovery research and is an introduction to some popular DL architectures.
PubMed Google Scholar
Hinton, G. Deep learning — a technology with the potential to transform health care. JAMA 320, 1101–1102 (2018).
PubMed Google Scholar
Wong, C. H., Siah, K. W. & Lo, A. W. Estimation of clinical trial success rates and related parameters. Biostatistics https://doi.org/10.1093/biostatistics/kxx069 (2018).
Article PubMed Central Google Scholar
Jeon, J. et al. A systematic approach to identify novel cancer drug targets using machine learning, inhibitor design and high-throughput screening. Genome Med. 6, 57 (2014).
PubMed PubMed Central Google Scholar
Ferrero, E., Dunham, I. & Sanseau, P. In silico prediction of novel therapeutic targets using gene-disease association data. J. Transl Med. 15, 182 (2017).
PubMed PubMed Central Google Scholar
Riniker, S., Wang, Y., Jenkins, J. & Landrum, G. Using information from historical high-throughput screens to predict active compounds. J. Chem. Inf. Model. 54, 1880–1891 (2014).
CAS PubMed Google Scholar
Godinez, W. J., Hossain, I., Lazic, S. E., Davies, J. W. & Zhang, X. A multi-scale convolutional neural network for phenotyping high-content cellular images. Bioinformatics 33, 2010–2019 (2017).
CAS PubMed Google Scholar
Olsen, T. et al. Diagnostic performance of deep learning algorithms applied to three common diagnoses in dermatopathology. J. Pathol. Inform. 9, 32–32 (2018).
PubMed PubMed Central Google Scholar
Srivastava, N., Hinton, G., Krizhevsky, A., Sutskever, I. & Salakhutdinov, R. Dropout: a simple way to prevent neural networks from overfitting. J. Mach. Learn. Res. 15, 1929–1958 (2014).
Google Scholar
Jiao, Y. & Pufeng, D. Performance measures in evaluating machine learning based bioinformatics predictors for classifications. Quant. Biol. 4, 320 (2016).
Google Scholar
Czodrowski, P. Count on kappa. J. Comput. Aided Mol. Des. 28, 1049–1055 (2014).
CAS PubMed Google Scholar
Rifaioglu, A. S. et al. Recent applications of deep learning and machine intelligence on in silico drug discovery: methods, tools and databases. Brief. Bioinform. https://doi.org/10.1093/bib/bby061 (2018).
Article PubMed Central Google Scholar
Hinton, G. E. & Salakhutdinov, R. R. Reducing the dimensionality of data with neural networks. Science 313, 504 (2006).
CAS PubMed Google Scholar
Koscielny, G. et al. Open targets: a platform for therapeutic target identification and validation. Nucleic Acids Res. 45, D985–D994 (2017).
CAS PubMed Google Scholar
Costa, P. R., Acencio, M. L. & Lemke, N. A machine learning approach for genome-wide prediction of morbid and druggable human genes based on systems-level data. BMC Genomics 11, S9–S9 (2010).
PubMed PubMed Central Google Scholar
Ament, S. A. et al. Transcriptional regulatory networks underlying gene expression changes in Huntington’s disease. Mol. Systems Biol. 14, e7435 (2018).
Google Scholar
Bravo, A., Pinero, J., Queralt-Rosinach, N., Rautschka, M. & Furlong, L. I. Extraction of relations between genes and diseases from text and large-scale data analysis: implications for translational research. BMC Bioinformatics 16, 55 (2015).
PubMed PubMed Central Google Scholar
Kim, J., Kim, J.-j. & Lee, H. An analysis of disease-gene relationship from Medline abstracts by DigSee. Sci. Rep. 7, 40154 (2017).
CAS PubMed PubMed Central Google Scholar
Leung, M. K. K., Xiong, H. Y., Lee, L. J. & Frey, B. J. Deep learning of the tissue-regulated splicing code. Bioinformatics 30, i121–i129 (2014).
CAS PubMed PubMed Central Google Scholar
Jha, A., Gazzara, M. R. & Barash, Y. Integrative deep models for alternative splicing. Bioinformatics 33, i274–i282 (2017).
CAS PubMed PubMed Central Google Scholar
Vaquero-Garcia, J. et al. A new view of transcriptome complexity and regulation through the lens of local splicing variations. eLife 5, e11752 (2016).
PubMed PubMed Central Google Scholar
Sotillo, E. et al. Convergence of acquired mutations and alternative splicing of CD19 enables resistance to CART-19 immunotherapy. Cancer Discov. 5, 1282–1295 (2015).
CAS PubMed PubMed Central Google Scholar
Rohacek, A. M. et al. ESRP1 mutations cause hearing loss due to defects in alternative splicing that disrupt cochlear development. Dev. Cell 43, 318–331 (2017).
CAS PubMed PubMed Central Google Scholar
Xiong, H. Y. et al. RNA splicing. The human splicing code reveals new insights into the genetic determinants of disease. Science 347, 1254806 (2015). This article describes a computational model based on DL that predicts splicing regulation for any mRNA sequence and has been applied to more than half a million human mRNA splicing sequence variants. Thousands of known disease-causing mutations are identified as well as new disease-linked genes.
PubMed Google Scholar
Iorio, F. et al. A landscape of pharmacogenomic interactions in cancer. Cell 166, 740–754 (2016). This paper applies ML to data from somatic mutations, copy number alterations, DNA methylation and gene expression from 1,000 cancer cell lines to model drug response of the cell lines and demonstrates the importance of genomic features for prediction.
CAS PubMed PubMed Central Google Scholar
Tsherniak, A. et al. Defining a cancer dependency map. Cell 170, 564–576 (2017).
CAS PubMed PubMed Central Google Scholar
McMillan, E. A. et al. Chemistry-first approach for nomination of personalized treatment in lung cancer. Cell 173, 864–878 (2018).
CAS PubMed PubMed Central Google Scholar
Al-Lazikani, B. et al. in Bioinformatics — From Genomes to Therapies Ch. 36 (Wiley-VCH, 2008).
Nayal, M. & Honig, B. On the nature of cavities on protein surfaces: application to the identification of drug-binding sites. Proteins 63, 892–906 (2006). This article describes a classifier to identify drug-binding cavities on the basis of physicochemical, structural and geometric attributes of proteins.
CAS PubMed Google Scholar
Li, Q. & Lai, L. Prediction of potential drug targets based on simple sequence properties. BMC Bioinformatics 8, 353 (2007).
PubMed PubMed Central Google Scholar
Bakheet, T. M. & Doig, A. J. Properties and identification of human protein drug targets. Bioinformatics 25, 451–457 (2009).
CAS PubMed Google Scholar
Wang, Q., Feng, Y., Huang, J., Wang, T. & Cheng, G. A novel framework for the identification of drug target proteins: combining stacked auto-encoders with a biased support vector machine. PLOS ONE 12, e0176486 (2017).
PubMed PubMed Central Google Scholar
Kandoi, G., Acencio, M. L. & Lemke, N. Prediction of druggable proteins using machine learning and systems biology: a mini-review. Front. Physiol. 6, 366–366 (2015).
PubMed PubMed Central Google Scholar
Nelson, M. R. et al. The support of human genetic evidence for approved drug indications. Nat. Genet. 47, 856–860 (2015).
CAS PubMed Google Scholar
Morgan, P. et al. Impact of a five-dimensional framework on R&D productivity at AstraZeneca. Nat. Rev. Drug Discov. 17, 167–181 (2018).
CAS PubMed Google Scholar
Rouillard, A. D., Hurle, M. R. & Agarwal, P. Systematic interrogation of diverse Omic data reveals interpretable, robust, and generalizable transcriptomic features of clinically successful therapeutic targets. PLOS Comput. Biol. 14, e1006142 (2018).
PubMed PubMed Central Google Scholar
Kumar, V., Sanseau, P., Simola, D. F., Hurle, M. R. & Agarwal, P. Systematic analysis of drug targets confirms expression in disease-relevant tissues. Sci. Rep. 6, 36205 (2016).
CAS PubMed PubMed Central Google Scholar
Ramsundar, B. et al. Is multitask deep learning practical for pharma? J. Chem. Inf. Model. 57, 2068–2076 (2017).
CAS PubMed Google Scholar
Ma, J., Sheridan, R. P., Liaw, A., Dahl, G. E. & Svetnik, V. Deep neural nets as a method for quantitative structure–activity relationships. J. Chem. Inf. Model. 55, 263–274 (2015).
CAS PubMed Google Scholar
Barati Farimani, A., Feinberg, E. & Pande, V. Binding pathway of opiates to μ-opioid receptors revealed by machine learning. Biophys. J. 114, 62a–63a (2018).
Google Scholar
Wu, Z. et al. MoleculeNet: a benchmark for molecular machine learning. Chem. Sci. 9, 513–530 (2018).
CAS PubMed Google Scholar
Segler, M. H. S., Preuss, M. & Waller, M. P. Planning chemical syntheses with deep neural networks and symbolic AI. Nature 555, 604 (2018). This seminal paper describes a very thorough approach to retrosynthetic analysis. The authors show that their method can compete with retrosynthesis done by experienced chemists who are experts in this field.
CAS PubMed Google Scholar
Olivecrona, M., Blaschke, T., Engkvist, O. & Chen, H. Molecular de-novo design through deep reinforcement learning. J. Cheminform. 9, 48 (2017).
PubMed PubMed Central Google Scholar
Kadurin, A., Nikolenko, S., Khrabrov, K., Aliper, A. & Zhavoronkov, A. druGAN: an advanced generative adversarial autoencoder model for de novo generation of new molecules with desired molecular properties in silico. Mol. Pharm. 14, 3098–3104 (2017).
CAS PubMed Google Scholar
Smith, J. S., Roitberg, A. E. & Isayev, O. Transforming computational drug discovery with machine learning and AI. ACS Med. Chem. Lett. 9, 1065–1069 (2018).
CAS PubMed PubMed Central Google Scholar
Lenselink, E. B. et al. Beyond the hype: deep neural networks outperform established methods using a ChEMBL bioactivity benchmark set. J. Cheminform. 9, 45 (2017).
PubMed PubMed Central Google Scholar
Gaulton, A. et al. The ChEMBL database in 2017. Nucleic Acids Res. 45, D945–D954 (2017).
CAS PubMed Google Scholar
Ramsundar, B. et al. Massively multitask networks for drug discovery. Preprint at arXiv https://arxiv.org/abs/1502.02072 (2015).
Gutlein, M. & Kramer, S. Filtered circular fingerprints improve either prediction or runtime performance while retaining interpretability. J. Cheminform. 8, 60 (2016).
PubMed PubMed Central Google Scholar
Mayr, A. et al. Large-scale comparison of machine learning methods for drug target prediction on ChEMBL. Chem. Sci. 9, 5441–5451 (2018). This research paper describes the methodology being used by the winners of almost all categories of the Tox21 Challenge.
CAS PubMed PubMed Central Google Scholar
Keiser, M. J. et al. Relating protein pharmacology by ligand chemistry. Nat. Biotechnol. 25, 197 (2007).
CAS PubMed Google Scholar
Preuer, K., Renz, P., Unterthiner, T., Hochreiter, S. & Klambauer, G. Fréchet ChemNet Distance: a metric for generative models for molecules in drug discovery. J. Chem. Inf. Model. 58, 1736–1741 (2018).
CAS PubMed Google Scholar
Unterthiner, T., Mayr, A., Klambauer, G. & Hochreiter, S. Toxicity prediction using deep learning. Preprint at arXiv https://arxiv.org/abs/1503.01445 (2015).
Li, B. et al. Development of a drug-response modeling framework to identify cell line derived translational biomarkers that can predict treatment outcome to erlotinib or sorafenib. PLOS ONE 10, e0130700 (2015). In this paper, a translational predictive biomarker is used to demonstrate that predictive models can be generated from preclinical training data sets and then be applied to clinical patient samples to stratify patients, infer the mechanism of action of a drug and select appropriate disease indications.
PubMed PubMed Central Google Scholar
van Gool, A. J. et al. Bridging the translational innovation gap through good biomarker practice. Nat. Rev. Drug Discov. 16, 587–588 (2017).
PubMed Google Scholar
Kraus, V. B. Biomarkers as drug development tools: discovery, validation, qualification and use. Nat. Rev. Rheumatol. 14, 354–362 (2018).
CAS PubMed Google Scholar
Shi, L. et al. The MicroArray Quality Control (MAQC)-II study of common practices for the development and validation of microarray-based predictive models. Nat. Biotechnol. 28, 827–838 (2010).
CAS PubMed Google Scholar
Zhan, F. et al. The molecular classification of multiple myeloma. Blood 108, 2020–2028 (2006).
CAS PubMed PubMed Central Google Scholar
Shaughnessy, J. D. Jr. et al. A validated gene expression model of high-risk multiple myeloma is defined by deregulated expression of genes mapping to chromosome 1. Blood 109, 2276–2284 (2007).
CAS PubMed Google Scholar
Zhan, F., Barlogie, B., Mulligan, G., Shaughnessy, J. D. Jr & Bryant, B. High-risk myeloma: a gene expression based risk-stratification model for newly diagnosed multiple myeloma treated with high-dose therapy is predictive of outcome in relapsed disease treated with single-agent bortezomib or high-dose dexamethasone. Blood 111, 968–969 (2008).
CAS PubMed PubMed Central Google Scholar
Decaux, O. et al. Prediction of survival in multiple myeloma based on gene expression profiles reveals cell cycle and chromosomal instability signatures in high-risk patients and hyperdiploid signatures in low-risk patients: a study of the Intergroupe Francophone du Myelome. J. Clin. Oncol. 26, 4798–4805 (2008).
CAS PubMed Google Scholar
Mulligan, G. et al. Gene expression profiling and correlation with outcome in clinical trials of the proteasome inhibitor bortezomib. Blood 109, 3177–3188 (2007).
CAS PubMed Google Scholar
Costello, J. C. et al. A community effort to assess and improve drug sensitivity prediction algorithms. Nat. Biotechnol. 32, 1202–1212 (2014). This paper is an effort to collect and objectively evaluate various ML approaches by teams around the world on multi-omics data sets and various compounds. The data sets and results are continuously used as benchmarks for new method developments and validation.
CAS PubMed PubMed Central Google Scholar
Rahman, R., Otridge, J. & Pal, R. IntegratedMRF: random forest-based framework for integrating prediction from different data types. Bioinformatics 33, 1407–1410 (2017).
CAS PubMed PubMed Central Google Scholar
Bunte, K., Leppäaho, E., Saarinen, I. & Kaski, S. Sparse group factor analysis for biclustering of multiple data sources. Bioinformatics 32, 2457–2463 (2016).
CAS PubMed Google Scholar
Huang, C., Mezencev, R., McDonald, J. F. & Vannberg, F. Open source machine-learning algorithms for the prediction of optimal cancer drug therapies. PLOS ONE 12, e0186906 (2017).
PubMed PubMed Central Google Scholar
Hejase, H. A. & Chan, C. Improving drug sensitivity prediction using different types of data. CPT Pharmacometrics Syst. Pharmacol. 4, e2 (2015).
CAS PubMed PubMed Central Google Scholar
Kim, E. S. et al. The BATTLE trial: personalizing therapy for lung cancer. Cancer Discov. 1, 44–53 (2011).
CAS PubMed PubMed Central Google Scholar
Boyiadzis, M. M. et al. Significance and implications of FDA approval of pembrolizumab for biomarker-defined disease. J. Immunother. Cancer 6, 35 (2018).
PubMed PubMed Central Google Scholar
Tasaki, S. et al. Multi-omics monitoring of drug response in rheumatoid arthritis in pursuit of molecular remission. Nat. Commun. 9, 2755 (2018). This work identifies molecular signatures that are resistant to drug treatments and illustrates a multi-omics approach to understanding drug response.
PubMed PubMed Central Google Scholar
Paré, G., Mao, S. & Deng, W. Q. A machine-learning heuristic to improve gene score prediction of polygenic traits. Sci. Rep. 7, 12665 (2017).
PubMed PubMed Central Google Scholar
Khera, A. V. et al. Genome-wide polygenic scores for common diseases identify individuals with risk equivalent to monogenic mutations. Nat. Genet. 50, 1219–1224 (2018).
CAS PubMed PubMed Central Google Scholar
Ding, J., Condon, A. & Shah, S. P. Interpretable dimensionality reduction of single cell transcriptome data with deep generative models. Nat. Commun. 9, 2002 (2018).
PubMed PubMed Central Google Scholar
Rashid, S., Shah, S., Bar-Joseph, Z. & Pandya, R. Project Dhaka: variational autoencoder for unmasking tumor heterogeneity from single cell genomic data. Preprint at bioRxiv https://www.biorxiv.org/content/10.1101/183863v4 (2018).
Wang, D. & Gu, J. VASC: dimension reduction and visualization of single-cell RNA-seq data by deep variational autoencoder. Genomics Proteomics Bioinformatics 16, 320–331 (2017).
Google Scholar
Pierson, E. & Yau, C. ZIFA: dimensionality reduction for zero-inflated single-cell gene expression analysis. Genome Biol. 16, 241 (2015).
PubMed PubMed Central Google Scholar
Wang, B., Zhu, J., Pierson, E., Ramazzotti, D. & Batzoglou, S. Visualization and analysis of single-cell RNA-seq data by kernel-based similarity learning. Nat. Methods 14, 414 (2017).
CAS PubMed Google Scholar
Tan, J., Hammond, J. H., Hogan, D. A. & Greene, C. A.-O. ADAGE-based integration of publicly available Pseudomonas aeruginosa gene expression data with denoising autoencoders illuminates microbe-host interactions. mSystems 1, e00025–15 (2016).
PubMed PubMed Central Google Scholar
Way, G. P. & Greene, C. S. Extracting a biologically relevant latent space from cancer transcriptomes with variational autoencoders. Pac. Symp. Biocomput. 23, 80–91 (2018).
PubMed PubMed Central Google Scholar
Casanova, R. et al. Morphoproteomic characterization of lung squamous cell carcinoma fragmentation, a histological marker of increased tumor invasiveness. Cancer Res. 77, 2585–2593 (2017).
CAS PubMed Google Scholar
Nirschl, J. J. et al. A deep-learning classifier identifies patients with clinical heart failure using whole-slide images of H&E tissue. PLOS ONE 13, e0192726 (2018).
PubMed PubMed Central Google Scholar
Angermueller, C., Pärnamaa, T., Parts, L. & Stegle, O. Deep learning for computational biology. Mol. Syst. Biol. 12, 878 (2016).
PubMed PubMed Central Google Scholar
Finnegan, A. & Song, J. S. Maximum entropy methods for extracting the learned features of deep neural networks. PLOS Comput. Biol. 13, e1005836 (2017).
PubMed PubMed Central Google Scholar
Hutson, M. Artificial intelligence faces reproducibility crisis. Science 359, 725–726 (2018).
PubMed Google Scholar
Veltri, R. W., Partin, A. W. & Miller, M. C. Quantitative nuclear grade (QNG): a new image analysis-based biomarker of clinically relevant nuclear structure alterations. J. Cell. Biochem. Suppl. 35, S151–S157 (2000).
Google Scholar
Beck, A. H. et al. Systematic analysis of breast cancer morphology uncovers stromal features associated with survival. Sci. Transl Med. 3, 108ra113 (2011).
PubMed Google Scholar
Lee, G. et al. Nuclear shape and architecture in benign fields predict biochemical recurrence in prostate cancer patients following radical prostatectomy: preliminary findings. Eur. Urol. Focus 3, 457–466 (2017).
PubMed Google Scholar
Lu, C. et al. An oral cavity squamous cell carcinoma quantitative histomorphometric-based image classifier of nuclear morphology can risk stratify patients for disease-specific survival. Mod. Pathol. 30, 1655–1665 (2017).
PubMed PubMed Central Google Scholar
Lu, C. et al. Nuclear shape and orientation features from H&E images predict survival in early-stage estrogen receptor-positive breast cancers. Lab. Invest. 98, 1438–1448 (2018).
CAS PubMed PubMed Central Google Scholar
Mani, N. L. et al. Quantitative assessment of the spatial heterogeneity of tumor-infiltrating lymphocytes in breast cancer. Breast Cancer Res. 18, 78 (2016).
PubMed PubMed Central Google Scholar
Giraldo, N. A. et al. The differential association of PD-1, PD-L1, and CD8 + cells with response to pembrolizumab and presence of Merkel cell polyomavirus (MCPyV) in patients with Merkel cell carcinoma (MCC). Cancer Res. 77, 662 (2017).
Google Scholar
Janowczyk, A. & Madabhushi, A. Deep learning for digital pathology image analysis: a comprehensive tutorial with selected use cases. J. Pathol. Informat. 7, 29 (2016). This article is the first comprehensive review of DL in the context of digital pathology images. The paper also systematically explains and presents approaches for training and validating DL classifiers for a number of image-based problems in digital pathology, including cell detection, segmentation and tissue classification.
Google Scholar
Sharma, H., Zerbe, N., Klempert, I., Hellwich, O. & Hufnagl, P. Deep convolutional neural networks for automatic classification of gastric carcinoma using whole slide images in digital histopathology. Comput. Med. Imaging Graph. 61, 2–13 (2017).
PubMed Google Scholar
Korbar, B. et al. Deep learning for classification of colorectal polyps on whole-slide images. J. Pathol. Informat. 8, 30 (2017).
Google Scholar
Bychkov, D. et al. Deep learning based tissue analysis predicts outcome in colorectal cancer. Sci. Rep. 8, 3395 (2018).
PubMed PubMed Central Google Scholar
Cruz-Roa, A. et al. Accurate and reproducible invasive breast cancer detection in whole-slide images: A Deep Learning approach for quantifying tumor extent. Sci. Rep. 7, 46450 (2017). This is one of the first papers to apply DL to identify regions of breast cancer on digital pathology images and shows that the algorithmic approach outperforms breast cancer pathologists. It is one of the first studies to have a large data set of cases (>600) with independent training and validation sets.
CAS PubMed PubMed Central Google Scholar
Romo-Bucheli, D., Janowczyk, A., Gilmore, H., Romero, E. & Madabhushi, A. Automated tubule nuclei quantification and correlation with oncotype DX risk categories in ER + breast cancer whole slide images. Sci. Rep. 6, 32706 (2016). This article applies DL to identify the presence and location of tubules in breast pathology images and subsequently demonstrates that the number of detected tubules correlates with the risk assessments of breast cancer via a genomic test. It is one of the first papers to show how DL can be used to establish genotype–phenotype associations.
CAS PubMed PubMed Central Google Scholar
Romo-Bucheli, D., Janowczyk, A., Gilmore, H., Romero, E. & Madabhushi, A. A deep learning based strategy for identifying and associating mitotic activity with gene expression derived risk categories in estrogen receptor positive breast cancers. Cytometry A 91, 566–573 (2017).
CAS PubMed PubMed Central Google Scholar
Saltz, J. et al. Spatial organization and molecular correlation of tumor-infiltrating lymphocytes using deep learning on pathology images. Cell Rep. 23, 181–193 (2018). This large-scale study utilizes DL to identify lymphocytes across all images and relate spatial characteristics of lymphocytes to molecular assessments. This article is key to the automatic quantification of immune cells from H&E slides and the identification of sub-categories of immune infiltrate as related to therapeutic outcome.
CAS PubMed PubMed Central Google Scholar
Corredor, G. et al. Spatial architecture and arrangement of tumor-infiltrating lymphocytes for predicting likelihood of recurrence in early-stage non-small cell lung cancer. Clin. Cancer Res. 25, 1526–1534 (2018). In this paper, the spatial arrangement, and not just the density, of tumour-infiltrating lymphocytes in early-stage lung cancer pathology images is shown to be prognostic of recurrence. A comprehensive comparison is provided, showing that computer-extracted features of spatial arrangement of tumour-infiltrating lymphocytes are more prognostic than manual (pathologist) enumeration of tumour-infiltrating lymphocyte density.
PubMed PubMed Central Google Scholar
Cohen, O., Zhu, B. & Rosen, M. S. MR fingerprinting Deep RecOnstruction NEtwork (DRONE). Magn. Reson. Med. 80, 885–894 (2018).
PubMed PubMed Central Google Scholar
Chen, H. et al. Low-dose CT with a residual encoder-decoder convolutional neural network (RED-CNN). Preprint at arXiv https://arxiv.org/abs/1702.00288 (2017).
Coudray, N. et al. Classification and mutation prediction from non–small cell lung cancer histopathology images using deep learning. Nat. Med. 24, 1559–1567 (2018). This paper uses DL frameworks to predict mutations from H&E images, which has implications for identifying key mechanistic insights from standard whole-slide imaging as well as for patient stratification.
CAS PubMed Google Scholar
Turkki, R., Linder, N., Kovanen, P. E., Pellinen, T. & Lundin, J. Antibody-supervised deep learning for quantification of tumor-infiltrating immune cells in hematoxylin and eosin stained breast cancer samples. J. Pathol. Inform. 7, 38 (2016).
PubMed PubMed Central Google Scholar
Norgeot, B., Glicksberg, B. S. & Butte, A. J. A call for deep-learning healthcare. Nat. Med. 25, 14–15 (2019).
CAS PubMed Google Scholar
Esteva, A. et al. A guide to deep learning in healthcare. Nat. Med. 25, 24–29 (2019).
CAS PubMed Google Scholar
Yang, Z. et al. Clinical assistant diagnosis for electronic medical record based on convolutional neural network. Sci. Rep. 8, 6329 (2018).
PubMed PubMed Central Google Scholar
Steele, A. J., Denaxas, S. C., Shah, A. D., Hemingway, H. & Luscombe, N. M. Machine learning models in electronic health records can outperform conventional survival models for predicting patient mortality in coronary artery disease. PLOS ONE 13, e0202344 (2018).
PubMed PubMed Central Google Scholar
Mohr, D. C., Zhang, M. & Schueller, S. M. Personal sensing: understanding mental health using ubiquitous sensors and machine learning. Annu. Rev. Clin. Psychol. 13, 23–47 (2017).
PubMed PubMed Central Google Scholar
Gkotsis, G. et al. Characterisation of mental health conditions in social media using Informed Deep Learning. Sci. Rep. 7, 45141 (2017).
CAS PubMed PubMed Central Google Scholar
Koscielny, S. Why most gene expression signatures of tumors have not been useful in the clinic. Sci. Transl Med. 2, 14ps12 (2010).
Google Scholar
Odell, S. G., Lazo, G. R., Woodhouse, M. R., Hane, D. L. & Sen, T. Z. The art of curation at a biological database: principles and application. Curr. Plant Biol. 11–12, 2–11 (2017).
Google Scholar

Download references

Acknowledgements

The authors thank E. Birney and E. Papa for helpful comments, M. Segler for contributing to the small-molecule optimization subsection and A. Janowczyk for providing the pathology images in Figure 4.

Author information

Authors and Affiliations

European Molecular Biology Laboratory, European Bioinformatics Institute, Cambridge, UK
Jessica Vamathevan, Dominic Clark & Edgardo Ferran
Technical University of Dortmund, Dortmund, Germany
Paul Czodrowski
Open Targets and European Molecular Biology Laboratory, European Bioinformatics Institute, Cambridge, UK
Ian Dunham & Michaela Spitzer
Bristol-Myers Squibb, Princeton, NJ, USA
George Lee
Takeda Pharmaceuticals International Co., Cambridge, MA, USA
Bin Li
Case Western Reserve University, Cleveland, OH, USA
Anant Madabhushi
Louis Stokes Cleveland Veterans Affair Medical Center, Cleveland, OH, USA
Anant Madabhushi
EMD Serono R&D Institute, Billerica, MA, USA
Parantu Shah
Pfizer Worldwide Research and Development, Cambridge, MA, USA
Shanrong Zhao

Authors

Jessica Vamathevan
View author publications
You can also search for this author in PubMed Google Scholar
Dominic Clark
View author publications
You can also search for this author in PubMed Google Scholar
Paul Czodrowski
View author publications
You can also search for this author in PubMed Google Scholar
Ian Dunham
View author publications
You can also search for this author in PubMed Google Scholar
Edgardo Ferran
View author publications
You can also search for this author in PubMed Google Scholar
George Lee
View author publications
You can also search for this author in PubMed Google Scholar
Bin Li
View author publications
You can also search for this author in PubMed Google Scholar
Anant Madabhushi
View author publications
You can also search for this author in PubMed Google Scholar
Parantu Shah
View author publications
You can also search for this author in PubMed Google Scholar
Michaela Spitzer
View author publications
You can also search for this author in PubMed Google Scholar
Shanrong Zhao
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Jessica Vamathevan.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Glossary

Graphical processing units: (GPUs). Processors designed to accelerate the rendering of graphics and that can handle tens of thousands of operations per cycle.
Central processing units: (CPUs). Processors designed to solve every computational problem in a general fashion and that can handle tens of operations per cycle. The cache and memory are designed to be optimal for any general programming problem.
Tensor processing units: (TPUs). Co-processors manufactured by Google that are designed to accelerate deep learning tasks developed using TensorFlow (a programming framework) and can handle up to 128,000 operations per cycle.
Support vector machine (SVM) classifier: A method that performs classification tasks by constructing separating lines to distinguish between objects with different class memberships in a multi-dimensional space.
CLIP–seq: Ultraviolet crosslinking immunoprecipitation (CLIP) followed by RNA sequencing to identify all RNA species bound by a protein of interest. This method can be used to map RNA protein binding sites or RNA modification sites on a genome-wide scale.
Heuristic method: A function that calculates the approximate cost of a problem (or ranks alternatives).
Chemical fingerprint: A concept used in chemical informatics to compare molecules with each other. The structure of a molecule is encoded in a series of binary digits (bits) that represent the presence or absence of particular substructures in the molecule.
Simplified molecular input line entry system (SMILES): A line notation for entering and representing molecules and reactions; for example, carbon dioxide is represented as O = C = O.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Vamathevan, J., Clark, D., Czodrowski, P. et al. Applications of machine learning in drug discovery and development. Nat Rev Drug Discov 18, 463–477 (2019). https://doi.org/10.1038/s41573-019-0024-5

Download citation

Published: 11 April 2019
Issue Date: June 2019
DOI: https://doi.org/10.1038/s41573-019-0024-5

This article is cited by

In-line NIR coupled with machine learning to predict mechanical properties and dissolution profile of PLA-Aspirin
- Nimra Munir
- Tielidy de Lima
- Marion McAfee
Functional Composite Materials (2024)
Tribulations and future opportunities for artificial intelligence in precision medicine
- Claudio Carini
- Attila A. Seyhan
Journal of Translational Medicine (2024)
Democratizing cheminformatics: interpretable chemical grouping using an automated KNIME workflow
- José T. Moreira-Filho
- Dhruv Ranganath
- Kamel Mansouri
Journal of Cheminformatics (2024)
CPSign: conformal prediction for cheminformatics modeling
- Staffan Arvidsson McShane
- Ulf Norinder
- Ola Spjuth
Journal of Cheminformatics (2024)
Accelerated hit identification with target evaluation, deep learning and automated labs: prospective validation in IRAK1
- Gintautas Kamuntavičius
- Alvaro Prat
- Povilas Norvaišas
Journal of Cheminformatics (2024)

Applications of machine learning in drug discovery and development

Subjects

Abstract

Access options

Similar content being viewed by others

Machine learning in preclinical drug discovery

The future of machine learning for small-molecule drug discovery will be driven by data

Drug ranking using machine learning systematically predicts the efficacy of anti-cancer drugs

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Competing interests

Additional information

Publisher’s note

Related links

Glossary

Rights and permissions

About this article

Cite this article

This article is cited by

In-line NIR coupled with machine learning to predict mechanical properties and dissolution profile of PLA-Aspirin

Tribulations and future opportunities for artificial intelligence in precision medicine

Democratizing cheminformatics: interpretable chemical grouping using an automated KNIME workflow

CPSign: conformal prediction for cheminformatics modeling

Accelerated hit identification with target evaluation, deep learning and automated labs: prospective validation in IRAK1

Search

Quick links

Subjects

Abstract

Access options

Similar content being viewed by others

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Competing interests

Additional information

Publisher’s note

Related links

Glossary

Rights and permissions

About this article

Cite this article

Share this article

This article is cited by

Search

Quick links