Cheminformatics articles within Nature Communications

Featured

  • Article
    | Open Access

    Accurate molecular representation of compounds is a fundamental challenge for drug discovery. Here, the authors present a molecular video-based foundation model pretrained on 120 million frames of 2 million molecular videos, and apply it to molecular targets and properties prediction.

    • Hongxin Xiang
    • , Li Zeng
    •  & Feixiong Cheng
  • Article
    | Open Access

    Generative AI holds promise for creating novel compounds. Here, authors introduce TamGen, a GPT-like model designed to generate molecules tailored to specific target proteins. TamGen identified 14 potent compounds against the Tuberculosis ClpP protease, showing its potential for drug discovery.

    • Kehan Wu
    • , Yingce Xia
    •  & Tie-Yan Liu
  • Article
    | Open Access

    Chemical language models are deep learning models trained with molecules in string representation. They enable data-driven de novo design of molecules with tailored features. Here, the authors used chemical language models to design multi-target ligands.

    • Laura Isigkeit
    • , Tim Hörmann
    •  & Daniel Merk
  • Article
    | Open Access

    Predictive machine learning models, while powerful, are often seen as black boxes. Here, the authors introduce a thermodynamics-inspired approach for generating rationale behind their explanations across diverse domains based on the proposed concept of interpretation entropy.

    • Shams Mehdi
    •  & Pratyush Tiwary
  • Article
    | Open Access

    Enhancing retrosynthetic efficiency requires overcoming the vast complexity of chemical space, the limited known interconversions between molecules, and the challenges posed by limited experimental datasets. Here, the authors introduce generative machine learning methods for retrosynthetic planning that generate reaction templates.

    • Yu Shee
    • , Haote Li
    •  & Victor S. Batista
  • Article
    | Open Access

    Understanding molecular near neighbours is key for molecular optimization. Here, authors propose a transformer model that improves correlation between generation probability and molecular similarity, enhancing exploration of molecular neighbourhoods.

    • Alessandro Tibo
    • , Jiazhen He
    •  & Ola Engkvist
  • Article
    | Open Access

    The automatic analysis of patent documents has potential to accelerate research in chemistry. Here, the authors leverage advances in document understanding to introduce a dataset (PatCID) which allows searching for molecules displayed in documents.

    • Lucas Morin
    • , Valéry Weber
    •  & Peter W. J. Staar
  • Article
    | Open Access

    Retrosynthesis aims to identify synthesis solutions for compounds in drug discovery. Here, the authors frame it as a molecular string editing task and utilize an iterative string editing model to provide high-quality and diverse solutions.

    • Yuqiang Han
    • , Xiaoyang Xu
    •  & Huajun Chen
  • Article
    | Open Access

    Engineering stabilized proteins is essential for industrial and pharmaceutical biotechnologies. Here, authors present Stability Oracle, a Graph-Transformer framework trained on protein masked microenvironments to predict protein thermodynamic stability, using less training data while achieving improved generalization.

    • Daniel J. Diaz
    • , Chengyue Gong
    •  & Adam R. Klivans
  • Article
    | Open Access

    Artificial Intelligence (AI) is accelerating drug discovery. Here the authors introduce a new approach to de novo molecule design - structured state space sequence models - to further extend AI’s capabilities of charting the chemical universe.

    • Rıza Özçelik
    • , Sarah de Ruiter
    •  & Francesca Grisoni
  • Article
    | Open Access

    A limitation of robotic platforms in chemistry is the lack of feedback loops to adjust the conditions in-operando. Here the authors present a dynamically programmable robotic system that uses sensors for real-time adaptation, achieving yield improvements in syntheses and discovering new molecules.

    • Artem I. Leonov
    • , Alexander J. S. Hammer
    •  & Leroy Cronin
  • Article
    | Open Access

    Regioselectivity prediction for many reactions remains a challenging target for a priori prediction. Here, the authors develop a machine learning model that predicts the outcomes of Minisci reactions.

    • Emma King-Smith
    • , Felix A. Faber
    •  & Alpha A. Lee
  • Article
    | Open Access

    There is a need for dataset-dependent MS2 acquisition in trapped ion mobility spectrometry imaging. Here the authors report spatial ion mobility-scheduled exhaustive fragmentation (SIMSEF) which enables on-tissue metabolite and lipid annotation in mass spectrometry bioimaging studies, and use this to visualise the chemical space in rat brains.

    • Steffen Heuckeroth
    • , Arne Behrens
    •  & Robin Schmid
  • Article
    | Open Access

    Over their careers, medicinal chemists develop a gut feeling for what is a promising molecule. Here, the authors use machine learning models to learn this intuition and show that it can be successfully applied in several drug discovery scenarios.

    • Oh-Hyeon Choung
    • , Riccardo Vianello
    •  & José Jiménez-Luna
  • Comment
    | Open Access

    Machine learning is a powerful tool for the study and design of molecules. Here the authors comment a recent publication in Nature Communications which highlights the challenges of different molecular representations for data-driven property predictions.

    • Ana Laura Dias
    • , Latimah Bustillo
    •  & Tiago Rodrigues
  • Article
    | Open Access

    AI has become a crucial tool for drug discovery, but how to properly represent molecules for data-driven property prediction is still an open question. Here the authors evaluate 62,820 models to highlight existing challenges, the impact of activity cliffs, and the crucial role of dataset size.

    • Jianyuan Deng
    • , Zhibo Yang
    •  & Fusheng Wang
  • Article
    | Open Access

    Fatty acids are fundamental biomolecular building blocks that are characterized by extraordinary structural diversity and present a formidable analytical challenge. Here the authors introduce a discovery workflow for de novo identification that adds more than 100 fatty acids to the human lipidome.

    • Jan Philipp Menzel
    • , Reuben S. E. Young
    •  & Stephen J. Blanksby
  • Article
    | Open Access

    High-throughput experimentation is an increasingly important tool in reaction discovery, while there remains a need for software solutions to navigate data-rich experiments. Here the authors report phactor™, a software that facilitates the performance and analysis of high-throughput experimentation in a chemical laboratory.

    • Babak Mahjour
    • , Rui Zhang
    •  & Tim Cernak
  • Article
    | Open Access

    Accuracy loss and slow speed affect the identification of compounds through matching of mass spectra using a large-scale spectral library. Here the authors use Word2vec spectral embedding and hierarchical navigable small-world graph to improve accuracy and speed of spectral matching on their own million-scale in-silico library.

    • Qiong Yang
    • , Hongchao Ji
    •  & Zhimin Zhang
  • Article
    | Open Access

    Deoxycytidine kinase is the rate-limiting enzyme of the salvage pathway and it has recently emerged as a target for antiproliferative therapies for cancers where it is essential. Here, the authors develop a potent inhibitor applying an iterative multidisciplinary approach, which relies on computational design coupled with experimental evaluations.

    • Magali Saez-Ayala
    • , Laurent Hoffer
    •  & Xavier Morelli
  • Article
    | Open Access

    Rare-earth and actinide complexes are critical for a wealth of clean-energy applications but Three dimensional (3D) structural generation and prediction for these organometallic systems remains challenging. Here, the authors propose a high-throughput in-silico synthesis code for s-, p-, d-, and f-block mononuclear organometallic complexes.

    • Michael G. Taylor
    • , Daniel J. Burrill
    •  & Ping Yang
  • Article
    | Open Access

    Experimental assays are used to determine if compounds cause a desired activity in cells. Here the authors demonstrate that computational methods can predict compound bioactivity given their chemical structure, imaging and gene expression data from historic screening libraries.

    • Nikita Moshkov
    • , Tim Becker
    •  & Juan C. Caicedo
  • Article
    | Open Access

    The identification of synthetic routes combining enzymatic and non-enzymatic reactions has been challenging and requiring expert knowledge. Here, the authors describe a computational retrosynthetic approach relying on neural network models for planning synthetic routes using both strategies.

    • Itai Levin
    • , Mengjie Liu
    •  & Connor W. Coley
  • Article
    | Open Access

    Water-in-salt electrolytes can be useful for future electrochemical energy storage systems. Here, the authors investigate the potential-dependent double-layer structures at the interface between a gold electrode and a highly concentrated aqueous electrolyte solution via in situ Raman measurements.

    • Chao-Yu Li
    • , Ming Chen
    •  & Tianquan Lian
  • Article
    | Open Access

    Accurate forecasts of lithium-ion battery performance will ease concerns about the reliability of electric vehicles. Here, the authors leverage electrochemical impedance spectroscopy and machine learning to show that future capacity can be predicted amid uneven use, with no historical data requirement.

    • Penelope K. Jones
    • , Ulrich Stimming
    •  & Alpha A. Lee
  • Article
    | Open Access

    Generative models for the novo molecular design attract enormous interest for exploring the chemical space. Here the authors investigate the application of chemical language models to challenging modeling tasks demonstrating their capability of learning complex molecular distributions.

    • Daniel Flam-Shepherd
    • , Kevin Zhu
    •  & Alán Aspuru-Guzik
  • Article
    | Open Access

    Interrelating metabolites by their fragmentation spectra is central to metabolomics. Here the authors align fragmentation spectra with both statistical significance and allowance for multiple chemical differences using Significant Interrelation of MS/MS Ions via Laplacian Embedding (SIMILE).

    • Daniel G. C. Treen
    • , Mingxun Wang
    •  & Benjamin P. Bowen
  • Comment
    | Open Access

    Achieving autonomous multi-step synthesis of novel molecular structures in chemical discovery processes is a goal shared by many researchers. In this Comment, we discuss key considerations of what an ideal platform may look like and the apparent state of the art. While most hardware challenges can be overcome with clever engineering, other challenges will require advances in both algorithms and data curation.

    • Wenhao Gao
    • , Priyanka Raghavan
    •  & Connor W. Coley
  • Article
    | Open Access

    As of now, only rule-based systems support retrosynthetic planning using biocatalysis, while initial data-driven approaches are limited to forward predictions. Here, the authors extend the data-driven forward reaction as well as retrosynthetic pathway prediction models based on the Molecular Transformer architecture to biocatalysis.

    • Daniel Probst
    • , Matteo Manica
    •  & Teodoro Laino
  • Article
    | Open Access

    COVID-19 has exposed the fragility of supply chains, particularly for goods that are essential or may suddenly become essential, such as repurposed pharmaceuticals. Here the authors develop a methodology to provide routes to pharmaceutical targets that allow low-supply starting materials or intermediates to be avoided, with representative pathways validated experimentally.

    • Yingfu Lin
    • , Zirong Zhang
    •  & Tim Cernak