Article
|
Open Access
Featured
-
-
Article
| Open AccessTamGen: drug design with target-aware molecule generation through a chemical language model
Generative AI holds promise for creating novel compounds. Here, authors introduce TamGen, a GPT-like model designed to generate molecules tailored to specific target proteins. TamGen identified 14 potent compounds against the Tuberculosis ClpP protease, showing its potential for drug discovery.
- Kehan Wu
- , Yingce Xia
- & Tie-Yan Liu
-
Article
| Open AccessDeep learning prediction of electrospray ionization tandem mass spectra of chemically derived molecules
Chemical derivatization is widely used, but the lack of reference spectra of chemically derived molecules (CDMs) hinders their identification. Here, the authors describe a deep learning approach enabling accurate prediction of ESIMS/MS spectra for CDMs.
- Bin Chen
- , Hailiang Li
- & Feng Li
-
Article
| Open AccessAutomated design of multi-target ligands by generative deep learning
Chemical language models are deep learning models trained with molecules in string representation. They enable data-driven de novo design of molecules with tailored features. Here, the authors used chemical language models to design multi-target ligands.
- Laura Isigkeit
- , Tim Hörmann
- & Daniel Merk
-
Article
| Open AccessThermodynamics-inspired explanations of artificial intelligence
Predictive machine learning models, while powerful, are often seen as black boxes. Here, the authors introduce a thermodynamics-inspired approach for generating rationale behind their explanations across diverse domains based on the proposed concept of interpretation entropy.
- Shams Mehdi
- & Pratyush Tiwary
-
Article
| Open AccessSite-specific template generative approach for retrosynthetic planning
Enhancing retrosynthetic efficiency requires overcoming the vast complexity of chemical space, the limited known interconversions between molecules, and the challenges posed by limited experimental datasets. Here, the authors introduce generative machine learning methods for retrosynthetic planning that generate reaction templates.
- Yu Shee
- , Haote Li
- & Victor S. Batista
-
Article
| Open AccessExhaustive local chemical space exploration using a transformer model
Understanding molecular near neighbours is key for molecular optimization. Here, authors propose a transformer model that improves correlation between generation probability and molecular similarity, enhancing exploration of molecular neighbourhoods.
- Alessandro Tibo
- , Jiazhen He
- & Ola Engkvist
-
Article
| Open AccessPatCID: an open-access dataset of chemical structures in patent documents
The automatic analysis of patent documents has potential to accelerate research in chemistry. Here, the authors leverage advances in document understanding to introduce a dataset (PatCID) which allows searching for molecules displayed in documents.
- Lucas Morin
- , Valéry Weber
- & Peter W. J. Staar
-
Article
| Open AccessRetrosynthesis prediction with an iterative string editing model
Retrosynthesis aims to identify synthesis solutions for compounds in drug discovery. Here, the authors frame it as a molecular string editing task and utilize an iterative string editing model to provide high-quality and diverse solutions.
- Yuqiang Han
- , Xiaoyang Xu
- & Huajun Chen
-
Article
| Open AccessStability Oracle: a structure-based graph-transformer framework for identifying stabilizing mutations
Engineering stabilized proteins is essential for industrial and pharmaceutical biotechnologies. Here, authors present Stability Oracle, a Graph-Transformer framework trained on protein masked microenvironments to predict protein thermodynamic stability, using less training data while achieving improved generalization.
- Daniel J. Diaz
- , Chengyue Gong
- & Adam R. Klivans
-
Article
| Open AccessChemical language modeling with structured state space sequence models
Artificial Intelligence (AI) is accelerating drug discovery. Here the authors introduce a new approach to de novo molecule design - structured state space sequence models - to further extend AI’s capabilities of charting the chemical universe.
- Rıza Özçelik
- , Sarah de Ruiter
- & Francesca Grisoni
-
Article
| Open AccessIn silico fragment-based discovery of CIB1-directed anti-tumor agents by FRASE-bot
Newly identified therapeutic targets are often hard-to-drug proteins. Here, the authors introduce FRASE-based hit-finding robot (FRASE-bot), to expedite drug discovery for unconventional therapeutic targets.
- Yi An
- , Jiwoong Lim
- & Dmitri Kireev
-
Article
| Open AccessBidirectional generation of structure and properties through a single molecular foundation model
Multimodal pre-training approaches on the molecule domain were limited. Here, authors propose a multimodal molecular pre-trained model including molecular structure and biochemical properties and apply it to downstream tasks related with both molecule structure and properties.
- Jinho Chang
- & Jong Chul Ye
-
Article
| Open AccessPrecise atom-to-atom mapping for organic reactions via human-in-the-loop machine learning
Precise atom mapping is crucial for data-driven reaction prediction, but currently lacks the required accuracy. Here, authors introduce a human-in-the-loop machine learning scheme for that purpose, and achieve high accuracy on a wide spectrum of reaction datasets.
- Shuan Chen
- , Sunggi An
- & Yousung Jung
-
Article
| Open AccessDifficulty in chirality recognition for Transformer architectures learning chemical structures from string representations
There has been limited research on how NLP models comprehend diverse chemical structures despite its popularity. Here, the authors examine the learning process of Transformer for chemical structures and show inherent issues for chirality recognition.
- Yasuhiro Yoshikai
- , Tadahaya Mizuno
- & Hiroyuki Kusuhara
-
Article
| Open AccessAn integrated self-optimizing programmable chemical synthesis and reaction engine
A limitation of robotic platforms in chemistry is the lack of feedback loops to adjust the conditions in-operando. Here the authors present a dynamically programmable robotic system that uses sensors for real-time adaptation, achieving yield improvements in syntheses and discovering new molecules.
- Artem I. Leonov
- , Alexander J. S. Hammer
- & Leroy Cronin
-
Article
| Open AccessSQM2.20: Semiempirical quantum-mechanical scoring function yields DFT-quality protein–ligand binding affinity predictions in minutes
The paper presents the universal QM-based scoring function that accurately and rapidly predicts protein-ligand binding affinities, outperforming current computational tools. This is demonstrated on the PL-REX experimental benchmark dataset.
- Adam Pecina
- , Jindřich Fanfrlík
- & Jan Řezáč
-
Article
| Open AccessPredictive Minisci late stage functionalization with transfer learning
Regioselectivity prediction for many reactions remains a challenging target for a priori prediction. Here, the authors develop a machine learning model that predicts the outcomes of Minisci reactions.
- Emma King-Smith
- , Felix A. Faber
- & Alpha A. Lee
-
Article
| Open AccessOn-tissue dataset-dependent MALDI-TIMS-MS2 bioimaging
There is a need for dataset-dependent MS2 acquisition in trapped ion mobility spectrometry imaging. Here the authors report spatial ion mobility-scheduled exhaustive fragmentation (SIMSEF) which enables on-tissue metabolite and lipid annotation in mass spectrometry bioimaging studies, and use this to visualise the chemical space in rat brains.
- Steffen Heuckeroth
- , Arne Behrens
- & Robin Schmid
-
Article
| Open AccessDeveloping a class of dual atom materials for multifunctional catalytic reactions
This work developed a class of dual atom materials that can act as efficient and stable catalysts for multifunctional catalytic reactions in an uninterrupted water splitting system.
- Xingkun Wang
- , Liangliang Xu
- & Minghua Huang
-
Article
| Open AccessExtracting medicinal chemistry intuition via preference machine learning
Over their careers, medicinal chemists develop a gut feeling for what is a promising molecule. Here, the authors use machine learning models to learn this intuition and show that it can be successfully applied in several drug discovery scenarios.
- Oh-Hyeon Choung
- , Riccardo Vianello
- & José Jiménez-Luna
-
Comment
| Open AccessLimitations of representation learning in small molecule property prediction
Machine learning is a powerful tool for the study and design of molecules. Here the authors comment a recent publication in Nature Communications which highlights the challenges of different molecular representations for data-driven property predictions.
- Ana Laura Dias
- , Latimah Bustillo
- & Tiago Rodrigues
-
Article
| Open AccessA systematic study of key elements underlying molecular property prediction
AI has become a crucial tool for drug discovery, but how to properly represent molecules for data-driven property prediction is still an open question. Here the authors evaluate 62,820 models to highlight existing challenges, the impact of activity cliffs, and the crucial role of dataset size.
- Jianyuan Deng
- , Zhibo Yang
- & Fusheng Wang
-
Article
| Open AccessRetrosynthesis prediction with an interpretable deep-learning framework based on molecular assembly tasks
Automating retrosynthesis prediction in organic chemistry is a major application of ML. Here the authors present RetroExplainer, which offers a high-performance, transparent and interpretable deep-learning framework providing valuable insights for drug development.
- Yu Wang
- , Chao Pang
- & Leyi Wei
-
Article
| Open AccessFirst fully-automated AI/ML virtual screening cascade implemented at a drug discovery centre in Africa
Streamlined data-driven drug discovery remains challenging, especially in resource-limited settings. Here, the authors present ZairaChem, an AI/ML tool that streamlines QSAR/QSPR modelling, implemented for the first time at the H3D Centre in South Africa.
- Gemma Turon
- , Jason Hlozek
- & Miquel Duran-Frigola
-
Article
| Open AccessDECIMER.ai: an open platform for automated optical chemical structure identification, segmentation and recognition in scientific publications
Chemical structures are typically published as nonmachine-readable images in scientific literature. Here, the authors present DECIMER.ai, an open platform for translating chemical structures in publications into machine-readable representations.
- Kohulan Rajan
- , Henning Otto Brinkhaus
- & Christoph Steinbeck
-
Article
| Open AccessOzone-enabled fatty acid discovery reveals unexpected diversity in the human lipidome
Fatty acids are fundamental biomolecular building blocks that are characterized by extraordinary structural diversity and present a formidable analytical challenge. Here the authors introduce a discovery workflow for de novo identification that adds more than 100 fatty acids to the human lipidome.
- Jan Philipp Menzel
- , Reuben S. E. Young
- & Stephen J. Blanksby
-
Article
| Open AccessRapid planning and analysis of high-throughput experiment arrays for reaction discovery
High-throughput experimentation is an increasingly important tool in reaction discovery, while there remains a need for software solutions to navigate data-rich experiments. Here the authors report phactor™, a software that facilitates the performance and analysis of high-throughput experimentation in a chemical laboratory.
- Babak Mahjour
- , Rui Zhang
- & Tim Cernak
-
Article
| Open AccessUltra-fast and accurate electron ionization mass spectrum matching for compound identification with million-scale in-silico library
Accuracy loss and slow speed affect the identification of compounds through matching of mass spectra using a large-scale spectral library. Here the authors use Word2vec spectral embedding and hierarchical navigable small-world graph to improve accuracy and speed of spectral matching on their own million-scale in-silico library.
- Qiong Yang
- , Hongchao Ji
- & Zhimin Zhang
-
Article
| Open AccessReaction performance prediction with an extrapolative and interpretable graph model based on chemical knowledge
Predictive modelling remains a key challenge for designing synthetic transformations. Here, the authors develop a knowledge-based graph model to predict reaction yield and stereoselectivity, offering an extrapolative and interpretable approach for evaluating reaction performance.
- Shu-Wen Li
- , Li-Cheng Xu
- & Xin Hong
-
Article
| Open AccessFrom a drug repositioning to a structure-based drug design approach to tackle acute lymphoblastic leukemia
Deoxycytidine kinase is the rate-limiting enzyme of the salvage pathway and it has recently emerged as a target for antiproliferative therapies for cancers where it is essential. Here, the authors develop a potent inhibitor applying an iterative multidisciplinary approach, which relies on computational design coupled with experimental evaluations.
- Magali Saez-Ayala
- , Laurent Hoffer
- & Xavier Morelli
-
Article
| Open AccessRetrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing
Retrosynthesis prediction is a fundamental problem in organic synthesis. Here, inspired by simplified arrow-pushing reaction mechanisms, the authors develop a graph-to-edits framework, Graph2Edits, based on graph neural network for retrosynthesis prediction.
- Weihe Zhong
- , Ziduo Yang
- & Calvin Yu-Chian Chen
-
Article
| Open AccessArchitector for high-throughput cross-periodic table 3D complex building
Rare-earth and actinide complexes are critical for a wealth of clean-energy applications but Three dimensional (3D) structural generation and prediction for these organometallic systems remains challenging. Here, the authors propose a high-throughput in-silico synthesis code for s-, p-, d-, and f-block mononuclear organometallic complexes.
- Michael G. Taylor
- , Daniel J. Burrill
- & Ping Yang
-
Article
| Open AccessChemistry-intuitive explanation of graph neural networks for molecular property prediction with substructure masking
Attempts to explain molecular property predictions of neural networks are not always compatible with chemical intuition based on chemical substructures. Here the authors propose the substructure mask explanation method to tackle this challenge.
- Zhenxing Wu
- , Jike Wang
- & Tingjun Hou
-
Article
| Open AccessSingle-step retrosynthesis prediction by leveraging commonly preserved substructures
Retrosynthesis is a critical task for organic chemistry with numerous industrial applications. Here, the authors build a machine learning model to learn the concept of substructures from a large reaction dataset to achieve chemist-like intuitions.
- Lei Fang
- , Junren Li
- & Jian-Guang Lou
-
Article
| Open AccessPredicting compound activity from phenotypic profiles and chemical structures
Experimental assays are used to determine if compounds cause a desired activity in cells. Here the authors demonstrate that computational methods can predict compound bioactivity given their chemical structure, imaging and gene expression data from historic screening libraries.
- Nikita Moshkov
- , Tim Becker
- & Juan C. Caicedo
-
Article
| Open AccessDigital circuits and neural networks based on acid-base chemistry implemented by robotic fluid handling
The complementarity of acids and bases is a fundamental chemical concept. Here, the authors use simple acid-base chemistry to encode binary information and perform information processing including digital circuits and neural networks using robotic fluid handling.
- Ahmed A. Agiza
- , Kady Oakley
- & Sherief Reda
-
Article
| Open AccessLeveraging molecular structure and bioactivity with chemical language models for de novo drug design
Generative Deep Learning holds promise for mining the unexplored “chemical universe” for new drugs. Here, the authors demonstrate the de novo design of phosphoinositide 3-kinase gamma (PI3Kγ) inhibitors for the PI3K/Akt pathway in human tumor cells.
- Michael Moret
- , Irene Pachon Angona
- & Gisbert Schneider
-
Article
| Open AccessMerging enzymatic and synthetic chemistry with computational synthesis planning
The identification of synthetic routes combining enzymatic and non-enzymatic reactions has been challenging and requiring expert knowledge. Here, the authors describe a computational retrosynthetic approach relying on neural network models for planning synthetic routes using both strategies.
- Itai Levin
- , Mengjie Liu
- & Connor W. Coley
-
Article
| Open AccessUnconventional interfacial water structure of highly concentrated aqueous electrolytes at negative electrode polarizations
Water-in-salt electrolytes can be useful for future electrochemical energy storage systems. Here, the authors investigate the potential-dependent double-layer structures at the interface between a gold electrode and a highly concentrated aqueous electrolyte solution via in situ Raman measurements.
- Chao-Yu Li
- , Ming Chen
- & Tianquan Lian
-
Article
| Open AccessImpedance-based forecasting of lithium-ion battery performance amid uneven usage
Accurate forecasts of lithium-ion battery performance will ease concerns about the reliability of electric vehicles. Here, the authors leverage electrochemical impedance spectroscopy and machine learning to show that future capacity can be predicted amid uneven use, with no historical data requirement.
- Penelope K. Jones
- , Ulrich Stimming
- & Alpha A. Lee
-
Article
| Open AccessLanguage models can learn complex molecular distributions
Generative models for the novo molecular design attract enormous interest for exploring the chemical space. Here the authors investigate the application of chemical language models to challenging modeling tasks demonstrating their capability of learning complex molecular distributions.
- Daniel Flam-Shepherd
- , Kevin Zhu
- & Alán Aspuru-Guzik
-
Article
| Open AccessThe pocketome of G-protein-coupled receptors reveals previously untargeted allosteric sites
G-protein-coupled receptors bind endogenous ligands at sites that are frequently highly conserved. Here, authors computationally describe alternative allosteric pockets, several of which have not been targeted by synthetic ligands before.
- Janik B. Hedderich
- , Margherita Persechino
- & Peter Kolb
-
Article
| Open AccessSIMILE enables alignment of tandem mass spectra with statistical significance
Interrelating metabolites by their fragmentation spectra is central to metabolomics. Here the authors align fragmentation spectra with both statistical significance and allowance for multiple chemical differences using Significant Interrelation of MS/MS Ions via Laplacian Embedding (SIMILE).
- Daniel G. C. Treen
- , Mingxun Wang
- & Benjamin P. Bowen
-
Article
| Open AccessImplicitly perturbed Hamiltonian as a class of versatile and general-purpose molecular representations for machine learning
Molecular representations are fundamental tools for machine-learning models. The current work introduces a new set of molecular representations demonstrated to enable accurate predictions of molecular conformational energy and solvation free energy.
- Amin Alibakhshi
- & Bernd Hartke
-
Article
| Open AccessRetrosynthetic reaction pathway prediction through neural machine translation of atomic environments
Reaction route planning remains a major challenge in organic synthesis. The authors present a retrosynthetic prediction model using the fragment-based representation of molecules and the Transformer architecture in neural machine translation.
- Umit V. Ucak
- , Islambek Ashyrmamatov
- & Juyong Lee
-
Comment
| Open AccessAutonomous platforms for data-driven organic synthesis
Achieving autonomous multi-step synthesis of novel molecular structures in chemical discovery processes is a goal shared by many researchers. In this Comment, we discuss key considerations of what an ideal platform may look like and the apparent state of the art. While most hardware challenges can be overcome with clever engineering, other challenges will require advances in both algorithms and data curation.
- Wenhao Gao
- , Priyanka Raghavan
- & Connor W. Coley
-
Article
| Open AccessBiocatalysed synthesis planning using data-driven learning
As of now, only rule-based systems support retrosynthetic planning using biocatalysis, while initial data-driven approaches are limited to forward predictions. Here, the authors extend the data-driven forward reaction as well as retrosynthetic pathway prediction models based on the Molecular Transformer architecture to biocatalysis.
- Daniel Probst
- , Matteo Manica
- & Teodoro Laino
-
Article
| Open AccessReinforcing the supply chain of umifenovir and other antiviral drugs with retrosynthetic software
COVID-19 has exposed the fragility of supply chains, particularly for goods that are essential or may suddenly become essential, such as repurposed pharmaceuticals. Here the authors develop a methodology to provide routes to pharmaceutical targets that allow low-supply starting materials or intermediates to be avoided, with representative pathways validated experimentally.
- Yingfu Lin
- , Zirong Zhang
- & Tim Cernak
-
Article
| Open AccessSpookyNet: Learning force fields with electronic degrees of freedom and nonlocal effects
Current machine-learned force fields typically ignore electronic degrees of freedom. SpookyNet is a deep neural network that explicitly treats electronic degrees of freedom, closing an important remaining gap for models in quantum chemistry.
- Oliver T. Unke
- , Stefan Chmiela
- & Klaus-Robert Müller