Cheminformatics articles within Nature Communications

Featured

  • Article
    | Open Access

    A limitation of robotic platforms in chemistry is the lack of feedback loops to adjust the conditions in-operando. Here the authors present a dynamically programmable robotic system that uses sensors for real-time adaptation, achieving yield improvements in syntheses and discovering new molecules.

    • Artem I. Leonov
    • , Alexander J. S. Hammer
    •  & Leroy Cronin
  • Article
    | Open Access

    Regioselectivity prediction for many reactions remains a challenging target for a priori prediction. Here, the authors develop a machine learning model that predicts the outcomes of Minisci reactions.

    • Emma King-Smith
    • , Felix A. Faber
    •  & Alpha A. Lee
  • Article
    | Open Access

    There is a need for dataset-dependent MS2 acquisition in trapped ion mobility spectrometry imaging. Here the authors report spatial ion mobility-scheduled exhaustive fragmentation (SIMSEF) which enables on-tissue metabolite and lipid annotation in mass spectrometry bioimaging studies, and use this to visualise the chemical space in rat brains.

    • Steffen Heuckeroth
    • , Arne Behrens
    •  & Robin Schmid
  • Article
    | Open Access

    Over their careers, medicinal chemists develop a gut feeling for what is a promising molecule. Here, the authors use machine learning models to learn this intuition and show that it can be successfully applied in several drug discovery scenarios.

    • Oh-Hyeon Choung
    • , Riccardo Vianello
    •  & José Jiménez-Luna
  • Comment
    | Open Access

    Machine learning is a powerful tool for the study and design of molecules. Here the authors comment a recent publication in Nature Communications which highlights the challenges of different molecular representations for data-driven property predictions.

    • Ana Laura Dias
    • , Latimah Bustillo
    •  & Tiago Rodrigues
  • Article
    | Open Access

    AI has become a crucial tool for drug discovery, but how to properly represent molecules for data-driven property prediction is still an open question. Here the authors evaluate 62,820 models to highlight existing challenges, the impact of activity cliffs, and the crucial role of dataset size.

    • Jianyuan Deng
    • , Zhibo Yang
    •  & Fusheng Wang
  • Article
    | Open Access

    Fatty acids are fundamental biomolecular building blocks that are characterized by extraordinary structural diversity and present a formidable analytical challenge. Here the authors introduce a discovery workflow for de novo identification that adds more than 100 fatty acids to the human lipidome.

    • Jan Philipp Menzel
    • , Reuben S. E. Young
    •  & Stephen J. Blanksby
  • Article
    | Open Access

    High-throughput experimentation is an increasingly important tool in reaction discovery, while there remains a need for software solutions to navigate data-rich experiments. Here the authors report phactor™, a software that facilitates the performance and analysis of high-throughput experimentation in a chemical laboratory.

    • Babak Mahjour
    • , Rui Zhang
    •  & Tim Cernak
  • Article
    | Open Access

    Accuracy loss and slow speed affect the identification of compounds through matching of mass spectra using a large-scale spectral library. Here the authors use Word2vec spectral embedding and hierarchical navigable small-world graph to improve accuracy and speed of spectral matching on their own million-scale in-silico library.

    • Qiong Yang
    • , Hongchao Ji
    •  & Zhimin Zhang
  • Article
    | Open Access

    Deoxycytidine kinase is the rate-limiting enzyme of the salvage pathway and it has recently emerged as a target for antiproliferative therapies for cancers where it is essential. Here, the authors develop a potent inhibitor applying an iterative multidisciplinary approach, which relies on computational design coupled with experimental evaluations.

    • Magali Saez-Ayala
    • , Laurent Hoffer
    •  & Xavier Morelli
  • Article
    | Open Access

    Rare-earth and actinide complexes are critical for a wealth of clean-energy applications but Three dimensional (3D) structural generation and prediction for these organometallic systems remains challenging. Here, the authors propose a high-throughput in-silico synthesis code for s-, p-, d-, and f-block mononuclear organometallic complexes.

    • Michael G. Taylor
    • , Daniel J. Burrill
    •  & Ping Yang
  • Article
    | Open Access

    Experimental assays are used to determine if compounds cause a desired activity in cells. Here the authors demonstrate that computational methods can predict compound bioactivity given their chemical structure, imaging and gene expression data from historic screening libraries.

    • Nikita Moshkov
    • , Tim Becker
    •  & Juan C. Caicedo
  • Article
    | Open Access

    The identification of synthetic routes combining enzymatic and non-enzymatic reactions has been challenging and requiring expert knowledge. Here, the authors describe a computational retrosynthetic approach relying on neural network models for planning synthetic routes using both strategies.

    • Itai Levin
    • , Mengjie Liu
    •  & Connor W. Coley
  • Article
    | Open Access

    Water-in-salt electrolytes can be useful for future electrochemical energy storage systems. Here, the authors investigate the potential-dependent double-layer structures at the interface between a gold electrode and a highly concentrated aqueous electrolyte solution via in situ Raman measurements.

    • Chao-Yu Li
    • , Ming Chen
    •  & Tianquan Lian
  • Article
    | Open Access

    Accurate forecasts of lithium-ion battery performance will ease concerns about the reliability of electric vehicles. Here, the authors leverage electrochemical impedance spectroscopy and machine learning to show that future capacity can be predicted amid uneven use, with no historical data requirement.

    • Penelope K. Jones
    • , Ulrich Stimming
    •  & Alpha A. Lee
  • Article
    | Open Access

    Generative models for the novo molecular design attract enormous interest for exploring the chemical space. Here the authors investigate the application of chemical language models to challenging modeling tasks demonstrating their capability of learning complex molecular distributions.

    • Daniel Flam-Shepherd
    • , Kevin Zhu
    •  & Alán Aspuru-Guzik
  • Article
    | Open Access

    Interrelating metabolites by their fragmentation spectra is central to metabolomics. Here the authors align fragmentation spectra with both statistical significance and allowance for multiple chemical differences using Significant Interrelation of MS/MS Ions via Laplacian Embedding (SIMILE).

    • Daniel G. C. Treen
    • , Mingxun Wang
    •  & Benjamin P. Bowen
  • Comment
    | Open Access

    Achieving autonomous multi-step synthesis of novel molecular structures in chemical discovery processes is a goal shared by many researchers. In this Comment, we discuss key considerations of what an ideal platform may look like and the apparent state of the art. While most hardware challenges can be overcome with clever engineering, other challenges will require advances in both algorithms and data curation.

    • Wenhao Gao
    • , Priyanka Raghavan
    •  & Connor W. Coley
  • Article
    | Open Access

    As of now, only rule-based systems support retrosynthetic planning using biocatalysis, while initial data-driven approaches are limited to forward predictions. Here, the authors extend the data-driven forward reaction as well as retrosynthetic pathway prediction models based on the Molecular Transformer architecture to biocatalysis.

    • Daniel Probst
    • , Matteo Manica
    •  & Teodoro Laino
  • Article
    | Open Access

    COVID-19 has exposed the fragility of supply chains, particularly for goods that are essential or may suddenly become essential, such as repurposed pharmaceuticals. Here the authors develop a methodology to provide routes to pharmaceutical targets that allow low-supply starting materials or intermediates to be avoided, with representative pathways validated experimentally.

    • Yingfu Lin
    • , Zirong Zhang
    •  & Tim Cernak
  • Article
    | Open Access

    Experimental determination of new cocrystals remains challenging due to the need of a systematic screening with a large range of coformers. Here the authors develop a flexible deep learning framework based on graph neural network demonstrated to quickly predict the formation of co-crystals.

    • Yuanyuan Jiang
    • , Zongwei Yang
    •  & Xuemei Pu
  • Article
    | Open Access

    Quantum mechanical calculations of molecular ionized states are computationally quite expensive. This work reports a successful extension of a previous deep-neural networks approach towards transferable neural-network models for predicting multiple properties of open shell anions and cations.

    • Roman Zubatyuk
    • , Justin S. Smith
    •  & Olexandr Isayev
  • Article
    | Open Access

    Small molecules bioactivity descriptors are enriched representations of compounds, reaching beyond chemical structures and capturing their known biological properties. Here the authors present a collection of deep neural networks able to infer bioactivity signatures for any compound of interest, even when little or no experimental information is available for them.

    • Martino Bertoni
    • , Miquel Duran-Frigola
    •  & Patrick Aloy
  • Article
    | Open Access

    The transition of prebiotic chemistry to present-day chemistry lasted a very long period of time, but the current laboratory investigations of this process are mostly limited to a couple of days. Here, the authors develop a fully automated robotic prebiotic chemist designed for long-term chemical experiments exploring unconstrained multicomponent reactions, which can run autonomously and uses simple chemical inputs.

    • Silke Asche
    • , Geoffrey J. T. Cooper
    •  & Leroy Cronin
  • Article
    | Open Access

    The IDG-DREAM Challenge carried out crowdsourced benchmarking of predictive algorithms for kinase inhibitor activities on unpublished data. This study provides a resource to compare emerging algorithms and prioritize new kinase activities to accelerate drug discovery and repurposing efforts.

    • Anna Cichońska
    • , Balaguru Ravikumar
    •  & Tero Aittokallio
  • Article
    | Open Access

    Generating new sensible molecular structures is a key problem in computer aided drug discovery. Here the authors propose a graph-based molecular generative model that outperforms previously proposed graph-based generative models of molecules and performs comparably to several SMILES-based models.

    • Omar Mahmood
    • , Elman Mansimov
    •  & Kyunghyun Cho
  • Article
    | Open Access

    The search for life in the universe is difficult due to issues with defining signatures of living systems. Here, the authors present an approach based on the molecular assembly number and tandem mass spectrometry that allows identification of molecules produced by biological systems, and use it to identify biosignatures from a range of samples, including ones from outer space.

    • Stuart M. Marshall
    • , Cole Mathis
    •  & Leroy Cronin
  • Article
    | Open Access

    In organic chemistry, synthetic routes for new molecules are often specified in terms of reacting molecules only. The current work reports an artificial intelligence model to predict the full sequence of experimental operations for an arbitrary chemical equation.

    • Alain C. Vaucher
    • , Philippe Schwaller
    •  & Teodoro Laino
  • Article
    | Open Access

    Identifying optimal materials in multiobjective optimization problems represents a challenge for new materials design approaches. Here the authors develop an active-learning algorithm to optimize the Pareto-optimal solutions successfully applied to the in silico polymer design for a dispersant-based application.

    • Kevin Maik Jablonka
    • , Giriprasad Melpatti Jothiappan
    •  & Brian Yoo
  • Article
    | Open Access

    Machine learning algorithms offer new possibilities for automating reaction procedures. The present paper investigates automated reaction’s prediction with Molecular Transformer, the state-of-the-art model for reaction prediction, proposing a new debiased dataset for a realistic assessment of the model’s performance.

    • Dávid Péter Kovács
    • , William McCorkindale
    •  & Alpha A. Lee
  • Article
    | Open Access

    Large-scale sequencing efforts have uncovered a large number of secondary metabolic pathways, but the chemicals they synthesise remain unknown. Here the authors present PRISM 4, which predicts the chemical structures encoded by microbial genome sequences, including all classes of bacterial antibiotics in clinical use.

    • Michael A. Skinnider
    • , Chad W. Johnston
    •  & Nathan A. Magarvey