Cheminformatics

  • Article
    | Open Access

    Quantum mechanical calculations of molecular ionized states are computationally quite expensive. This work reports a successful extension of a previous deep-neural networks approach towards transferable neural-network models for predicting multiple properties of open shell anions and cations.

    • Roman Zubatyuk
    • , Justin S. Smith
    •  & Olexandr Isayev
  • Article
    | Open Access

    Small molecules bioactivity descriptors are enriched representations of compounds, reaching beyond chemical structures and capturing their known biological properties. Here the authors present a collection of deep neural networks able to infer bioactivity signatures for any compound of interest, even when little or no experimental information is available for them.

    • Martino Bertoni
    • , Miquel Duran-Frigola
    •  & Patrick Aloy
  • Article
    | Open Access

    The transition of prebiotic chemistry to present-day chemistry lasted a very long period of time, but the current laboratory investigations of this process are mostly limited to a couple of days. Here, the authors develop a fully automated robotic prebiotic chemist designed for long-term chemical experiments exploring unconstrained multicomponent reactions, which can run autonomously and uses simple chemical inputs.

    • Silke Asche
    • , Geoffrey J. T. Cooper
    •  & Leroy Cronin
  • Article
    | Open Access

    The IDG-DREAM Challenge carried out crowdsourced benchmarking of predictive algorithms for kinase inhibitor activities on unpublished data. This study provides a resource to compare emerging algorithms and prioritize new kinase activities to accelerate drug discovery and repurposing efforts.

    • Anna Cichońska
    • , Balaguru Ravikumar
    •  & Tero Aittokallio
  • Article
    | Open Access

    Generating new sensible molecular structures is a key problem in computer aided drug discovery. Here the authors propose a graph-based molecular generative model that outperforms previously proposed graph-based generative models of molecules and performs comparably to several SMILES-based models.

    • Omar Mahmood
    • , Elman Mansimov
    •  & Kyunghyun Cho
  • Article
    | Open Access

    The search for life in the universe is difficult due to issues with defining signatures of living systems. Here, the authors present an approach based on the molecular assembly number and tandem mass spectrometry that allows identification of molecules produced by biological systems, and use it to identify biosignatures from a range of samples, including ones from outer space.

    • Stuart M. Marshall
    • , Cole Mathis
    •  & Leroy Cronin
  • Article
    | Open Access

    In organic chemistry, synthetic routes for new molecules are often specified in terms of reacting molecules only. The current work reports an artificial intelligence model to predict the full sequence of experimental operations for an arbitrary chemical equation.

    • Alain C. Vaucher
    • , Philippe Schwaller
    •  & Teodoro Laino
  • Article
    | Open Access

    Identifying optimal materials in multiobjective optimization problems represents a challenge for new materials design approaches. Here the authors develop an active-learning algorithm to optimize the Pareto-optimal solutions successfully applied to the in silico polymer design for a dispersant-based application.

    • Kevin Maik Jablonka
    • , Giriprasad Melpatti Jothiappan
    •  & Brian Yoo
  • Article
    | Open Access

    Machine learning algorithms offer new possibilities for automating reaction procedures. The present paper investigates automated reaction’s prediction with Molecular Transformer, the state-of-the-art model for reaction prediction, proposing a new debiased dataset for a realistic assessment of the model’s performance.

    • Dávid Péter Kovács
    • , William McCorkindale
    •  & Alpha A. Lee
  • Article
    | Open Access

    Large-scale sequencing efforts have uncovered a large number of secondary metabolic pathways, but the chemicals they synthesise remain unknown. Here the authors present PRISM 4, which predicts the chemical structures encoded by microbial genome sequences, including all classes of bacterial antibiotics in clinical use.

    • Michael A. Skinnider
    • , Chad W. Johnston
    •  & Nathan A. Magarvey
  • Article
    | Open Access

    Accurate prediction of solubility represents a challenge for traditional computational approaches due to the complex nature of phenomena involved. Here the authors report a successful approach to solubility prediction in organic solvents and water using combination of machine learning and computational chemistry.

    • Samuel Boobier
    • , David R. J. Hose
    •  & Bao N. Nguyen
  • Article
    | Open Access

    Development of algorithms to predict reactant and reagents given a target molecule is key to accelerate retrosynthesis approaches. Here the authors demonstrate that applying augmentation techniques to the SMILE representation of target data significantly improves the quality of the reaction predictions.

    • Igor V. Tetko
    • , Pavel Karpov
    •  & Guillaume Godin
  • Article
    | Open Access

    Organic reactions can readily be learned by deep learning models, however, stereochemistry is still a challenge. Here, the authors fine tune a general model using a small dataset, then predict and validate experimentally regio- and stereo-selectivity for various carbohydrates transformations.

    • Giorgio Pesciullesi
    • , Philippe Schwaller
    •  & Jean-Louis Reymond
  • Article
    | Open Access

    Extracting experimental operations for chemical synthesis from procedures reported in prose is a tedious task. Here the authors develop a deep-learning model based on the transformer architecture to translate experimental procedures from the field of organic chemistry into synthesis actions.

    • Alain C. Vaucher
    • , Federico Zipoli
    •  & Teodoro Laino
  • Article
    | Open Access

    The choice of molecular representations can severely impact the performances of machine-learning methods. Here the authors demonstrate a persistence homology based molecular representation through an active-learning approach for predicting CO2/N2 interaction energies at the density functional theory (DFT) level.

    • Jacob Townsend
    • , Cassie Putman Micucci
    •  & Konstantinos D. Vogiatzis
  • Article
    | Open Access

    Bond dissociation enthalpies are key quantities in determining chemical reactivity, their computations with quantum mechanical methods being highly demanding. Here the authors develop a machine learning approach to calculate accurate dissociation enthalpies for organic molecules with sub-second computational cost.

    • Peter C. St. John
    • , Yanfei Guan
    •  & Robert S. Paton
  • Article
    | Open Access

    Identifying kinases responsible for specific phosphorylation events remains challenging. Here, the authors leverage kinase inhibitor profiles for the identification of kinase-substrate site pairs in cell extracts, developing a method that can identify the enzymes responsible for unassigned phosphorylation events.

    • Nikolaus A. Watson
    • , Tyrell N. Cartwright
    •  & Jonathan M. G. Higgins
  • Article
    | Open Access

    The use of machine learning for identifying small molecules through their retention time’s predictions has been challenging so far. Here the authors combine a large database of liquid chromatography retention time with a deep learning approach to enable accurate metabolites’s identification.

    • Xavier Domingo-Almenara
    • , Carlos Guijas
    •  & Gary Siuzdak
  • Article
    | Open Access

    Derivatization of natural products is a powerful approach to generate new molecules for biological screenings. Here, the authors employ C-H oxidation and ring expansion methods for the preparation of a library of medium-sized ring skeleta, which occupy a unique chemical space based on chemoinformatic analysis.

    • Changgui Zhao
    • , Zhengqing Ye
    •  & Weiping Tang
  • Article
    | Open Access

    Mapping atoms across chemical reactions represents a challenging computational task. Here the authors show via a combination of graph theory and combinatorics with expert chemical knowledge the possibility to map very complex organic reactions.

    • Wojciech Jaworski
    • , Sara Szymkuć
    •  & Bartosz A. Grzybowski
  • Article
    | Open Access

    Synthetic chemists develop a "chemical intuition" over years of experience in the lab. Here the authors combine machine learning of (partially) failed experiments with robotic synthesis to capture this intuition used in searching for the optimal synthesis conditions of metal-organic frameworks.

    • Seyed Mohamad Moosavi
    • , Arunraj Chidambaram
    •  & Berend Smit
  • Article
    | Open Access

    The incomplete nature and undefined structure of the existing catalysis research data has prevented comprehensive knowledge extraction. Here, the authors report a novel meta-analysis method that identifies correlations between a catalyst’s physico-chemical properties and its performance in a particular reaction.

    • Roman Schmack
    • , Alexandra Friedrich
    •  & Ralph Kraehnert
  • Article
    | Open Access

    Parasitic nematodes causing onchocerciasis and lymphatic filariasis rely on a bacterial endosymbiont, Wolbachia, which is a validated therapeutic target. Here, Clare et al. perform a high-throughput screen of 1.3 million compounds and identify 5 chemotypes with faster kill rates than existing anti-Wolbachia drugs.

    • Rachel H. Clare
    • , Catherine Bardelle
    •  & Stephen A. Ward
  • Article
    | Open Access

    The fast and accurate determination of molecular properties is particularly crucial in drug discovery. Here, the authors employ supervised machine learning to treat differential mobility spectrometry – mass spectrometry data for ten classes of drug candidates and predict several condensed-phase properties.

    • Stephen W. C. Walker
    • , Ahdia Anwar
    •  & W. Scott Hopkins
  • Article
    | Open Access

    It is now possible to predict what a chemical smells like based on its chemical structure, however to date, this has only been done for a small number of odor descriptors. Here, using natural-language semantic representations, the authors demonstrate prediction of a much wider range of descriptors.

    • E. Darío Gutiérrez
    • , Amit Dhurandhar
    •  & Guillermo A. Cecchi
  • Article
    | Open Access

    Sequence-defined macromolecules consist of a defined chain length and topology and can be used in applications such as antibiotics and data storage. Here the authors developed two algorithms to encode text fragments and QR codes as a collection of oligomers and to reconstruct the original data.

    • Steven Martens
    • , Annelies Landuyt
    •  & Filip Du Prez
  • Article
    | Open Access

    Distributing a reaction workload across laboratories can solve chemical problems more efficiently, but it is challenging to develop viable hardware and software. Here, the authors present an internet-connected network of cheap robots that can perform chemical reactions and share outcomes in real time, demonstrating a digitized approach to chemical collaboration.

    • Dario Caramelli
    • , Daniel Salley
    •  & Leroy Cronin
  • Article
    | Open Access

    The success of a fluorescent dye as a molecular probe to monitor the intracellular activity of biomolecules depends on its physicochemical characteristics. Here, the authors use a predictive model to identify key features that allow them to design cell permeable, background-free fluorescent probes.

    • Samira Husen Alamudi
    • , Rudrakanta Satapathy
    •  & Young-Tae Chang