Machine learning articles within Nature Communications

Featured

  • Article
    | Open Access

    Multimodal biological data is challenging to analyze. Here, the authors develop UnitedNet, an explainable deep neural network for analyzing single-cell multimodal biological data and estimating relationships between gene expression and other modalities with cell-type specificity.

    • Xin Tang
    • , Jiawei Zhang
    •  & Jia Liu
  • Article
    | Open Access

    Sustained drug delivery is critical for patient adherence to chronic disease treatments. Here the authors apply machine learning to engineer multifunctional peptides with high melanin binding, high cell-penetration, and low cytotoxicity, enhancing the duration and efficacy of peptide-drug conjugates for sustained ocular delivery.

    • Henry T. Hsueh
    • , Renee Ti Chou
    •  & Laura M. Ensign
  • Article
    | Open Access

    Alternative algorithms exploiting advantages of multidimensional mass spectrometry in untargeted metabolomics are needed. Here, the authors develop and demonstrate PeakDecoder for confident and accurate metabolite profiling in 116 microbial sample runs and using a library built from 64 standards.

    • Aivett Bilbao
    • , Nathalie Munoz
    •  & Kristin E. Burnum-Johnson
  • Article
    | Open Access

    A challenge in diagnostics is integrating different data modalities to characterize physiological state. Here, the authors show, using the heart as a model system, that cross-modal autoencoders can integrate and translate modalities to improve diagnostics and identify associated genetic variants.

    • Adityanarayanan Radhakrishnan
    • , Sam F. Friedman
    •  & Caroline Uhler
  • Article
    | Open Access

    Artificial Intelligence (AI) has the potential of assisting the study and diagnosis of veterinary cancers. Here, the authors build a cancer digital pathology atlas encompassing multiple animal species and demonstrate an AI approach for comparative pathology, which yields insights about immune response and morphological similarities.

    • Khalid AbdulJabbar
    • , Simon P. Castillo
    •  & Yinyin Yuan
  • Article
    | Open Access

    In this work, the authors study protein families’ VAE latent manifolds and coevolutionary Hamiltonians. These Latent Generative Landscapes predict phylogenetic groupings, fitness & functional properties for several systems with clear protein engineering/design potential.

    • Cheyenne Ziegler
    • , Jonathan Martin
    •  & Faruck Morcos
  • Article
    | Open Access

    Interpretation of rare genetic variants remains challenging. Here, the authors develop a supervised variant effect predictor for use in clinically actionable genes which incorporates evolutionary and structural relationships between residues and has balanced specificity and sensitivity.

    • Federica Luppino
    • , Ivan A. Adzhubei
    •  & Agnes Toth-Petroczy
  • Article
    | Open Access

    Gene selection for spatial transcriptomics is currently not optimal. Here the authors report PERSIST, a flexible deep learning framework that uses existing scRNA-seq data to identify gene targets for spatial transcriptomics; they show this allows you to capture more information with fewer genes.

    • Ian Covert
    • , Rohan Gala
    •  & Su-In Lee
  • Article
    | Open Access

    State-of-the-art machine learning models in drug discovery fail to reliably predict the binding properties of poorly annotated proteins and small molecules. Here, the authors present AI-Bind, a machine learning pipeline to improve generalizability and interpretability of binding predictions.

    • Ayan Chatterjee
    • , Robin Walters
    •  & Giulia Menichetti
  • Article
    | Open Access

    Experimental assays are used to determine if compounds cause a desired activity in cells. Here the authors demonstrate that computational methods can predict compound bioactivity given their chemical structure, imaging and gene expression data from historic screening libraries.

    • Nikita Moshkov
    • , Tim Becker
    •  & Juan C. Caicedo
  • Article
    | Open Access

    Accurately annotating cell types is a fundamental step in single-cell omics data analysis. Here, the authors develop a computational method called Cellcano based on a two-round supervised learning algorithm to identify cell types for scATAC-seq data and perform benchmarking to demonstrate its accuracy, robustness and computational efficiency.

    • Wenjing Ma
    • , Jiaying Lu
    •  & Hao Wu
  • Article
    | Open Access

    The signal-to-noise ratio in bioimages is often low, which is problematic for segmentation. Here the authors report a deep learning method, deepflash2, to facilitate the segmentation of ambiguous bioimages through multi-expert annotations and integrated quality assurance.

    • Matthias Griebel
    • , Dennis Segebarth
    •  & Christoph M. Flath
  • Article
    | Open Access

    There is a lack of standardisation in slide microscopy imaging data. Here the authors report Slim, an open-source, web-based slide microscopy viewer implementing the Digital Imaging and Communications in Medicine (DICOM) standard to achieve interoperability with a range of existing medical imaging systems.

    • Chris Gorman
    • , Davide Punzo
    •  & Markus D. Herrmann
  • Article
    | Open Access

    Antimicrobial peptides emerge as compounds that can alleviate the global health hazard of antimicrobial resistance. Here, the authors propose HydrAMP, an extended conditional variational autoencoder. HydrAMP generated antimicrobial peptides with high activity against bacteria, including multidrug-resistant species.

    • Paulina Szymczak
    • , Marcin Możejko
    •  & Ewa Szczurek
  • Article
    | Open Access

    There is interest in measuring the influence of spatial cellular organization on pathophysiology, which is being accomplished through spatial transcriptomics. There the authors present UniCell Deconvolve, a pre-trained deep learning model that predicts cell identity and deconvolves cell type fractions using a 28 M cell database.

    • Daniel Charytonowicz
    • , Rachel Brody
    •  & Robert Sebra
  • Article
    | Open Access

    Advances in spatial transcriptomics technologies have enabled the gene expression profiling of tissues while retaining spatial context. Here the authors present GraphST, a graph self-supervised contrastive learning method that learns informative and discriminative spot representations from spatial transcriptomics data.

    • Yahui Long
    • , Kok Siong Ang
    •  & Jinmiao Chen
  • Article
    | Open Access

    Despite recent progress, machine learning methods remain inadequate in modeling the natural protein-protein interaction (PPI) hierarchy for PPI prediction. Here, the authors present a double-viewed hierarchical graph learning model, HIGH-PPI, to predict PPIs and extrapolate the molecular details involved.

    • Ziqi Gao
    • , Chenran Jiang
    •  & Jia Li
  • Article
    | Open Access

    The increasing scale of single-cell RNA-seq studies presents new challenge for integrating datasets from different batches. Here, the authors develop scDML, a tool that simultaneously removes batch effects, improves clustering performance, recovers true cell types, and scales well to large datasets.

    • Xiaokang Yu
    • , Xinyi Xu
    •  & Xiangjie Li
  • Article
    | Open Access

    Single-cell multi-omics and deep learning could lead to the inference of biological networks across specific cell types. Here, the authors develop DeepMAPS, a deep learning, graph-based approach for cell-type specific network inference from single-cell multi-omics data that is tested on healthy and tumour tissue datasets.

    • Anjun Ma
    • , Xiaoying Wang
    •  & Qin Ma
  • Article
    | Open Access

    The current use of elastography ultrasound faces challenges, including vulnerability to subjective manipulation, echo signal attenuation, unknown risks of elastic pressure and high imaging hardware cost. Here, the author shows a virtual elastography to empower low-end ultrasound devices with state-of-art elastography function.

    • Zhao Yao
    • , Ting Luo
    •  & JianQiao Zhou
  • Article
    | Open Access

    Computational methods to study protein structural dynamics are a powerful tool in life sciences but are computationally expensive. Here, the authors show that machine learning can be used to efficiently generate protein conformational ensembles and test their method on intrinsically disordered peptides.

    • Giacomo Janson
    • , Gilberto Valdes-Garcia
    •  & Michael Feig
  • Article
    | Open Access

    Application of CRISPR-Cas13d is limited by the inability to predict on- and off-targets. Here the authors perform CRISPR-Cas13d proliferation screens followed by modeling of Cas13d on- and off-targets; they design a deep learning model, DeepCas13, to predict the on-target activity of a gRNA.

    • Xiaolong Cheng
    • , Zexu Li
    •  & Wei Li
  • Article
    | Open Access

    The PML-RARA gene fusion is the characteristic driver of Acute Promyelocytic Leukaemia (APL) and is known to bind to the genome. Here, the authors characterise the impact of PML-RARA on gene regulation in APL cell lines and patient samples using transcriptomics, epigenomics, and machine learning.

    • William Villiers
    • , Audrey Kelly
    •  & Cameron S. Osborne
  • Article
    | Open Access

    Understanding the heterogeneity of growth, response to therapy and progression dynamics in metastatic colorectal cancer (mCRC) remains critical. Here, the authors analyse lesion-specific response heterogeneity in 4,308 mCRC patients and find that organ-level progression sequence is associated with long-term survival.

    • Jiawei Zhou
    • , Amber Cipriani
    •  & Yanguang Cao
  • Article
    | Open Access

    Single-cell genomics has expanded to measure diverse molecular modalities within the same cell. Here the authors provide a computational framework called scTriangulate to integrate cluster annotations from diverse independent sources, algorithms, and modalities to define statistically stable populations.

    • Guangyuan Li
    • , Baobao Song
    •  & Nathan Salomonis
  • Article
    | Open Access

    Different location of adipose tissue may have different consequences to cardiometabolic risk. Here the authors report that deep learning enabled accurate prediction of specific adipose tissue volumes, and that after adjustment for BMI, visceral adiposity was associated with increased risk of cardiometabolic disease, while gluteofemoral adiposity was associated with reduced risk.

    • Saaket Agrawal
    • , Marcus D. R. Klarqvist
    •  & Amit V. Khera
  • Article
    | Open Access

    Developing computational tools for interpretable cell type annotation in scRNA-seq data remains challenging. Here the authors propose a Transformer-based model for interpretable annotation transfer using biologically understandable entities, and demonstrate its performance on large or atlas datasets.

    • Jiawei Chen
    • , Hao Xu
    •  & Jing-Dong J. Han
  • Article
    | Open Access

    Design of recombinases with new target sites is usually achieved through cycles of directed molecular evolution. Here the authors report Recombinase Generator, RecGen, an algorithm for generation of designer-recombinases; they perform experimental validation to show that this can predict recombinase sequences.

    • Lukas Theo Schmitt
    • , Maciej Paszkowski-Rogacz
    •  & Frank Buchholz
  • Article
    | Open Access

    Synthetic biology often involves engineering microbial strains to express high-value proteins. Here the authors build deep learning predictors of protein expression from sequence that deliver accurate models with fewer data than previously assumed, helping to lower costs of model-driven strain design.

    • Evangelos-Marios Nikolados
    • , Arin Wongprommoon
    •  & Diego A. Oyarzún
  • Article
    | Open Access

    Observation of the chemical and conformational dynamics of biomolecules by diffraction methods is impeded by several physical artifacts. The authors present an extensible framework for accurate correction of such data that can keep pace with rapid developments in diffraction methods.

    • Kevin M. Dalton
    • , Jack B. Greisman
    •  & Doeke R. Hekstra
  • Article
    | Open Access

    Single-cell multimodal sequencing technologies are developed to simultaneously profile different modalities of data in the same cell. Here the authors develops a multimodal deep clustering method for the analysis of single-cell multi-omics data that supports clustering different types of multi-omics data and multi-batch data, as well as downstream differential expression analysis.

    • Xiang Lin
    • , Tian Tian
    •  & Hakon Hakonarson
  • Article
    | Open Access

    ‘Circulating cell-free DNA can be used to predict cancer, but it is more challenging to assess in early stage cancer. Here, the authors created a diagnostic model using tumor fractions deciphered from circulating cfDNA methylation signatures, which exhibited an 86% sensitivity in detecting early-stage cancer.

    • Xiao Zhou
    • , Zhen Cheng
    •  & Weibin Cheng
  • Article
    | Open Access

    Liquid biopsy offers great promise for noninvasive cancer diagnostics, while the lack of adequate target characterization and analysis hinders its wide application. Here, the authors design a transfer learning-based algorithm to transfer lesion labels from the primary cancer cell atlas to circulating tumor cells.

    • Xiaoxu Guo
    • , Fanghe Lin
    •  & Jia Song
  • Article
    | Open Access

    Off-target binding hinders the development of therapeutic antibodies and reproducibility in basic research settings. Here the authors develop a method to quantify and reduce the polyreactivity of antibody fragments based on protein sequence alone.

    • Edward P. Harvey
    • , Jung-Eun Shin
    •  & Andrew C. Kruse
  • Article
    | Open Access

    Nucleosome profiling from cell-free DNA (cfDNA) represents a potential approach for cancer detection and classification. Here, the authors develop Griffin, a computational framework for tumour subtype classification based on cfDNA nucleosome profiling that can work with ultra-low pass sequencing data.

    • Anna-Lisa Doebley
    • , Minjeong Ko
    •  & Gavin Ha
  • Article
    | Open Access

    Methods for jointly analysing the different spatial data modalities in 3D are lacking. Here the authors report the computational framework STACI (Spatial Transcriptomic data using over-parameterized graph-based Autoencoders with Chromatin Imaging data) which they apply to an Alzheimer’s disease mouse model.

    • Xinyi Zhang
    • , Xiao Wang
    •  & Caroline Uhler
  • Article
    | Open Access

    Predicting topological structures from Hi-C data provides insight into comprehending gene expression and regulation. Here, the authors present RefHiC, an attention-based deep learning framework that leverages a reference panel of Hi-C datasets to assist topological structure annotation from a given study sample.

    • Yanlin Zhang
    •  & Mathieu Blanchette