Machine learning

  • Article
    | Open Access

    Understanding the heterogeneity of growth, response to therapy and progression dynamics in metastatic colorectal cancer (mCRC) remains critical. Here, the authors analyse lesion-specific response heterogeneity in 4,308 mCRC patients and find that organ-level progression sequence is associated with long-term survival.

    • Jiawei Zhou
    • , Amber Cipriani
    •  & Yanguang Cao
  • Article
    | Open Access

    Single-cell genomics has expanded to measure diverse molecular modalities within the same cell. Here the authors provide a computational framework called scTriangulate to integrate cluster annotations from diverse independent sources, algorithms, and modalities to define statistically stable populations.

    • Guangyuan Li
    • , Baobao Song
    •  & Nathan Salomonis
  • Article
    | Open Access

    Different location of adipose tissue may have different consequences to cardiometabolic risk. Here the authors report that deep learning enabled accurate prediction of specific adipose tissue volumes, and that after adjustment for BMI, visceral adiposity was associated with increased risk of cardiometabolic disease, while gluteofemoral adiposity was associated with reduced risk.

    • Saaket Agrawal
    • , Marcus D. R. Klarqvist
    •  & Amit V. Khera
  • Article
    | Open Access

    Developing computational tools for interpretable cell type annotation in scRNA-seq data remains challenging. Here the authors propose a Transformer-based model for interpretable annotation transfer using biologically understandable entities, and demonstrate its performance on large or atlas datasets.

    • Jiawei Chen
    • , Hao Xu
    •  & Jing-Dong J. Han
  • Article
    | Open Access

    Design of recombinases with new target sites is usually achieved through cycles of directed molecular evolution. Here the authors report Recombinase Generator, RecGen, an algorithm for generation of designer-recombinases; they perform experimental validation to show that this can predict recombinase sequences.

    • Lukas Theo Schmitt
    • , Maciej Paszkowski-Rogacz
    •  & Frank Buchholz
  • Article
    | Open Access

    Synthetic biology often involves engineering microbial strains to express high-value proteins. Here the authors build deep learning predictors of protein expression from sequence that deliver accurate models with fewer data than previously assumed, helping to lower costs of model-driven strain design.

    • Evangelos-Marios Nikolados
    • , Arin Wongprommoon
    •  & Diego A. Oyarzún
  • Article
    | Open Access

    Observation of the chemical and conformational dynamics of biomolecules by diffraction methods is impeded by several physical artifacts. The authors present an extensible framework for accurate correction of such data that can keep pace with rapid developments in diffraction methods.

    • Kevin M. Dalton
    • , Jack B. Greisman
    •  & Doeke R. Hekstra
  • Article
    | Open Access

    Single-cell multimodal sequencing technologies are developed to simultaneously profile different modalities of data in the same cell. Here the authors develops a multimodal deep clustering method for the analysis of single-cell multi-omics data that supports clustering different types of multi-omics data and multi-batch data, as well as downstream differential expression analysis.

    • Xiang Lin
    • , Tian Tian
    •  & Hakon Hakonarson
  • Article
    | Open Access

    ‘Circulating cell-free DNA can be used to predict cancer, but it is more challenging to assess in early stage cancer. Here, the authors created a diagnostic model using tumor fractions deciphered from circulating cfDNA methylation signatures, which exhibited an 86% sensitivity in detecting early-stage cancer.

    • Xiao Zhou
    • , Zhen Cheng
    •  & Weibin Cheng
  • Article
    | Open Access

    Liquid biopsy offers great promise for noninvasive cancer diagnostics, while the lack of adequate target characterization and analysis hinders its wide application. Here, the authors design a transfer learning-based algorithm to transfer lesion labels from the primary cancer cell atlas to circulating tumor cells.

    • Xiaoxu Guo
    • , Fanghe Lin
    •  & Jia Song
  • Article
    | Open Access

    Off-target binding hinders the development of therapeutic antibodies and reproducibility in basic research settings. Here the authors develop a method to quantify and reduce the polyreactivity of antibody fragments based on protein sequence alone.

    • Edward P. Harvey
    • , Jung-Eun Shin
    •  & Andrew C. Kruse
  • Article
    | Open Access

    Nucleosome profiling from cell-free DNA (cfDNA) represents a potential approach for cancer detection and classification. Here, the authors develop Griffin, a computational framework for tumour subtype classification based on cfDNA nucleosome profiling that can work with ultra-low pass sequencing data.

    • Anna-Lisa Doebley
    • , Minjeong Ko
    •  & Gavin Ha
  • Article
    | Open Access

    Methods for jointly analysing the different spatial data modalities in 3D are lacking. Here the authors report the computational framework STACI (Spatial Transcriptomic data using over-parameterized graph-based Autoencoders with Chromatin Imaging data) which they apply to an Alzheimer’s disease mouse model.

    • Xinyi Zhang
    • , Xiao Wang
    •  & Caroline Uhler
  • Article
    | Open Access

    Predicting topological structures from Hi-C data provides insight into comprehending gene expression and regulation. Here, the authors present RefHiC, an attention-based deep learning framework that leverages a reference panel of Hi-C datasets to assist topological structure annotation from a given study sample.

    • Yanlin Zhang
    •  & Mathieu Blanchette
  • Article
    | Open Access

    Identifying the designers of engineered biological sequences would help promote biotechnological innovation while holding designers accountable. Here the authors present the winners of a 2020 data-science competition which improved on previous attempts to attribute plasmid sequences.

    • Oliver M. Crook
    • , Kelsey Lane Warmbrod
    •  & William J. Bradshaw
  • Article
    | Open Access

    Sinonasal tumour diagnosis can be complicated by the heterogeneity of disease and classification systems. Here, the authors use machine learning to classify sinonasal undifferentiated carcinomas into 4 molecular classe with differences in differentiation state and clinical outcome.

    • Philipp Jurmeister
    • , Stefanie Glöß
    •  & David Capper
  • Article
    | Open Access

    Studying the cell composition of acral melanoma at the single-cell level could provide some clues about its poor response to immunotherapy. Here, the authors analyse acral and cutaneous melanoma patient samples using single-cell RNA-sequencing, and reveal a severe immunosuppressive state in acral melanomas

    • Chao Zhang
    • , Hongru Shen
    •  & Jilong Yang
  • Article
    | Open Access

    Recovering dropout-affected gene expression values is a challenging problem in bioinformatics. Here, the authors propose a data-driven framework, that first learns the underlying data distribution and then recovers the expression values by imposing a self-consistency on the expression matrix.

    • Md Tauhidul Islam
    • , Jen-Yeu Wang
    •  & Lei Xing
  • Article
    | Open Access

    Modeling the dynamics of large proteins reveals a fundamental scaling problem. Here, the authors tackle this challenge by decomposing a large system into smaller independent subsystems, simultaneously modeling each subsystem’s kinetics and ensuring their mutual independence.

    • Andreas Mardt
    • , Tim Hempel
    •  & Frank Noé
  • Article
    | Open Access

    Current treatment guidelines for Type-2 diabetes endorse a massive number of potential anti-hyper-glycemic treatment options in various permutations and combinations. Here, the authors present a causal deep learning approach for more personalized recommendations of treatment selection.

    • Chinmay Belthangady
    • , Stefanos Giampanis
    •  & Beau Norgeot
  • Article
    | Open Access

    The 1+ million publicly-available human –omics samples currently remain acutely underused. Here the authors present an approach combining natural language processing and machine learning to infer the source tissue of public genomics samples based on their plain text descriptions, making these samples easy to discover and reuse.

    • Nathaniel T. Hawkins
    • , Marc Maldaver
    •  & Arjun Krishnan
  • Article
    | Open Access

    Previous efforts to study the circadian clock using scRNA-seq have relied on time course designs that treat cell collection time as a proxy for circadian time. Here, the authors introduce a statistical method to infer circadian timing directly from expression, enabling researchers to study circadian phase heterogeneity.

    • Benjamin J. Auerbach
    • , Garret A. FitzGerald
    •  & Mingyao Li
  • Article
    | Open Access

    Safe clinical deployment of deep learning models for digital pathology requires reliable estimates of predictive uncertainty. Here the authors describe an algorithm for quantifying whole-slide image uncertainty, demonstrating their approach with models trained to distinguish lung cancer subtypes.

    • James M. Dolezal
    • , Andrew Srisuwananukorn
    •  & Alexander T. Pearson
  • Article
    | Open Access

    Biomarkers of age and frailty may aid in understanding the aging process, predicting lifespan or health span and in assessing the effects of anti-aging interventions. Here, the authors show that combining physics-based models and deep learning may enhance understanding of aging from big biomedical data, observe effects of anti-aging interventions in laboratory animals, and discover signatures of longevity.

    • Konstantin Avchaciov
    • , Marina P. Antoch
    •  & Peter O. Fedichev
  • Article
    | Open Access

    Current methods to reanalyze bulk RNA-seq at spatially resolved single-cell resolution have limitations. Here, the authors develop Bulk2Space, a spatial deconvolution algorithm using single-cell and spatial transcriptomics as references, providing new insights into spatial heterogeneity within bulk tissue.

    • Jie Liao
    • , Jingyang Qian
    •  & Xiaohui Fan
  • Article
    | Open Access

    Cryogenic electron tomography suffers from anisotropic resolution due to the missing-wedge problem. Here, the authors present IsoNet, a neural network that learn the feature representation from similar structures in the tomogram and recover the missing information for isotropic tomogram reconstruction.

    • Yun-Tao Liu
    • , Heng Zhang
    •  & Z. Hong Zhou
  • Article
    | Open Access

    Protein language models taking multiple sequence alignments as inputs capture protein structure and mutational effects. Here, the authors show that these models also encode phylogenetic relationships, and can disentangle correlations due to structural constraints from those due to phylogeny.

    • Umberto Lupo
    • , Damiano Sgarbossa
    •  & Anne-Florence Bitbol
  • Article
    | Open Access

    Bioactive peptides regulate many physiological functions but progress in discovering them has been slow. Here, the authors use a machine learning framework to predict mammalian peptide candidates from the global and local structure of large-scale tissue-specific mass spectrometry data.

    • Christian T. Madsen
    • , Jan C. Refsgaard
    •  & Ulrik de Lichtenberg
  • Article
    | Open Access

    The analysis of protein NMR spectra is time-consuming and can occupy a human expert for weeks or months. The researchers in this work present a deep learning-based method that delivers signal positions, chemical shift assignments, and structures of proteins within hours after completion of the NMR measurements.

    • Piotr Klukowski
    • , Roland Riek
    •  & Peter Güntert
  • Article
    | Open Access

    Artificial Intelligence can support diagnostic workflows in oncology, but they are vulnerable to adversarial attacks. Here, the authors show that convolutional neural networks are highly susceptible to white- and black-box adversarial attacks in clinically relevant classification tasks.

    • Narmin Ghaffari Laleh
    • , Daniel Truhn
    •  & Jakob Nikolas Kather
  • Article
    | Open Access

    The function of many microbial genes is yet unknown. Here the authors repurposed natural language processing algorithms to explore “gene semantics” and infer function for thousands of genes with defense and secretion systems found to have the most discovery potential.

    • Danielle Miller
    • , Adi Stern
    •  & David Burstein
  • Article
    | Open Access

    Predicting treatment response in cancer remains a highly complex task. Here, the authors develop Precily, a deep neural network framework to predict treatment response in cancer by considering gene expression, pathway activity estimates and drug features, and test this method in multiple datasets and preclinical models.

    • Smriti Chawla
    • , Anja Rockstroh
    •  & Debarka Sengupta
  • Article
    | Open Access

    Mutations in RAS oncogenes and related pathways are frequent in lung cancers. Here, the authors derive a RAS gene expression signature and a machine learning classifier to predict drug response and clinical outcomes in lung adenocarcinoma and other solid tumours, with improved performance over KRAS mutations alone.

    • Philip East
    • , Gavin P. Kelly
    •  & Sophie de Carné Trécesson
  • Article
    | Open Access

    Single-cell gene expression data with positional information is critical to dissect mechanisms and architectures of multicellular organisms, but the potential is limited by the scalability of current data analysis strategies. Here the authors develop a highly scalable method, scGCO, to identify genes whose expression values form spatial patterns from spatial transcriptomics data.

    • Ke Zhang
    • , Wanwan Feng
    •  & Peng Wang