Machine learning articles within Nature Communications

Featured

  • Article
    | Open Access

    Nucleosome profiling from cell-free DNA (cfDNA) represents a potential approach for cancer detection and classification. Here, the authors develop Griffin, a computational framework for tumour subtype classification based on cfDNA nucleosome profiling that can work with ultra-low pass sequencing data.

    • Anna-Lisa Doebley
    • , Minjeong Ko
    •  & Gavin Ha
  • Article
    | Open Access

    Methods for jointly analysing the different spatial data modalities in 3D are lacking. Here the authors report the computational framework STACI (Spatial Transcriptomic data using over-parameterized graph-based Autoencoders with Chromatin Imaging data) which they apply to an Alzheimer’s disease mouse model.

    • Xinyi Zhang
    • , Xiao Wang
    •  & Caroline Uhler
  • Article
    | Open Access

    Predicting topological structures from Hi-C data provides insight into comprehending gene expression and regulation. Here, the authors present RefHiC, an attention-based deep learning framework that leverages a reference panel of Hi-C datasets to assist topological structure annotation from a given study sample.

    • Yanlin Zhang
    •  & Mathieu Blanchette
  • Article
    | Open Access

    Identifying the designers of engineered biological sequences would help promote biotechnological innovation while holding designers accountable. Here the authors present the winners of a 2020 data-science competition which improved on previous attempts to attribute plasmid sequences.

    • Oliver M. Crook
    • , Kelsey Lane Warmbrod
    •  & William J. Bradshaw
  • Article
    | Open Access

    Sinonasal tumour diagnosis can be complicated by the heterogeneity of disease and classification systems. Here, the authors use machine learning to classify sinonasal undifferentiated carcinomas into 4 molecular classe with differences in differentiation state and clinical outcome.

    • Philipp Jurmeister
    • , Stefanie Glöß
    •  & David Capper
  • Article
    | Open Access

    Studying the cell composition of acral melanoma at the single-cell level could provide some clues about its poor response to immunotherapy. Here, the authors analyse acral and cutaneous melanoma patient samples using single-cell RNA-sequencing, and reveal a severe immunosuppressive state in acral melanomas

    • Chao Zhang
    • , Hongru Shen
    •  & Jilong Yang
  • Article
    | Open Access

    Recovering dropout-affected gene expression values is a challenging problem in bioinformatics. Here, the authors propose a data-driven framework, that first learns the underlying data distribution and then recovers the expression values by imposing a self-consistency on the expression matrix.

    • Md Tauhidul Islam
    • , Jen-Yeu Wang
    •  & Lei Xing
  • Article
    | Open Access

    Modeling the dynamics of large proteins reveals a fundamental scaling problem. Here, the authors tackle this challenge by decomposing a large system into smaller independent subsystems, simultaneously modeling each subsystem’s kinetics and ensuring their mutual independence.

    • Andreas Mardt
    • , Tim Hempel
    •  & Frank Noé
  • Comment
    | Open Access

    Very few of the COVID-19 ML models were fit for deployment in real-world settings. In this Comment, Huang et al. discuss the main steps required to develop clinically useful models in the context of an emerging infectious disease.

    • Shih-Cheng Huang
    • , Akshay S. Chaudhari
    •  & Matthew P. Lungren
  • Article
    | Open Access

    Current treatment guidelines for Type-2 diabetes endorse a massive number of potential anti-hyper-glycemic treatment options in various permutations and combinations. Here, the authors present a causal deep learning approach for more personalized recommendations of treatment selection.

    • Chinmay Belthangady
    • , Stefanos Giampanis
    •  & Beau Norgeot
  • Article
    | Open Access

    The 1+ million publicly-available human –omics samples currently remain acutely underused. Here the authors present an approach combining natural language processing and machine learning to infer the source tissue of public genomics samples based on their plain text descriptions, making these samples easy to discover and reuse.

    • Nathaniel T. Hawkins
    • , Marc Maldaver
    •  & Arjun Krishnan
  • Article
    | Open Access

    Previous efforts to study the circadian clock using scRNA-seq have relied on time course designs that treat cell collection time as a proxy for circadian time. Here, the authors introduce a statistical method to infer circadian timing directly from expression, enabling researchers to study circadian phase heterogeneity.

    • Benjamin J. Auerbach
    • , Garret A. FitzGerald
    •  & Mingyao Li
  • Article
    | Open Access

    Safe clinical deployment of deep learning models for digital pathology requires reliable estimates of predictive uncertainty. Here the authors describe an algorithm for quantifying whole-slide image uncertainty, demonstrating their approach with models trained to distinguish lung cancer subtypes.

    • James M. Dolezal
    • , Andrew Srisuwananukorn
    •  & Alexander T. Pearson
  • Article
    | Open Access

    Biomarkers of age and frailty may aid in understanding the aging process, predicting lifespan or health span and in assessing the effects of anti-aging interventions. Here, the authors show that combining physics-based models and deep learning may enhance understanding of aging from big biomedical data, observe effects of anti-aging interventions in laboratory animals, and discover signatures of longevity.

    • Konstantin Avchaciov
    • , Marina P. Antoch
    •  & Peter O. Fedichev
  • Article
    | Open Access

    Current methods to reanalyze bulk RNA-seq at spatially resolved single-cell resolution have limitations. Here, the authors develop Bulk2Space, a spatial deconvolution algorithm using single-cell and spatial transcriptomics as references, providing new insights into spatial heterogeneity within bulk tissue.

    • Jie Liao
    • , Jingyang Qian
    •  & Xiaohui Fan
  • Article
    | Open Access

    Cryogenic electron tomography suffers from anisotropic resolution due to the missing-wedge problem. Here, the authors present IsoNet, a neural network that learn the feature representation from similar structures in the tomogram and recover the missing information for isotropic tomogram reconstruction.

    • Yun-Tao Liu
    • , Heng Zhang
    •  & Z. Hong Zhou
  • Article
    | Open Access

    Protein language models taking multiple sequence alignments as inputs capture protein structure and mutational effects. Here, the authors show that these models also encode phylogenetic relationships, and can disentangle correlations due to structural constraints from those due to phylogeny.

    • Umberto Lupo
    • , Damiano Sgarbossa
    •  & Anne-Florence Bitbol
  • Article
    | Open Access

    Bioactive peptides regulate many physiological functions but progress in discovering them has been slow. Here, the authors use a machine learning framework to predict mammalian peptide candidates from the global and local structure of large-scale tissue-specific mass spectrometry data.

    • Christian T. Madsen
    • , Jan C. Refsgaard
    •  & Ulrik de Lichtenberg
  • Article
    | Open Access

    The analysis of protein NMR spectra is time-consuming and can occupy a human expert for weeks or months. The researchers in this work present a deep learning-based method that delivers signal positions, chemical shift assignments, and structures of proteins within hours after completion of the NMR measurements.

    • Piotr Klukowski
    • , Roland Riek
    •  & Peter Güntert
  • Article
    | Open Access

    Artificial Intelligence can support diagnostic workflows in oncology, but they are vulnerable to adversarial attacks. Here, the authors show that convolutional neural networks are highly susceptible to white- and black-box adversarial attacks in clinically relevant classification tasks.

    • Narmin Ghaffari Laleh
    • , Daniel Truhn
    •  & Jakob Nikolas Kather
  • Article
    | Open Access

    The function of many microbial genes is yet unknown. Here the authors repurposed natural language processing algorithms to explore “gene semantics” and infer function for thousands of genes with defense and secretion systems found to have the most discovery potential.

    • Danielle Miller
    • , Adi Stern
    •  & David Burstein
  • Article
    | Open Access

    Predicting treatment response in cancer remains a highly complex task. Here, the authors develop Precily, a deep neural network framework to predict treatment response in cancer by considering gene expression, pathway activity estimates and drug features, and test this method in multiple datasets and preclinical models.

    • Smriti Chawla
    • , Anja Rockstroh
    •  & Debarka Sengupta
  • Article
    | Open Access

    Mutations in RAS oncogenes and related pathways are frequent in lung cancers. Here, the authors derive a RAS gene expression signature and a machine learning classifier to predict drug response and clinical outcomes in lung adenocarcinoma and other solid tumours, with improved performance over KRAS mutations alone.

    • Philip East
    • , Gavin P. Kelly
    •  & Sophie de Carné Trécesson
  • Article
    | Open Access

    Single-cell gene expression data with positional information is critical to dissect mechanisms and architectures of multicellular organisms, but the potential is limited by the scalability of current data analysis strategies. Here the authors develop a highly scalable method, scGCO, to identify genes whose expression values form spatial patterns from spatial transcriptomics data.

    • Ke Zhang
    • , Wanwan Feng
    •  & Peng Wang
  • Article
    | Open Access

    A major informatic challenge in single cell RNA-sequencing analysis is the precise annotation of datasets where cells exhibit complex multilayered identities or transitory states. Here the authors present devCellPy, a Python-based package that enables the automated prediction of cell types across complex cellular hierarchies, species, and experimental systems with high accuracy, particularly for developmental scRNA-seq datasets.

    • Francisco X. Galdos
    • , Sidra Xu
    •  & Sean M. Wu
  • Article
    | Open Access

    Existing methods for generating sgRNA predictions do not account for the tracrRNA sequence. Here the authors report an on-target model, Rule Set 3, to generate optimal predictions for multiple tracrRNA variants, and validate this on a new dataset of sgRNAs showing improvement over prior prediction models.

    • Peter C. DeWeirdt
    • , Abby V. McGee
    •  & John G. Doench
  • Article
    | Open Access

    Transcription rates are regulated by the interactions between RNA polymerase, sigma factor, and promoter DNA sequences in bacteria. Here the authors combine massively parallel experiments & machine learning to develop a predictive biophysical model of transcription, validated across 22132 bacterial promoters, and apply it to the design and debugging of genetic circuits.

    • Travis L. LaFleur
    • , Ayaan Hossain
    •  & Howard M. Salis
  • Article
    | Open Access

    Design of de novo synthetic regulatory DNA is a promising avenue to control gene expression in biotechnology and medicine. Here the authors present EspressionGAN, a generative adversarial network that uses genomic and transcriptomic data to generate regulatory sequences.

    • Jan Zrimec
    • , Xiaozhi Fu
    •  & Aleksej Zelezniak
  • Article
    | Open Access

    Detection of mutational signatures in cell-free DNA (cfDNA) is challenging due to low sequence coverage and low mutant allele fractions. Here, the authors identify mutational signatures in plasma whole genome sequencing of cancer patients and use machine learning to distinguish them from healthy individuals.

    • Jonathan C. M. Wan
    • , Dennis Stephens
    •  & Luis A. Diaz Jr.
  • Comment
    | Open Access

    A plethora of work has shown that AI systems can systematically and unfairly be biased against certain populations in multiple scenarios. The field of medical imaging, where AI systems are beginning to be increasingly adopted, is no exception. Here we discuss the meaning of fairness in this area and comment on the potential sources of biases, as well as the strategies available to mitigate them. Finally, we analyze the current state of the field, identifying strengths and highlighting areas of vacancy, challenges and opportunities that lie ahead.

    • María Agustina Ricci Lara
    • , Rodrigo Echeveste
    •  & Enzo Ferrante
  • Article
    | Open Access

    Protein design aims to build novel proteins customized for specific purposes, thereby holding the potential to tackle many environmental and biomedical problems. Here the authors apply some of the latest advances in natural language processing, generative Transformers, to train ProtGPT2, a language model that explores unseen regions of the protein space while designing proteins with nature-like properties.

    • Noelia Ferruz
    • , Steffen Schmidt
    •  & Birte Höcker
  • Article
    | Open Access

    Small molecule kinase inhibitors (SMKIs) are being approved at a fast pace under expedited programs for anticancer treatment. Here, the authors employ a machine-learning model to examine the relationships between kinase targets and adverse events in the trials of 16 FDA-approved SMKIs.

    • Xiajing Gong
    • , Meng Hu
    •  & Liang Zhao
  • Article
    | Open Access

    Triage is essential for the early diagnosis and reporting of emergency patients in the emergency department. Here, the authors develop an anomaly detection algorithm with a deep generative model that reprioritizes radiology worklists and provides lesion attention maps for brain CT images with critical findings.

    • Seungjun Lee
    • , Boryeong Jeong
    •  & Namkug Kim