Machine learning | Nature Communications

Article
27 December 2022 | Open Access

Prediction of designer-recombinases for DNA editing with generative deep learning

Design of recombinases with new target sites is usually achieved through cycles of directed molecular evolution. Here the authors report Recombinase Generator, RecGen, an algorithm for generation of designer-recombinases; they perform experimental validation to show that this can predict recombinase sequences.

Lukas Theo Schmitt
, Maciej Paszkowski-Rogacz
& Frank Buchholz

Article
15 December 2022 | Open Access

Accuracy and data efficiency in deep learning models of protein expression

Synthetic biology often involves engineering microbial strains to express high-value proteins. Here the authors build deep learning predictors of protein expression from sequence that deliver accurate models with fewer data than previously assumed, helping to lower costs of model-driven strain design.

Evangelos-Marios Nikolados
, Arin Wongprommoon
& Diego A. Oyarzún

Article
15 December 2022 | Open Access

Estimating diagnostic uncertainty in artificial intelligence assisted pathology using conformal prediction

Artificial intelligence prediction accuracy can be reduced with new data. Here, the authors utilise conformal prediction to reduce incorrect predictions in histopathological analysis of prostate cancer biopsies.

Henrik Olsson
, Kimmo Kartasalo
& Martin Eklund

Article
15 December 2022 | Open Access

A unifying Bayesian framework for merging X-ray diffraction data

Observation of the chemical and conformational dynamics of biomolecules by diffraction methods is impeded by several physical artifacts. The authors present an extensible framework for accurate correction of such data that can keep pace with rapid developments in diffraction methods.

Kevin M. Dalton
, Jack B. Greisman
& Doeke R. Hekstra

Article
13 December 2022 | Open Access

Clustering of single-cell multi-omics data with a multimodal deep learning method

Single-cell multimodal sequencing technologies are developed to simultaneously profile different modalities of data in the same cell. Here the authors develops a multimodal deep clustering method for the analysis of single-cell multi-omics data that supports clustering different types of multi-omics data and multi-batch data, as well as downstream differential expression analysis.

Xiang Lin
, Tian Tian
& Hakon Hakonarson

Article
13 December 2022 | Open Access

Tumor fractions deciphered from circulating cell-free DNA methylation for cancer early diagnosis

‘Circulating cell-free DNA can be used to predict cancer, but it is more challenging to assess in early stage cancer. Here, the authors created a diagnostic model using tumor fractions deciphered from circulating cfDNA methylation signatures, which exhibited an 86% sensitivity in detecting early-stage cancer.

Xiao Zhou
, Zhen Cheng
& Weibin Cheng

Article
12 December 2022 | Open Access

Deep transfer learning enables lesion tracing of circulating tumor cells

Liquid biopsy offers great promise for noninvasive cancer diagnostics, while the lack of adequate target characterization and analysis hinders its wide application. Here, the authors design a transfer learning-based algorithm to transfer lesion labels from the primary cancer cell atlas to circulating tumor cells.

Xiaoxu Guo
, Fanghe Lin
& Jia Song

Article
07 December 2022 | Open Access

An in silico method to assess antibody fragment polyreactivity

Off-target binding hinders the development of therapeutic antibodies and reproducibility in basic research settings. Here the authors develop a method to quantify and reduce the polyreactivity of antibody fragments based on protein sequence alone.

Edward P. Harvey
, Jung-Eun Shin
& Andrew C. Kruse

Article
03 December 2022 | Open Access

A framework for clinical cancer subtyping from nucleosome profiling of cell-free DNA

Nucleosome profiling from cell-free DNA (cfDNA) represents a potential approach for cancer detection and classification. Here, the authors develop Griffin, a computational framework for tumour subtype classification based on cfDNA nucleosome profiling that can work with ultra-low pass sequencing data.

Anna-Lisa Doebley
, Minjeong Ko
& Gavin Ha

Article
03 December 2022 | Open Access

Graph-based autoencoder integrates spatial transcriptomics with chromatin images and identifies joint biomarkers for Alzheimer’s disease

Methods for jointly analysing the different spatial data modalities in 3D are lacking. Here the authors report the computational framework STACI (Spatial Transcriptomic data using over-parameterized graph-based Autoencoders with Chromatin Imaging data) which they apply to an Alzheimer’s disease mouse model.

Xinyi Zhang
, Xiao Wang
& Caroline Uhler

Article
02 December 2022 | Open Access

Reference panel guided topological structure annotation of Hi-C data

Predicting topological structures from Hi-C data provides insight into comprehending gene expression and regulation. Here, the authors present RefHiC, an attention-based deep learning framework that leverages a reference panel of Hi-C datasets to assist topological structure annotation from a given study sample.

Yanlin Zhang
& Mathieu Blanchette

Article
01 December 2022 | Open Access

A unified computational framework for single-cell data integration with optimal transport

Integrating heterogeneous single-cell multi-omics as well as spatially resolved transcriptomic data remains a major challenge. Here the authors report a unified single-cell data integration framework using an unbalanced optimal transport-based deep network.

Kai Cao
, Qiyu Gong
& Lin Wan

Article
30 November 2022 | Open Access

Analysis of the first genetic engineering attribution challenge

Identifying the designers of engineered biological sequences would help promote biotechnological innovation while holding designers accountable. Here the authors present the winners of a 2020 data-science competition which improved on previous attempts to attribute plasmid sequences.

Oliver M. Crook
, Kelsey Lane Warmbrod
& William J. Bradshaw

Article
28 November 2022 | Open Access

DNA methylation-based classification of sinonasal tumors

Sinonasal tumour diagnosis can be complicated by the heterogeneity of disease and classification systems. Here, the authors use machine learning to classify sinonasal undifferentiated carcinomas into 4 molecular classe with differences in differentiation state and clinical outcome.

Philipp Jurmeister
, Stefanie Glöß
& David Capper

Article
25 November 2022 | Open Access

A single-cell analysis reveals tumor heterogeneity and immune environment of acral melanoma

Studying the cell composition of acral melanoma at the single-cell level could provide some clues about its poor response to immunotherapy. Here, the authors analyse acral and cutaneous melanoma patient samples using single-cell RNA-sequencing, and reveal a severe immunosuppressive state in acral melanomas

Chao Zhang
, Hongru Shen
& Jilong Yang

Article
21 November 2022 | Open Access

DeepPROTACs is a deep learning-based targeted degradation predictor for PROTACs

The rational design of PROTACs is difficult due to their obscure structure-activity relationship. Here the authors present a deep neural network model - DeepPROTACs - for predicting the degradation capacity of a proposed PROTAC molecule.

Fenglei Li
, Qiaoyu Hu
& Fang Bai

Article
21 November 2022 | Open Access

Leveraging data-driven self-consistency for high-fidelity gene expression recovery

Recovering dropout-affected gene expression values is a challenging problem in bioinformatics. Here, the authors propose a data-driven framework, that first learns the underlying data distribution and then recovers the expression values by imposing a self-consistency on the expression matrix.

Md Tauhidul Islam
, Jen-Yeu Wang
& Lei Xing

Article
19 November 2022 | Open Access

Deep learning to decompose macromolecules into independent Markovian domains

Modeling the dynamics of large proteins reveals a fundamental scaling problem. Here, the authors tackle this challenge by decomposing a large system into smaller independent subsystems, simultaneously modeling each subsystem’s kinetics and ensuring their mutual independence.

Andreas Mardt
, Tim Hempel
& Frank Noé

Comment
18 November 2022 | Open Access

Developing medical imaging AI for emerging infectious diseases

Very few of the COVID-19 ML models were fit for deployment in real-world settings. In this Comment, Huang et al. discuss the main steps required to develop clinically useful models in the context of an emerging infectious disease.

Shih-Cheng Huang
, Akshay S. Chaudhari
& Matthew P. Lungren

Article
15 November 2022 | Open Access

Prediction of inter-chain distance maps of protein complexes with 2D attention-based deep neural networks

Predicting inter-chain residue-residue distances of protein complexes is useful for constructing and evaluating quaternary structures of the protein complexes. Here, the authors develop a deep attention-based residual network method (CDPred) to predict inter-chain residue-residue distances of protein dimers.

Zhiye Guo
, Jian Liu
& Jianlin Cheng

Article
14 November 2022 | Open Access

Causal deep learning reveals the comparative effectiveness of antihyperglycemic treatments in poorly controlled diabetes

Current treatment guidelines for Type-2 diabetes endorse a massive number of potential anti-hyper-glycemic treatment options in various permutations and combinations. Here, the authors present a causal deep learning approach for more personalized recommendations of treatment selection.

Chinmay Belthangady
, Stefanos Giampanis
& Beau Norgeot

Article
09 November 2022 | Open Access

A formal validation of a deep learning-based automated workflow for the interpretation of the echocardiogram

Deep learning can automate the interpretation of medical imaging tests. Here, the authors formally assess the interchangeability of deep learning algorithms with expert human measurements for interpreting echocardiographic studies.

Jasper Tromp
, David Bauer
& Scott D. Solomon

Article
08 November 2022 | Open Access

Systematic tissue annotations of genomics samples by modeling unstructured metadata

The 1+ million publicly-available human –omics samples currently remain acutely underused. Here the authors present an approach combining natural language processing and machine learning to infer the source tissue of public genomics samples based on their plain text descriptions, making these samples easy to discover and reuse.

Nathaniel T. Hawkins
, Marc Maldaver
& Arjun Krishnan

Article
08 November 2022 | Open Access

Deep learning-based image analysis predicts PD-L1 status from H&E-stained histopathology images in breast cancer

Programmed death ligand-1 (PD-L1) has been recently adopted for breast cancer as a predictive biomarker for immunotherapies. Here, the authors show that PD-L1 expression can be predicted from H&E-stained images using deep learning.

Gil Shamai
, Amir Livne
& Ron Kimmel

Article
05 November 2022 | Open Access

Learning the histone codes with large genomic windows and three-dimensional chromatin interactions using transformer

Existing deep learning-based approaches for the prediction of gene expression by histone modifications (HMs) can only focus on narrow and linear genomic regions around promoters. Here, the authors address these problems by developing a transformer-based deep learning architecture named Chromoformer.

Dohoon Lee
, Jeewon Yang
& Sun Kim

Article
02 November 2022 | Open Access

Tempo: an unsupervised Bayesian algorithm for circadian phase inference in single-cell transcriptomics

Previous efforts to study the circadian clock using scRNA-seq have relied on time course designs that treat cell collection time as a proxy for circadian time. Here, the authors introduce a statistical method to infer circadian timing directly from expression, enabling researchers to study circadian phase heterogeneity.

Benjamin J. Auerbach
, Garret A. FitzGerald
& Mingyao Li

Article
02 November 2022 | Open Access

Uncertainty-informed deep learning models enable high-confidence predictions for digital histopathology

Safe clinical deployment of deep learning models for digital pathology requires reliable estimates of predictive uncertainty. Here the authors describe an algorithm for quantifying whole-slide image uncertainty, demonstrating their approach with models trained to distinguish lung cancer subtypes.

James M. Dolezal
, Andrew Srisuwananukorn
& Alexander T. Pearson

Article
02 November 2022 | Open Access

Deep learning empowered volume delineation of whole-body organs-at-risk for accelerated radiotherapy

Volume delineation of organs-at risk (OARs) and target tumors is an indispensable process for creating radiotherapy treatment planning. Herein, the authors propose a lightweight deep learning framework to empower the rapid and precise volume delineation of whole-body OARs and target tumors.

Feng Shi
, Weigang Hu
& Dinggang Shen

Article
01 November 2022 | Open Access

Unsupervised learning of aging principles from longitudinal data

Biomarkers of age and frailty may aid in understanding the aging process, predicting lifespan or health span and in assessing the effects of anti-aging interventions. Here, the authors show that combining physics-based models and deep learning may enhance understanding of aging from big biomedical data, observe effects of anti-aging interventions in laboratory animals, and discover signatures of longevity.

Konstantin Avchaciov
, Marina P. Antoch
& Peter O. Fedichev

Article
30 October 2022 | Open Access

De novo analysis of bulk RNA-seq data at spatially resolved single-cell resolution

Current methods to reanalyze bulk RNA-seq at spatially resolved single-cell resolution have limitations. Here, the authors develop Bulk2Space, a spatial deconvolution algorithm using single-cell and spatial transcriptomics as references, providing new insights into spatial heterogeneity within bulk tissue.

Jie Liao
, Jingyang Qian
& Xiaohui Fan

Article
29 October 2022 | Open Access

Isotropic reconstruction for electron tomography with deep learning

Cryogenic electron tomography suffers from anisotropic resolution due to the missing-wedge problem. Here, the authors present IsoNet, a neural network that learn the feature representation from similar structures in the tomogram and recover the missing information for isotropic tomogram reconstruction.

Yun-Tao Liu
, Heng Zhang
& Z. Hong Zhou

Article
22 October 2022 | Open Access

Protein language models trained on multiple sequence alignments learn phylogenetic relationships

Protein language models taking multiple sequence alignments as inputs capture protein structure and mutational effects. Here, the authors show that these models also encode phylogenetic relationships, and can disentangle correlations due to structural constraints from those due to phylogeny.

Umberto Lupo
, Damiano Sgarbossa
& Anne-Florence Bitbol

Article
20 October 2022 | Open Access

Combining mass spectrometry and machine learning to discover bioactive peptides

Bioactive peptides regulate many physiological functions but progress in discovering them has been slow. Here, the authors use a machine learning framework to predict mammalian peptide candidates from the global and local structure of large-scale tissue-specific mass spectrometry data.

Christian T. Madsen
, Jan C. Refsgaard
& Ulrik de Lichtenberg

Article
18 October 2022 | Open Access

Rapid protein assignments and structures from raw NMR spectra with the deep learning technique ARTINA

The analysis of protein NMR spectra is time-consuming and can occupy a human expert for weeks or months. The researchers in this work present a deep learning-based method that delivers signal positions, chemical shift assignments, and structures of proteins within hours after completion of the NMR measurements.

Piotr Klukowski
, Roland Riek
& Peter Güntert

Article
17 October 2022 | Open Access

Comprehensive and clinically accurate head and neck cancer organs-at-risk delineation on a multi-institutional study

Accurate organ at risk (OAR) segmentation is critical to reduce the radiotherapy post-treatment complications. Here, the authors develop an automated OAR segmentation system to delineate a comprehensive set of 42 H&N OARs.

Xianghua Ye
, Dazhou Guo
& Tsung-Ying Ho

Article
06 October 2022 | Open Access

Using domain knowledge for robust and generalizable deep learning-based CT-free PET attenuation and scatter correction

Deep learning-based methods have been proposed to substitute CT-based PET attenuation and scatter correction to achieve CT-free PET imaging. Here, the authors present a simple way to integrate domain knowledge in deep learning for CT-free PET imaging.

Rui Guo
, Song Xue
& Kuangyu Shi

Article
29 September 2022 | Open Access

Adversarial attacks and adversarial robustness in computational pathology

Artificial Intelligence can support diagnostic workflows in oncology, but they are vulnerable to adversarial attacks. Here, the authors show that convolutional neural networks are highly susceptible to white- and black-box adversarial attacks in clinically relevant classification tasks.

Narmin Ghaffari Laleh
, Daniel Truhn
& Jakob Nikolas Kather

Article
29 September 2022 | Open Access

Deciphering microbial gene function using natural language processing

The function of many microbial genes is yet unknown. Here the authors repurposed natural language processing algorithms to explore “gene semantics” and infer function for thousands of genes with defense and secretion systems found to have the most discovery potential.

Danielle Miller
, Adi Stern
& David Burstein

Article
27 September 2022 | Open Access

Gene expression based inference of cancer drug sensitivity

Predicting treatment response in cancer remains a highly complex task. Here, the authors develop Precily, a deep neural network framework to predict treatment response in cancer by considering gene expression, pathway activity estimates and drug features, and test this method in multiple datasets and preclinical models.

Smriti Chawla
, Anja Rockstroh
& Debarka Sengupta

Article
26 September 2022 | Open Access

RAS oncogenic activity predicts response to chemotherapy and outcome in lung adenocarcinoma

Mutations in RAS oncogenes and related pathways are frequent in lung cancers. Here, the authors derive a RAS gene expression signature and a machine learning classifier to predict drug response and clinical outcomes in lung adenocarcinoma and other solid tumours, with improved performance over KRAS mutations alone.

Philip East
, Gavin P. Kelly
& Sophie de Carné Trécesson

Article
19 September 2022 | Open Access

Identification of spatially variable genes with graph cuts

Single-cell gene expression data with positional information is critical to dissect mechanisms and architectures of multicellular organisms, but the potential is limited by the scalability of current data analysis strategies. Here the authors develop a highly scalable method, scGCO, to identify genes whose expression values form spatial patterns from spatial transcriptomics data.

Ke Zhang
, Wanwan Feng
& Peng Wang

Article
19 September 2022 | Open Access

Mutated processes predict immune checkpoint inhibitor therapy benefit in metastatic melanoma

Tumour mutational burden is a biomarker of immune checkpoint inhibitor response, but their association is not fully understood. Here, the authors train classifiers to identify key mutated processes which show stable predictive performance in multiple melanoma cohorts.

Andrew Patterson
& Noam Auslander

Matters Arising
12 September 2022 | Open Access

Machine-learning prediction of hosts of novel coronaviruses requires caution as it may affect wildlife conservation

Sophie Lund Rasmussen
, Cino Pertoldi
& David W. Macdonald

Matters Arising
12 September 2022 | Open Access

Reply to: Machine-learning prediction of hosts of novel coronaviruses requires caution as it may affect wildlife conservation

Marcus S. C. Blagrove
, Matthew Baylis
& Maya Wardeh

Article
09 September 2022 | Open Access

Enhanced detection of threat materials by dark-field x-ray imaging combined with deep neural networks

Dark-field X-ray imaging is sensitive to the microstructure of a material. Here, the authors combine this with a neural network algorithm to provide efficient material discrimination, e.g., of explosives vs non-threat materials.

T. Partridge
, A. Astolfo
& A. Olivo

Article
09 September 2022 | Open Access

Integrating and formatting biomedical data as pre-calculated knowledge graph embeddings in the Bioteque

Biomedical data is accumulating at a fast pace and integrating it into a unified framework is a major challenge. Here, the authors present a resource that contains pre-calculated biomedical descriptors derived from a very large knowledge graph.

Adrià Fernández-Torras
, Miquel Duran-Frigola
& Patrick Aloy

Article
09 September 2022 | Open Access

Traject3d allows label-free identification of distinct co-occurring phenotypes within 3D culture by live imaging

There are currently a lack of tools to detect heterogeneity in 3D cultures. Here the authors report Traject3d as a framework to identify heterogeneous states in 3D culture and to understand how these give rise to distinct phenotypes using label-free multi-day time-lapse imaging.

Eva C. Freckmann
, Emma Sandilands
& David M. Bryant

Article
07 September 2022 | Open Access

devCellPy is a machine learning-enabled pipeline for automated annotation of complex multilayered single-cell transcriptomic data

A major informatic challenge in single cell RNA-sequencing analysis is the precise annotation of datasets where cells exhibit complex multilayered identities or transitory states. Here the authors present devCellPy, a Python-based package that enables the automated prediction of cell types across complex cellular hierarchies, species, and experimental systems with high accuracy, particularly for developmental scRNA-seq datasets.

Francisco X. Galdos
, Sidra Xu
& Sean M. Wu

Article
06 September 2022 | Open Access

Accounting for small variations in the tracrRNA sequence improves sgRNA activity predictions for CRISPR screening

Existing methods for generating sgRNA predictions do not account for the tracrRNA sequence. Here the authors report an on-target model, Rule Set 3, to generate optimal predictions for multiple tracrRNA variants, and validate this on a new dataset of sgRNAs showing improvement over prior prediction models.

Peter C. DeWeirdt
, Abby V. McGee
& John G. Doench

Article
02 September 2022 | Open Access

Automated model-predictive design of synthetic promoters to control transcriptional profiles in bacteria

Transcription rates are regulated by the interactions between RNA polymerase, sigma factor, and promoter DNA sequences in bacteria. Here the authors combine massively parallel experiments & machine learning to develop a predictive biophysical model of transcription, validated across 22132 bacterial promoters, and apply it to the design and debugging of genetic circuits.

Travis L. LaFleur
, Ayaan Hossain
& Howard M. Salis

Machine learning articles within Nature Communications

Featured

Browse broader subjects

Search

Quick links