Computational biology and bioinformatics articles within Nature Communications

Featured

  • Article
    | Open Access

    Bacterial microcompartments (BMCs) are organelles consisting of a protein shell in which certain metabolic reactions take place separated from the cytoplasm. Here, Sutter et al. present a comprehensive catalog of BMC loci, substantially expanding the number of known BMCs and describing distinct types and compartmentalized reactions.

    • Markus Sutter
    • , Matthew R. Melnicki
    •  & Cheryl A. Kerfeld
  • Article
    | Open Access

    Many proteins exist in various proteoforms but detecting these variants by bottom-up proteomics remains difficult. Here, the authors present a computational approach based on peptide correlation analysis to identify and characterize proteoforms from bottom-up proteomics data.

    • Isabell Bludau
    • , Max Frank
    •  & Ruedi Aebersold
  • Article
    | Open Access

    Keratitis is the main cause of corneal blindness worldwide, but most vision loss caused by keratitis can be avoidable via early detection and treatment, which are challenging in resource-limited settings. Here, the authors develop a deep learning system for the automated classification of keratitis and other cornea abnormalities.

    • Zhongwen Li
    • , Jiewei Jiang
    •  & Wei Chen
  • Article
    | Open Access

    α-Synuclein (αS) aggregation is a driver of several neurodegenerative disorders. Here, the authors identify a class of peptides that bind toxic αS oligomers and amyloid fibrils but not monomeric functional protein, and prevent further αS aggregation and associated cell damage.

    • Jaime Santos
    • , Pablo Gracia
    •  & Salvador Ventura
  • Article
    | Open Access

    The authors generate the largest structural dataset of enzymatic and non-enzymatic metalloprotein sites to date. They use this dataset to train a decision-tree ensemble machine learning algorithm that allows them to distinguish between catalytic and non-catalytic metal sites. The computational model described here could also be useful for the identification of new enzymatic mechanisms and de novo enzyme design.

    • Ryan Feehan
    • , Meghan W. Franklin
    •  & Joanna S. G. Slusky
  • Article
    | Open Access

    Single cell RNA-seq loses spatial information of gene expression in multicellular systems because tissue must be dissociated. Here, the authors show the spatial gene expression profiles can be both accurately and robustly reconstructed by a new computational method using a generative linear mapping, Perler.

    • Yasushi Okochi
    • , Shunta Sakaguchi
    •  & Honda Naoki
  • Article
    | Open Access

    A large number of mass spectra from different samples have been collected, and to identify small molecules from these spectra, database searches are needed, which is challenging. Here, the authors report molDiscovery, a mass spectral database search method that uses an algorithm to generate mass spectrometry fragmentations and learns a probabilistic model to match small molecules with their mass spectra.

    • Liu Cao
    • , Mustafa Guler
    •  & Hosein Mohimani
  • Article
    | Open Access

    Despite the consensus that mass vaccination against SARS-CoV-2 will ultimately end the pandemic, it is not clear when and which control measures can be relaxed during the rollout of vaccination programmes. Here, the authors investigate relaxation scenarios using an age-structured transmission model that has been fitted to data for Portugal.

    • João Viana
    • , Christiaan H. van Dorp
    •  & Ganna Rozhnova
  • Article
    | Open Access

    People can infer unobserved causes of perceptual data (e.g. the contents of a box from the sound made by shaking it). Here the authors show that children compare what they hear with what they would have heard given other causes, and explore longer when the heard and imagined sounds are hard to discriminate.

    • Max H. Siegel
    • , Rachel W. Magid
    •  & Laura E. Schulz
  • Article
    | Open Access

    Directed evolution commonly relies on point mutations but InDels frequently occur in evolution. Here the authors report a protein-engineering framework based on InDel mutagenesis and fragment transplantation resulting in greater catalysis and longer glow-type bioluminescence of the ancestral luciferase.

    • Andrea Schenkmayerova
    • , Gaspar P. Pinto
    •  & Jiri Damborsky
  • Article
    | Open Access

    Here, the authors use simulated quantitative gut microbial communities to benchmark the performance of 13 common data transformations in determining diversity as well as microbe-microbe and microbe-metadata associations, finding that quantitative approaches incorporating microbial load variation outperform computational strategies in downstream analyses, urging for a widespread adoption of quantitative approaches, or recommending specific computational transformations whenever determination of microbial load of samples is not feasible.

    • Verónica Lloréns-Rico
    • , Sara Vieira-Silva
    •  & Jeroen Raes
  • Article
    | Open Access

    Cross-linking mass spectrometry (MS) can identify protein-protein interaction (PPI) networks but assessing the reliability of these data remains challenging. To address this issue, the authors develop and validate a method to determine the false-discovery rate of PPIs identified by cross-linking MS.

    • Swantje Lenz
    • , Ludwig R. Sinn
    •  & Juri Rappsilber
  • Article
    | Open Access

    Disentangling the impacts of non-pharmaceutical interventions on COVID-19 transmission is challenging as they have been used in different combinations across time and space. This study shows that, early in the epidemic, school/daycare closures and stopping nursing home visits were associated with the biggest reduction in transmission in the United States.

    • Bingyi Yang
    • , Angkana T. Huang
    •  & Derek A. T. Cummings
  • Article
    | Open Access

    Population-based surveys are the gold standard for estimating seroprevalence but are expensive and often only capture a small geographic area or window of time. This study describes a new platform, SCALE-IT, for serosurveillance based on algorithmic sampling of electronic health records, and uses it to estimate the seroprevalence of SARS-CoV-2 in San Francisco.

    • Isobel Routledge
    • , Adrienne Epstein
    •  & Isabel Rodriguez-Barraquer
  • Article
    | Open Access

    Systemic light chain amyloidosis (AL) is caused by the production of toxic light chains and can be fatal, yet effective treatments are often not possible due to delayed diagnosis. Here the authors show that a machine learning platform analyzing light chain somatic mutations allows the prediction of light chain toxicity to serve as a possible tool for early diagnosis of AL.

    • Maura Garofalo
    • , Luca Piccoli
    •  & Andrea Cavalli
  • Article
    | Open Access

    Despite considerable efforts, quantitative prediction of various molecular properties remains a challenge. Here, the authors propose an algebraic graph-assisted bidirectional transformer, which can incorporate massive unlabeled molecular data into molecular representations via a self-supervised learning strategy and assisted with 3D stereochemical information from graphs.

    • Dong Chen
    • , Kaifu Gao
    •  & Feng Pan
  • Article
    | Open Access

    Traditional methods to identify genomic regions identical-by-descent (IBD) do not scale well to biobank-level datasets. Here, the authors describe a new IBD algorithm, iLASH, which uses LocAlity-Sensitive Hashing to provide rapid IBD estimation when applied to the PAGE and UK Biobank datasets.

    • Ruhollah Shemirani
    • , Gillian M. Belbin
    •  & José Luis Ambite
  • Article
    | Open Access

    Inaccurate cell segmentation has been the major problem for cell-type identification and tissue characterization of the in situ spatially resolved transcriptomics data. Here we show a robust cell segmentation-free computational framework (SSAM), for identifying cell types and tissue domains in 2D and 3D.

    • Jeongbin Park
    • , Wonyl Choi
    •  & Naveed Ishaque
  • Article
    | Open Access

    Technical advancements have significantly improved early diagnosis of cervical cancer, but accurate diagnosis is still difficult due to various practical factors. Here, the authors develop an artificial intelligence assistive diagnostic solution to improve cervical liquid-based thin-layer cell smear diagnosis according to clinical TBS criteria in a large multicenter study.

    • Xiaohui Zhu
    • , Xiaoming Li
    •  & Yanqing Ding
  • Article
    | Open Access

    Non-coding RNA function is poorly understood, partly due to the challenge of determining RNA secondary (2D) structure. Here, the authors present a framework for the reproducible prediction and visualization of the 2D structure of a wide array of RNAs, which enables linking RNA sequence to function.

    • Blake A. Sweeney
    • , David Hoksza
    •  & Anton I. Petrov
  • Article
    | Open Access

    Regular exercise promotes overall health and prevents non-communicable diseases, but the adaptation mechanisms are unclear. Here, the authors perform a meta-analysis to reveal time-specific patterns of the acute and long-term exercise response in human skeletal muscle, and identify sex- and age-specific changes.

    • David Amar
    • , Malene E. Lindholm
    •  & Euan A. Ashley
  • Article
    | Open Access

    Recent advances in super-resolution microscopy have made it possible to measure chromatin 3D structure and transcription in thousands of single cells. Here, authors present a deep learning-based approach to characterise how chromatin structure relates to transcriptional state of individual cells and determine which structural features of chromatin regulation are important for gene expression state.

    • Aparna R. Rajpurkar
    • , Leslie J. Mateo
    •  & Alistair N. Boettiger
  • Article
    | Open Access

    Several existing algorithms predict the methylation of DNA using Nanopore sequencing signals, but it is unclear how they compare in performance. Here, the authors benchmark the performance of several such tools, and propose METEORE, a consensus tool that improves prediction accuracy.

    • Zaka Wing-Sze Yuen
    • , Akanksha Srivastava
    •  & Eduardo Eyras
  • Article
    | Open Access

    Our understanding of human disease can be improved by integrating the abundance of high throughput biomedical data. Here, the authors use deep learning methods successfully used on images to integrate various types of omics data to improve patient classification and identify disease biomarkers.

    • Tongxin Wang
    • , Wei Shao
    •  & Kun Huang
  • Article
    | Open Access

    Many job sectors classified as ‘essential’ have continued operating with limited restrictions during the COVID-19 pandemic, potentially placing workers at higher risk of infection. Here, the authors show that seropositivity rates in workers vary widely across and between job sectors in Geneva, Switzerland.

    • Silvia Stringhini
    • , María-Eugenia Zaballa
    •  & Idris Guessous
  • Article
    | Open Access

    The genome-wide investigation of chromatin organization enables insights into global gene expression control. Here, the authors present a computationally efficient method for the analysis of chromatin organization data and use it to recover principles of 3D organization across conditions.

    • Merve Sahin
    • , Wilfred Wong
    •  & Christina S. Leslie
  • Article
    | Open Access

    Allele-specific expression in diploid organisms can be quantified by RNA-seq and it is common practice to rely on a single library. Here, the authors show that the standard approach has variable error rate and present Qllelic as a tool to improve reproducibility of allele-specific RNA-seq analysis.

    • Asia Mendelevich
    • , Svetlana Vinogradova
    •  & Alexander A. Gimelbrant
  • Article
    | Open Access

    The identification of HLA peptides by mass spectrometry is non-trivial. Here, the authors extended and used the wealth of data from the ProteomeTools project to improve the prediction of non-tryptic peptides using deep learning, and show their approach enables a variety of immunological discoveries.

    • Mathias Wilhelm
    • , Daniel P. Zolg
    •  & Bernhard Kuster
  • Article
    | Open Access

    Single-cell proteomics can provide insights into the molecular basis for cellular heterogeneity. Here, the authors develop a multiplexed single-cell proteomics and computational workflow, and show that their strategy captures the cellular hierarchies in an Acute Myeloid Leukemia culture model.

    • Erwin M. Schoof
    • , Benjamin Furtwängler
    •  & Bo T. Porse
  • Article
    | Open Access

    Cyclic peptides are of particular interest due to their pharmacological properties, but their design for binding to a target protein is challenging. Here, the authors present a computational “anchor extension” methodology for de novo design of cyclic peptides that bind to the target protein with high affinity, and validate the approach by developing cyclic peptides that inhibit histone deacetylases 2 and 6.

    • Parisa Hosseinzadeh
    • , Paris R. Watson
    •  & David Baker
  • Article
    | Open Access

    Existing studies of the chromatin accessibility, the primary mark of regulatory DNA, in Arabidopsis are based mainly on bulk samples. Here, the authors report the regulatory landscape of Arabidopsis thaliana roots at single-cell resolution.

    • Michael W. Dorrity
    • , Cristina M. Alexandre
    •  & Josh T. Cuperus
  • Article
    | Open Access

    Sequencing methods such as icSHAPE were developed to probe RNA structures transcriptome-wide in cells. To probe intact RNA structures, the authors develop icSHAPE-MaP and apply to Dicer-bound substrates showing that distance measuring is important for Dicer cleavage of pre-miRNAs.

    • Qing-Jun Luo
    • , Jinsong Zhang
    •  & Qiangfeng Cliff Zhang
  • Article
    | Open Access

    RNA localization plays an important role in transcriptome regulation. The majority of TERT transcripts are detected in the nucleus and TUG1 lncRNAs in both the nucleus and cytoplasm. Here, the authors combine single-cell RNA imaging, antisense oligonucleotides and splicing analyses to show that retention of specific introns drives stable compartmentalization of TERT and TUG1 transcripts in the nucleus, and that splicing of TERT retained introns is mitotically regulated.

    • Gabrijela Dumbović
    • , Ulrich Braunschweig
    •  & John L. Rinn
  • Article
    | Open Access

    The IDG-DREAM Challenge carried out crowdsourced benchmarking of predictive algorithms for kinase inhibitor activities on unpublished data. This study provides a resource to compare emerging algorithms and prioritize new kinase activities to accelerate drug discovery and repurposing efforts.

    • Anna Cichońska
    • , Balaguru Ravikumar
    •  & Tero Aittokallio
  • Article
    | Open Access

    Mammalian genomes are scattered with repetitive sequences, but their biology remains largely elusive. Here, the authors show that transcription can initiate from short tandem repetitive sequences, and that genetic variants linked to human diseases are preferentially found at repeats with high transcription initiation level.

    • Mathys Grapotte
    • , Manu Saraswat
    •  & Charles-Henri Lecellier
  • Article
    | Open Access

    Although autophagy has been linked to tumourigenesis, it is unclear how genomic alterations affect autophagy selectivity in tumours. Here, the authors establish a pipeline that integrates computational and experimental approaches to show that altered autophagy selectivity is frequent in cancer cells and link glycogen autophagy with tumourigenesis.

    • Zhu Han
    • , Weizhi Zhang
    •  & Da Jia
  • Article
    | Open Access

    Single-cell RNA-Seq allows us to observe snapshots of how biological systems change over time at cellular resolution. Here, the authors develop a generative framework that uses time-resolved single-cell data to model how cells change in physical time, including in response to perturbations.

    • Grace Hui Ting Yeo
    • , Sachit D. Saksena
    •  & David K. Gifford
  • Article
    | Open Access

    Current genome mining methods predict many putative non-ribosomal peptides (NRPs) from their corresponding biosynthetic gene clusters, but it remains unclear which of those exist in nature and how to identify their post-assembly modifications. Here, the authors develop NRPminer, a modification-tolerant tool for the discovery of NRPs from large genomic and mass spectrometry datasets, and use it to find 180 NRPs from different environments.

    • Bahar Behsaz
    • , Edna Bode
    •  & Hosein Mohimani