Computational biology and bioinformatics articles within Nature Communications

Featured

  • Article
    | Open Access

    The ability to engineer novel protein structures has tremendous scientific and therapeutic impact. Here, authors develop a generative model acting upon an angular representation of protein structures to create high quality protein backbones.

    • Kevin E. Wu
    • , Kevin K. Yang
    •  & Ava P. Amini
  • Article
    | Open Access

    Telomeres protect the extremities of linear chromosomes and are involved in ageing, senescence and genome stability. Here, the authors have identified peculiar and specific telomeric DNA repeats in the genomes of devastating plant-parasitic nematodes, opening new perspectives for their control.

    • Ana Paula Zotta Mota
    • , Georgios D. Koutsovoulos
    •  & Etienne G. J. Danchin
  • Article
    | Open Access

    The Bioconductor project aims to develop R packages for analysis of genomic datasets. Here the authors show the HiCExperiment package suite and its companion online book (https://bioconductor.org/books/OHCA/) which present data structures, computational methods and visualization tools available in Bioconductor to investigate chromatin conformation capture (3C) data in R.

    • Jacques Serizay
    • , Cyril Matthey-Doret
    •  & Romain Koszul
  • Article
    | Open Access

    Descriptive data in biomedical research are expanding rapidly, but functional validation methods lag behind. Here, authors present Logical Synthetic cis-regulatory DNA, a framework to design reporters that mark cellular states and pathways, showcasing its applicability to complex phenotypic states.

    • Carlos Company
    • , Matthias Jürgen Schmitt
    •  & Gaetano Gargiulo
  • Article
    | Open Access

    Cell type annotation for single-cell data is challenging. Here, authors explore active and self-supervised learning and introduce adaptive reweighting as a tailored heuristic, demonstrating competitive performance and showing that incorporating prior knowledge enhances cell type annotation accuracy.

    • Michael J. Geuenich
    • , Dae-won Gong
    •  & Kieran R. Campbell
  • Article
    | Open Access

    Identifying senescence is complicated by a lack of universal markers. Here, Duran et al. use nuclear morphology features to devise machine-learning classifiers that detect senescence in cell lines and liver sections of patients and mouse models of aging and disease.

    • Imanol Duran
    • , Joaquim Pombo
    •  & Jesús Gil
  • Article
    | Open Access

    In this work, using a combination of Cryo-EM, in-cell experiments and biophysical analysis, the authors decoded the aggregation propensity of tau, revealing 5 central hot spots in its primary sequence and identify PAM4 as short segment that determines both the structure, as well as the cellular propagation of tau aggregates extracted from Alzheimer’s disease, corticobasal degeneration, and progressive supranuclear palsy patients.

    • Nikolaos Louros
    • , Martin Wilkinson
    •  & Joost Schymkowitz
  • Article
    | Open Access

    Existing tools for structural variations (SVs) calling and merging often lead to fragmented SVs and the potential of introducing unnecessary errors. Here, the authors report the PanPop pipeline to address these issues by implementing sequence-aware SV merging algorithm to efficiently merge SVs of various types.

    • Zeyu Zheng
    • , Mingjia Zhu
    •  & Yongzhi Yang
  • Article
    | Open Access

    High-throughput electron microscopy demands minimal human intervention and high image quality. Here, authors introduce DeepFocus, a data-driven method for aberration correction in electron microscopy, robust for low SNR images, fast and easily adaptable to microscopes and samples. Peer Review Information: Nature Communications thanks Yang Zhang and the other, anonymous, reviewer(s) for their contribution to the peer review of this work. A peer review file is available.

    • P. J. Schubert
    • , R. Saxena
    •  & J. Kornfeld
  • Article
    | Open Access

    It is unclear whether naturally evolved de novo proteins have stable, folded structures. Here, systematic identification and structural modeling of de novo genes, this study reveals that a small subset of these proteins may have well-folded structures, and were likely born with these structures.

    • Balázs Bálint
    • , Zsolt Merényi
    •  & László G. Nagy
  • Article
    | Open Access

    Age-associated myometrial dysfunction can cause complications during pregnancy and labor. Here, the authors report that aging myometrium is characterized by diminished contractile capillary cells, altered gene expression, and disrupted cellular communication leading to impaired angiogenesis, increased fibrosis and inflammation.

    • Paula Punzon-Jimenez
    • , Alba Machado-Lopez
    •  & Aymara Mas
  • Article
    | Open Access

    Here, the authors develop a high-throughput method to quantify Bifidobacterium longum subsp. infantis (BL. infantis), a proficient HMO-utilizer, from metagenomic sequencing, and applied it to a longitudinal cohort consisting of 21 mother-infant dyads, suggesting BL. infantis colonization to start late in the breast-feeding period.

    • Dena Ennis
    • , Shimrit Shmorak
    •  & Moran Yassour
  • Article
    | Open Access

    Batch effects hinder multi-sample single-cell data analyses. Here, authors present STACAS, a scalable single-cell RNA-seq data integration tool that uses prior cell type knowledge to preserve biological variability, demonstrating robustness to noisy input cell type labels.

    • Massimo Andreatta
    • , Léonard Hérault
    •  & Santiago J. Carmona
  • Article
    | Open Access

    Using All of Us pilot data, the authors compared short- and long-read performance across medically relevant genes and showcased the utility of long reads to improve variant detection and phasing in easy and hard to resolve medically relevant genes.

    • M. Mahmoud
    • , Y. Huang
    •  & F. J. Sedlazeck
  • Article
    | Open Access

    scRNA-Seq has enabled the study of dynamic systems such as response to a drug at the individual cell and gene levels. Here the authors introduce a framework to interpret differences at the trajectory, cell populations, and individual gene levels.

    • Hector Roux de Bézieux
    • , Koen Van den Berge
    •  & Sandrine Dudoit
  • Article
    | Open Access

    Synthetic microbial communities are suitable for mixed substrates fermentation and long metabolic pathway engineering. Here, the authors combine fermentation experiments with mathematical modeling to reveal the effect of compositional and temporal changes on division of labor in cellulosic ethanol production using two yeast strains.

    • Jonghyeok Shin
    • , Siqi Liao
    •  & Yong-Su Jin
  • Article
    | Open Access

    No consensus exists on the computationally tractable use of dynamic models for strain design. To tackle this, the authors report a framework, nonlinear-dynamic-model-assisted rational metabolic engineering design, for efficiently designing robust, artificially engineered cellular organisms.

    • Bharath Narayanan
    • , Daniel Weilandt
    •  & Vassily Hatzimanikatis
  • Article
    | Open Access

    The heterogeneity of whole-exome sequencing (WES) data generation methods presents a challenge to joint analysis. Here, the authors present a bioinformatics strategy to generate high-quality data from processing diversely generated WES samples, as applied in the Alzheimer’s Disease Sequencing Project.

    • Yuk Yee Leung
    • , Adam C. Naj
    •  & Li-San Wang
  • Article
    | Open Access

    Bacterial viruses (phages) are generally recognised as rapidly evolving biological entities. Here, Rozwalak et al. analyse DNA sequence datasets generated from ancient palaeofaeces and identify 298 phage genomes from the last 5300 years, including a 1300-year-old phage genome nearly identical to a present-day virus that infects human gut bacteria.

    • Piotr Rozwalak
    • , Jakub Barylski
    •  & Andrzej Zielezinski
  • Article
    | Open Access

    Detection of neoepitopes from tumours is time consuming and requires the integration of genomic and/or RNA sequencing expression data. Here, the authors propose a machine learning method to enable direct identification of additional, tumour-specific sequences using mass spectrometry through integration of de novo peptide sequencing scores, MHC class I binding prediction, and peptide retention time prediction.

    • Hanqing Liao
    • , Carolina Barra
    •  & Nicola Ternette
  • Article
    | Open Access

    Segmentation is an important fundamental task in medical image analysis. Here the authors show a deep learning model for efficient and accurate segmentation across a wide range of medical image modalities and anatomies.

    • Jun Ma
    • , Yuting He
    •  & Bo Wang
  • Article
    | Open Access

    Chronic SARS-CoV-2 infections have been hypothesised to be sources of new variants. Here, the authors use large-scale genome sequencing data to identify mutations predictive of chronic infections, which may therefore be relevant in future variants.

    • Sheri Harari
    • , Danielle Miller
    •  & Adi Stern
  • Article
    | Open Access

    Dengue virus circulation was unusually low in Brazil in 2015-2018 following the emergence of Zika virus, but subsequently resurged causing large outbreaks with a lower mean age of infection. Here, the authors use mathematical modelling to investigate the links between dengue dynamics and prior Zika infection.

    • Francesco Pinotti
    • , Marta Giovanetti
    •  & José Lourenço
  • Article
    | Open Access

    Bacteria use various defense systems to protect themselves from phage infection, and phages have evolved diverse counter-defense measures to overcome host defenses. Here, the authors use protein structural similarity and gene co-occurrence analyses for identification of new anti-phage and counter-defense systems.

    • Ning Duan
    • , Emily Hand
    •  & Akintunde Emiola
  • Article
    | Open Access

    Assessing tumour contamination in normal samples is critical for accurate variant calling in cancer samples. Here, the authors develop TINC, a computational method to determine the level of tumour in normal contamination, and demonstrate its application in the Genomics England 100,000 Genomes Project dataset.

    • Jonathan Mitchell
    • , Salvatore Milite
    •  & Giulio Caravagna
  • Article
    | Open Access

    Single cell transcriptomics can reveal at high resolution the body’s response to infection. Here the authors have applied this technology to a longitudinal SARS-CoV-2 infected cohort and identified gene expression changes that may predict disease severity and reveal the underlying molecular mechanisms.

    • Quy Xiao Xuan Lin
    • , Deepa Rajagopalan
    •  & Shyam Prabhakar
  • Article
    | Open Access

    Seasonal influenza levels were unusually low when non-pharmaceutical interventions for COVID-19 were in place. Here, the authors analyse serological and epidemiological evidence for the hypothesis that such lulls in influenza transmission lead to reduced immunity and therefore larger epidemics in subsequent seasons.

    • Simon P. J. de Jong
    • , Zandra C. Felix Garza
    •  & Colin A. Russell
  • Article
    | Open Access

    Here, the authors present COMEBin, a metagenomics binning method based on contrastive multi-view representation learning that uses data augmentation to generate multiple fragments (views) of each contig, resulting in high-quality embeddings of heterogeneous features. COMEBin outperforms state-of-the art binning methods, particularly in recovering near-complete genomes from real environmental samples.

    • Ziye Wang
    • , Ronghui You
    •  & Shanfeng Zhu
  • Article
    | Open Access

    SARS-CoV-2 coinfections may lead to recombination events which could be important in the emergence of new variants. Here, the authors develop an automated bioinformatics pipeline to identify coinfections in genomic data and test it on >2 million publicly available raw read data sets collected globally.

    • Orsolya Anna Pipek
    • , Anna Medgyes-Horváth
    •  & István Csabai
  • Article
    | Open Access

    Topologically associating domains (TADs) are critical structural units in 3D genome organization, and their reorganization between health and disease states is associated with essential genome functions. However, computational methods for identifying reorganized TADs are still in the early stages of development. Here, the authors present an algorithm leveraging random matrix theory to identify reorganized TADs.

    • Dunming Hua
    • , Ming Gu
    •  & Dechao Tian
  • Article
    | Open Access

    Cryo-EM is the go-to method for visualizing large, flexible biomolecules. Here, authors introduce a new Gaussian mixture modelling method for cryo-EM modelling tasks, including refinement, composite map generation and ensemble representation.

    • Joseph G. Beton
    • , Thomas Mulvaney
    •  & Maya Topf