Computational biology and bioinformatics articles within Nature Communications

Featured

  • Article
    | Open Access

    Association analyses that capture rare and noncoding variants in whole genome sequencing data are limited by factors like statistical power. Here, the authors present KnockoffScreen, a statistical method using the knockoff framework to detect, localise and prioritise rare and common risk variants at genome-wide scale.

    • Zihuai He
    • , Linxi Liu
    •  & Iuliana Ionita-Laza
  • Article
    | Open Access

    A major challenge across a variety of fields is how to process the vast quantities of data produced by sensors without large computation resources. Here, the authors present a neuromorphic chip which can detect a relevant signature of epileptogenic tissue from intracranial recordings in patients.

    • Mohammadali Sharifshazileh
    • , Karla Burelo
    •  & Giacomo Indiveri
  • Article
    | Open Access

    Cellular genetic heterogeneity is common across biological conditions, yet application of long-read sequencing to this subject is limited by error rates. Here, the authors present iGDA, a tool for detection and phasing of minor variants from long-read sequencing data, allowing accurate reconstruction of haplotypes.

    • Zhixing Feng
    • , Jose C. Clemente
    •  & Eric E. Schadt
  • Article
    | Open Access

    High-grade serous ovarian cancer (HGSOC) is prone to developing resistance to treatment. Here, the authors use single-cell RNA-seq and an analysis of archetypes, and find that shifts in metabolism and proliferation are associated with the response to treatment and clonal heterogeneity in HGSOC.

    • Aritro Nath
    • , Patrick A. Cosgrove
    •  & Andrea H. Bild
  • Article
    | Open Access

    Combining scRNA-seq with spatial information to enable the reconstruction of spatially-resolved cell atlases is challenging for rare cell types. Here the authors present ClumpSeq, an approach for sequencing small clumps of tissue attached cells, and apply it to establish spatial atlases for all secretory cell types in the small intestine.

    • Rita Manco
    • , Inna Averbukh
    •  & Shalev Itzkovitz
  • Article
    | Open Access

    The vast majority of somatic mutations observed in tumors are rare. Here, the authors show that these large numbers of rare mutations are more predictive of the tissue of origin of a tumor than the information from a few common driver mutations.

    • Saptarshi Chakraborty
    • , Axel Martin
    •  & Ronglai Shen
  • Article
    | Open Access

    Study of human disease remains challenging due to convoluted disease etiologies and complex molecular mechanisms at genetic, genomic, and proteomic levels. Here, the authors propose a computationally efficient Permutation-based Feature Importance Test to assist interpretation and selection of individual features in complex machine learning models for complex disease analysis.

    • Xinlei Mi
    • , Baiming Zou
    •  & Jianhua Hu
  • Article
    | Open Access

    Initial COVID-19 containment in the United States focused on limiting mobility, including school and workplace closures, with enormous societal and economic costs. Here, the authors demonstrate the feasibility of a test-trace-quarantine strategy using an agent-based model and detailed data on the Seattle region.

    • Cliff C. Kerr
    • , Dina Mistry
    •  & Daniel J. Klein
  • Article
    | Open Access

    One challenge of single cell RNA sequencing analysis is how to consistently identify cell subtypes and states across different datasets. Here the authors propose the use of a reference single-cell atlas as a stable system of coordinates to characterize T cell states across studies, diseases and species.

    • Massimo Andreatta
    • , Jesus Corria-Osorio
    •  & Santiago J. Carmona
  • Article
    | Open Access

    Here, combing the massive gene-universe of the gut microbiome to identify strain-specific, cross-disease, associations across seven human diseases, the authors introduce the concept of microbiome architecture, defined as the complete set of positive and negative associations between microbial genes and human host disease, highlighting microbiome architectures as potential diagnostic indicators.

    • Braden T. Tierney
    • , Yingxuan Tan
    •  & Chirag J. Patel
  • Article
    | Open Access

    Whole genome sequencing data are increasingly becoming routinely available but generating actionable insights is challenging. Here, the authors describe Pathogenwatch, a web tool for genomic surveillance of S. Typhi, and demonstrate its use for antimicrobial resistance assignment and strain risk assessment.

    • Silvia Argimón
    • , Corin A. Yeats
    •  & David M. Aanensen
  • Article
    | Open Access

    Biomedical measurements usually generate high-dimensional data where individual samples are classified in several categories. Vogelstein et al. propose a supervised dimensionality reduction method which estimates the low-dimensional data projection for classification and prediction in big datasets.

    • Joshua T. Vogelstein
    • , Eric W. Bridgeford
    •  & Mauro Maggioni
  • Article
    | Open Access

    Differentiating neutrophil functional states is difficult. Here the authors show, using single cell RNA-sequencing and trajectory analyses, that mouse neutrophils can be presented as a transcriptome continuum rather than discrete subsets, but are affected by inflammation to express distinct transcriptional states.

    • Ricardo Grieshaber-Bouyer
    • , Felix A. Radtke
    •  & Hideyuki Yoshida
  • Article
    | Open Access

    The basolateral amygdala is implicated in several behavior-related states including anxiety, autism, and addiction. The authors apply circuit-level pathway tracing methods combined with computational techniques to provide a comprehensive connectivity atlas of the mouse basolateral amygdala complex.

    • Houri Hintiryan
    • , Ian Bowman
    •  & Hong-Wei Dong
  • Article
    | Open Access

    Classification methods for scRNA-seq data are limited in their ability to learn from multiple datasets simultaneously. Here the authors present scHPL, a hierarchical progressive learning method that automatically finds relationships between cell populations across multiple datasets and constructs a classification tree.

    • Lieke Michielsen
    • , Marcel J. T. Reinders
    •  & Ahmed Mahfouz
  • Article
    | Open Access

    Quantifying the effects of individual loci on the human phenome is a challenging task. Here, the authors introduce a modelling technique, TGCA, that assesses total genetic contribution per locus and apply this to UK Biobank phenotype domains, revealing top loci and links to tissue-specific gene expression.

    • Ting Li
    • , Zheng Ning
    •  & Xia Shen
  • Article
    | Open Access

    The functional consequences of variation in human regulatory DNA depend on the local chromatin environment and the cell/tissue context. Here the authors use highly diverged hybrid mice to study genetic effects on DNA accessibility in vivo across multiple cell and tissue types.

    • Jessica M. Halow
    • , Rachel Byron
    •  & Matthew T. Maurano
  • Article
    | Open Access

    Predicting RNA structure from sequence is challenging due to the relative sparsity of experimentally-determined RNA 3D structures for model training. Here, the authors propose a way to incorporate knowledge on interactions at the atomic and base–base level to refine the prediction of RNA structures.

    • Peng Xiong
    • , Ruibo Wu
    •  & Yaoqi Zhou
  • Article
    | Open Access

    Single-nucleotide variants in enhancers or promoters may affect gene transcription by altering transcription factor binding sites. Here the authors present a meta-analysis empowered by a new statistical method covering thousands of ChIP-Seq experiments resulting in the identification of more than 500 thousand allele-specific binding (ASB) events in the human genome.

    • Sergey Abramov
    • , Alexandr Boytsov
    •  & Ivan V. Kulakovskiy
  • Article
    | Open Access

    AIM2-ASC inflammasomes are filamentous signalling platforms that play a central role in host innate defence. Here, the authors present the filament cryo-EM structure of the inflammasome receptor AIM2, which is very similar to the adaptor ASC filament structure. By employing Rosetta and Molecular Dynamics simulations the authors provide further insights into the directionality and recognition mechanisms of the individual AIM2 and ASC filaments, which is further validated with biochemical and cellular experiments.

    • Mariusz Matyszewski
    • , Weili Zheng
    •  & Jungsan Sohn
  • Article
    | Open Access

    Estimates of COVID-19-related mortality are limited by incomplete testing. Here, the authors perform counterfactual analyses and estimate that there were 59,000–62,000 deaths from COVID-19 in Italy until 9th September 2020, approximately 1.5 times higher than official statistics.

    • Chirag Modi
    • , Vanessa Böhm
    •  & Uroš Seljak
  • Article
    | Open Access

    Several prognostic indices are available to predict the long-term fate of emerging infectious diseases and the effect of their containment measures, including a variety of reproduction numbers. Here, the authors introduce the epidemicity index, a complementary index to evaluate the potential for transient increases of SARS-Cov-2 epidemics.

    • Lorenzo Mari
    • , Renato Casagrandi
    •  & Marino Gatto
  • Article
    | Open Access

    Identifying enriched gene sets in transcriptomic data is routine analysis. Here, the authors show that conventional gene category enrichment analysis (GCEA) applied to brain-wide atlas data yields biased results and develop a flexible ensemble-based null model framework to enable appropriate inference in GCEA.

    • Ben D. Fulcher
    • , Aurina Arnatkeviciute
    •  & Alex Fornito
  • Article
    | Open Access

    Placental dysfunction can have catastrophic or barely discernible effects ranging from miscarriage to apparently normal birth. Here the authors present a comprehensive analysis of the human placental transcriptome and identify circular RNAs and piRNAs.

    • Sungsam Gong
    • , Francesca Gaccioli
    •  & D. Stephen Charnock-Jones
  • Article
    | Open Access

    The SARS-CoV-2 gene set remains unresolved, hindering dissection of COVID-19 biology. Comparing 44 Sarbecovirus genomes provides a high-confidence protein-coding gene set. The study characterizes protein-level and nucleotide-level evolutionary constraints, and prioritizes functional mutations from the ongoing COVID-19 pandemic.

    • Irwin Jungreis
    • , Rachel Sealfon
    •  & Manolis Kellis
  • Article
    | Open Access

    Cardiac amyloidosis is difficult to identify, given low prevalence and similarity of the symptoms to more prevalent disorders. Here the authors present a multi-modality, artificial intelligence-enabled pipeline, that enables automated detection of cardiac amyloidosis from inexpensive and accessible measures.

    • Shinichi Goto
    • , Keitaro Mahara
    •  & Rahul C. Deo
  • Article
    | Open Access

    Whole genome sequencing is increasingly being adopted for Shigella sonnei outbreak investigation and surveillance, but there is no global classification standard. Here, the authors develop and validate a genomic framework implemented using open-source software, and demonstrate its application using surveillance data.

    • Jane Hawkey
    • , Kalani Paranagama
    •  & Kathryn E. Holt
  • Article
    | Open Access

    Nephric duct (ND)-derived ureteric buds (UB) form the kidney collecting duct system, while ureteric tips promote nephron formation. Here the authors use single-cell RNA-seq and introduce Cluster RNA-seq to identify four progenitor populations in developing ND/UB regulated by the transcription factors Tfap2a/b and Gata3.

    • Oraly Sanchez-Ferras
    • , Alain Pacis
    •  & Maxime Bouchard
  • Article
    | Open Access

    Tissue damage and turnover lead to the release of DNA in the blood and can be used to monitor changes in tissue state. Here, the authors developed a tool to accurately estimate the proportion of cell types contributing to cell-free DNA in the blood, with an application to pregnant women and ALS patients.

    • Christa Caggiano
    • , Barbara Celona
    •  & Noah Zaitlen
  • Article
    | Open Access

    The differentiation of neural stem cells (NSCs) into neurons is a critical part in devising potential cell-based therapeutic strategies for central nervous system diseases but NSCs fate determination and prediction is problematic. Here, the authors present a deep neural network model for predictable reliable identification of NSCs fate.

    • Yanjing Zhu
    • , Ruiqi Huang
    •  & Rongrong Zhu
  • Article
    | Open Access

    Computational algorithms to infer chromatin sub-compartments and compartment domains require high-resolution Hi-C maps. Here the authors present Calder, an algorithm that can infer sub-compartments and compartment domains with variable resolution Hi-C data, and they apply it to more than a hundred Hi-C experiments to study sub-compartment repositioning.

    • Yuanlong Liu
    • , Luca Nanni
    •  & Giovanni Ciriello
  • Article
    | Open Access

    High-content screening prompted the development of software enabling discrete phenotypic analysis of single cells. Here, the authors show that supervised continuous machine learning can drive novel discoveries in diverse imaging experiments and present the Regression Plane module of Advanced Cell Classifier.

    • Abel Szkalisity
    • , Filippo Piccinini
    •  & Peter Horvath
  • Article
    | Open Access

    Karyotyping of cancer genomes at the base-level is technically challenging. Here, the authors introduce InfoGenomeR, an algorithm that can infer cancer genome karyotypes from whole-genome sequencing data, and test their model on breast, ovarian and brain cancer samples; and identify private and shared mutations between primary and metastatic cancer samples.

    • Yeonghun Lee
    •  & Hyunju Lee
  • Article
    | Open Access

    Methods to produce haplotype-resolved genome assemblies often rely on access to family trios. The authors present FALCON-Phase, a tool that combines ultra-long range Hi-C chromatin interaction data with a long read de novo assembly to extend haplotype phasing to the contig or scaffold level.

    • Zev N. Kronenberg
    • , Arang Rhie
    •  & Sarah B. Kingan
  • Article
    | Open Access

    Highly endangered species like the Sumatran rhinoceros are at risk from inbreeding. Five historical and 16 modern genomes from across the species range show mutational load, but little evidence for local adaptation, suggesting that future inbreeding depression could be mitigated by assisted gene flow among populations.

    • Johanna von Seth
    • , Nicolas Dussex
    •  & Love Dalén
  • Article
    | Open Access

    The ability to design functional sequences is central to protein engineering and biotherapeutics. Here the authors introduce a deep generative alignment-free model for sequence design applied to highly variable regions and design and test a diverse nanobody library with improved properties for selection experiments.

    • Jung-Eun Shin
    • , Adam J. Riesselman
    •  & Debora S. Marks
  • Article
    | Open Access

    Data-rich networks can be difficult to interpret beyond a certain size. Here, the authors introduce a platform that uses virtual reality to allow the visual exploration of large networks, while interfacing with data repositories and other analytical methods to improve the interpretation of big data.

    • Sebastian Pirch
    • , Felix Müller
    •  & Jörg Menche