Genome informatics | Nature Communications

Article
28 October 2022 | Open Access

Strain level microbial detection and quantification with applications to single cell metagenomics

Here the authors develop CAMMiQ, a combinatorial optimization approach that utilizes variable length, “unique” and “doubly-unique” genomic segments, showing improves identification and quantification of distinct microbes in metagenomic sequence data.

Kaiyuan Zhu
, Alejandro A. Schäffer
& S. Cenk Sahinalp

Article
23 October 2022 | Open Access

Detection and prevalence of SARS-CoV-2 co-infections during the Omicron variant circulation in France

Monitoring of co-infections of SARS-CoV-2 variants is important to evaluate their clinical impact and the risk of emergence of recombinants. Here, the authors develop and validate a methodological pipeline to detect co-infections and apply it to samples from France in early 2022, when Delta and Omicron were co-circulating.

Antonin Bal
, Bruno Simon
& Laurence Josset

Article
13 October 2022 | Open Access

Genome-centric analysis of short and long read metagenomes reveals uncharacterized microbiome diversity in Southeast Asians

Reference genomes for gut microbiomes help unravel microbial “dark matter” and serve as valuable resource for disease-focused studies. Here, the authors perform short and long read metagenomics and metagenome-assembled genomes analyses to profile the gut microbiome of Southeast Asian populations, revealing significant species and strain-level diversity, with thousands of previously uncharacterized biosynthetic gene clusters.

Jean-Sebastien Gounot
, Minghao Chia
& Niranjan Nagarajan

Article
07 October 2022 | Open Access

Prevalence and mechanisms of somatic deletions in single human neurons during normal aging and in DNA repair disorders

DNA damage has been implicated in aging and neurodegeneration. Here, the authors develop a bioinformatic method to detect deletions in single neuron genome sequences and reveal an increased burden of somatic deletions during aging and in DNA repair disorders.

Junho Kim
, August Yue Huang
& Eunjung Alice Lee

Article
29 September 2022 | Open Access

Deciphering microbial gene function using natural language processing

The function of many microbial genes is yet unknown. Here the authors repurposed natural language processing algorithms to explore “gene semantics” and infer function for thousands of genes with defense and secretion systems found to have the most discovery potential.

Danielle Miller
, Adi Stern
& David Burstein

Article
26 September 2022 | Open Access

Modeling tissue-specific breakpoint proximity of structural variations from whole-genomes to identify cancer drivers

Identifying structural variants (SVs) under positive selection in cancer is challenging. Here, the authors develop CSVDriver, a method that computes SV breakpoint proximity and the contribution of elements such as topologically associating domains, and identifies loci that show signs of positive selection and contain known and putative drivers.

Alexander Martinez-Fundichely
, Austin Dixon
& Ekta Khurana

Article
06 September 2022 | Open Access

The mutational signatures of formalin fixation on the human genome

Many archived tumour samples are stored as formalin-fixed and paraffin-embedded (FFPE) blocks, but this treatment can impact downstream genomics analyses. Here, the authors derive the mutational signatures of formalin on the cancer genome, and present FFPEsig, an algorithm that can distinguish and correct FFPE mutational signatures in archived cancer samples.

Qingli Guo
, Eszter Lakatos
& Ville Mustonen

Article
08 August 2022 | Open Access

Scarf enables a highly memory-efficient analysis of large-scale single-cell genomics data

As the scale of single-cell genomics experiments grows into the millions, the computational requirements to process this data are beyond the reach of many. Here the authors present Scarf, a modularly designed Python package that makes the analysis workflow highly memory efficient such that even the largest existing datasets can be analyzed on an average modern laptop.

Parashar Dhapola
, Johan Rodhe
& Göran Karlsson

Article
04 August 2022 | Open Access

Pan-African genome demonstrates how population-specific genome graphs improve high-throughput sequencing data analysis

Graph-based genome reference representations have seen significant development, motivated by the inadequacy of the current human genome reference to represent the diverse genetic information from different human populations and its inability to maintain the same level of accuracy for non-European ancestries. Here the authors present the case for iteratively augmenting tailored genome graphs for targeted populations and demonstrate this approach on the whole-genome samples of African ancestry.

H. Serhat Tetikol
, Deniz Turgut
& Brandi N. Davis-Dusenbery

Article
25 July 2022 | Open Access

Extreme purifying selection against point mutations in the human genome

Previous work has investigated selection in the coding genome, but it is not as well characterized in the non-coding genome. By analyzing rare variants in 70k genome sequences from gnomAD, the authors detect very strong purifying selection ("ultraselection”) across the human genome, finding it in some microRNAs and coding sequences but generally rare in regulatory sequences.

Noah Dukler
, Mehreen R. Mughal
& Adam Siepel

Article
14 July 2022 | Open Access

Identifying multicellular spatiotemporal organization of cells with SpaceFlow

A critical task in spatial transcriptomics analysis is to understand inherently spatial relationships among cells. Here, the authors present a deep learning framework to integrate spatial and transcriptional information, spatially extending pseudotime and revealing spatiotemporal organization of cells.

Honglei Ren
, Benjamin L. Walker
& Qing Nie

Article
25 June 2022 | Open Access

Occult polyclonality of preclinical pancreatic cancer models drives in vitro evolution

It is unclear if the molecular profiles of pancreatic ductal adenocarcinoma (PDAC) preclinical models remain stable during propagation. Here, the authors characterise clonal evolution throughout propagation in PDAC cell lines and a patient-derived organoid using single-cell genomics, transcriptomics and epigenomics.

Maria E. Monberg
, Heather Geiger
& Anirban Maitra

Article
09 June 2022 | Open Access

Robust and accurate estimation of paralog-specific copy number for duplicated genes using whole-genome sequencing

Low-copy repeats cover up to 5% of the human genome and are prone to extensive copy number variation. Here, the authors present a novel computational method to estimate paralog-specific copy number of such regions using whole-genome sequencing.

Timofey Prodanov
& Vikas Bansal

Article
31 May 2022 | Open Access

Structural variant-based pangenome construction has low sensitivity to variability of haplotype-resolved bovine assemblies

Pangenomes have a number of advantages over linear reference assemblies. Here the authors use bovine haplotype-resolved assemblies to show that structural variant-based pangenomes are consistent regardless of sequence platform, assembler, or coverage, suggesting that rigid protocols may not be required.

Alexander S. Leonard
, Danang Crysnanto
& Hubert Pausch

Article
26 May 2022 | Open Access

Genomic analyses of 10,376 individuals in the Westlake BioBank for Chinese (WBBC) pilot project

Biobanks of genetic data have been primarily in European populations, which gives us an incomplete understanding of complex traits across populations. Here, the authors initiate the Westlake BioBank for Chinese (WBBC) pilot project with 4,535 whole genome sequences and 5,841 high-density genotypes from China, characterizing large-scale genomic variation in Chinese populations.

Pei-Kuan Cong
, Wei-Yang Bai
& Hou-Feng Zheng

Article
26 May 2022 | Open Access

An integral genomic signature approach for tailored cancer therapy using genome-wide sequencing data

Predicting drug responses in cancer patients requires robust computational frameworks. Here, the authors develop an integral genomic signature —iGenSig— approach to predict drug responses using multi-omics data from tumour samples, and validate this approach using genomic datasets from multiple clinical studies.

Xiao-Song Wang
, Sanghoon Lee
& Yue Wang

Article
11 May 2022 | Open Access

3D chromatin remodelling in the germ line modulates genome evolutionary plasticity

The role of genome folding in the heritability and evolvability of structural variations is not well understood. Here the authors investigate the impact of the three-dimensional genome topology of germ cells in the formation and transmission of gross structural genomic changes detected from comparing whole-genome sequences of 14 rodent species.

Lucía Álvarez-González
, Frances Burden
& Aurora Ruiz-Herrera

Article
19 April 2022 | Open Access

An evolutionarily conserved stop codon enrichment at the 5′ ends of mammalian piRNAs

Piwi-interacting RNAs are small RNAs produced by processing of long precursor transcripts. Here the authors report that precursor cleavage typically occurs at positions corresponding to stop codons and that this pattern is conserved across mammals.

Susanne Bornelöv
, Benjamin Czech
& Gregory J. Hannon

Article
14 April 2022 | Open Access

Differences in RNA polymerase II complexes and their interactions with surrounding chromatin on human and cytomegalovirus genomes

Here the authors digested chromatin with DNA fragmentation factor (DFF) prior to chromatin immunoprecipitation (DFF-ChIP) to depict transcription complex interactions with neighboring nucleosomes in cells. Applying this method to human cytomegalovirus (HMCV)-infected cells, they find that the viral genome is underchromatinized, leading to fewer transcription complex interactions with nucleosomes.

Benjamin M. Spector
, Mrutyunjaya Parida
& David H. Price

Article
12 April 2022 | Open Access

Anopheles mosquitoes reveal new principles of 3D genome organization in insects

Anopheles mosquitoes are vectors of human malaria, and better understanding of them has implications for public health. Here, the authors apply Hi-C, FISH, RNA-seq, and ChIP-seq techniques to comprehensively characterize chromatin architecture and its evolutionary dynamics in five Anopheles species.

Varvara Lukyanchikova
, Miroslav Nuriddinov
& Veniamin Fishman

Article
06 April 2022 | Open Access

Zoonotic origin of the human malaria parasite Plasmodium malariae from African apes

Plasmodium malariae is a cause of malaria in humans and related species have been identified in non-human primates. Here, the authors use genomic analyses to establish that human P. malariae arose from a host switch of an ape parasite whilst a species infecting New World monkeys can be traced to a reverse zoonosis.

Lindsey J. Plenderleith
, Weimin Liu
& Paul M. Sharp

Article
29 March 2022 | Open Access

Empirical prediction of variant-activated cryptic splice donors using population-based RNA-Seq data

Genetic variants affecting the consensus splicing motifs can alter binding of spliceosomal components and induce mis-splicing. Here, the authors develop a method, showing that ranking the most common recurring mis-splicing events in public RNA-Seq data can predict the activation of cryptic-donors.

Ruebena Dawes
, Himanshu Joshi
& Sandra T. Cooper

Article
02 March 2022 | Open Access

Antimicrobial resistance and population genomics of multidrug-resistant Escherichia coli in pig farms in mainland China

Use of antimicrobials in livestock contributes to development of antimicrobial resistance but there are few large-scale surveillance studies. Here, the authors describe E. coli surveillance in pig farms in China, reporting high levels of multidrug-resistance across all mainland provinces.

Zhong Peng
, Zizhe Hu
& Xiangru Wang

Article
18 February 2022 | Open Access

Genome binning of viral entities from bulk metagenomics data

Here, Johansen et al. develop an approach, Phages from Metagenomics Binning (PHAMB), that allows the binning of thousands of viral genomes directly from bulk metagenomics data, while simultaneously enabling clustering of viral genomes into accurate taxonomic viral populations, unveiling viral-microbial host interactions in the gut.

Joachim Johansen
, Damian R. Plichta
& Simon Rasmussen

Article
27 January 2022 | Open Access

Population structure analysis and laboratory monitoring of Shigella by core-genome multilocus sequence typing

Lab-based surveillance of Shigella has traditionally been based on serotyping but increasing availability of whole genome sequencing could enable higher resolution typing. Here, the authors apply a core genome multilocus sequence typing scheme to Shigella sequence data and describe its population structure.

Iman Yassine
, Sophie Lefèvre
& François-Xavier Weill

Article
17 January 2022 | Open Access

Epigenetic aging of the demographically non-aging naked mole-rat

The exceptionally long-lived naked mole-rat is characterized by the lack of increased mortality with aging. Here the authors perform epigenetic studies to show that naked mole-rats epigenetically age despite their non-increasing mortality rate.

Csaba Kerepesi
, Margarita V. Meer
& Vadim N. Gladyshev

Article
10 January 2022 | Open Access

Prediction of biomarkers and therapeutic combinations for anti-PD-1 immunotherapy using the global gene network association

A lot of cancer patients are not responsive to anti-PD1 therapy. Here, the authors develop a network approach to identify genes, pathways and potential therapeutic combinations and develop an MHC-I gene immunoscore associated with tumour response to anti-PD1.

Chia-Chin Wu
, Y. Alan Wang
& P. Andrew Futreal

Article
26 November 2021 | Open Access

Whole-genome analysis of Nigerian patients with breast cancer reveals ethnic-driven somatic evolution and distinct genomic subtypes

Breast cancer heterogeneity and tumour evolutionary trajectories remain largely unknown among women of African ancestry. Here, the authors perform whole genome and transcriptome sequencing of Nigerian breast cancer patients and identify unique evolutionary phenomena.

Naser Ansari-Pour
, Yonglan Zheng
& Olufunmilayo I. Olopade

Article
18 November 2021 | Open Access

Jumper enables discontinuous transcript assembly in coronaviruses

@melkebir @psashittal et al. develop a graph-based method for the assembly of discontinuous transcripts produced in Coronaviruses and other Nidovirales, enabling the discovery of transcriptional changes missed by existing methods.

Palash Sashittal
, Chuanyi Zhang
& Mohammed El-Kebir

Article
04 November 2021 | Open Access

Benchmarking pipelines for subclonal deconvolution of bulk tumour sequencing data

Subclonal deconvolution in cancer sequencing data is a complex task, and the optimal tools to use are unclear. Here, the authors systematically benchmark subclonal deconvolution pipelines with a comprehensive set of simulated tumour genomes and identify the best-performing methods.

Georgette Tanner
, David R. Westhead
& Lucy F. Stead

Article
07 September 2021 | Open Access

Different historical generation intervals in human populations inferred from Neanderthal fragment lengths and mutation signatures

Historical interbreeding between Neanderthals and humans should leave signatures of historical demographics in modern human genomes. Analysing the size distribution of Neanderthal fragments in non-African genomes suggests consistent differences in the generation interval across Eurasia, and that this could explain mutational spectrum variation.

Moisès Coll Macià
, Laurits Skov
& Mikkel Heide Schierup

Article
25 August 2021 | Open Access

Evidence for opposing selective forces operating on human-specific duplicated TCAF genes in Neanderthals and humans

Duplications of gene segments can allow novel physiological adaptations to evolve. A detailed analysis of the TCAF gene family in primates and archaic humans suggest rapid duplication and diversification in this gene family is associated with cold or dietary adaptations.

PingHsun Hsieh
, Vy Dang
& Evan E. Eichler

Article
24 August 2021 | Open Access

The landscape of alternative polyadenylation in single cells of the developing mouse embryo

Alternative polyadenylation regulates localization, half-life and translation of mRNA isoforms. Here the authors investigate alternative polyadenylation using single cell RNA sequencing data from mouse embryos and identify 3’-UTR isoforms that are regulated across cell types and developmental time.

Vikram Agarwal
, Sereno Lopez-Darwin
& Jay Shendure

Article
04 August 2021 | Open Access

GIANA allows computationally-efficient TCR clustering and multi-disease repertoire classification by isometric transformation

Grouping T-cell receptors (TCRs) by sequence similarity could lead to new immunological insights. Here, the authors propose a tool that allows the rapid clustering of millions of TCR sequences, identifying TCRs potentially associated with the response to cancer, infectious and autoimmune diseases.

Hongyi Zhang
, Xiaowei Zhan
& Bo Li

Article
03 August 2021 | Open Access

Haploinsufficiency of SF3B2 causes craniofacial microsomia

Despite being a common congenital facial anomaly, the genetic etiology of craniofacial microsomia (CFM) is not well understood. Here, the authors use exome and genome sequencing of 146 individuals with CFM to identify haploinsufficient variants in SF3B2 as a prevalent underlying cause.

Andrew T. Timberlake
, Casey Griffin
& Daniela V. Luquetti

Article
23 July 2021 | Open Access

Strainberry: automated strain separation in low-complexity metagenomes using long reads

Existing long-read de novo assembly methods can partially, but not completely, separate strains. Here, the authors develop Strainberry, a metagenome assembly bioinformatic pipeline that exclusively uses longread data to accurately separate and reconstruct strain genomes from single-sample low-complexity microbiomes.

Riccardo Vicedomini
, Christopher Quince
& Rayan Chikhi

Article
12 July 2021 | Open Access

Profiling variable-number tandem repeat variation across populations using repeat-pangenome graphs

Variable number tandem repeats (VNTRs) are difficult to analyze by short-read sequencing in disease studies. Here, the authors describe a VNTR mapping strategy for short-read analyses using a repeat pangenome graph. This method will help elucidate the contribution of VNTRs to diversity and disease.

Tsung-Yu Lu
, Katherine M. Munson
& Mark J. P. Chaisson

Article
22 June 2021 | Open Access

Comprehensive identification of transposable element insertions using multiple sequencing technologies

Identification of transposable element (TE) insertions from whole genome sequencing data remains challenging. Here the authors developed a comprehensive TE insertion detection algorithm xTea that can be applied to both short-read and long-read sequencing data.

Chong Chu
, Rebeca Borges-Monroy
& Peter J. Park

Article
10 June 2021 | Open Access

Rapid detection of identity-by-descent tracts for mega-scale datasets

Traditional methods to identify genomic regions identical-by-descent (IBD) do not scale well to biobank-level datasets. Here, the authors describe a new IBD algorithm, iLASH, which uses LocAlity-Sensitive Hashing to provide rapid IBD estimation when applied to the PAGE and UK Biobank datasets.

Ruhollah Shemirani
, Gillian M. Belbin
& José Luis Ambite

Article
08 June 2021 | Open Access

Systematic benchmarking of tools for CpG methylation detection from nanopore sequencing

Several existing algorithms predict the methylation of DNA using Nanopore sequencing signals, but it is unclear how they compare in performance. Here, the authors benchmark the performance of several such tools, and propose METEORE, a consensus tool that improves prediction accuracy.

Zaka Wing-Sze Yuen
, Akanksha Srivastava
& Eduardo Eyras

Article
17 May 2021 | Open Access

A global resource for genomic predictions of antimicrobial resistance and surveillance of Salmonella Typhi at pathogenwatch

Whole genome sequencing data are increasingly becoming routinely available but generating actionable insights is challenging. Here, the authors describe Pathogenwatch, a web tool for genomic surveillance of S. Typhi, and demonstrate its use for antimicrobial resistance assignment and strain risk assessment.

Silvia Argimón
, Corin A. Yeats
& David M. Aanensen

Article
28 April 2021 | Open Access

Extended haplotype-phasing of long-read de novo genome assemblies using Hi-C

Methods to produce haplotype-resolved genome assemblies often rely on access to family trios. The authors present FALCON-Phase, a tool that combines ultra-long range Hi-C chromatin interaction data with a long read de novo assembly to extend haplotype phasing to the contig or scaffold level.

Zev N. Kronenberg
, Arang Rhie
& Sarah B. Kingan

Article
13 April 2021 | Open Access

Single cell transcriptional and chromatin accessibility profiling redefine cellular heterogeneity in the adult human kidney

Single cell transcriptomic and epigenomic sequencing of human kidney highlight diverse cell types and states. These findings help characterize a novel population of injured proximal tubule cells and illustrate the power of multi-omic approaches to characterizing human tissue.

Yoshiharu Muto
, Parker C. Wilson
& Benjamin D. Humphreys

Article
12 April 2021 | Open Access

Uncovering transcriptional dark matter via gene annotation independent single-cell RNA sequencing analysis

Conventional single-cell RNA sequencing analysis rely on genome annotations that may be incomplete or inaccurate especially for understudied organisms. Here the authors present a bioinformatic tool that leverages single-cell data to uncover biologically relevant transcripts beyond the best available genome annotation.

Michael F. Z. Wang
, Madhav Mantri
& Iwijn De Vlaminck

Article
25 March 2021 | Open Access

Model-based deep embedding for constrained clustering analysis of single cell RNA-seq data

Clustering cells based on similarities in gene expression is the first step towards identifying cell types in scRNASeq data. Here the authors incorporate biological knowledge into the clustering step to facilitate the biological interpretability of clusters, and subsequent cell type identification.

Tian Tian
, Jie Zhang
& Hakon Hakonarson

Article
19 March 2021 | Open Access

The transcriptional landscape of Shh medulloblastoma

Sonic Hedgehog medulloblastoma (Shh-MB) comprises four subtypes each with distinct clinical traits. Here the authors characterize the genome, transcriptome, and methylome of Shh-MB subtypes, revealing a complex fusion landscape and the molecular convergence of MYCN and cAMP signaling pathways.

Patryk Skowron
, Hamza Farooq
& Michael D. Taylor

Article
12 March 2021 | Open Access

Integration of Alzheimer’s disease genetics and myeloid genomics identifies disease risk regulatory elements and genes

This study integrates Alzheimer’s disease (AD) GWAS data with myeloid cell genomics, and reports that myeloid active enhancers are most burdened by AD risk alleles. The authors also nominate candidate causal regulatory elements, variants and genes that likely modulate the risk for AD.

Gloriia Novikova
, Manav Kapoor
& Alison M. Goate

Article
26 February 2021 | Open Access

PlasmidHawk improves lab of origin prediction of engineered plasmids using sequence alignment

Advances in synthetic biology and genome engineering raise awareness of potential misuse. Here, the authors present PlasmidHawk, a sequence alignment based method for lab-of-origin prediction.

Qi Wang
, Bryce Kille
& Todd J. Treangen

Article
16 February 2021 | Open Access

Analysis of metagenome-assembled viral genomes from the human gut reveals diverse putative CrAss-like phages with unique genomic features

Here, the authors analyze 4907 Circular Metagenome Assembled Genomes from human microbiomes and identify and characterize nearly 600 diverse genomes of crAss-like phages, finding two putative families with unusual genomic features, including high density of self-splicing introns and inteins.

Natalya Yutin
, Sean Benler
& Eugene V. Koonin

Article
08 February 2021 | Open Access

Genome-wide fine-mapping identifies pleiotropic and functional variants that predict many traits across global cattle populations

Genomic prediction of phenotype may be improved by using DNA mutations with functional, evolutionary, and pleiotropic consequences. Here the authors describe a method for genome-wide fine-mapping of QTLs and develop a genotyping array for improved prediction of genetic values for cattle traits.

Ruidong Xiang
, Iona M. MacLeod
& Michael E. Goddard

Genome informatics articles within Nature Communications

Featured

Browse broader subjects

Browse narrower subjects

Search

Quick links