Featured
-
-
Article
| Open AccessDetection and prevalence of SARS-CoV-2 co-infections during the Omicron variant circulation in France
Monitoring of co-infections of SARS-CoV-2 variants is important to evaluate their clinical impact and the risk of emergence of recombinants. Here, the authors develop and validate a methodological pipeline to detect co-infections and apply it to samples from France in early 2022, when Delta and Omicron were co-circulating.
- Antonin Bal
- , Bruno Simon
- & Laurence Josset
-
Article
| Open AccessGenome-centric analysis of short and long read metagenomes reveals uncharacterized microbiome diversity in Southeast Asians
Reference genomes for gut microbiomes help unravel microbial “dark matter” and serve as valuable resource for disease-focused studies. Here, the authors perform short and long read metagenomics and metagenome-assembled genomes analyses to profile the gut microbiome of Southeast Asian populations, revealing significant species and strain-level diversity, with thousands of previously uncharacterized biosynthetic gene clusters.
- Jean-Sebastien Gounot
- , Minghao Chia
- & Niranjan Nagarajan
-
Article
| Open AccessPrevalence and mechanisms of somatic deletions in single human neurons during normal aging and in DNA repair disorders
DNA damage has been implicated in aging and neurodegeneration. Here, the authors develop a bioinformatic method to detect deletions in single neuron genome sequences and reveal an increased burden of somatic deletions during aging and in DNA repair disorders.
- Junho Kim
- , August Yue Huang
- & Eunjung Alice Lee
-
Article
| Open AccessDeciphering microbial gene function using natural language processing
The function of many microbial genes is yet unknown. Here the authors repurposed natural language processing algorithms to explore “gene semantics” and infer function for thousands of genes with defense and secretion systems found to have the most discovery potential.
- Danielle Miller
- , Adi Stern
- & David Burstein
-
Article
| Open AccessModeling tissue-specific breakpoint proximity of structural variations from whole-genomes to identify cancer drivers
Identifying structural variants (SVs) under positive selection in cancer is challenging. Here, the authors develop CSVDriver, a method that computes SV breakpoint proximity and the contribution of elements such as topologically associating domains, and identifies loci that show signs of positive selection and contain known and putative drivers.
- Alexander Martinez-Fundichely
- , Austin Dixon
- & Ekta Khurana
-
Article
| Open AccessThe mutational signatures of formalin fixation on the human genome
Many archived tumour samples are stored as formalin-fixed and paraffin-embedded (FFPE) blocks, but this treatment can impact downstream genomics analyses. Here, the authors derive the mutational signatures of formalin on the cancer genome, and present FFPEsig, an algorithm that can distinguish and correct FFPE mutational signatures in archived cancer samples.
- Qingli Guo
- , Eszter Lakatos
- & Ville Mustonen
-
Article
| Open AccessScarf enables a highly memory-efficient analysis of large-scale single-cell genomics data
As the scale of single-cell genomics experiments grows into the millions, the computational requirements to process this data are beyond the reach of many. Here the authors present Scarf, a modularly designed Python package that makes the analysis workflow highly memory efficient such that even the largest existing datasets can be analyzed on an average modern laptop.
- Parashar Dhapola
- , Johan Rodhe
- & Göran Karlsson
-
Article
| Open AccessPan-African genome demonstrates how population-specific genome graphs improve high-throughput sequencing data analysis
Graph-based genome reference representations have seen significant development, motivated by the inadequacy of the current human genome reference to represent the diverse genetic information from different human populations and its inability to maintain the same level of accuracy for non-European ancestries. Here the authors present the case for iteratively augmenting tailored genome graphs for targeted populations and demonstrate this approach on the whole-genome samples of African ancestry.
- H. Serhat Tetikol
- , Deniz Turgut
- & Brandi N. Davis-Dusenbery
-
Article
| Open AccessExtreme purifying selection against point mutations in the human genome
Previous work has investigated selection in the coding genome, but it is not as well characterized in the non-coding genome. By analyzing rare variants in 70k genome sequences from gnomAD, the authors detect very strong purifying selection ("ultraselection”) across the human genome, finding it in some microRNAs and coding sequences but generally rare in regulatory sequences.
- Noah Dukler
- , Mehreen R. Mughal
- & Adam Siepel
-
Article
| Open AccessIdentifying multicellular spatiotemporal organization of cells with SpaceFlow
A critical task in spatial transcriptomics analysis is to understand inherently spatial relationships among cells. Here, the authors present a deep learning framework to integrate spatial and transcriptional information, spatially extending pseudotime and revealing spatiotemporal organization of cells.
- Honglei Ren
- , Benjamin L. Walker
- & Qing Nie
-
Article
| Open AccessOccult polyclonality of preclinical pancreatic cancer models drives in vitro evolution
It is unclear if the molecular profiles of pancreatic ductal adenocarcinoma (PDAC) preclinical models remain stable during propagation. Here, the authors characterise clonal evolution throughout propagation in PDAC cell lines and a patient-derived organoid using single-cell genomics, transcriptomics and epigenomics.
- Maria E. Monberg
- , Heather Geiger
- & Anirban Maitra
-
Article
| Open AccessRobust and accurate estimation of paralog-specific copy number for duplicated genes using whole-genome sequencing
Low-copy repeats cover up to 5% of the human genome and are prone to extensive copy number variation. Here, the authors present a novel computational method to estimate paralog-specific copy number of such regions using whole-genome sequencing.
- Timofey Prodanov
- & Vikas Bansal
-
Article
| Open AccessStructural variant-based pangenome construction has low sensitivity to variability of haplotype-resolved bovine assemblies
Pangenomes have a number of advantages over linear reference assemblies. Here the authors use bovine haplotype-resolved assemblies to show that structural variant-based pangenomes are consistent regardless of sequence platform, assembler, or coverage, suggesting that rigid protocols may not be required.
- Alexander S. Leonard
- , Danang Crysnanto
- & Hubert Pausch
-
Article
| Open AccessGenomic analyses of 10,376 individuals in the Westlake BioBank for Chinese (WBBC) pilot project
Biobanks of genetic data have been primarily in European populations, which gives us an incomplete understanding of complex traits across populations. Here, the authors initiate the Westlake BioBank for Chinese (WBBC) pilot project with 4,535 whole genome sequences and 5,841 high-density genotypes from China, characterizing large-scale genomic variation in Chinese populations.
- Pei-Kuan Cong
- , Wei-Yang Bai
- & Hou-Feng Zheng
-
Article
| Open AccessAn integral genomic signature approach for tailored cancer therapy using genome-wide sequencing data
Predicting drug responses in cancer patients requires robust computational frameworks. Here, the authors develop an integral genomic signature —iGenSig— approach to predict drug responses using multi-omics data from tumour samples, and validate this approach using genomic datasets from multiple clinical studies.
- Xiao-Song Wang
- , Sanghoon Lee
- & Yue Wang
-
Article
| Open Access3D chromatin remodelling in the germ line modulates genome evolutionary plasticity
The role of genome folding in the heritability and evolvability of structural variations is not well understood. Here the authors investigate the impact of the three-dimensional genome topology of germ cells in the formation and transmission of gross structural genomic changes detected from comparing whole-genome sequences of 14 rodent species.
- Lucía Álvarez-González
- , Frances Burden
- & Aurora Ruiz-Herrera
-
Article
| Open AccessAn evolutionarily conserved stop codon enrichment at the 5′ ends of mammalian piRNAs
Piwi-interacting RNAs are small RNAs produced by processing of long precursor transcripts. Here the authors report that precursor cleavage typically occurs at positions corresponding to stop codons and that this pattern is conserved across mammals.
- Susanne Bornelöv
- , Benjamin Czech
- & Gregory J. Hannon
-
Article
| Open AccessDifferences in RNA polymerase II complexes and their interactions with surrounding chromatin on human and cytomegalovirus genomes
Here the authors digested chromatin with DNA fragmentation factor (DFF) prior to chromatin immunoprecipitation (DFF-ChIP) to depict transcription complex interactions with neighboring nucleosomes in cells. Applying this method to human cytomegalovirus (HMCV)-infected cells, they find that the viral genome is underchromatinized, leading to fewer transcription complex interactions with nucleosomes.
- Benjamin M. Spector
- , Mrutyunjaya Parida
- & David H. Price
-
Article
| Open AccessAnopheles mosquitoes reveal new principles of 3D genome organization in insects
Anopheles mosquitoes are vectors of human malaria, and better understanding of them has implications for public health. Here, the authors apply Hi-C, FISH, RNA-seq, and ChIP-seq techniques to comprehensively characterize chromatin architecture and its evolutionary dynamics in five Anopheles species.
- Varvara Lukyanchikova
- , Miroslav Nuriddinov
- & Veniamin Fishman
-
Article
| Open AccessZoonotic origin of the human malaria parasite Plasmodium malariae from African apes
Plasmodium malariae is a cause of malaria in humans and related species have been identified in non-human primates. Here, the authors use genomic analyses to establish that human P. malariae arose from a host switch of an ape parasite whilst a species infecting New World monkeys can be traced to a reverse zoonosis.
- Lindsey J. Plenderleith
- , Weimin Liu
- & Paul M. Sharp
-
Article
| Open AccessEmpirical prediction of variant-activated cryptic splice donors using population-based RNA-Seq data
Genetic variants affecting the consensus splicing motifs can alter binding of spliceosomal components and induce mis-splicing. Here, the authors develop a method, showing that ranking the most common recurring mis-splicing events in public RNA-Seq data can predict the activation of cryptic-donors.
- Ruebena Dawes
- , Himanshu Joshi
- & Sandra T. Cooper
-
Article
| Open AccessAntimicrobial resistance and population genomics of multidrug-resistant Escherichia coli in pig farms in mainland China
Use of antimicrobials in livestock contributes to development of antimicrobial resistance but there are few large-scale surveillance studies. Here, the authors describe E. coli surveillance in pig farms in China, reporting high levels of multidrug-resistance across all mainland provinces.
- Zhong Peng
- , Zizhe Hu
- & Xiangru Wang
-
Article
| Open AccessGenome binning of viral entities from bulk metagenomics data
Here, Johansen et al. develop an approach, Phages from Metagenomics Binning (PHAMB), that allows the binning of thousands of viral genomes directly from bulk metagenomics data, while simultaneously enabling clustering of viral genomes into accurate taxonomic viral populations, unveiling viral-microbial host interactions in the gut.
- Joachim Johansen
- , Damian R. Plichta
- & Simon Rasmussen
-
Article
| Open AccessPopulation structure analysis and laboratory monitoring of Shigella by core-genome multilocus sequence typing
Lab-based surveillance of Shigella has traditionally been based on serotyping but increasing availability of whole genome sequencing could enable higher resolution typing. Here, the authors apply a core genome multilocus sequence typing scheme to Shigella sequence data and describe its population structure.
- Iman Yassine
- , Sophie Lefèvre
- & François-Xavier Weill
-
Article
| Open AccessEpigenetic aging of the demographically non-aging naked mole-rat
The exceptionally long-lived naked mole-rat is characterized by the lack of increased mortality with aging. Here the authors perform epigenetic studies to show that naked mole-rats epigenetically age despite their non-increasing mortality rate.
- Csaba Kerepesi
- , Margarita V. Meer
- & Vadim N. Gladyshev
-
Article
| Open AccessPrediction of biomarkers and therapeutic combinations for anti-PD-1 immunotherapy using the global gene network association
A lot of cancer patients are not responsive to anti-PD1 therapy. Here, the authors develop a network approach to identify genes, pathways and potential therapeutic combinations and develop an MHC-I gene immunoscore associated with tumour response to anti-PD1.
- Chia-Chin Wu
- , Y. Alan Wang
- & P. Andrew Futreal
-
Article
| Open AccessWhole-genome analysis of Nigerian patients with breast cancer reveals ethnic-driven somatic evolution and distinct genomic subtypes
Breast cancer heterogeneity and tumour evolutionary trajectories remain largely unknown among women of African ancestry. Here, the authors perform whole genome and transcriptome sequencing of Nigerian breast cancer patients and identify unique evolutionary phenomena.
- Naser Ansari-Pour
- , Yonglan Zheng
- & Olufunmilayo I. Olopade
-
Article
| Open AccessJumper enables discontinuous transcript assembly in coronaviruses
@melkebir @psashittal et al. develop a graph-based method for the assembly of discontinuous transcripts produced in Coronaviruses and other Nidovirales, enabling the discovery of transcriptional changes missed by existing methods.
- Palash Sashittal
- , Chuanyi Zhang
- & Mohammed El-Kebir
-
Article
| Open AccessBenchmarking pipelines for subclonal deconvolution of bulk tumour sequencing data
Subclonal deconvolution in cancer sequencing data is a complex task, and the optimal tools to use are unclear. Here, the authors systematically benchmark subclonal deconvolution pipelines with a comprehensive set of simulated tumour genomes and identify the best-performing methods.
- Georgette Tanner
- , David R. Westhead
- & Lucy F. Stead
-
Article
| Open AccessDifferent historical generation intervals in human populations inferred from Neanderthal fragment lengths and mutation signatures
Historical interbreeding between Neanderthals and humans should leave signatures of historical demographics in modern human genomes. Analysing the size distribution of Neanderthal fragments in non-African genomes suggests consistent differences in the generation interval across Eurasia, and that this could explain mutational spectrum variation.
- Moisès Coll Macià
- , Laurits Skov
- & Mikkel Heide Schierup
-
Article
| Open AccessEvidence for opposing selective forces operating on human-specific duplicated TCAF genes in Neanderthals and humans
Duplications of gene segments can allow novel physiological adaptations to evolve. A detailed analysis of the TCAF gene family in primates and archaic humans suggest rapid duplication and diversification in this gene family is associated with cold or dietary adaptations.
- PingHsun Hsieh
- , Vy Dang
- & Evan E. Eichler
-
Article
| Open AccessThe landscape of alternative polyadenylation in single cells of the developing mouse embryo
Alternative polyadenylation regulates localization, half-life and translation of mRNA isoforms. Here the authors investigate alternative polyadenylation using single cell RNA sequencing data from mouse embryos and identify 3’-UTR isoforms that are regulated across cell types and developmental time.
- Vikram Agarwal
- , Sereno Lopez-Darwin
- & Jay Shendure
-
Article
| Open AccessGIANA allows computationally-efficient TCR clustering and multi-disease repertoire classification by isometric transformation
Grouping T-cell receptors (TCRs) by sequence similarity could lead to new immunological insights. Here, the authors propose a tool that allows the rapid clustering of millions of TCR sequences, identifying TCRs potentially associated with the response to cancer, infectious and autoimmune diseases.
- Hongyi Zhang
- , Xiaowei Zhan
- & Bo Li
-
Article
| Open AccessHaploinsufficiency of SF3B2 causes craniofacial microsomia
Despite being a common congenital facial anomaly, the genetic etiology of craniofacial microsomia (CFM) is not well understood. Here, the authors use exome and genome sequencing of 146 individuals with CFM to identify haploinsufficient variants in SF3B2 as a prevalent underlying cause.
- Andrew T. Timberlake
- , Casey Griffin
- & Daniela V. Luquetti
-
Article
| Open AccessStrainberry: automated strain separation in low-complexity metagenomes using long reads
Existing long-read de novo assembly methods can partially, but not completely, separate strains. Here, the authors develop Strainberry, a metagenome assembly bioinformatic pipeline that exclusively uses longread data to accurately separate and reconstruct strain genomes from single-sample low-complexity microbiomes.
- Riccardo Vicedomini
- , Christopher Quince
- & Rayan Chikhi
-
Article
| Open AccessProfiling variable-number tandem repeat variation across populations using repeat-pangenome graphs
Variable number tandem repeats (VNTRs) are difficult to analyze by short-read sequencing in disease studies. Here, the authors describe a VNTR mapping strategy for short-read analyses using a repeat pangenome graph. This method will help elucidate the contribution of VNTRs to diversity and disease.
- Tsung-Yu Lu
- , Katherine M. Munson
- & Mark J. P. Chaisson
-
Article
| Open AccessComprehensive identification of transposable element insertions using multiple sequencing technologies
Identification of transposable element (TE) insertions from whole genome sequencing data remains challenging. Here the authors developed a comprehensive TE insertion detection algorithm xTea that can be applied to both short-read and long-read sequencing data.
- Chong Chu
- , Rebeca Borges-Monroy
- & Peter J. Park
-
Article
| Open AccessRapid detection of identity-by-descent tracts for mega-scale datasets
Traditional methods to identify genomic regions identical-by-descent (IBD) do not scale well to biobank-level datasets. Here, the authors describe a new IBD algorithm, iLASH, which uses LocAlity-Sensitive Hashing to provide rapid IBD estimation when applied to the PAGE and UK Biobank datasets.
- Ruhollah Shemirani
- , Gillian M. Belbin
- & José Luis Ambite
-
Article
| Open AccessSystematic benchmarking of tools for CpG methylation detection from nanopore sequencing
Several existing algorithms predict the methylation of DNA using Nanopore sequencing signals, but it is unclear how they compare in performance. Here, the authors benchmark the performance of several such tools, and propose METEORE, a consensus tool that improves prediction accuracy.
- Zaka Wing-Sze Yuen
- , Akanksha Srivastava
- & Eduardo Eyras
-
Article
| Open AccessA global resource for genomic predictions of antimicrobial resistance and surveillance of Salmonella Typhi at pathogenwatch
Whole genome sequencing data are increasingly becoming routinely available but generating actionable insights is challenging. Here, the authors describe Pathogenwatch, a web tool for genomic surveillance of S. Typhi, and demonstrate its use for antimicrobial resistance assignment and strain risk assessment.
- Silvia Argimón
- , Corin A. Yeats
- & David M. Aanensen
-
Article
| Open AccessExtended haplotype-phasing of long-read de novo genome assemblies using Hi-C
Methods to produce haplotype-resolved genome assemblies often rely on access to family trios. The authors present FALCON-Phase, a tool that combines ultra-long range Hi-C chromatin interaction data with a long read de novo assembly to extend haplotype phasing to the contig or scaffold level.
- Zev N. Kronenberg
- , Arang Rhie
- & Sarah B. Kingan
-
Article
| Open AccessSingle cell transcriptional and chromatin accessibility profiling redefine cellular heterogeneity in the adult human kidney
Single cell transcriptomic and epigenomic sequencing of human kidney highlight diverse cell types and states. These findings help characterize a novel population of injured proximal tubule cells and illustrate the power of multi-omic approaches to characterizing human tissue.
- Yoshiharu Muto
- , Parker C. Wilson
- & Benjamin D. Humphreys
-
Article
| Open AccessUncovering transcriptional dark matter via gene annotation independent single-cell RNA sequencing analysis
Conventional single-cell RNA sequencing analysis rely on genome annotations that may be incomplete or inaccurate especially for understudied organisms. Here the authors present a bioinformatic tool that leverages single-cell data to uncover biologically relevant transcripts beyond the best available genome annotation.
- Michael F. Z. Wang
- , Madhav Mantri
- & Iwijn De Vlaminck
-
Article
| Open AccessModel-based deep embedding for constrained clustering analysis of single cell RNA-seq data
Clustering cells based on similarities in gene expression is the first step towards identifying cell types in scRNASeq data. Here the authors incorporate biological knowledge into the clustering step to facilitate the biological interpretability of clusters, and subsequent cell type identification.
- Tian Tian
- , Jie Zhang
- & Hakon Hakonarson
-
Article
| Open AccessThe transcriptional landscape of Shh medulloblastoma
Sonic Hedgehog medulloblastoma (Shh-MB) comprises four subtypes each with distinct clinical traits. Here the authors characterize the genome, transcriptome, and methylome of Shh-MB subtypes, revealing a complex fusion landscape and the molecular convergence of MYCN and cAMP signaling pathways.
- Patryk Skowron
- , Hamza Farooq
- & Michael D. Taylor
-
Article
| Open AccessIntegration of Alzheimer’s disease genetics and myeloid genomics identifies disease risk regulatory elements and genes
This study integrates Alzheimer’s disease (AD) GWAS data with myeloid cell genomics, and reports that myeloid active enhancers are most burdened by AD risk alleles. The authors also nominate candidate causal regulatory elements, variants and genes that likely modulate the risk for AD.
- Gloriia Novikova
- , Manav Kapoor
- & Alison M. Goate
-
Article
| Open AccessPlasmidHawk improves lab of origin prediction of engineered plasmids using sequence alignment
Advances in synthetic biology and genome engineering raise awareness of potential misuse. Here, the authors present PlasmidHawk, a sequence alignment based method for lab-of-origin prediction.
- Qi Wang
- , Bryce Kille
- & Todd J. Treangen
-
Article
| Open AccessAnalysis of metagenome-assembled viral genomes from the human gut reveals diverse putative CrAss-like phages with unique genomic features
Here, the authors analyze 4907 Circular Metagenome Assembled Genomes from human microbiomes and identify and characterize nearly 600 diverse genomes of crAss-like phages, finding two putative families with unusual genomic features, including high density of self-splicing introns and inteins.
- Natalya Yutin
- , Sean Benler
- & Eugene V. Koonin
-
Article
| Open AccessGenome-wide fine-mapping identifies pleiotropic and functional variants that predict many traits across global cattle populations
Genomic prediction of phenotype may be improved by using DNA mutations with functional, evolutionary, and pleiotropic consequences. Here the authors describe a method for genome-wide fine-mapping of QTLs and develop a genotyping array for improved prediction of genetic values for cattle traits.
- Ruidong Xiang
- , Iona M. MacLeod
- & Michael E. Goddard