-
-
Article
| Open AccessA thousand-genome panel retraces the global spread and adaptation of a major fungal crop pathogen
Zymoseptoria tritici is an important fungal pathogen of wheat which has spread globally. Here, the authors perform genomic analyses on a collection of ~1100 Z. tritici samples from 42 countries to describe its global spread and elucidate mechanisms of adaptation to different environmental conditions.
- Alice Feurtey
- , Cécile Lorrain
- & Daniel Croll
-
Article
| Open AccessA molecular atlas reveals the tri-sectional spinning mechanism of spider dragline silk
The genetic basis of spider major ampullate (Ma) gland silk production remains unknown. Hu et al. unveil a molecular atlas of this gland for the golden orb-weaving spider combining genome assembly and multiomics, revealing the single-cell spatial architecture of silk production in the Ma gland.
- Wenbo Hu
- , Anqiang Jia
- & Yi Wang
-
Article
| Open AccessAmniotes co-opt intrinsic genetic instability to protect germ-line genome integrity
Pachytene Piwi-interacting RNAs (piRNAs) expressed in mammalian germ lines are abundant, but their evolution and function are not fully understood. Here, the authors find that pachytene piRNA loci are hotspots of structural variation, which underlies rapid piRNA birth, divergence, and loss.
- Yu H. Sun
- , Hongxiao Cui
- & Xin Zhiguo Li
-
Article
| Open AccessDeciphering the exact breakpoints of structural variations using long sequencing reads with DeBreak
Long-read sequencing is promising for the detection of structural variants (SVs), which requires algorithms with high sensitivity and precision. Here, the authors develop DeBreak, an algorithm for comprehensive and accurate SV detection in long-read sequencing data across different platforms, which outperforms other SV callers.
- Yu Chen
- , Amy Y. Wang
- & Zechen Chong
-
Article
| Open AccessGALA: a computational framework for de novo chromosome-by-chromosome assembly with long reads
Genomes usually contain multiple chromosomes. The paper reports on GALA, a computational framework for chromosome-based sequencing data separation and gap-free de novo assembly. It allows integration of different sources of data.
- Mohamed Awad
- & Xiangchao Gan
-
Article
| Open AccessThousands of human non-AUG extended proteoforms lack evidence of evolutionary selection among mammals
Analysis of a large number of Ribo-seq datasets and genomic alignments led to detection of novel non-AUG proteoforms. Unexpectedly the number of non-AUG proteoforms identified with Ribo-seq greatly exceeds those with strong phylogenetic support.
- Alla D. Fedorova
- , Stephen J. Kiniry
- & Pavel V. Baranov
-
Article
| Open AccessGraph-based pangenomics maximizes genotyping density and reveals structural impacts on fungal resistance in melon
The power of pangenomic graphs to improve genetic mapping is still unclear. Here, the authors demonstrate its value in identification of genetic variants associated with disease resistance traits in melon using PanPipes, a pangenome construction and low-coverage genotype-by-sequencing pipeline.
- Justin N. Vaughn
- , Sandra E. Branham
- & William P. Wechter
-
Article
| Open AccessReference panel guided topological structure annotation of Hi-C data
Predicting topological structures from Hi-C data provides insight into comprehending gene expression and regulation. Here, the authors present RefHiC, an attention-based deep learning framework that leverages a reference panel of Hi-C datasets to assist topological structure annotation from a given study sample.
- Yanlin Zhang
- & Mathieu Blanchette
-
Article
| Open AccessTransposable element-mediated rearrangements are prevalent in human genomes
Here the authors show that transposable element-mediated rearrangements impact more than 500 kbp of an average human genome, are a source of individual variation, a substrate for evolutionary change, and can occur through diverse mechanisms.
- Parithi Balachandran
- , Isha A. Walawalkar
- & Christine R. Beck
-
Article
| Open AccessVeChat: correcting errors in long reads using variation graphs
Consensus sequence-based methods for self-correction of long-read sequencing data are affected by biases that can mask true variants characterizing little-covered or low-frequency haplotypes. Here, to address this issue, the authors develop a variation graph-based method for performing haplotype-aware self-correction of long reads.
- Xiao Luo
- , Xiongbin Kang
- & Alexander Schönhuth
-
Article
| Open AccessStrain level microbial detection and quantification with applications to single cell metagenomics
Here the authors develop CAMMiQ, a combinatorial optimization approach that utilizes variable length, “unique” and “doubly-unique” genomic segments, showing improves identification and quantification of distinct microbes in metagenomic sequence data.
- Kaiyuan Zhu
- , Alejandro A. Schäffer
- & S. Cenk Sahinalp
-
Article
| Open AccessDetection and prevalence of SARS-CoV-2 co-infections during the Omicron variant circulation in France
Monitoring of co-infections of SARS-CoV-2 variants is important to evaluate their clinical impact and the risk of emergence of recombinants. Here, the authors develop and validate a methodological pipeline to detect co-infections and apply it to samples from France in early 2022, when Delta and Omicron were co-circulating.
- Antonin Bal
- , Bruno Simon
- & Laurence Josset
-
Article
| Open AccessGenome-centric analysis of short and long read metagenomes reveals uncharacterized microbiome diversity in Southeast Asians
Reference genomes for gut microbiomes help unravel microbial “dark matter” and serve as valuable resource for disease-focused studies. Here, the authors perform short and long read metagenomics and metagenome-assembled genomes analyses to profile the gut microbiome of Southeast Asian populations, revealing significant species and strain-level diversity, with thousands of previously uncharacterized biosynthetic gene clusters.
- Jean-Sebastien Gounot
- , Minghao Chia
- & Niranjan Nagarajan
-
Article
| Open AccessPrevalence and mechanisms of somatic deletions in single human neurons during normal aging and in DNA repair disorders
DNA damage has been implicated in aging and neurodegeneration. Here, the authors develop a bioinformatic method to detect deletions in single neuron genome sequences and reveal an increased burden of somatic deletions during aging and in DNA repair disorders.
- Junho Kim
- , August Yue Huang
- & Eunjung Alice Lee
-
Article
| Open AccessDeciphering microbial gene function using natural language processing
The function of many microbial genes is yet unknown. Here the authors repurposed natural language processing algorithms to explore “gene semantics” and infer function for thousands of genes with defense and secretion systems found to have the most discovery potential.
- Danielle Miller
- , Adi Stern
- & David Burstein
-
Article
| Open AccessModeling tissue-specific breakpoint proximity of structural variations from whole-genomes to identify cancer drivers
Identifying structural variants (SVs) under positive selection in cancer is challenging. Here, the authors develop CSVDriver, a method that computes SV breakpoint proximity and the contribution of elements such as topologically associating domains, and identifies loci that show signs of positive selection and contain known and putative drivers.
- Alexander Martinez-Fundichely
- , Austin Dixon
- & Ekta Khurana
-
Article
| Open AccessThe mutational signatures of formalin fixation on the human genome
Many archived tumour samples are stored as formalin-fixed and paraffin-embedded (FFPE) blocks, but this treatment can impact downstream genomics analyses. Here, the authors derive the mutational signatures of formalin on the cancer genome, and present FFPEsig, an algorithm that can distinguish and correct FFPE mutational signatures in archived cancer samples.
- Qingli Guo
- , Eszter Lakatos
- & Ville Mustonen
-
Article
| Open AccessScarf enables a highly memory-efficient analysis of large-scale single-cell genomics data
As the scale of single-cell genomics experiments grows into the millions, the computational requirements to process this data are beyond the reach of many. Here the authors present Scarf, a modularly designed Python package that makes the analysis workflow highly memory efficient such that even the largest existing datasets can be analyzed on an average modern laptop.
- Parashar Dhapola
- , Johan Rodhe
- & Göran Karlsson
-
Article
| Open AccessPan-African genome demonstrates how population-specific genome graphs improve high-throughput sequencing data analysis
Graph-based genome reference representations have seen significant development, motivated by the inadequacy of the current human genome reference to represent the diverse genetic information from different human populations and its inability to maintain the same level of accuracy for non-European ancestries. Here the authors present the case for iteratively augmenting tailored genome graphs for targeted populations and demonstrate this approach on the whole-genome samples of African ancestry.
- H. Serhat Tetikol
- , Deniz Turgut
- & Brandi N. Davis-Dusenbery
-
Article
| Open AccessExtreme purifying selection against point mutations in the human genome
Previous work has investigated selection in the coding genome, but it is not as well characterized in the non-coding genome. By analyzing rare variants in 70k genome sequences from gnomAD, the authors detect very strong purifying selection ("ultraselection”) across the human genome, finding it in some microRNAs and coding sequences but generally rare in regulatory sequences.
- Noah Dukler
- , Mehreen R. Mughal
- & Adam Siepel
-
Article
| Open AccessIdentifying multicellular spatiotemporal organization of cells with SpaceFlow
A critical task in spatial transcriptomics analysis is to understand inherently spatial relationships among cells. Here, the authors present a deep learning framework to integrate spatial and transcriptional information, spatially extending pseudotime and revealing spatiotemporal organization of cells.
- Honglei Ren
- , Benjamin L. Walker
- & Qing Nie
-
Article
| Open AccessOccult polyclonality of preclinical pancreatic cancer models drives in vitro evolution
It is unclear if the molecular profiles of pancreatic ductal adenocarcinoma (PDAC) preclinical models remain stable during propagation. Here, the authors characterise clonal evolution throughout propagation in PDAC cell lines and a patient-derived organoid using single-cell genomics, transcriptomics and epigenomics.
- Maria E. Monberg
- , Heather Geiger
- & Anirban Maitra
-
Article
| Open AccessRobust and accurate estimation of paralog-specific copy number for duplicated genes using whole-genome sequencing
Low-copy repeats cover up to 5% of the human genome and are prone to extensive copy number variation. Here, the authors present a novel computational method to estimate paralog-specific copy number of such regions using whole-genome sequencing.
- Timofey Prodanov
- & Vikas Bansal
-
Article
| Open AccessStructural variant-based pangenome construction has low sensitivity to variability of haplotype-resolved bovine assemblies
Pangenomes have a number of advantages over linear reference assemblies. Here the authors use bovine haplotype-resolved assemblies to show that structural variant-based pangenomes are consistent regardless of sequence platform, assembler, or coverage, suggesting that rigid protocols may not be required.
- Alexander S. Leonard
- , Danang Crysnanto
- & Hubert Pausch
-
Article
| Open AccessGenomic analyses of 10,376 individuals in the Westlake BioBank for Chinese (WBBC) pilot project
Biobanks of genetic data have been primarily in European populations, which gives us an incomplete understanding of complex traits across populations. Here, the authors initiate the Westlake BioBank for Chinese (WBBC) pilot project with 4,535 whole genome sequences and 5,841 high-density genotypes from China, characterizing large-scale genomic variation in Chinese populations.
- Pei-Kuan Cong
- , Wei-Yang Bai
- & Hou-Feng Zheng
-
Article
| Open AccessAn integral genomic signature approach for tailored cancer therapy using genome-wide sequencing data
Predicting drug responses in cancer patients requires robust computational frameworks. Here, the authors develop an integral genomic signature —iGenSig— approach to predict drug responses using multi-omics data from tumour samples, and validate this approach using genomic datasets from multiple clinical studies.
- Xiao-Song Wang
- , Sanghoon Lee
- & Yue Wang
-
Article
| Open Access3D chromatin remodelling in the germ line modulates genome evolutionary plasticity
The role of genome folding in the heritability and evolvability of structural variations is not well understood. Here the authors investigate the impact of the three-dimensional genome topology of germ cells in the formation and transmission of gross structural genomic changes detected from comparing whole-genome sequences of 14 rodent species.
- Lucía Álvarez-González
- , Frances Burden
- & Aurora Ruiz-Herrera
-
Article
| Open AccessAn evolutionarily conserved stop codon enrichment at the 5′ ends of mammalian piRNAs
Piwi-interacting RNAs are small RNAs produced by processing of long precursor transcripts. Here the authors report that precursor cleavage typically occurs at positions corresponding to stop codons and that this pattern is conserved across mammals.
- Susanne Bornelöv
- , Benjamin Czech
- & Gregory J. Hannon
-
Article
| Open AccessDifferences in RNA polymerase II complexes and their interactions with surrounding chromatin on human and cytomegalovirus genomes
Here the authors digested chromatin with DNA fragmentation factor (DFF) prior to chromatin immunoprecipitation (DFF-ChIP) to depict transcription complex interactions with neighboring nucleosomes in cells. Applying this method to human cytomegalovirus (HMCV)-infected cells, they find that the viral genome is underchromatinized, leading to fewer transcription complex interactions with nucleosomes.
- Benjamin M. Spector
- , Mrutyunjaya Parida
- & David H. Price
-
Article
| Open AccessAnopheles mosquitoes reveal new principles of 3D genome organization in insects
Anopheles mosquitoes are vectors of human malaria, and better understanding of them has implications for public health. Here, the authors apply Hi-C, FISH, RNA-seq, and ChIP-seq techniques to comprehensively characterize chromatin architecture and its evolutionary dynamics in five Anopheles species.
- Varvara Lukyanchikova
- , Miroslav Nuriddinov
- & Veniamin Fishman
-
Article
| Open AccessZoonotic origin of the human malaria parasite Plasmodium malariae from African apes
Plasmodium malariae is a cause of malaria in humans and related species have been identified in non-human primates. Here, the authors use genomic analyses to establish that human P. malariae arose from a host switch of an ape parasite whilst a species infecting New World monkeys can be traced to a reverse zoonosis.
- Lindsey J. Plenderleith
- , Weimin Liu
- & Paul M. Sharp
-
Article
| Open AccessEmpirical prediction of variant-activated cryptic splice donors using population-based RNA-Seq data
Genetic variants affecting the consensus splicing motifs can alter binding of spliceosomal components and induce mis-splicing. Here, the authors develop a method, showing that ranking the most common recurring mis-splicing events in public RNA-Seq data can predict the activation of cryptic-donors.
- Ruebena Dawes
- , Himanshu Joshi
- & Sandra T. Cooper
-
Article
| Open AccessAntimicrobial resistance and population genomics of multidrug-resistant Escherichia coli in pig farms in mainland China
Use of antimicrobials in livestock contributes to development of antimicrobial resistance but there are few large-scale surveillance studies. Here, the authors describe E. coli surveillance in pig farms in China, reporting high levels of multidrug-resistance across all mainland provinces.
- Zhong Peng
- , Zizhe Hu
- & Xiangru Wang
-
Article
| Open AccessGenome binning of viral entities from bulk metagenomics data
Here, Johansen et al. develop an approach, Phages from Metagenomics Binning (PHAMB), that allows the binning of thousands of viral genomes directly from bulk metagenomics data, while simultaneously enabling clustering of viral genomes into accurate taxonomic viral populations, unveiling viral-microbial host interactions in the gut.
- Joachim Johansen
- , Damian R. Plichta
- & Simon Rasmussen
-
Article
| Open AccessPopulation structure analysis and laboratory monitoring of Shigella by core-genome multilocus sequence typing
Lab-based surveillance of Shigella has traditionally been based on serotyping but increasing availability of whole genome sequencing could enable higher resolution typing. Here, the authors apply a core genome multilocus sequence typing scheme to Shigella sequence data and describe its population structure.
- Iman Yassine
- , Sophie Lefèvre
- & François-Xavier Weill
-
Article
| Open AccessEpigenetic aging of the demographically non-aging naked mole-rat
The exceptionally long-lived naked mole-rat is characterized by the lack of increased mortality with aging. Here the authors perform epigenetic studies to show that naked mole-rats epigenetically age despite their non-increasing mortality rate.
- Csaba Kerepesi
- , Margarita V. Meer
- & Vadim N. Gladyshev
-
Article
| Open AccessPrediction of biomarkers and therapeutic combinations for anti-PD-1 immunotherapy using the global gene network association
A lot of cancer patients are not responsive to anti-PD1 therapy. Here, the authors develop a network approach to identify genes, pathways and potential therapeutic combinations and develop an MHC-I gene immunoscore associated with tumour response to anti-PD1.
- Chia-Chin Wu
- , Y. Alan Wang
- & P. Andrew Futreal
-
Article
| Open AccessWhole-genome analysis of Nigerian patients with breast cancer reveals ethnic-driven somatic evolution and distinct genomic subtypes
Breast cancer heterogeneity and tumour evolutionary trajectories remain largely unknown among women of African ancestry. Here, the authors perform whole genome and transcriptome sequencing of Nigerian breast cancer patients and identify unique evolutionary phenomena.
- Naser Ansari-Pour
- , Yonglan Zheng
- & Olufunmilayo I. Olopade
-
Article
| Open AccessJumper enables discontinuous transcript assembly in coronaviruses
@melkebir @psashittal et al. develop a graph-based method for the assembly of discontinuous transcripts produced in Coronaviruses and other Nidovirales, enabling the discovery of transcriptional changes missed by existing methods.
- Palash Sashittal
- , Chuanyi Zhang
- & Mohammed El-Kebir
-
Article
| Open AccessBenchmarking pipelines for subclonal deconvolution of bulk tumour sequencing data
Subclonal deconvolution in cancer sequencing data is a complex task, and the optimal tools to use are unclear. Here, the authors systematically benchmark subclonal deconvolution pipelines with a comprehensive set of simulated tumour genomes and identify the best-performing methods.
- Georgette Tanner
- , David R. Westhead
- & Lucy F. Stead
-
Article
| Open AccessDifferent historical generation intervals in human populations inferred from Neanderthal fragment lengths and mutation signatures
Historical interbreeding between Neanderthals and humans should leave signatures of historical demographics in modern human genomes. Analysing the size distribution of Neanderthal fragments in non-African genomes suggests consistent differences in the generation interval across Eurasia, and that this could explain mutational spectrum variation.
- Moisès Coll Macià
- , Laurits Skov
- & Mikkel Heide Schierup
-
Article
| Open AccessEvidence for opposing selective forces operating on human-specific duplicated TCAF genes in Neanderthals and humans
Duplications of gene segments can allow novel physiological adaptations to evolve. A detailed analysis of the TCAF gene family in primates and archaic humans suggest rapid duplication and diversification in this gene family is associated with cold or dietary adaptations.
- PingHsun Hsieh
- , Vy Dang
- & Evan E. Eichler
-
Article
| Open AccessThe landscape of alternative polyadenylation in single cells of the developing mouse embryo
Alternative polyadenylation regulates localization, half-life and translation of mRNA isoforms. Here the authors investigate alternative polyadenylation using single cell RNA sequencing data from mouse embryos and identify 3’-UTR isoforms that are regulated across cell types and developmental time.
- Vikram Agarwal
- , Sereno Lopez-Darwin
- & Jay Shendure
-
Article
| Open AccessGIANA allows computationally-efficient TCR clustering and multi-disease repertoire classification by isometric transformation
Grouping T-cell receptors (TCRs) by sequence similarity could lead to new immunological insights. Here, the authors propose a tool that allows the rapid clustering of millions of TCR sequences, identifying TCRs potentially associated with the response to cancer, infectious and autoimmune diseases.
- Hongyi Zhang
- , Xiaowei Zhan
- & Bo Li
-
Article
| Open AccessHaploinsufficiency of SF3B2 causes craniofacial microsomia
Despite being a common congenital facial anomaly, the genetic etiology of craniofacial microsomia (CFM) is not well understood. Here, the authors use exome and genome sequencing of 146 individuals with CFM to identify haploinsufficient variants in SF3B2 as a prevalent underlying cause.
- Andrew T. Timberlake
- , Casey Griffin
- & Daniela V. Luquetti
-
Article
| Open AccessStrainberry: automated strain separation in low-complexity metagenomes using long reads
Existing long-read de novo assembly methods can partially, but not completely, separate strains. Here, the authors develop Strainberry, a metagenome assembly bioinformatic pipeline that exclusively uses longread data to accurately separate and reconstruct strain genomes from single-sample low-complexity microbiomes.
- Riccardo Vicedomini
- , Christopher Quince
- & Rayan Chikhi
-
Article
| Open AccessProfiling variable-number tandem repeat variation across populations using repeat-pangenome graphs
Variable number tandem repeats (VNTRs) are difficult to analyze by short-read sequencing in disease studies. Here, the authors describe a VNTR mapping strategy for short-read analyses using a repeat pangenome graph. This method will help elucidate the contribution of VNTRs to diversity and disease.
- Tsung-Yu Lu
- , Katherine M. Munson
- & Mark J. P. Chaisson
-
Article
| Open AccessComprehensive identification of transposable element insertions using multiple sequencing technologies
Identification of transposable element (TE) insertions from whole genome sequencing data remains challenging. Here the authors developed a comprehensive TE insertion detection algorithm xTea that can be applied to both short-read and long-read sequencing data.
- Chong Chu
- , Rebeca Borges-Monroy
- & Peter J. Park
-
Article
| Open AccessRapid detection of identity-by-descent tracts for mega-scale datasets
Traditional methods to identify genomic regions identical-by-descent (IBD) do not scale well to biobank-level datasets. Here, the authors describe a new IBD algorithm, iLASH, which uses LocAlity-Sensitive Hashing to provide rapid IBD estimation when applied to the PAGE and UK Biobank datasets.
- Ruhollah Shemirani
- , Gillian M. Belbin
- & José Luis Ambite