Article
|
Open Access
Featured
-
-
Article
| Open AccessDe novo diploid genome assembly using long noisy reads
Most existing assemblers failed to generate high-quality phased assemblies using long noisy reads. Here, the authors present PECAT, a Phased Error Correction and Assembly Tool, for reconstructing diploid genomes from long noisy reads.
- Fan Nie
- , Peng Ni
- & Jianxin Wang
-
Article
| Open AccessTradeoffs in alignment and assembly-based methods for structural variant detection with long-read sequencing data
Long-read sequencing can greatly improve detection of genomic structural variants (SVs), and numerous methods have been developed to identify SVs using long-read data. Here the authors compare the performance of these methods and provide guidelines to aid users in selecting the most suitable tools for various scenarios.
- Yichen Henry Liu
- , Can Luo
- & Xin Maizie Zhou
-
Article
| Open AccessTuning parameters for polygenic risk score methods using GWAS summary statistics from training data
Some polygenic risk score (PRS) methods for predicting genetic risk for common diseases require an external individual-level dataset for parameter tuning, posing privacy-related concerns. Here, the authors present an empirical Bayes method that tunes PRS models using only summary statistics from the training data.
- Wei Jiang
- , Ling Chen
- & Hongyu Zhao
-
Article
| Open AccessDeciphering complex breakage-fusion-bridge genome rearrangements with Ambigram
Breakage-fusion-bridge (BFB) is a mechanism that leads to complex genome rearrangements in multiple cancers. Here, the authors develop a computational method for identifying these events, even when further complicated by additional structural variations.
- Chaohui Li
- , Lingxi Chen
- & Shuai Cheng Li
-
Article
| Open AccessGenomic dissection of endemic carbapenem resistance reveals metallo-beta-lactamase dissemination through clonal, plasmid and integron transfer
Resistance to carbapenems, a class of last-line antibiotics, is a global health threat. This study analysed a two-decade history of carbapenem resistance and identified complex, multi-level (bacterial strain, plasmid, gene) transmission dynamics.
- Nenad Macesic
- , Jane Hawkey
- & Anton Y. Peleg
-
Article
| Open AccessHigh throughput single cell long-read sequencing analyses of same-cell genotypes and phenotypes in human tumors
There is a need for methods that allow the analysis of single-cell long-read sequencing data without depending on known barcode lists or short-read sequencing. Here, the authors develop scNanoGPS, a tool that can independently deconvolute long reads into single cells and single molecules, and apply it on tumour and cell line data.
- Cheng-Kai Shiau
- , Lina Lu
- & Ruli Gao
-
Article
| Open AccessLinear time complexity de novo long read genome assembly with GoldRush
Current state-of-the-art de novo long read genome assemblers follow the Overlap-Layout-Consensus paradigm. GoldRush departs from this paradigm, generating highly contiguous assemblies with linear time complexity and using an order of magnitude less RAM than state-of-the-art methods.
- Johnathan Wong
- , Lauren Coombe
- & Inanç Birol
-
Article
| Open AccessA molecular atlas reveals the tri-sectional spinning mechanism of spider dragline silk
The genetic basis of spider major ampullate (Ma) gland silk production remains unknown. Hu et al. unveil a molecular atlas of this gland for the golden orb-weaving spider combining genome assembly and multiomics, revealing the single-cell spatial architecture of silk production in the Ma gland.
- Wenbo Hu
- , Anqiang Jia
- & Yi Wang
-
Article
| Open AccessGALA: a computational framework for de novo chromosome-by-chromosome assembly with long reads
Genomes usually contain multiple chromosomes. The paper reports on GALA, a computational framework for chromosome-based sequencing data separation and gap-free de novo assembly. It allows integration of different sources of data.
- Mohamed Awad
- & Xiangchao Gan
-
Article
| Open AccessA comprehensive update to the Mycobacterium tuberculosis H37Rv reference genome
H37Rv is the most widely used Mycobacterium tuberculosis strain, and its genome is the reference sequence for this pathogen. Here, Chitale et al. present a bioinformatic pipeline for accurate assembly of bacterial genome sequences, and use it to provide important updates to the M. tuberculosis reference genome.
- Poonam Chitale
- , Alexander D. Lemenze
- & David Alland
-
Article
| Open AccessVeChat: correcting errors in long reads using variation graphs
Consensus sequence-based methods for self-correction of long-read sequencing data are affected by biases that can mask true variants characterizing little-covered or low-frequency haplotypes. Here, to address this issue, the authors develop a variation graph-based method for performing haplotype-aware self-correction of long reads.
- Xiao Luo
- , Xiongbin Kang
- & Alexander Schönhuth
-
Article
| Open AccessGenome-centric analysis of short and long read metagenomes reveals uncharacterized microbiome diversity in Southeast Asians
Reference genomes for gut microbiomes help unravel microbial “dark matter” and serve as valuable resource for disease-focused studies. Here, the authors perform short and long read metagenomics and metagenome-assembled genomes analyses to profile the gut microbiome of Southeast Asian populations, revealing significant species and strain-level diversity, with thousands of previously uncharacterized biosynthetic gene clusters.
- Jean-Sebastien Gounot
- , Minghao Chia
- & Niranjan Nagarajan
-
Article
| Open AccessPangenomic analysis of Chinese gastric cancer
Human pan-genomics are increasing our knowledge of genomic diversity and genetic factors in disease. Here, the authors built a gastric cancer pan-genome that included the sequences of Chinese Han patients, and predicted putative and previously unaligned genes associated with gastric cancer.
- Yingyan Yu
- , Zhen Zhang
- & Zhenggang Zhu
-
Article
| Open AccessStructural variant-based pangenome construction has low sensitivity to variability of haplotype-resolved bovine assemblies
Pangenomes have a number of advantages over linear reference assemblies. Here the authors use bovine haplotype-resolved assemblies to show that structural variant-based pangenomes are consistent regardless of sequence platform, assembler, or coverage, suggesting that rigid protocols may not be required.
- Alexander S. Leonard
- , Danang Crysnanto
- & Hubert Pausch
-
Article
| Open Access3D chromatin remodelling in the germ line modulates genome evolutionary plasticity
The role of genome folding in the heritability and evolvability of structural variations is not well understood. Here the authors investigate the impact of the three-dimensional genome topology of germ cells in the formation and transmission of gross structural genomic changes detected from comparing whole-genome sequences of 14 rodent species.
- Lucía Álvarez-González
- , Frances Burden
- & Aurora Ruiz-Herrera
-
Article
| Open AccessGenome binning of viral entities from bulk metagenomics data
Here, Johansen et al. develop an approach, Phages from Metagenomics Binning (PHAMB), that allows the binning of thousands of viral genomes directly from bulk metagenomics data, while simultaneously enabling clustering of viral genomes into accurate taxonomic viral populations, unveiling viral-microbial host interactions in the gut.
- Joachim Johansen
- , Damian R. Plichta
- & Simon Rasmussen
-
Article
| Open AccessStrainberry: automated strain separation in low-complexity metagenomes using long reads
Existing long-read de novo assembly methods can partially, but not completely, separate strains. Here, the authors develop Strainberry, a metagenome assembly bioinformatic pipeline that exclusively uses longread data to accurately separate and reconstruct strain genomes from single-sample low-complexity microbiomes.
- Riccardo Vicedomini
- , Christopher Quince
- & Rayan Chikhi
-
Article
| Open AccessExtended haplotype-phasing of long-read de novo genome assemblies using Hi-C
Methods to produce haplotype-resolved genome assemblies often rely on access to family trios. The authors present FALCON-Phase, a tool that combines ultra-long range Hi-C chromatin interaction data with a long read de novo assembly to extend haplotype phasing to the contig or scaffold level.
- Zev N. Kronenberg
- , Arang Rhie
- & Sarah B. Kingan
-
Article
| Open AccessEngineered yeast genomes accurately assembled from pure and mixed samples
The cost and complexity of whole genome sequencing limits its use in identifying and validating sequences used for genetic engineering and synthetic biology. Here the authors present Prymetime, an integrated workflow to sequence engineered strains and identify engineering in metagenomes.
- Joseph H. Collins
- , Kevin W. Keating
- & Eric M. Young
-
Article
| Open AccessAquila enables reference-assisted diploid personal genome assembly and comprehensive variant detection based on linked reads
Genome assembly approaches are limited by factors including cost, power and incomplete resolution. Here, the authors present Aquila, a method that uses a reference sequence and linked read data to generate high quality diploid genome assemblies from which genetic variation can be detected and phased.
- Xin Zhou
- , Lu Zhang
- & Arend Sidow
-
Article
| Open AccessConstruction and integration of three de novo Japanese human genome assemblies toward a population-specific reference
Human reference genomes are typically constructed from few individuals, and are biased towards European and African genomes. Here, the authors assemble three Japanese genomes to create a population-specific reference genome. They then demonstrate improved variant calling from exome sequencing with this reference genome.
- Jun Takayama
- , Shu Tadaka
- & Gen Tamiya
-
Article
| Open AccessEfficient assembly of nanopore reads via highly accurate and intact error correction
Nanopore reads have been advantageous for de novo genome assembly; however these reads have high error rates. Here, the authors develop an error correction and de novo assembly tool, NECAT, which produces efficient, high quality assemblies of nanopore reads.
- Ying Chen
- , Fan Nie
- & Chuan-Le Xiao
-
Article
| Open AccessA diploid assembly-based benchmark for variants in the major histocompatibility complex
Accurate, phased assemblies are a key tool in understanding the human genome, particularly in highly polymorphic regions like the medically important MHC. Here the authors provide an assembly-based benchmark for this difficult-to-characterize region.
- Chen-Shan Chin
- , Justin Wagner
- & Justin M. Zook
-
Article
| Open AccessImproved haplotype inference by exploiting long-range linking and allelic imbalance in RNA-seq datasets
Haplotype reconstruction of distant genetic variants is problematic in short-read sequencing. Here, the authors describe HapTree-X, a probabilistic framework that uses differential allele-specific expression to better reconstruct paternal haplotypes from diploid and polyploid genomes.
- Emily Berger
- , Deniz Yorukoglu
- & Bonnie Berger
-
Article
| Open AccessEffect of sequence depth and length in long-read assembly of the maize inbred NC358
Sequence depth and read length determine the quality of genome assembly. Here, the authors leverage a set of PacBio reads to develop guidelines for sequencing and assembly of complex plant genomes in order to allocate finite resources using maize as an example.
- Shujun Ou
- , Jianing Liu
- & Doreen Ware
-
Article
| Open AccessAssembly of chromosome-scale contigs by efficiently resolving repetitive sequences with long reads
Repetitive sequences in complex eukaryote genomes can cause fragmented assemblies with incomplete gene sequences and unanchored or mispositioned contigs. Here, the authors report HERA, a method to improve genome assemblies by efficiently resolving repeats using single-molecule sequencing data.
- Huilong Du
- & Chengzhi Liang
-
Article
| Open AccessPlatanus-allee is a de novo haplotype assembler enabling a comprehensive access to divergent heterozygous regions
Most phasing programmes for sequencing data work well for genomes with low heterozygosity but drop in performance in regions of high heterozygosity. Here, Kajitani et al. develop the assembler Platanus-allee and demonstrate its utility in de novo assemblies of various genomes and the human MHC region.
- Rei Kajitani
- , Dai Yoshimura
- & Takehiko Itoh
-
Article
| Open AccessSelective single molecule sequencing and assembly of a human Y chromosome of African origin
Due to various structural and sequence complexities, the human Y chromosome is challenging to sequence and characterize. Here, the authors develop a strategy to sequence native, unamplified flow sorted Y chromosomes with a nanopore sequencing platform, and report the first assembly of a human Y chromosome of African origin.
- Lukas F. K. Kuderna
- , Esther Lizano
- & Tomas Marques-Bonet
-
Article
| Open AccessWhole genome sequencing puts forward hypotheses on metastasis evolution and therapy in colorectal cancer
The evolution and genetic nature of metastatic lesions is not completely characterized. Here the authors perform a comprehensive whole-genome study of colorectal metastases in comparison to matched primary tumors and define a multistage progression model and metastasis-specific changes that, in part, are therapeutically actionable.
- Naveed Ishaque
- , Mohammed L. Abba
- & Heike Allgayer
-
Article
| Open AccessDe novo human genome assemblies reveal spectrum of alternative haplotypes in diverse populations
The majority of the human reference genome assembly is represented as a single consensus haplotype. Here, Wong et al. analyze de novo assemblies of 17 diverse, haplotype-resolved genomes to gain insights into the structure of genetic diversity and compile a list of alternative haplotypes across populations.
- Karen H. Y. Wong
- , Michal Levy-Sakin
- & Pui-Yan Kwok
-
Article
| Open AccessSequencing and de novo assembly of a near complete indica rice genome
High-quality reference genomes facilitate analysis of genome structure and variation. Here Duet al. create a near-complete assembly of the indicarice genome by combining single molecule sequencing with mapping data and fosmid sequences and identify genetic variants by comparison with other rice genomes.
- Huilong Du
- , Ying Yu
- & Chengzhi Liang
-
Article
| Open AccessGenome assembly with in vitro proximity ligation data and whole-genome triplication in lettuce
Genome assembly for many plant species can be challenging due to large size and high repeat content. Here, the authors usein vitroproximity ligation to assemble the genome of lettuce, revealing a family-specific triplication event and providing a comprehensive reference genome for a member of the Compositae.
- Sebastian Reyes-Chin-Wo
- , Zhiwen Wang
- & Richard W. Michelmore
-
Article
| Open AccessScaffolding and completing genome assemblies in real-time with nanopore sequencing
Assembling genomes using currently available computational methods can be time consuming. Here, Coin and colleagues describe a bioinformatics tool named npScarf that can scaffold and complete an existing short read assembly in real-time using nanopore sequencing.
- Minh Duc Cao
- , Son Hoang Nguyen
- & Lachlan J. M. Coin
-
Article
| Open AccessMetaSort untangles metagenome assembly by reducing microbial community complexity
Currently available metagenomic data analysis relies on reference genomes. Here, the authors describe a newde novometagenomic assembly method, metaSort, that constructs bacterial genomes from metagenomic samples to reduce microbial community complexity while increasing genome recovery and assembly.
- Peifeng Ji
- , Yanming Zhang
- & Fangqing Zhao
-
Article
| Open AccessHigh-quality genome (re)assembly using chromosomal contact data
The correct assembly of genomes from sequencing data remains a challenge due to difficulties in correctly assigning the location of repeated DNA elements. Here the authors describe GRAAL, an algorithm that utilizes genome-wide chromosome contact data within a probabilistic framework to produce accurate genome assemblies.
- Hervé Marie-Nelly
- , Martial Marbouty
- & Romain Koszul