Article
|
Open Access
-
-
Article
| Open AccessGALA: a computational framework for de novo chromosome-by-chromosome assembly with long reads
Genomes usually contain multiple chromosomes. The paper reports on GALA, a computational framework for chromosome-based sequencing data separation and gap-free de novo assembly. It allows integration of different sources of data.
- Mohamed Awad
- & Xiangchao Gan
-
Article
| Open AccessA comprehensive update to the Mycobacterium tuberculosis H37Rv reference genome
H37Rv is the most widely used Mycobacterium tuberculosis strain, and its genome is the reference sequence for this pathogen. Here, Chitale et al. present a bioinformatic pipeline for accurate assembly of bacterial genome sequences, and use it to provide important updates to the M. tuberculosis reference genome.
- Poonam Chitale
- , Alexander D. Lemenze
- & David Alland
-
Article
| Open AccessVeChat: correcting errors in long reads using variation graphs
Consensus sequence-based methods for self-correction of long-read sequencing data are affected by biases that can mask true variants characterizing little-covered or low-frequency haplotypes. Here, to address this issue, the authors develop a variation graph-based method for performing haplotype-aware self-correction of long reads.
- Xiao Luo
- , Xiongbin Kang
- & Alexander Schönhuth
-
Article
| Open AccessGenome-centric analysis of short and long read metagenomes reveals uncharacterized microbiome diversity in Southeast Asians
Reference genomes for gut microbiomes help unravel microbial “dark matter” and serve as valuable resource for disease-focused studies. Here, the authors perform short and long read metagenomics and metagenome-assembled genomes analyses to profile the gut microbiome of Southeast Asian populations, revealing significant species and strain-level diversity, with thousands of previously uncharacterized biosynthetic gene clusters.
- Jean-Sebastien Gounot
- , Minghao Chia
- & Niranjan Nagarajan
-
Article
| Open AccessPangenomic analysis of Chinese gastric cancer
Human pan-genomics are increasing our knowledge of genomic diversity and genetic factors in disease. Here, the authors built a gastric cancer pan-genome that included the sequences of Chinese Han patients, and predicted putative and previously unaligned genes associated with gastric cancer.
- Yingyan Yu
- , Zhen Zhang
- & Zhenggang Zhu
-
Article
| Open AccessStructural variant-based pangenome construction has low sensitivity to variability of haplotype-resolved bovine assemblies
Pangenomes have a number of advantages over linear reference assemblies. Here the authors use bovine haplotype-resolved assemblies to show that structural variant-based pangenomes are consistent regardless of sequence platform, assembler, or coverage, suggesting that rigid protocols may not be required.
- Alexander S. Leonard
- , Danang Crysnanto
- & Hubert Pausch
-
Article
| Open Access3D chromatin remodelling in the germ line modulates genome evolutionary plasticity
The role of genome folding in the heritability and evolvability of structural variations is not well understood. Here the authors investigate the impact of the three-dimensional genome topology of germ cells in the formation and transmission of gross structural genomic changes detected from comparing whole-genome sequences of 14 rodent species.
- Lucía Álvarez-González
- , Frances Burden
- & Aurora Ruiz-Herrera
-
Article
| Open AccessGenome binning of viral entities from bulk metagenomics data
Here, Johansen et al. develop an approach, Phages from Metagenomics Binning (PHAMB), that allows the binning of thousands of viral genomes directly from bulk metagenomics data, while simultaneously enabling clustering of viral genomes into accurate taxonomic viral populations, unveiling viral-microbial host interactions in the gut.
- Joachim Johansen
- , Damian R. Plichta
- & Simon Rasmussen
-
Article
| Open AccessStrainberry: automated strain separation in low-complexity metagenomes using long reads
Existing long-read de novo assembly methods can partially, but not completely, separate strains. Here, the authors develop Strainberry, a metagenome assembly bioinformatic pipeline that exclusively uses longread data to accurately separate and reconstruct strain genomes from single-sample low-complexity microbiomes.
- Riccardo Vicedomini
- , Christopher Quince
- & Rayan Chikhi
-
Article
| Open AccessExtended haplotype-phasing of long-read de novo genome assemblies using Hi-C
Methods to produce haplotype-resolved genome assemblies often rely on access to family trios. The authors present FALCON-Phase, a tool that combines ultra-long range Hi-C chromatin interaction data with a long read de novo assembly to extend haplotype phasing to the contig or scaffold level.
- Zev N. Kronenberg
- , Arang Rhie
- & Sarah B. Kingan
-
Article
| Open AccessEngineered yeast genomes accurately assembled from pure and mixed samples
The cost and complexity of whole genome sequencing limits its use in identifying and validating sequences used for genetic engineering and synthetic biology. Here the authors present Prymetime, an integrated workflow to sequence engineered strains and identify engineering in metagenomes.
- Joseph H. Collins
- , Kevin W. Keating
- & Eric M. Young
-
Article
| Open AccessAquila enables reference-assisted diploid personal genome assembly and comprehensive variant detection based on linked reads
Genome assembly approaches are limited by factors including cost, power and incomplete resolution. Here, the authors present Aquila, a method that uses a reference sequence and linked read data to generate high quality diploid genome assemblies from which genetic variation can be detected and phased.
- Xin Zhou
- , Lu Zhang
- & Arend Sidow
-
Article
| Open AccessConstruction and integration of three de novo Japanese human genome assemblies toward a population-specific reference
Human reference genomes are typically constructed from few individuals, and are biased towards European and African genomes. Here, the authors assemble three Japanese genomes to create a population-specific reference genome. They then demonstrate improved variant calling from exome sequencing with this reference genome.
- Jun Takayama
- , Shu Tadaka
- & Gen Tamiya
-
Article
| Open AccessEfficient assembly of nanopore reads via highly accurate and intact error correction
Nanopore reads have been advantageous for de novo genome assembly; however these reads have high error rates. Here, the authors develop an error correction and de novo assembly tool, NECAT, which produces efficient, high quality assemblies of nanopore reads.
- Ying Chen
- , Fan Nie
- & Chuan-Le Xiao
-
Article
| Open AccessA diploid assembly-based benchmark for variants in the major histocompatibility complex
Accurate, phased assemblies are a key tool in understanding the human genome, particularly in highly polymorphic regions like the medically important MHC. Here the authors provide an assembly-based benchmark for this difficult-to-characterize region.
- Chen-Shan Chin
- , Justin Wagner
- & Justin M. Zook
-
Article
| Open AccessImproved haplotype inference by exploiting long-range linking and allelic imbalance in RNA-seq datasets
Haplotype reconstruction of distant genetic variants is problematic in short-read sequencing. Here, the authors describe HapTree-X, a probabilistic framework that uses differential allele-specific expression to better reconstruct paternal haplotypes from diploid and polyploid genomes.
- Emily Berger
- , Deniz Yorukoglu
- & Bonnie Berger
-
Article
| Open AccessEffect of sequence depth and length in long-read assembly of the maize inbred NC358
Sequence depth and read length determine the quality of genome assembly. Here, the authors leverage a set of PacBio reads to develop guidelines for sequencing and assembly of complex plant genomes in order to allocate finite resources using maize as an example.
- Shujun Ou
- , Jianing Liu
- & Doreen Ware
-
Article
| Open AccessAssembly of chromosome-scale contigs by efficiently resolving repetitive sequences with long reads
Repetitive sequences in complex eukaryote genomes can cause fragmented assemblies with incomplete gene sequences and unanchored or mispositioned contigs. Here, the authors report HERA, a method to improve genome assemblies by efficiently resolving repeats using single-molecule sequencing data.
- Huilong Du
- & Chengzhi Liang
-
Article
| Open AccessPlatanus-allee is a de novo haplotype assembler enabling a comprehensive access to divergent heterozygous regions
Most phasing programmes for sequencing data work well for genomes with low heterozygosity but drop in performance in regions of high heterozygosity. Here, Kajitani et al. develop the assembler Platanus-allee and demonstrate its utility in de novo assemblies of various genomes and the human MHC region.
- Rei Kajitani
- , Dai Yoshimura
- & Takehiko Itoh
-
Article
| Open AccessSelective single molecule sequencing and assembly of a human Y chromosome of African origin
Due to various structural and sequence complexities, the human Y chromosome is challenging to sequence and characterize. Here, the authors develop a strategy to sequence native, unamplified flow sorted Y chromosomes with a nanopore sequencing platform, and report the first assembly of a human Y chromosome of African origin.
- Lukas F. K. Kuderna
- , Esther Lizano
- & Tomas Marques-Bonet
-
Article
| Open AccessWhole genome sequencing puts forward hypotheses on metastasis evolution and therapy in colorectal cancer
The evolution and genetic nature of metastatic lesions is not completely characterized. Here the authors perform a comprehensive whole-genome study of colorectal metastases in comparison to matched primary tumors and define a multistage progression model and metastasis-specific changes that, in part, are therapeutically actionable.
- Naveed Ishaque
- , Mohammed L. Abba
- & Heike Allgayer
-
Article
| Open AccessDe novo human genome assemblies reveal spectrum of alternative haplotypes in diverse populations
The majority of the human reference genome assembly is represented as a single consensus haplotype. Here, Wong et al. analyze de novo assemblies of 17 diverse, haplotype-resolved genomes to gain insights into the structure of genetic diversity and compile a list of alternative haplotypes across populations.
- Karen H. Y. Wong
- , Michal Levy-Sakin
- & Pui-Yan Kwok
-
Article
| Open AccessSequencing and de novo assembly of a near complete indica rice genome
High-quality reference genomes facilitate analysis of genome structure and variation. Here Duet al. create a near-complete assembly of the indicarice genome by combining single molecule sequencing with mapping data and fosmid sequences and identify genetic variants by comparison with other rice genomes.
- Huilong Du
- , Ying Yu
- & Chengzhi Liang
-
Article
| Open AccessGenome assembly with in vitro proximity ligation data and whole-genome triplication in lettuce
Genome assembly for many plant species can be challenging due to large size and high repeat content. Here, the authors usein vitroproximity ligation to assemble the genome of lettuce, revealing a family-specific triplication event and providing a comprehensive reference genome for a member of the Compositae.
- Sebastian Reyes-Chin-Wo
- , Zhiwen Wang
- & Richard W. Michelmore
-
Article
| Open AccessScaffolding and completing genome assemblies in real-time with nanopore sequencing
Assembling genomes using currently available computational methods can be time consuming. Here, Coin and colleagues describe a bioinformatics tool named npScarf that can scaffold and complete an existing short read assembly in real-time using nanopore sequencing.
- Minh Duc Cao
- , Son Hoang Nguyen
- & Lachlan J. M. Coin
-
Article
| Open AccessMetaSort untangles metagenome assembly by reducing microbial community complexity
Currently available metagenomic data analysis relies on reference genomes. Here, the authors describe a newde novometagenomic assembly method, metaSort, that constructs bacterial genomes from metagenomic samples to reduce microbial community complexity while increasing genome recovery and assembly.
- Peifeng Ji
- , Yanming Zhang
- & Fangqing Zhao
-
Article
| Open AccessHigh-quality genome (re)assembly using chromosomal contact data
The correct assembly of genomes from sequencing data remains a challenge due to difficulties in correctly assigning the location of repeated DNA elements. Here the authors describe GRAAL, an algorithm that utilizes genome-wide chromosome contact data within a probabilistic framework to produce accurate genome assemblies.
- Hervé Marie-Nelly
- , Martial Marbouty
- & Romain Koszul