Genome assembly algorithms articles within Nature Communications

Featured

  • Article
    | Open Access

    Haplotyping is the process of distinguishing alleles inherited together on a chromosome, a crucial step in assembling and interpreting genome sequences. Here, the authors present a computationally efficient haplotype assembly tool for long read sequencing data.

    • Qian Zhou
    • , Fahu Ji
    •  & Jue Ruan
  • Article
    | Open Access

    Most existing assemblers failed to generate high-quality phased assemblies using long noisy reads. Here, the authors present PECAT, a Phased Error Correction and Assembly Tool, for reconstructing diploid genomes from long noisy reads.

    • Fan Nie
    • , Peng Ni
    •  & Jianxin Wang
  • Article
    | Open Access

    Long-read sequencing can greatly improve detection of genomic structural variants (SVs), and numerous methods have been developed to identify SVs using long-read data. Here the authors compare the performance of these methods and provide guidelines to aid users in selecting the most suitable tools for various scenarios.

    • Yichen Henry Liu
    • , Can Luo
    •  & Xin Maizie Zhou
  • Article
    | Open Access

    Breakage-fusion-bridge (BFB) is a mechanism that leads to complex genome rearrangements in multiple cancers. Here, the authors develop a computational method for identifying these events, even when further complicated by additional structural variations.

    • Chaohui Li
    • , Lingxi Chen
    •  & Shuai Cheng Li
  • Article
    | Open Access

    There is a need for methods that allow the analysis of single-cell long-read sequencing data without depending on known barcode lists or short-read sequencing. Here, the authors develop scNanoGPS, a tool that can independently deconvolute long reads into single cells and single molecules, and apply it on tumour and cell line data.

    • Cheng-Kai Shiau
    • , Lina Lu
    •  & Ruli Gao
  • Article
    | Open Access

    Current state-of-the-art de novo long read genome assemblers follow the Overlap-Layout-Consensus paradigm. GoldRush departs from this paradigm, generating highly contiguous assemblies with linear time complexity and using an order of magnitude less RAM than state-of-the-art methods.

    • Johnathan Wong
    • , Lauren Coombe
    •  & Inanç Birol
  • Article
    | Open Access

    The genetic basis of spider major ampullate (Ma) gland silk production remains unknown. Hu et al. unveil a molecular atlas of this gland for the golden orb-weaving spider combining genome assembly and multiomics, revealing the single-cell spatial architecture of silk production in the Ma gland.

    • Wenbo Hu
    • , Anqiang Jia
    •  & Yi Wang
  • Article
    | Open Access

    H37Rv is the most widely used Mycobacterium tuberculosis strain, and its genome is the reference sequence for this pathogen. Here, Chitale et al. present a bioinformatic pipeline for accurate assembly of bacterial genome sequences, and use it to provide important updates to the M. tuberculosis reference genome.

    • Poonam Chitale
    • , Alexander D. Lemenze
    •  & David Alland
  • Article
    | Open Access

    Consensus sequence-based methods for self-correction of long-read sequencing data are affected by biases that can mask true variants characterizing little-covered or low-frequency haplotypes. Here, to address this issue, the authors develop a variation graph-based method for performing haplotype-aware self-correction of long reads.

    • Xiao Luo
    • , Xiongbin Kang
    •  & Alexander Schönhuth
  • Article
    | Open Access

    Reference genomes for gut microbiomes help unravel microbial “dark matter” and serve as valuable resource for disease-focused studies. Here, the authors perform short and long read metagenomics and metagenome-assembled genomes analyses to profile the gut microbiome of Southeast Asian populations, revealing significant species and strain-level diversity, with thousands of previously uncharacterized biosynthetic gene clusters.

    • Jean-Sebastien Gounot
    • , Minghao Chia
    •  & Niranjan Nagarajan
  • Article
    | Open Access

    Human pan-genomics are increasing our knowledge of genomic diversity and genetic factors in disease. Here, the authors built a gastric cancer pan-genome that included the sequences of Chinese Han patients, and predicted putative and previously unaligned genes associated with gastric cancer.

    • Yingyan Yu
    • , Zhen Zhang
    •  & Zhenggang Zhu
  • Article
    | Open Access

    Pangenomes have a number of advantages over linear reference assemblies. Here the authors use bovine haplotype-resolved assemblies to show that structural variant-based pangenomes are consistent regardless of sequence platform, assembler, or coverage, suggesting that rigid protocols may not be required.

    • Alexander S. Leonard
    • , Danang Crysnanto
    •  & Hubert Pausch
  • Article
    | Open Access

    The role of genome folding in the heritability and evolvability of structural variations is not well understood. Here the authors investigate the impact of the three-dimensional genome topology of germ cells in the formation and transmission of gross structural genomic changes detected from comparing whole-genome sequences of 14 rodent species.

    • Lucía Álvarez-González
    • , Frances Burden
    •  & Aurora Ruiz-Herrera
  • Article
    | Open Access

    Here, Johansen et al. develop an approach, Phages from Metagenomics Binning (PHAMB), that allows the binning of thousands of viral genomes directly from bulk metagenomics data, while simultaneously enabling clustering of viral genomes into accurate taxonomic viral populations, unveiling viral-microbial host interactions in the gut.

    • Joachim Johansen
    • , Damian R. Plichta
    •  & Simon Rasmussen
  • Article
    | Open Access

    Existing long-read de novo assembly methods can partially, but not completely, separate strains. Here, the authors develop Strainberry, a metagenome assembly bioinformatic pipeline that exclusively uses longread data to accurately separate and reconstruct strain genomes from single-sample low-complexity microbiomes.

    • Riccardo Vicedomini
    • , Christopher Quince
    •  & Rayan Chikhi
  • Article
    | Open Access

    Methods to produce haplotype-resolved genome assemblies often rely on access to family trios. The authors present FALCON-Phase, a tool that combines ultra-long range Hi-C chromatin interaction data with a long read de novo assembly to extend haplotype phasing to the contig or scaffold level.

    • Zev N. Kronenberg
    • , Arang Rhie
    •  & Sarah B. Kingan
  • Article
    | Open Access

    The cost and complexity of whole genome sequencing limits its use in identifying and validating sequences used for genetic engineering and synthetic biology. Here the authors present Prymetime, an integrated workflow to sequence engineered strains and identify engineering in metagenomes.

    • Joseph H. Collins
    • , Kevin W. Keating
    •  & Eric M. Young
  • Article
    | Open Access

    Genome assembly approaches are limited by factors including cost, power and incomplete resolution. Here, the authors present Aquila, a method that uses a reference sequence and linked read data to generate high quality diploid genome assemblies from which genetic variation can be detected and phased.

    • Xin Zhou
    • , Lu Zhang
    •  & Arend Sidow
  • Article
    | Open Access

    Human reference genomes are typically constructed from few individuals, and are biased towards European and African genomes. Here, the authors assemble three Japanese genomes to create a population-specific reference genome. They then demonstrate improved variant calling from exome sequencing with this reference genome.

    • Jun Takayama
    • , Shu Tadaka
    •  & Gen Tamiya
  • Article
    | Open Access

    Sequence depth and read length determine the quality of genome assembly. Here, the authors leverage a set of PacBio reads to develop guidelines for sequencing and assembly of complex plant genomes in order to allocate finite resources using maize as an example.

    • Shujun Ou
    • , Jianing Liu
    •  & Doreen Ware
  • Article
    | Open Access

    Due to various structural and sequence complexities, the human Y chromosome is challenging to sequence and characterize. Here, the authors develop a strategy to sequence native, unamplified flow sorted Y chromosomes with a nanopore sequencing platform, and report the first assembly of a human Y chromosome of African origin.

    • Lukas F. K. Kuderna
    • , Esther Lizano
    •  & Tomas Marques-Bonet
  • Article
    | Open Access

    The evolution and genetic nature of metastatic lesions is not completely characterized. Here the authors perform a comprehensive whole-genome study of colorectal metastases in comparison to matched primary tumors and define a multistage progression model and metastasis-specific changes that, in part, are therapeutically actionable.

    • Naveed Ishaque
    • , Mohammed L. Abba
    •  & Heike Allgayer
  • Article
    | Open Access

    The majority of the human reference genome assembly is represented as a single consensus haplotype. Here, Wong et al. analyze de novo assemblies of 17 diverse, haplotype-resolved genomes to gain insights into the structure of genetic diversity and compile a list of alternative haplotypes across populations.

    • Karen H. Y. Wong
    • , Michal Levy-Sakin
    •  & Pui-Yan Kwok
  • Article
    | Open Access

    High-quality reference genomes facilitate analysis of genome structure and variation. Here Duet al. create a near-complete assembly of the indicarice genome by combining single molecule sequencing with mapping data and fosmid sequences and identify genetic variants by comparison with other rice genomes.

    • Huilong Du
    • , Ying Yu
    •  & Chengzhi Liang
  • Article
    | Open Access

    Genome assembly for many plant species can be challenging due to large size and high repeat content. Here, the authors usein vitroproximity ligation to assemble the genome of lettuce, revealing a family-specific triplication event and providing a comprehensive reference genome for a member of the Compositae.

    • Sebastian Reyes-Chin-Wo
    • , Zhiwen Wang
    •  & Richard W. Michelmore
  • Article
    | Open Access

    Assembling genomes using currently available computational methods can be time consuming. Here, Coin and colleagues describe a bioinformatics tool named npScarf that can scaffold and complete an existing short read assembly in real-time using nanopore sequencing.

    • Minh Duc Cao
    • , Son Hoang Nguyen
    •  & Lachlan J. M. Coin
  • Article
    | Open Access

    Currently available metagenomic data analysis relies on reference genomes. Here, the authors describe a newde novometagenomic assembly method, metaSort, that constructs bacterial genomes from metagenomic samples to reduce microbial community complexity while increasing genome recovery and assembly.

    • Peifeng Ji
    • , Yanming Zhang
    •  & Fangqing Zhao
  • Article
    | Open Access

    The correct assembly of genomes from sequencing data remains a challenge due to difficulties in correctly assigning the location of repeated DNA elements. Here the authors describe GRAAL, an algorithm that utilizes genome-wide chromosome contact data within a probabilistic framework to produce accurate genome assemblies.

    • Hervé Marie-Nelly
    • , Martial Marbouty
    •  & Romain Koszul