The Vertebrate Genomes Project

A Collection of Research Articles from Phase I of the Vertebrate Genomes Project

Anna's hummingbird (Calypte anna) spotted outdoors in San Francisco

Shutterstock

Shutterstock

Reference genome assemblies provide a map of a species’ DNA sequence and its spatial context—that is, where along the chromosomes a specific piece of DNA sequence can be found. In the past, the generation of reference assemblies was prohibitively expensive and labour-intensive, so they were only produced for humans and the most important model organisms, and still contained gaps and errors. Draft genomes generated using more affordable second-generation sequencing technologies could be assembled for a larger number of species, but these were of lower quality because they were highly fragmented and their annotation was erroneous in some parts.

However, for a complete understanding of evolutionary processes and other fundamental questions in biology, high-quality reference genome assemblies of all species are required. Technological advances, improved computational methods and the ever-decreasing cost of sequencing enabled the Vertebrate Genomes Project (VGP), which was launched in 2017, to pursue the ambitious goal of producing a reference genome assembly for each of the extant vertebrate species on Earth. In the first phase of the project, the VGP has been focused on testing and improving genome sequencing and assembly approaches, on assembling a first set of 260 high-quality genomes of species representing all vertebrate orders (a work that is still in progress), and on the initial reporting of insights into genome evolution in vertebrates.

Milestones for phase II will be the production of assemblies for about 1,159 vertebrate families, and for phase III will involve the generation of assemblies for more than 10,000 genera; finally, in phase IV, assemblies will be completed for all vertebrate species. All sequence data and assemblies are being made freely available as they are produced and can be downloaded or browsed at GenomeArk, Genbank, Ensembl, and UCSC.


Did you know that…?

…the VGP aims to produce a high-quality reference genome assembly for each of the 71,657 named vertebrate species. Currently, about 3 assemblies are produced per week and this will be scaled up to 125 per week to achieve this goal within 10 years.



…the selection of species for the different phases is based on taxonomic hierarchy and includes particular consideration of a species’ conservation status. Phase I will end with the assembly of the genome of a representative each of all orders within the vertebrate subphylum, phase II will include a representative of each vertebrate family and phase III will assemble the genomes of representatives of all vertebrate genera before the VGP concludes with phase IV and the assembly of the genome of each vertebrate species.


…the VGP is setting quality standards in the assembly of genomes and has made recommendations on a minimum set of quality criteria for a high-quality reference genome. An Editorial in Nature Biotechnology covered these in detail in 2018.


…a standard VGP reference genome is assembled using a combination of long-read sequencing, linked-read sequencing, optical mapping and Hi-C data in an automated workflow, and includes a final manual curation step to ensure the highest possible quality. The long reads constitute the basic building blocks of these assemblies and linked reads, optical maps and Hi-C data provide the ‘scaffold’ information to put these building blocks together in their correct order and orientation, organized into the different chromosomes. A polishing step, using more accurate short-read sequencing data, removes any potential errors in the long reads. In the final curation step, genome curators inspect the generated ‘draft assemblies’ to identify and correct any anomalies, in order to create the final curated assemblies.


The VGP standard assembly pipeline. Adapted from: Towards complete and error-free genome assemblies of all vertebrate species.


…some of the biological discoveries from the 16 genomes in the flagship paper, and 25 genomes in total from more than 20 papers in the first wave of VGP publications, include:

  • A canonical rapid rise in G and C nucleotides in the regulatory regions of protein-coding genes.
  • Repeated evolution of chromosomal rearrangements involving immune genes in bats.
  • A universal evolution-based understanding of oxytocin and vasotocin and their receptors across vertebrates.
  • The evolution of a complex sex chromosome system in monotreme mammals with multiple X and Y chromosomes.
  • The unexpected amount of genetic diversity between maternal and paternal chromosomes in a non-human primate, the marmoset.
  • Extensive gene duplications in mitochondrial genomes.

Getty


Flagship paper

The Vertebrate Genomes Project has used an optimized pipeline to generate high-quality genome assemblies for sixteen species (representing all major vertebrate classes), which have led to new biological insights.

Lynx family with four bobcats sitting in a snowy winter forest.

Getty

Getty

A Goode’s Thornscrub Tortoise (Gopherus evgoodei) emerging from its burrow in Alamos, Sonora, Mexico.

John Sullivan/Alamy

John Sullivan/Alamy

Thorny skate (Amblyraja radiata) above sea floor. Rhode Island, New England, USA.

Andy Murch/Nature Picture Library

Andy Murch/Nature Picture Library

Climbing perch, Anabas testudineus, underwater.

Credit: Paulo Oliveira/Alamy

Credit: Paulo Oliveira/Alamy


John Sullivan/Alamy

Companion papers

A revised, universal nomenclature for the vertebrate genes that encode the oxytocin and vasopressin–vasotocin ligands and receptors will improve our understanding of gene evolution and facilitate the translation of findings across species.

Andy Murch/Nature Picture Library

New reference genomes of the two extant monotreme lineages (platypus and echidna) reveal the ancestral and lineage-specific genomic changes that shaped both monotreme and mammalian evolution.

Credit: Paulo Oliveira/Alamy

Reference-quality genomes for six bat species shed light on the phylogenetic position of Chiroptera, and provide insight into the genetic underpinnings of the unique adaptations of this clade.

Getty

A fully phased, high-quality assembly of the common marmoset genome provides insights into the evolution of sex chromosomes and the conservation of brain-related human disease genes in this primate model for biomedical research.

A new computational method, FALCON-Phase, makes it possible to resolve haplotypes in genome assemblies by using the information from natural intrachromosomal interactions identified by Hi-C, without the need for parental data.

Daniel Heuclin/Nature Picture Library/Alamy

The Vertebrate Genomes Project (VGP) has developed a fully automated pipeline for de novo assembly of mitochondrial genomes and reports the completion of mitogenome assemblies for 100 vertebrate species, which reveal errors and missing sequences in previous mitogenome assemblies.

Common Marmoset.

Getty

Getty

Two-lined Caecilian (Rhinatrema bivittatum) in French Guiana.

Daniel Heuclin/Nature Picture Library/Alamy

Daniel Heuclin/Nature Picture Library/Alamy


Browse the collection

Anna's Hummingbird (Calypte anna) perched on a branch, Victoria, British Columbia, Canada.

View the Vertebrate Genomes Project collection page which includes all research articles and the VPG resources.

View the Vertebrate Genomes Project collection page which includes all research articles and the VPG resources.


Springer Nature © 2021 Springer Nature Limited. All rights reserved.