The Genome Aggregation Database (gnomAD)

A collection of research articles and related content from the gnomAD Consortium that describe and analyse human genetic variation.

A group of people walking on a zebra crossing, where the stripes have been painted to represent DNA.

Credit: SciStories

Credit: SciStories

The human genome comprises both our protein-coding genes and the regulatory information that controls when, and to what extent, those genes are expressed. While humans mostly share the same repertoire of genes and regulatory elements, the underlying sequences are as diverse as the people on Earth; each individual’s genome is unique. To reflect this diversity and to capture the extent of variation among a large group of individuals on an unprecedented scale, the Genome Aggregation Database (gnomAD) has aggregated 15,708 whole genomes and 125,748 exomes (the protein-coding part of the genome). Analyses of this rich resource have created a catalogue of the different types of variation present, and revealed their potential functional impact and how this information could help to identify disease-causing mutations and to prioritize potential drug targets.

gnomAD at a glance

More than three petabytes of raw data were contributed to the project from independent human sequencing studies led by more than 100 investigators, and then processed into 35 terabytes of high-quality variant data.

The gnomAD papers report 241 million small genetic variants (single nucleotide variants and short insertion/deletion variants) and 335,470 structural variants (DNA rearrangements of at least 50 base pairs), compared with 7.4 million small genetic variants identified in gnomAD’s predecessor, the Exome Aggregation Consortium (ExAC, which did not analyse structural variation).

Barchart comparing the gnomAD and ExAC projects.

gnomAD has examined more small variants than the ExAC project (and now also includes structural variants).

gnomAD has examined more small variants than the ExAC project (and now also includes structural variants).

gnomAD includes exomes and genomes from European, Latino African and African American, South Asian, East Asian, Ashkenazi Jewish and other populations.

Science figure showing genetic data.

The analyses detected 443,769 predicted loss-of-function (pLoF) genetic variants in protein-coding genes in the whole-exome sequencing data. These are genetic variants that are predicted to prematurely truncate the protein (stop-gained), or to profoundly change the protein sequence owing to a shift in translational frame (frameshift) or the alternative inclusion or exclusion of exons (splice variant).

There are 1,815 genes for which biallelic pLoF variants (where both copies of a gene are likely to be inactive) are found in at least one individual in the gnomAD database, suggesting that humans can tolerate the loss of these genes or of their function.

The predecessor of gnomAD, the Exome Aggregation Consortium (ExAC), has been mentioned and used in over 4,000 publications since it was first reported in Nature in August 2016. (source: Web of Science, May 2020)

Cover of the Nature science journal.

The human Exome project on the cover of Nature in 2016.

The human Exome project on the cover of Nature in 2016.

The gnomAD team is already expanding the resource further and has recently released gnomAD v3, which contains 71,702 genomes.

Flagship paper

A catalogue of predicted loss-of-function variants in 125,748 whole-exome and 15,708 whole-genome sequencing datasets from the Genome Aggregation Database (gnomAD) reveals the spectrum of mutational constraints that affect these human protein-coding genes.

Companion papers

Analysis of predicted loss-of-function variants from 125,748 human exomes and 15,708 whole genomes in the Genome Aggregation Database (gnomAD) provides a roadmap for human ‘knockout’ studies and a guide for future research into disease biology and drug-target selection.

A large empirical assessment of sequence-resolved structural variants from 14,891 genomes across diverse global populations in the Genome Aggregation Database (gnomAD) provides a reference map for disease-association studies, population genetics, and diagnostic screening.

A novel variant annotation metric that quantifies the level of expression of genetic variants across tissues is validated in the Genome Aggregation Database (gnomAD) and is shown to improve rare variant interpretation.

Predicted loss-of-function variants in the gene LRRK2 are identified in 1,455 apparently healthy individuals from three large cohorts, suggesting that therapeutic inhibition of LRRK2 might be a safe approach in diseases that are associated with elevated LRRK2 activity, such as Parkinson’s disease.

The authors systematically assess the deleteriousness of genetic variants located in 5’ untranslated regions, which could create or disrupt upstream open reading frames, in the genomes of 15,708 individuals.

A catalogue of multi-nucleotide variants—genetic variants in close proximity to each other on the same haplotype—is assembled and their global mutation rate estimated.

Browse the collection

View the gnomAD Collection page which includes all research articles, an editorial and News & Views.

View the gnomAD Collection page which includes all research articles, an editorial and News & Views.

Springer Nature © 2020 Springer Nature Limited. All rights reserved.