The human genome comprises both our protein-coding genes and the regulatory information that controls when, and to what extent, those genes are expressed. While humans mostly share the same repertoire of genes and regulatory elements, the underlying sequences are as diverse as the people on Earth; each individual’s genome is unique. To reflect this diversity and to capture the extent of variation among a large group of individuals on an unprecedented scale, the Genome Aggregation Database (gnomAD) has aggregated 15,708 whole genomes and 125,748 exomes (the protein-coding part of the genome). Analyses of this rich resource have created a catalogue of the different types of variation present, and revealed their potential functional impact and how this information could help to identify disease-causing mutations and to prioritize potential drug targets.
gnomAD at a glance
More than three petabytes of raw data were contributed to the project from independent human sequencing studies led by more than 100 investigators, and then processed into 35 terabytes of high-quality variant data.
The gnomAD papers report 241 million small genetic variants (single nucleotide variants and short insertion/deletion variants) and 335,470 structural variants (DNA rearrangements of at least 50 base pairs), compared with 7.4 million small genetic variants identified in gnomAD’s predecessor, the Exome Aggregation Consortium (ExAC, which did not analyse structural variation).
gnomAD includes exomes and genomes from European, Latino African and African American, South Asian, East Asian, Ashkenazi Jewish and other populations.
The analyses detected 443,769 predicted loss-of-function (pLoF) genetic variants in protein-coding genes in the whole-exome sequencing data. These are genetic variants that are predicted to prematurely truncate the protein (stop-gained), or to profoundly change the protein sequence owing to a shift in translational frame (frameshift) or the alternative inclusion or exclusion of exons (splice variant).
There are 1,815 genes for which biallelic pLoF variants (where both copies of a gene are likely to be inactive) are found in at least one individual in the gnomAD database, suggesting that humans can tolerate the loss of these genes or of their function.
The predecessor of gnomAD, the Exome Aggregation Consortium (ExAC), has been mentioned and used in over 4,000 publications since it was first reported in Nature in August 2016. (source: Web of Science, May 2020)
The gnomAD team is already expanding the resource further and has recently released gnomAD v3, which contains 71,702 genomes.
A catalogue of predicted loss-of-function variants in 125,748 whole-exome and 15,708 whole-genome sequencing datasets from the Genome Aggregation Database (gnomAD) reveals the spectrum of mutational constraints that affect these human protein-coding genes.
Analysis of predicted loss-of-function variants from 125,748 human exomes and 15,708 whole genomes in the Genome Aggregation Database (gnomAD) provides a roadmap for human ‘knockout’ studies and a guide for future research into disease biology and drug-target selection.
A large empirical assessment of sequence-resolved structural variants from 14,891 genomes across diverse global populations in the Genome Aggregation Database (gnomAD) provides a reference map for disease-association studies, population genetics, and diagnostic screening.
A novel variant annotation metric that quantifies the level of expression of genetic variants across tissues is validated in the Genome Aggregation Database (gnomAD) and is shown to improve rare variant interpretation.
Predicted loss-of-function variants in the gene LRRK2 are identified in 1,455 apparently healthy individuals from three large cohorts, suggesting that therapeutic inhibition of LRRK2 might be a safe approach in diseases that are associated with elevated LRRK2 activity, such as Parkinson’s disease.
The authors systematically assess the deleteriousness of genetic variants located in 5’ untranslated regions, which could create or disrupt upstream open reading frames, in the genomes of 15,708 individuals.
A catalogue of multi-nucleotide variants—genetic variants in close proximity to each other on the same haplotype—is assembled and their global mutation rate estimated.