Loss-of-function mutations cause many mendelian diseases. Here we aimed to create a catalog of autosomal genes that are completely knocked out in humans by rare loss-of-function mutations. We sequenced the whole genomes of 2,636 Icelanders and imputed the sequence variants identified in this set into 101,584 additional chip-genotyped and phased Icelanders. We found a total of 6,795 autosomal loss-of-function SNPs and indels in 4,924 genes. Of the genotyped Icelanders, 7.7% are homozygotes or compound heterozygotes for loss-of-function mutations with a minor allele frequency (MAF) below 2% in 1,171 genes (complete knockouts). Genes that are highly expressed in the brain are less often completely knocked out than other genes. Homozygous loss-of-function offspring of two heterozygous parents occurred less frequently than expected (deficit of 136 per 10,000 transmissions for variants with MAF <2%, 95% confidence interval (CI) = 10–261).
At a glance
- The Sequence Ontology: a tool for the unification of genome annotations. Genome Biol. 6, R44 (2005). et al.
- A systematic survey of loss-of-function variants in human protein-coding genes. Science 335, 823–828 (2012). et al.
- Evolution and functional impact of rare coding variation from deep sequencing of human exomes. Science 337, 64–69 (2012). et al.
- An abundance of rare functional variants in 202 drug target genes sequenced in 14,002 people. Science 337, 100–104 (2012). et al.
- Large-scale whole-genome sequencing of the Icelandic population. Nat. Genet. doi:10.1038/ng.3247 (25 March 2015). et al.
- Rare complete knockouts in humans: population distribution and significant role in autism spectrum disorders. Neuron 77, 235–242 (2013). et al.
- Exome sequencing as a tool for Mendelian disease gene discovery. Nat. Rev. Genet. 12, 745–755 (2011). et al.
- Mendelian Inheritance in Man and its online version, OMIM. Am. J. Hum. Genet. 80, 588–604 (2007).
- Human-specific insertions and deletions inferred from mammalian genome sequences. Genome Res. 17, 16–22 (2007). , , &
- The origin, evolution, and functional impact of short insertion-deletion variants identified in 179 human genomes. Genome Res. 23, 749–761 (2013). et al.
- Analysis of 6,515 exomes reveals the recent origin of most human protein-coding variants. Nature 493, 216–220 (2013). et al.
- The Y-chromosome point mutation rate in humans. Nat. Genet. doi:10.1038/ng.3171 (25 March 2015). et al.
- Gene action in the X-chromosome of the mouse (Mus musculus L.). Nature 190, 372–373 (1961).
- The diagnosis of cystic fibrosis: a consensus statement. Cystic Fibrosis Foundation Consensus Panel. J. Pediatr. 132, 589–595 (1998). &
- NCBI Reference Sequences (RefSeq): current status, new features and genome annotation policy. Nucleic Acids Res. 40, D130–D135 (2012). , , &
- The human phenotype ontology. Clin. Genet. 77, 525–534 (2010). &
- Rapid whole-genome sequencing for genetic disease diagnosis in neonatal intensive care units. Sci. Transl. Med. 4, 154ra135 (2012). et al.
- The Human Phenotype Ontology project: linking molecular biology and disease through phenotype data. Nucleic Acids Res. 42, D966–D974 (2014). et al.
- Human Gene Mutation Database. Hum. Genet. 98, 629 (1996). &
- Estimating Scandinavian and Gaelic ancestry in the male settlers of Iceland. Am. J. Hum. Genet. 67, 697–717 (2000). et al.
- Analysis of the human tissue-specific expression by genome-wide integration of transcriptomics and antibody-based proteomics. Mol. Cell. Proteomics 13, 397–406 (2014). et al.
- The Mouse Genome Database (MGD): comprehensive resource for genetics and genomics of the laboratory mouse. Nucleic Acids Res. 40, D881–D886 (2012). , , , &
- Genic intolerance to functional variation and the interpretation of personal genomes. PLoS Genet. 9, e1003709 (2013). , , , &
- Genetic screens in human cells using the CRISPR-Cas9 system. Science 343, 80–84 (2014). , , &
- Defective cholesterol biosynthesis associated with the Smith-Lemli-Opitz syndrome. N. Engl. J. Med. 330, 107–113 (1994). et al.
- Homozygosity for the W151X stop mutation in the δ7-sterol reductase gene (DHCR7) causing a lethal form of Smith-Lemli-Opitz syndrome: retrospective molecular diagnosis. Am. J. Med. Genet. 95, 174–177 (2000). , , , &
- Integrative annotation of variants from 1092 humans: application to cancer genomics. Science 342, 1235587 (2013). et al.
- Rare and common regulatory variation in population-scale sequenced human genomes. PLoS Genet. 7, e1002144 (2011). , , &
- Transcriptome and genome sequencing uncovers functional variation in humans. Nature 501, 506–511 (2013). et al.
- Nonsense-mediated mRNA decay: terminating erroneous gene expression. Curr. Opin. Cell Biol. 16, 293–299 (2004). &
- A systematic genome-wide analysis of zebrafish protein-coding gene function. Nature 496, 494–497 (2013). et al.
- Mouse large-scale phenotyping initiatives: overview of the European Mouse Disease Clinic (EUMODIC) and of the Wellcome Trust Sanger Institute Mouse Genetics Project. Mamm. Genome 23, 600–610 (2012). et al.
- Cross-repressive interactions between Lrig3 and netrin 1 shape the architecture of the inner ear. Development 135, 4091–4099 (2008). et al.
- Non-syndromic vestibular disorder with otoconial agenesis in tilted/mergulhador mice caused by mutations in otopetrin 1. Hum. Mol. Genet. 12, 777–789 (2003). et al.
- dbSNP: the NCBI database of genetic variation. Nucleic Acids Res. 29, 308–311 (2001). et al.
- Supplementary Figure 1: The probabilities of a variant being seen once and five times as a function of the number of individuals sequenced by minor allele frequency (MAF). (128 KB)
Variants that are seen at least five times, corresponding to an observed allelic frequency of 0.095%, are likely to be imputed with good quality.
- Supplementary Figure 2: The fraction of individuals among 104,220 individuals with imputed genotypes that have genes completely knockout out by LoF variants with MAF below the given threshold. (78 KB)
The second panel shows a magnified view of MAF below 3%.
- Supplementary Figure 3: The number of genes that are observed to have at least one LoF variant with MAF below 2% as a function of the number of sequenced individuals and the number of genes that are completely knocked out in at least one individual by LoF variants with MAF below 2% as a function of the number of chip-genotyped individuals. (72 KB)
The curves are derived from the allele frequency distributions and the number of imputed complete knockouts.
- Supplementary Figure 4: The cumulative number of genes by the number of completely knocked out individuals. (43 KB)
The total number of genes is 1,171.
- Supplementary Figure 5: The frequency distribution of the 6,795 LoF variants among sequenced and imputed individuals. (65 KB)
The leftmost count includes all variants with a frequency less than one and half divided by twice the number of sequenced individuals, which corresponds to the number of variants seen only once in the sequenced set.
- Supplementary Figure 7: Transcriptome effect of synonymous SNPs by exon rank. (40 KB)
The allele-specific expression of the non-reference allele was calculated for each variant for a set of 262 individuals with blood RNA sequence data. The top, middle and bottom of the boxes are the top quartile, median and bottom quartile values calculated over the set of variants. The whiskers show the lowest and highest datum within 1.5 times the interquartile range (IQR) from the median. The dots indicate datum more than 1.5 times the IQR from the median. The n values given are the number of variants in each class.
- Supplementary Text and Figures (3,100 KB)
Supplementary Figures 1–7, Supplementary Note and Supplementary Tables 1–3 and 5–11.
- Supplementary Table 4 (982 KB)
The observed list of 6,795 LoF mutations in 4,924 genes.