Focus on Genomes of Icelanders

Identification of a large set of rare complete human knockouts

Journal name:
Nature Genetics
Volume:
47,
Pages:
448–452
Year published:
DOI:
doi:10.1038/ng.3243
Received
Accepted
Published online

Loss-of-function mutations cause many mendelian diseases. Here we aimed to create a catalog of autosomal genes that are completely knocked out in humans by rare loss-of-function mutations. We sequenced the whole genomes of 2,636 Icelanders and imputed the sequence variants identified in this set into 101,584 additional chip-genotyped and phased Icelanders. We found a total of 6,795 autosomal loss-of-function SNPs and indels in 4,924 genes. Of the genotyped Icelanders, 7.7% are homozygotes or compound heterozygotes for loss-of-function mutations with a minor allele frequency (MAF) below 2% in 1,171 genes (complete knockouts). Genes that are highly expressed in the brain are less often completely knocked out than other genes. Homozygous loss-of-function offspring of two heterozygous parents occurred less frequently than expected (deficit of 136 per 10,000 transmissions for variants with MAF <2%, 95% confidence interval (CI) = 10–261).

At a glance

Figures

  1. Transmission probabilities from carrier parents.
    Figure 1: Transmission probabilities from carrier parents.

    (ad) Transmission probabilities from a single heterozygous parent (a) and two heterozygous parents (b), with the transmissions of loss-of-function variants being further stratified by Residual Variation Intolerance Score (RVIS) (c) and essentiality score (d) percentiles. Transmissions observed in 35,024 father-offspring pairs and 47,769 mother-offspring pairs were used for the single-heterozygous-parent calculations, and transmissions observed in 26,188 triads were used for the two-heterozygous-parent calculations. The numbers of informative transmissions, where both parents were heterozygous for an indicated type of variant, are shown below each graph. Shown are the transmission probabilities of loss-of-function (LoF; red), moderate-impact (blue) and intergenic (black) sequence variants with MAF below 0.5%, 1% and 2% in a and b and the transmission probabilities for loss-of-function variants with MAF below 2% in c and d. The middle tick of each colored segment indicates the observed transmission probability, and the extreme ticks indicate the 95% confidence intervals estimated by bootstrap sampling. The dotted lines correspond to transmission probabilities under mendelian inheritance.

  2. A histogram of the number of frameshift and stop-gain variants by percentage position within the affected protein sequence and the fraction of rare variants within each bin (derived allele frequency (DAF) < 0.5%).
    Figure 2: A histogram of the number of frameshift and stop-gain variants by percentage position within the affected protein sequence and the fraction of rare variants within each bin (derived allele frequency (DAF) < 0.5%).

    For variants affecting multiple transcripts, the mean percentage position was used.

  3. Transcriptome effect of stop-gain SNPs by exon rank.
    Figure 3: Transcriptome effect of stop-gain SNPs by exon rank.

    The allele-specific expression of the non-reference (stop-gain) allele was calculated for each variant for a set of 262 individuals with blood RNA sequence data. The top, middle and bottom of the boxes are the top quartile, median and bottom quartile values calculated over the set of variants. The whiskers show the lowest and highest data points within 1.5 times the interquartile range (IQR) from the median. The dots indicate data points more than 1.5 times the IQR from the median. The n values given are the number of variants in each class.

  4. The probabilities of a variant being seen once and five times as a function of the number of individuals sequenced by minor allele frequency (MAF).
    Supplementary Fig. 1: The probabilities of a variant being seen once and five times as a function of the number of individuals sequenced by minor allele frequency (MAF).

    Variants that are seen at least five times, corresponding to an observed allelic frequency of 0.095%, are likely to be imputed with good quality.

  5. The fraction of individuals among 104,220 individuals with imputed genotypes that have genes completely knockout out by LoF variants with MAF below the given threshold.
    Supplementary Fig. 2: The fraction of individuals among 104,220 individuals with imputed genotypes that have genes completely knockout out by LoF variants with MAF below the given threshold.

    The second panel shows a magnified view of MAF below 3%.

  6. The number of genes that are observed to have at least one LoF variant with MAF below 2% as a function of the number of sequenced individuals and the number of genes that are completely knocked out in at least one individual by LoF variants with MAF below 2% as a function of the number of chip-genotyped individuals.
    Supplementary Fig. 3: The number of genes that are observed to have at least one LoF variant with MAF below 2% as a function of the number of sequenced individuals and the number of genes that are completely knocked out in at least one individual by LoF variants with MAF below 2% as a function of the number of chip-genotyped individuals.

    The curves are derived from the allele frequency distributions and the number of imputed complete knockouts.

  7. The cumulative number of genes by the number of completely knocked out individuals.
    Supplementary Fig. 4: The cumulative number of genes by the number of completely knocked out individuals.

    The total number of genes is 1,171.

  8. The frequency distribution of the 6,795 LoF variants among sequenced and imputed individuals.
    Supplementary Fig. 5: The frequency distribution of the 6,795 LoF variants among sequenced and imputed individuals.

    The leftmost count includes all variants with a frequency less than one and half divided by twice the number of sequenced individuals, which corresponds to the number of variants seen only once in the sequenced set.

  9. A histogram of the number of meiosis between the parents of the 104,220 Icelanders in our study and the fraction of individuals that have at least one gene completely knocked out by rare LoF variants.
    Supplementary Fig. 6: A histogram of the number of meiosis between the parents of the 104,220 Icelanders in our study and the fraction of individuals that have at least one gene completely knocked out by rare LoF variants.
  10. Transcriptome effect of synonymous SNPs by exon rank.
    Supplementary Fig. 7: Transcriptome effect of synonymous SNPs by exon rank.

    The allele-specific expression of the non-reference allele was calculated for each variant for a set of 262 individuals with blood RNA sequence data. The top, middle and bottom of the boxes are the top quartile, median and bottom quartile values calculated over the set of variants. The whiskers show the lowest and highest datum within 1.5 times the interquartile range (IQR) from the median. The dots indicate datum more than 1.5 times the IQR from the median. The n values given are the number of variants in each class.

References

  1. Eilbeck, K. et al. The Sequence Ontology: a tool for the unification of genome annotations. Genome Biol. 6, R44 (2005).
  2. MacArthur, D.G. et al. A systematic survey of loss-of-function variants in human protein-coding genes. Science 335, 823828 (2012).
  3. Tennessen, J.A. et al. Evolution and functional impact of rare coding variation from deep sequencing of human exomes. Science 337, 6469 (2012).
  4. Nelson, M.R. et al. An abundance of rare functional variants in 202 drug target genes sequenced in 14,002 people. Science 337, 100104 (2012).
  5. Gudbjartsson, D. et al. Large-scale whole-genome sequencing of the Icelandic population. Nat. Genet. doi:10.1038/ng.3247 (25 March 2015).
  6. Lim, E.T. et al. Rare complete knockouts in humans: population distribution and significant role in autism spectrum disorders. Neuron 77, 235242 (2013).
  7. Bamshad, M.J. et al. Exome sequencing as a tool for Mendelian disease gene discovery. Nat. Rev. Genet. 12, 745755 (2011).
  8. McKusick, V.A. Mendelian Inheritance in Man and its online version, OMIM. Am. J. Hum. Genet. 80, 588604 (2007).
  9. Chen, F.C., Chen, C.J., Li, W.H. & Chuang, T.J. Human-specific insertions and deletions inferred from mammalian genome sequences. Genome Res. 17, 1622 (2007).
  10. Montgomery, S.B. et al. The origin, evolution, and functional impact of short insertion-deletion variants identified in 179 human genomes. Genome Res. 23, 749761 (2013).
  11. Fu, W. et al. Analysis of 6,515 exomes reveals the recent origin of most human protein-coding variants. Nature 493, 216220 (2013).
  12. Helgason, A. et al. The Y-chromosome point mutation rate in humans. Nat. Genet. doi:10.1038/ng.3171 (25 March 2015).
  13. Lyon, M.F. Gene action in the X-chromosome of the mouse (Mus musculus L.). Nature 190, 372373 (1961).
  14. Rosenstein, B.J. & Cutting, G.R. The diagnosis of cystic fibrosis: a consensus statement. Cystic Fibrosis Foundation Consensus Panel. J. Pediatr. 132, 589595 (1998).
  15. Pruitt, K.D., Tatusova, T., Brown, G.R. & Maglott, D.R. NCBI Reference Sequences (RefSeq): current status, new features and genome annotation policy. Nucleic Acids Res. 40, D130D135 (2012).
  16. Robinson, P.N. & Mundlos, S. The human phenotype ontology. Clin. Genet. 77, 525534 (2010).
  17. Saunders, C.J. et al. Rapid whole-genome sequencing for genetic disease diagnosis in neonatal intensive care units. Sci. Transl. Med. 4, 154ra135 (2012).
  18. Köhler, S. et al. The Human Phenotype Ontology project: linking molecular biology and disease through phenotype data. Nucleic Acids Res. 42, D966D974 (2014).
  19. Cooper, D.N. & Krawczak, M. Human Gene Mutation Database. Hum. Genet. 98, 629 (1996).
  20. Helgason, A. et al. Estimating Scandinavian and Gaelic ancestry in the male settlers of Iceland. Am. J. Hum. Genet. 67, 697717 (2000).
  21. Fagerberg, L. et al. Analysis of the human tissue-specific expression by genome-wide integration of transcriptomics and antibody-based proteomics. Mol. Cell. Proteomics 13, 397406 (2014).
  22. Eppig, J.T., Blake, J.A., Bult, C.J., Kadin, J.A. & Richardson, J.E. The Mouse Genome Database (MGD): comprehensive resource for genetics and genomics of the laboratory mouse. Nucleic Acids Res. 40, D881D886 (2012).
  23. Petrovski, S., Wang, Q., Heinzen, E.L., Allen, A.S. & Goldstein, D.B. Genic intolerance to functional variation and the interpretation of personal genomes. PLoS Genet. 9, e1003709 (2013).
  24. Wang, T., Wei, J.J., Sabatini, D.M. & Lander, E.S. Genetic screens in human cells using the CRISPR-Cas9 system. Science 343, 8084 (2014).
  25. Tint, G.S. et al. Defective cholesterol biosynthesis associated with the Smith-Lemli-Opitz syndrome. N. Engl. J. Med. 330, 107113 (1994).
  26. Löffler, J., Trojovsky, A., Casati, B., Kroisel, P.M. & Utermann, G. Homozygosity for the W151X stop mutation in the δ7-sterol reductase gene (DHCR7) causing a lethal form of Smith-Lemli-Opitz syndrome: retrospective molecular diagnosis. Am. J. Med. Genet. 95, 174177 (2000).
  27. Khurana, E. et al. Integrative annotation of variants from 1092 humans: application to cancer genomics. Science 342, 1235587 (2013).
  28. Montgomery, S.B., Lappalainen, T., Gutierrez-Arcelus, M. & Dermitzakis, E.T. Rare and common regulatory variation in population-scale sequenced human genomes. PLoS Genet. 7, e1002144 (2011).
  29. Lappalainen, T. et al. Transcriptome and genome sequencing uncovers functional variation in humans. Nature 501, 506511 (2013).
  30. Baker, K.E. & Parker, R. Nonsense-mediated mRNA decay: terminating erroneous gene expression. Curr. Opin. Cell Biol. 16, 293299 (2004).
  31. Kettleborough, R.N. et al. A systematic genome-wide analysis of zebrafish protein-coding gene function. Nature 496, 494497 (2013).
  32. Ayadi, A. et al. Mouse large-scale phenotyping initiatives: overview of the European Mouse Disease Clinic (EUMODIC) and of the Wellcome Trust Sanger Institute Mouse Genetics Project. Mamm. Genome 23, 600610 (2012).
  33. Abraira, V.E. et al. Cross-repressive interactions between Lrig3 and netrin 1 shape the architecture of the inner ear. Development 135, 40914099 (2008).
  34. Hurle, B. et al. Non-syndromic vestibular disorder with otoconial agenesis in tilted/mergulhador mice caused by mutations in otopetrin 1. Hum. Mol. Genet. 12, 777789 (2003).
  35. Sherry, S.T. et al. dbSNP: the NCBI database of genetic variation. Nucleic Acids Res. 29, 308311 (2001).

Download references

Author information

  1. These authors contributed equally to this work.

    • Patrick Sulem &
    • Hannes Helgason

Affiliations

  1. deCODE Genetics/Amgen, Inc., Reykjavik, Iceland.

    • Patrick Sulem,
    • Hannes Helgason,
    • Asmundur Oddson,
    • Hreinn Stefansson,
    • Sigurjon A Gudjonsson,
    • Florian Zink,
    • Eirikur Hjartarson,
    • Gunnar Th Sigurdsson,
    • Adalbjorg Jonasdottir,
    • Aslaug Jonasdottir,
    • Asgeir Sigurdsson,
    • Olafur Th Magnusson,
    • Augustine Kong,
    • Agnar Helgason,
    • Hilma Holm,
    • Unnur Thorsteinsdottir,
    • Gisli Masson,
    • Daniel F Gudbjartsson &
    • Kari Stefansson
  2. School of Engineering and Natural Sciences, University of Iceland, Reykjavik, Iceland.

    • Hannes Helgason,
    • Augustine Kong &
    • Daniel F Gudbjartsson
  3. Department of Anthropology, University of Iceland, Reykjavik, Iceland.

    • Agnar Helgason
  4. Department of Internal Medicine, Landspitali The National University Hospital of Iceland, Reykjavik, Iceland.

    • Hilma Holm
  5. Faculty of Medicine, University of Iceland, Reykjavik, Iceland.

    • Unnur Thorsteinsdottir &
    • Kari Stefansson

Contributions

P.S., H. Helgason, A.O., U.T., G.M., D.F.G. and K.S. designed the experiment. H.S., H. Holm and U.T. collected the samples. Adalbjorg Jonasdottir, Aslaug Jonasdottir, A.S. and O.T.M. performed the sequencing experiments. P.S., H. Helgason, S.A.G., F.Z., E.H., G.T.S., A.K., G.M. and D.F.G. analyzed the data. P.S., H. Helgason, A.H., D.F.G. and K.S. wrote the first draft of the manuscript. All authors contributed to the final version of the manuscript.

Competing financial interests

The authors affiliated with deCODE Genetics are employed by the company: P.S., H. Helgason A.O., H.S., S.A.G., F.Z., E.H., G.T.S., Adalbjorg Jonasdottir, Aslaug Jonasdottir, A.S., O.T.M., A.K., A.H., H. Holm, U.T., G.M., D.F.G. and K.S.

Corresponding authors

Correspondence to:

Author details

Supplementary information

Supplementary Figures

  1. Supplementary Figure 1: The probabilities of a variant being seen once and five times as a function of the number of individuals sequenced by minor allele frequency (MAF). (128 KB)

    Variants that are seen at least five times, corresponding to an observed allelic frequency of 0.095%, are likely to be imputed with good quality.

  2. Supplementary Figure 2: The fraction of individuals among 104,220 individuals with imputed genotypes that have genes completely knockout out by LoF variants with MAF below the given threshold. (78 KB)

    The second panel shows a magnified view of MAF below 3%.

  3. Supplementary Figure 3: The number of genes that are observed to have at least one LoF variant with MAF below 2% as a function of the number of sequenced individuals and the number of genes that are completely knocked out in at least one individual by LoF variants with MAF below 2% as a function of the number of chip-genotyped individuals. (72 KB)

    The curves are derived from the allele frequency distributions and the number of imputed complete knockouts.

  4. Supplementary Figure 4: The cumulative number of genes by the number of completely knocked out individuals. (43 KB)

    The total number of genes is 1,171.

  5. Supplementary Figure 5: The frequency distribution of the 6,795 LoF variants among sequenced and imputed individuals. (65 KB)

    The leftmost count includes all variants with a frequency less than one and half divided by twice the number of sequenced individuals, which corresponds to the number of variants seen only once in the sequenced set.

  6. Supplementary Figure 6: A histogram of the number of meiosis between the parents of the 104,220 Icelanders in our study and the fraction of individuals that have at least one gene completely knocked out by rare LoF variants. (75 KB)
  7. Supplementary Figure 7: Transcriptome effect of synonymous SNPs by exon rank. (40 KB)

    The allele-specific expression of the non-reference allele was calculated for each variant for a set of 262 individuals with blood RNA sequence data. The top, middle and bottom of the boxes are the top quartile, median and bottom quartile values calculated over the set of variants. The whiskers show the lowest and highest datum within 1.5 times the interquartile range (IQR) from the median. The dots indicate datum more than 1.5 times the IQR from the median. The n values given are the number of variants in each class.

PDF files

  1. Supplementary Text and Figures (3,100 KB)

    Supplementary Figures 1–7, Supplementary Note and Supplementary Tables 1–3 and 5–11.

Excel files

  1. Supplementary Table 4 (982 KB)

    The observed list of 6,795 LoF mutations in 4,924 genes.

Additional data