Human knockouts and phenotypic analysis in a cohort with a high rate of consanguinity

Journal name:
Nature
Volume:
544,
Pages:
235–239
Date published:
DOI:
doi:10.1038/nature22034
Received
Accepted
Published online

A major goal of biomedicine is to understand the function of every gene in the human genome1. Loss-of-function mutations can disrupt both copies of a given gene in humans and phenotypic analysis of such ‘human knockouts’ can provide insight into gene function. Consanguineous unions are more likely to result in offspring carrying homozygous loss-of-function mutations. In Pakistan, consanguinity rates are notably high2. Here we sequence the protein-coding regions of 10,503 adult participants in the Pakistan Risk of Myocardial Infarction Study (PROMIS), designed to understand the determinants of cardiometabolic diseases in individuals from South Asia3. We identified individuals carrying homozygous predicted loss-of-function (pLoF) mutations, and performed phenotypic analysis involving more than 200 biochemical and disease traits. We enumerated 49,138 rare (<1% minor allele frequency) pLoF mutations. These pLoF mutations are estimated to knock out 1,317 genes, each in at least one participant. Homozygosity for pLoF mutations at PLA2G7 was associated with absent enzymatic activity of soluble lipoprotein-associated phospholipase A2; at CYP2F1, with higher plasma interleukin-8 concentrations; at TREH, with lower concentrations of apoB-containing lipoprotein subfractions; at either A3GALT2 or NRG4, with markedly reduced plasma insulin C-peptide concentrations; and at SLC9A3R1, with mediators of calcium and phosphate signalling. Heterozygous deficiency of APOC3 has been shown to protect against coronary heart disease4, 5; we identified APOC3 homozygous pLoF carriers in our cohort. We recruited these human knockouts and challenged them with an oral fat load. Compared with family members lacking the mutation, individuals with APOC3 knocked out displayed marked blunting of the usual post-prandial rise in plasma triglycerides. Overall, these observations provide a roadmap for a ‘human knockout project’, a systematic effort to understand the phenotypic consequences of complete disruption of genes in humans.

At a glance

Figures

  1. Homozygous pLoF burden in PROMIS is driven by excess autozygosity.
    Figure 1: Homozygous pLoF burden in PROMIS is driven by excess autozygosity.

    a, Most genes are observed in the homozygous pLoF state in only single individuals. b, The distribution of F inbreeding coefficient of PROMIS participants is compared to those of outbred samples of African (AFR) and European (EUR) ancestry. c, The burden of homozygous pLoF genes per individual is correlated with coefficient of inbreeding. Bars represent 1.5× interquartile range beyond the 25th and 75th percentiles (b, c).

  2. Carriers of PLA2G7 splice mutation have diminished Lp-PLA2 mass and activity but similar risk for coronary heart disease when compared to non-carriers.
    Figure 2: Carriers of PLA2G7 splice mutation have diminished Lp-PLA2 mass and activity but similar risk for coronary heart disease when compared to non-carriers.

    a, b, Carriage of a splice-site mutation, c.663 + 1G>A, in PLA2G7 leads to a dose-dependent reduction of both lipoprotein-associated phospholipase A2 (Lp-PLA2) mass (P = 6 × 10−5) and activity (P = 2 × 10−7), with homozygotes having no circulating Lp-PLA2. c, Despite substantial reductions of Lp-PLA2 activity, PLA2G7 c.663 + 1G>A heterozygotes and homozygotes have similar coronary heart disease risk when compared with non-carriers (P = 0.87). Bars represent 1.5× interquartile range beyond the 25th and 75th percentiles (a, b).

  3. APOC3 pLoF homozygotes have diminished fasting triglycerides and blunted post-prandial lipaemia.
    Figure 3: APOC3 pLoF homozygotes have diminished fasting triglycerides and blunted post-prandial lipaemia.

    ad, APOC3 pLoF genotype status, apolipoprotein C3, triglycerides, HDL cholesterol and LDL cholesterol distributions among all sequenced participants. Apolipoprotein C3 concentration is displayed on a logarithmic base 10 scale. e, A proband with APOC3 pLoF homozygote genotype as well as several family members were recalled for provocative phenotyping. Surprisingly, the spouse of the proband was also a pLoF homozygote, leading to nine obligate homozygote children. Given the extensive number of first-degree unions, the pedigree is simplified for clarity. f, APOC3 p.Arg19Ter homozygotes and non-carriers within the same family were challenged with a 50 g m−2 fat feeding. Homozygotes had lower baseline triglyceride concentrations and displayed marked blunting of post-prandial rise in plasma triglycerides. Bars represent 1.5× interquartile range beyond the 25th and 75th percentiles (ad, f).

  4. Simulations anticipate many more homozygous pLoF genes in the PROMIS cohort.
    Figure 4: Simulations anticipate many more homozygous pLoF genes in the PROMIS cohort.

    Number of unique homozygous pLoF genes anticipated with increasing sample sizes sequenced in PROMIS compared with similar African (AFR) and European (EUR) sample sizes. Estimates derived using observed allele frequencies and degree of inbreeding.

  5. pLoF mutations are typically seen in very few individuals.
    Extended Data Fig. 1: pLoF mutations are typically seen in very few individuals.

    The site-frequency spectrum of synonymous, missense, and high-confidence pLoF mutations is represented. Points represent the proportion of variants within a 1 × 10−4 minor allele frequency bin for each variant category. Lines represent the cumulative proportions of variants categories. The bottom inset highlights that most pLoF variants are often seen in no more than one or two individuals. The top inset highlights that virtually all pLoF mutations are very rare.

  6. Intersection of homozygous pLoF genes between PROMIS and other cohorts.
    Extended Data Fig. 2: Intersection of homozygous pLoF genes between PROMIS and other cohorts.

    We compared the counts and overlap of unique homozygous pLoF genes in PROMIS with other exome sequenced cohorts.

  7. QQ-plot of recessive model pLoF association analysis across phenotypes.
    Extended Data Fig. 3: QQ-plot of recessive model pLoF association analysis across phenotypes.

    Analyses to determine whether homozygous pLoF carrier status was associated with traits was performed where there were at least two homozygous pLoF carriers phenotyped per trait. The observed versus the expected results from 15,263 associations are displayed here demonstrating an excess of associations beyond a Bonferroni threshold.

  8. Carriers of pLoF alleles in CYP2F1 have increased IL-8 concentrations.
    Extended Data Fig. 4: Carriers of pLoF alleles in CYP2F1 have increased IL-8 concentrations.

    Participants who had pLoF mutations in the CYP2F1 gene had higher concentrations of IL-8, whereas heterozygotes had a more modest effect when compared to the rest of the cohort of non-carriers. IL-8 concentration is natural log transformed. Bars represent 1.5× interquartile range beyond the 25th and 75th percentiles.

  9. Carriers of pLoF alleles in TREH have decreased concentrations of several lipoprotein subfractions.
    Extended Data Fig. 5: Carriers of pLoF alleles in TREH have decreased concentrations of several lipoprotein subfractions.

    Participants who had pLoF mutations in the TREH gene had lower concentrations of several lipoprotein subfractions. Bars represent 1.5× interquartile range beyond the 25th and 75th percentiles.

  10. Nondiabetic homozygous pLoF carriers for A3GALT2 have diminished insulin C-peptide concentrations.
    Extended Data Fig. 6: Nondiabetic homozygous pLoF carriers for A3GALT2 have diminished insulin C-peptide concentrations.

    Among nondiabetics, those who were homozygous pLoF for A3GALT2 had substantially lower fasting insulin C-peptide concentrations. This observation was not evident in nondiabetic heterozygous pLoF A3GALT2 participants. Insulin C-peptide is natural log transformed. Bars represent 1.5× interquartile range beyond the 25th and 75th percentiles.

  11. Example of a second polymorphism in-phase which rescues a putative protein-truncating mutation.
    Extended Data Fig. 7: Example of a second polymorphism in-phase which rescues a putative protein-truncating mutation.

    Short-reads that align to genomic positions 65,339,112 to 65,339,132 on chromosome 1 are displayed for one individual with a putative homozygous pLoF genotype in this region. The SNP at position 65,339,122 from G to T is annotated as a nonsense mutation in the JAK1 gene. However, all three homozygotes of this mutation carried a tandem SNP in the same codon (A to G at 65,339,124) thus resulting in a glutamine and effectively rescuing the protein-truncating mutation.

  12. Anticipated number of genes knocked out with increasing sample sizes by minimum knockout count.
    Extended Data Fig. 8: Anticipated number of genes knocked out with increasing sample sizes by minimum knockout count.

    We simulate the number of genes expected to be knocked out by minimum knockout count per gene at increasing sample sizes. We perform this simulation with and without the observed inbreeding.

  13. PROMIS participants have an excess burden of runs of homozygosity compared with other populations.
    Extended Data Fig. 9: PROMIS participants have an excess burden of runs of homozygosity compared with other populations.

    Consanguinity leads to regions of genomic segments that are identical by descent and can be observed as runs of homozygosity. Using genome-wide array data in 17,744 PROMIS participants and reference samples from the International HapMap3, the burden of runs of homozygosity (minimum 1.5 Mb) per individual was derived and population-specific distributions are displayed, with outliers removed. This highlights the higher median runs of homozygosity burden in PROMIS and the higher proportion of individuals with very high burdens.

  14. Down-sampling of synonymous and high confidence pLoF variants to validate simulation.
    Extended Data Fig. 10: Down-sampling of synonymous and high confidence pLoF variants to validate simulation.

    a, b, We ran simulations to estimate the number of unique, completely knocked out genes at increasing sample sizes. Before applying our model, we first applied this approach to a range of sample sizes below 7,078 for variants that were not under constraint, synonymous variants (a), and for high-confidence null variants (b). At the observed sample size, we did not observe significant selection. We expect that at increasing sample sizes, there may be a subset of genes that will not be tolerated in a homozygous pLoF state. In fact, our estimates are slightly more conservative when comparing outbred simulations with a recent description of >100,000 Icelanders using a more liberal definition for pLoF mutations.

References

  1. Eisenberg, D., Marcotte, E. M., Xenarios, I. & Yeates, T. O. Protein function in the post-genomic era. Nature 405, 823826 (2000)
  2. Bittles, A. H., Mason, W. M., Greene, J. & Rao, N. A. Reproductive behavior and health in consanguineous marriages. Science 252, 789794 (1991)
  3. Saleheen, D. et al. The Pakistan Risk of Myocardial Infarction Study: a resource for the study of genetic, lifestyle and other determinants of myocardial infarction in South Asia. Eur. J. Epidemiol. 24, 329338 (2009)
  4. Crosby, J. et al. Loss-of-function mutations in APOC3, triglycerides, and coronary disease. N. Engl. J. Med. 371, 2231 (2014)
  5. Jørgensen, A. B., Frikke-Schmidt, R., Nordestgaard, B. G. & Tybjærg-Hansen, A. Loss-of-function mutations in APOC3 and risk of ischemic vascular disease. N. Engl. J. Med. 371, 3241 (2014)
  6. Narasimhan, V. M. et al. Health and population effects of rare gene knockouts in adult humans with related parents. Science 352, 474477 (2016)
  7. Sulem, P. et al. Identification of a large set of rare complete human knockouts. Nat. Genet. 47, 448452 (2015)
  8. Lek, M. et al. Analysis of protein-coding genetic variation in 60,706 humans. Nature 536, 285291 (2016)
  9. Di Angelantonio, E. et al. Lipid-related markers and cardiovascular disease prediction. J. Am. Med. Assoc. 307, 24992506 (2012)
  10. Gregson, J. M. et al. Genetic invalidation of Lp-Pla2 as a therapeutic target: large-scale study of five functional Lp-Pla2-lowering alleles. Eur. J. Prev. Cardiol. (2016)
  11. Polfus, L. M., Gibbs, R. A. & Boerwinkle, E. Coronary heart disease and genetic variants with low phospholipase A2 activity. N. Engl. J. Med. 372, 295296 (2015)
  12. White, H. D. et al. Darapladib for preventing ischemic events in stable coronary heart disease. N. Engl. J. Med. 370, 17021711 (2014)
  13. O’Donoghue, M. L. et al. Effect of darapladib on major coronary events after an acute coronary syndrome: the SOLID-TIMI 52 randomized clinical trial. J. Am. Med. Assoc. 312, 10061015 (2014)
  14. Carr, B. A., Wan, J., Hines, R. N. & Yost, G. S. Characterization of the human lung CYP2F1 gene and identification of a novel lung-specific binding motif. J. Biol. Chem. 278, 1547315483 (2003)
  15. Standiford, T. J. et al. Interleukin-8 gene expression by a pulmonary epithelial cell line. A model for cytokine networks in the lung. J. Clin. Invest. 86, 19451953 (1990)
  16. Murray, I. A., Coupland, K., Smith, J. A., Ansell, I. D. & Long, R. G. Intestinal trehalase activity in a UK population: establishing a normal range and the effect of disease. Br. J. Nutr. 83, 241245 (2000)
  17. Christiansen, D. et al. Humans lack iGb3 due to the absence of functional iGb3-synthase: implications for NKT cell development and transplantation. PLoS Biol. 6, e172 (2008)
  18. Dahl, K., Buschard, K., Gram, D. X., d’Apice, A. J. & Hansen, A. K. Glucose intolerance in a xenotransplantation model: studies in alpha-gal knockout mice. APMIS 114, 805811 (2006)
  19. Casu, A. et al. Insulin secretion and glucose metabolism in alpha 1,3-galactosyltransferase knock-out pigs compared to wild-type pigs. Xenotransplantation 17, 131139 (2010)
  20. Schneider, M. R. & Wolf, E. The epidermal growth factor receptor ligands at a glance. J. Cell. Physiol. 218, 460466 (2009)
  21. Wang, G. X. et al. The brown fat-enriched secreted factor Nrg4 preserves metabolic homeostasis through attenuation of hepatic lipogenesis. Nat. Med. 20, 14361443 (2014)
  22. Murtazina, R. et al. Tissue-specific regulation of sodium/proton exchanger isoform 3 activity in Na+/H+ exchanger regulatory factor 1 (NHERF1) null mice. cAMP inhibition is differentially dependent on NHERF1 and exchange protein directly activated by cAMP in ileum versus proximal tubule. J. Biol. Chem. 282, 2514125151 (2007)
  23. Karim, Z. et al. NHERF1 mutations and responsiveness of renal parathyroid hormone. N. Engl. J. Med. 359, 11281135 (2008)
  24. Huff, M. W. & Hegele, R. A. Apolipoprotein C-III: going back to the future for a lipid drug target. Circ. Res. 112, 14051408 (2013)
  25. Pollin, T. I. et al. A null mutation in human APOC3 confers a favorable plasma lipid profile and apparent cardioprotection. Science 322, 17021705 (2008)
  26. Gaudet, D. et al. Antisense inhibition of apolipoprotein C-III in patients with hypertriglyceridemia. N. Engl. J. Med. 373, 438447 (2015)
  27. Gaudet, D. et al. Targeting APOC3 in the familial chylomicronemia syndrome. N. Engl. J. Med. 371, 22002206 (2014)
  28. Graham, M. J. et al. Antisense oligonucleotide inhibition of apolipoprotein C-III reduces plasma triglycerides in rodents, nonhuman primates, and humans. Circ. Res. 112, 14791490 (2013)
  29. Brown, S. D. & Moore, M. W. Towards an encyclopaedia of mammalian gene function: the International Mouse Phenotyping Consortium. Dis. Model. Mech. 5, 289292 (2012)
  30. Scott, E. M. et al. Characterization of Greater Middle Eastern genetic variation for enhanced disease gene discovery. Nat. Genet. 48, 10711076 (2016)
  31. Kooner, J. S. et al. Genome-wide association study in individuals of South Asian ancestry identifies six new type 2 diabetes susceptibility loci. Nat. Genet. 43, 984989 (2011)
  32. Purcell, S. et al. PLINK: a tool set for whole-genome association and population-based linkage analyses. Am. J. Hum. Genet. 81, 559575 (2007)
  33. Tennessen, J. A. et al. Evolution and functional impact of rare coding variation from deep sequencing of human exomes. Science 337, 6469 (2012)
  34. Do, R. et al. Exome sequencing identifies rare LDLR and APOA5 alleles conferring risk for myocardial infarction. Nature 518, 102106 (2015)
  35. Fisher, S. et al. A scalable, fully automated process for construction of sequence-ready human exome targeted capture libraries. Genome Biol. 12, R1 (2011)
  36. Li, H. & Durbin, R. Fast and accurate short read alignment with Burrows–Wheeler transform. Bioinformatics 25, 17541760 (2009)
  37. McKenna, A. et al. The Genome Analysis Toolkit: a MapReduce framework for analyzing next-generation DNA sequencing data. Genome Res. 20, 12971303 (2010)
  38. DePristo, M. A. et al. A framework for variation discovery and genotyping using next-generation DNA sequencing data. Nat. Genet. 43, 491498 (2011)
  39. Van der Auwera, G. A. et al. From FastQ data to high confidence variant calls: the Genome Analysis Toolkit best practices pipeline. Curr. Protoc. Bioinformatics http://dx.doi.org/10.1002/0471250953.bi1110s43 (2013)
  40. McLaren, W. et al. Deriving the consequences of genomic variants with the Ensembl API and SNP Effect Predictor. Bioinformatics 26, 20692070 (2010)
  41. Karczewski, K. J. Loftee (Loss-of-Function Transcript Effect Estimator), https://github.com/konradjk/loftee (2015)
  42. Jun, G. et al. Detecting and estimating contamination of human DNA samples in sequencing and array-based genotype data. Am. J. Hum. Genet. 91, 839848 (2012)
  43. Manichaikul, A. et al. Robust relationship inference in genome-wide association studies. Bioinformatics 26, 28672873 (2010)
  44. Gold, L. et al. Aptamer-based multiplexed proteomic technology for biomarker discovery. PLoS One 5, e15004 (2010)
  45. Hunter-Zinck, H. et al. Population genetic structure of the people of Qatar. Am. J. Hum. Genet. 87, 1725 (2010)
  46. Lander, E. S. & Botstein, D. Homozygosity mapping: a way to map human recessive traits with the DNA of inbred children. Science 236, 15671570 (1987)
  47. Purcell, S. M. et al. A polygenic burden of rare disruptive mutations in schizophrenia. Nature 506, 185190 (2014)
  48. Wright, S. Coefficients of Inbreeding and Relationship. Am. Nat. 56, 330338 (1922)
  49. De Rubeis, S. et al. Synaptic, transcriptional and chromatin genes disrupted in autism. Nature 515, 209215 (2014)
  50. Samocha, K. E. et al. A framework for the interpretation of de novo mutation in human disease. Nat. Genet. 46, 944950 (2014)
  51. Wang, T. et al. Identification and characterization of essential genes in the human genome. Science 350, 10961101 (2015)
  52. Eppig, J. T., Blake, J. A., Bult, C. J., Kadin, J. A. & Richardson, J. E. The Mouse Genome Database (MGD): facilitating mouse as a model for human biology and disease. Nucleic Acids Res. 43, D726D736 (2015)
  53. Georgi, B., Voight, B. F. & Bućan, M. From mouse to human: evolutionary genomics analysis of human orthologs of essential genes. PLoS Genet. 9, e1003484 (2013)
  54. Fuchs, M. et al. The p400 complex is an essential E1A transformation target. Cell 106, 297307 (2001)
  55. Fazzio, T. G., Huff, J. T. & Panning, B. An RNAi screen of chromatin proteins identifies Tip60-p400 as a regulator of embryonic stem cell identity. Cell 134, 162174 (2008)
  56. Price, A. L. et al. Principal components analysis corrects for stratification in genome-wide association studies. Nat. Genet. 38, 904909 (2006)
  57. Sambrook, J. & Russell, D. W. Purification of nucleic acids by extraction with phenol:chloroform. CSH Protoc. http://dx.doi.org/10.1101/pdb.prot4455 (2006)
  58. Mosteller, R. D. Simplified calculation of body-surface area. N. Engl. J. Med. 317, 1098 (1987)
  59. Maraki, M. et al. Validity of abbreviated oral fat tolerance tests for assessing postprandial lipemia. Clin. Nutr. 30, 852857 (2011)

Download references

Author information

  1. These authors contributed equally to this work.

    • Danish Saleheen &
    • Pradeep Natarajan
  2. These authors jointly supervised this work.

    • Philippe Frossard,
    • John Danesh,
    • Daniel J. Rader &
    • Sekar Kathiresan

Affiliations

  1. Department of Biostatistics and Epidemiology, Perelman School of Medicine at the University of Pennsylvania, Philadelphia, Pennsylvania, USA

    • Danish Saleheen &
    • Wei Zhao
  2. Center for Non-Communicable Diseases, Karachi, Pakistan

    • Danish Saleheen,
    • Asif Rasheed,
    • Mozzam Zaidi,
    • Maria Samuel,
    • Atif Imran,
    • Faisal Majeed,
    • Madiha Ishaq,
    • Saba Akhtar &
    • Philippe Frossard
  3. Center for Genomic Medicine, Massachusetts General Hospital and Department of Medicine, Harvard Medical School, Boston, Massachusetts, USA

    • Pradeep Natarajan &
    • Sekar Kathiresan
  4. Broad Institute of Harvard and MIT, Cambridge, Massachusetts, USA

    • Pradeep Natarajan,
    • Irina M. Armean,
    • Konrad J. Karczewski,
    • Anne H. O’Donnell-Luria,
    • Kaitlin E. Samocha,
    • Benjamin Weisburd,
    • Namrata Gupta,
    • Daniel G. MacArthur,
    • Stacey Gabriel,
    • Eric S. Lander,
    • Mark J. Daly &
    • Sekar Kathiresan
  5. Analytic and Translational Genetics Unit, Department of Medicine, Massachusetts General Hospital and Harvard Medical School, Boston, Massachusetts, USA

    • Irina M. Armean,
    • Konrad J. Karczewski,
    • Anne H. O’Donnell-Luria,
    • Kaitlin E. Samocha,
    • Benjamin Weisburd,
    • Daniel G. MacArthur &
    • Mark J. Daly
  6. Institute for Translational Medicine and Therapeutics, Department of Genetics, and Department of Medicine, Perelman School of Medicine at the University of Pennsylvania, Philadelphia, Pennsylvania, USA

    • Sumeet A. Khetarpal,
    • Kevin Trindade,
    • Megan Mucksavage &
    • Daniel J. Rader
  7. Samsung Advanced Institute for Health Sciences and Technology (SAIHST), Sungkyunkwan University, Samsung Medical Center, Seoul, Korea

    • Hong-Hee Won
  8. Division of Genetics and Genomics, Boston Children’s Hospital, Boston, Massachusetts, USA

    • Anne H. O’Donnell-Luria
  9. Faisalabad Institute of Cardiology, Faisalabad, Pakistan

    • Shahid Abbas
  10. National Institute of Cardiovascular Disorders, Karachi, Pakistan

    • Nadeem Qamar,
    • Khan Shah Zaman,
    • Zia Yaqoob,
    • Tahir Saghir,
    • Syed Nadeem Hasan Rizvi &
    • Anis Memon
  11. Punjab Institute of Cardiology, Lahore, Pakistan

    • Nadeem Hayyat Mallick
  12. Karachi Institute of Heart Diseases, Karachi, Pakistan

    • Mohammad Ishaq &
    • Syed Zahed Rasheed
  13. Red Crescent Institute of Cardiology, Hyderabad, Pakistan

    • Fazal-ur-Rehman Memon
  14. The Civil Hospital, Karachi, Pakistan

    • Khalid Mahmood
  15. Liaquat National Hospital, Karachi, Pakistan

    • Naveeduddin Ahmed
  16. Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, New York, USA

    • Ron Do
  17. The Charles Bronfman Institute for Personalized Medicine, Icahn School of Medicine at Mount Sinai, New York, New York, USA

    • Ron Do
  18. Children’s Hospital Oakland Research Institute, Oakland, California, USA

    • Ronald M. Krauss
  19. MRC/BHF Cardiovascular Epidemiology Unit, Department of Public Health and Primary Care, University of Cambridge, UK

    • John Danesh
  20. Wellcome Trust Sanger Institute, Hinxton, Cambridge, UK

    • John Danesh
  21. Department of Human Genetics, University of Pennsylvania, USA

    • Daniel J. Rader

Contributions

Sample recruitment and phenotyping was performed by D.S., P.F., J.D., A.R., M.Z., M.S., A.I., S.A., F.Ma., M.I., S.A., K.T., N.H.M., K.S.Z., N.Q., M.I., S.Z.R., F.Me., K.M., N.A., and R.M.K. D.S., P.F., J.D., and W.Z. performed array-based genotyping and runs-of-homozygosity analyses. Exome sequencing was coordinated by D.S., N.G., S.G., E.S.L., D.J.R., and S.K. P.N., W.Z., H.H.W., and R.D. performed exome-sequencing quality control and association analyses. P.N., I.M.A., K.J.K., A.H.O., B.W., and D.G.M. performed variant annotation. D.S., S.K., and D.J.R. performed confirmatory genotyping and lipoprotein biomarker assays. D.S. and A.R. conducted recall-based studies for the APOC3 knockouts. P.N. and M.J.D. performed bioinformatics simulations. P.N. and K.E.S. performed constraint score analyses. D.S., P.N., and S.K. designed the study and wrote the paper. All authors discussed the results and commented on the manuscript.

Competing financial interests

The authors declare no competing financial interests.

Corresponding authors

Correspondence to:

Publisher's note: Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Author details

Extended data figures and tables

Extended Data Figures

  1. Extended Data Figure 1: pLoF mutations are typically seen in very few individuals. (172 KB)

    The site-frequency spectrum of synonymous, missense, and high-confidence pLoF mutations is represented. Points represent the proportion of variants within a 1 × 10−4 minor allele frequency bin for each variant category. Lines represent the cumulative proportions of variants categories. The bottom inset highlights that most pLoF variants are often seen in no more than one or two individuals. The top inset highlights that virtually all pLoF mutations are very rare.

  2. Extended Data Figure 2: Intersection of homozygous pLoF genes between PROMIS and other cohorts. (115 KB)

    We compared the counts and overlap of unique homozygous pLoF genes in PROMIS with other exome sequenced cohorts.

  3. Extended Data Figure 3: QQ-plot of recessive model pLoF association analysis across phenotypes. (53 KB)

    Analyses to determine whether homozygous pLoF carrier status was associated with traits was performed where there were at least two homozygous pLoF carriers phenotyped per trait. The observed versus the expected results from 15,263 associations are displayed here demonstrating an excess of associations beyond a Bonferroni threshold.

  4. Extended Data Figure 4: Carriers of pLoF alleles in CYP2F1 have increased IL-8 concentrations. (46 KB)

    Participants who had pLoF mutations in the CYP2F1 gene had higher concentrations of IL-8, whereas heterozygotes had a more modest effect when compared to the rest of the cohort of non-carriers. IL-8 concentration is natural log transformed. Bars represent 1.5× interquartile range beyond the 25th and 75th percentiles.

  5. Extended Data Figure 5: Carriers of pLoF alleles in TREH have decreased concentrations of several lipoprotein subfractions. (135 KB)

    Participants who had pLoF mutations in the TREH gene had lower concentrations of several lipoprotein subfractions. Bars represent 1.5× interquartile range beyond the 25th and 75th percentiles.

  6. Extended Data Figure 6: Nondiabetic homozygous pLoF carriers for A3GALT2 have diminished insulin C-peptide concentrations. (49 KB)

    Among nondiabetics, those who were homozygous pLoF for A3GALT2 had substantially lower fasting insulin C-peptide concentrations. This observation was not evident in nondiabetic heterozygous pLoF A3GALT2 participants. Insulin C-peptide is natural log transformed. Bars represent 1.5× interquartile range beyond the 25th and 75th percentiles.

  7. Extended Data Figure 7: Example of a second polymorphism in-phase which rescues a putative protein-truncating mutation. (104 KB)

    Short-reads that align to genomic positions 65,339,112 to 65,339,132 on chromosome 1 are displayed for one individual with a putative homozygous pLoF genotype in this region. The SNP at position 65,339,122 from G to T is annotated as a nonsense mutation in the JAK1 gene. However, all three homozygotes of this mutation carried a tandem SNP in the same codon (A to G at 65,339,124) thus resulting in a glutamine and effectively rescuing the protein-truncating mutation.

  8. Extended Data Figure 8: Anticipated number of genes knocked out with increasing sample sizes by minimum knockout count. (68 KB)

    We simulate the number of genes expected to be knocked out by minimum knockout count per gene at increasing sample sizes. We perform this simulation with and without the observed inbreeding.

  9. Extended Data Figure 9: PROMIS participants have an excess burden of runs of homozygosity compared with other populations. (37 KB)

    Consanguinity leads to regions of genomic segments that are identical by descent and can be observed as runs of homozygosity. Using genome-wide array data in 17,744 PROMIS participants and reference samples from the International HapMap3, the burden of runs of homozygosity (minimum 1.5 Mb) per individual was derived and population-specific distributions are displayed, with outliers removed. This highlights the higher median runs of homozygosity burden in PROMIS and the higher proportion of individuals with very high burdens.

  10. Extended Data Figure 10: Down-sampling of synonymous and high confidence pLoF variants to validate simulation. (37 KB)

    a, b, We ran simulations to estimate the number of unique, completely knocked out genes at increasing sample sizes. Before applying our model, we first applied this approach to a range of sample sizes below 7,078 for variants that were not under constraint, synonymous variants (a), and for high-confidence null variants (b). At the observed sample size, we did not observe significant selection. We expect that at increasing sample sizes, there may be a subset of genes that will not be tolerated in a homozygous pLoF state. In fact, our estimates are slightly more conservative when comparing outbred simulations with a recent description of >100,000 Icelanders using a more liberal definition for pLoF mutations.

Supplementary information

PDF files

  1. Supplementary Information (621 KB)

    This file contains Supplementary Tables 1-9, the full legend for Supplementary Table 1 (supplied as a separate spreadsheet) and Supplementary References.

Excel files

  1. Supplementary Table (241 KB)

    This file contains Supplementary Table 1 – see the Supplementary Information document for the full description.

Additional data