Skip to main content

Thank you for visiting You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

Analysis of loss-of-function variants and 20 risk factor phenotypes in 8,554 individuals identifies loci influencing chronic disease


A typical human exome harbors dozens of loss-of-function (LOF) variants1, which can lower disease risk factor levels and affect drug efficacy2. We hypothesized that LOF variants are enriched in genes influencing risk factor levels and the onset of common chronic diseases, such as cardiovascular disease and diabetes. To test this hypothesis, we sequenced the exomes of 8,554 individuals and analyzed the effects of predicted LOF variants on 20 chronic disease risk factor phenotypes. Analysis of this sample as discovery and replication strata of equal size verified two relationships in well-studied genes (PCSK9 and APOC3) and identified eight new loci. Previously unknown relationships included elevated fasting glucose in carriers of heterozygous LOF variation in TXNDC5, which encodes a biomarker for type 1 diabetes progression, and apparent recessive effects of C1QTNF8 on serum magnesium levels. These data demonstrate the utility of functional-variant annotation within a large sample of deeply phenotyped individuals for gene discovery.

This is a preview of subscription content, access via your institution

Relevant articles

Open Access articles citing this article.

Access options

Rent or buy this article

Get just this article for as long as you need it


Prices may be subject to local taxes which are calculated during checkout

Figure 1: OP ratio trends across gene groups.
Figure 2: Distribution of phenotypes in carriers of LOF variations.


  1. MacArthur, D.G. et al. A systematic survey of loss-of-function variants in human protein coding genes. Science 335, 823–828 (2012).

    Article  CAS  Google Scholar 

  2. Phillips, I.R. & Shephard, E.A. Flavin-containing monooxygenases: mutations, disease and drug response. Trends Pharmacol. Sci. 29, 294–301 (2008).

    Article  CAS  Google Scholar 

  3. Margaritte, P., Bonaiti-Pellie, C., King, M.C. & Clerget-Darpoux, F. Linkage of familial breast cancer to chromosome 17q21 may not be restricted to early-onset disease. Am. J. Hum. Genet. 50, 1231–1234 (1992).

    CAS  PubMed  PubMed Central  Google Scholar 

  4. Liu, X., Jian, X. & Boerwinkle, E. dbNSFP: a lightweight database of human nonsynonymous SNPs and their functional predictions. Hum. Mutat. 32, 894–899 (2011).

    Article  CAS  Google Scholar 

  5. Liu, X., Jian, X. & Boerwinkle, E. dbNSFP v2.0: a database of human non-synonymous SNVs and their functional predictions and annotations. Hum. Mutat. 34, E2393–E2402 (2013).

    Article  CAS  Google Scholar 

  6. Dang, V.T., Kassahn, K.S., Marcos, A.E. & Ragan, M.A. Identification of human haploinsufficient genes and their genomic proximity to segmental duplications. Eur. J. Hum. Genet. 16, 1350–1357 (2008).

    Article  CAS  Google Scholar 

  7. Georgi, B., Voight, B.F. & Bućan, M. From mouse to human: evolutionary genomics analysis of human orthologs of essential genes. PLoS Genet. 9, e1003484 (2013).

    Article  CAS  Google Scholar 

  8. Li, B. & Leal, S.M. Methods for detecting associations with rare variants for common diseases: application to analysis of sequence data. Am. J. Hum. Genet. 83, 311–321 (2008).

    Article  CAS  Google Scholar 

  9. Cohen, J. et al. Low LDL cholesterol in individuals of African descent resulting from frequent nonsense mutations in PCSK9. Nat. Genet. 37, 161–165 (2005).

    Article  CAS  Google Scholar 

  10. Crosby, J. et al. Loss-of-function mutations in APOC3, triglycerides, and coronary disease. N. Engl. J. Med. 371, 22–31 (2014).

    Article  Google Scholar 

  11. Jin, Y. et al. Risk of type 1 diabetes progression in islet autoantibody-positive children can be further stratified using expression patterns of multiple genes implicated in peripheral blood lymphocyte activation and function. Diabetes 63, 2506–2515 (2014).

    Article  Google Scholar 

  12. Gizer, I.R. et al. Linkage scan of nicotine dependence in the University of California, San Francisco (UCSF) Family Alcoholism Study. Psychol. Med. 41, 799–808 (2011).

    Article  CAS  Google Scholar 

  13. Barbaric, I., Miller, G. & Dear, T.N. Appearances can be deceiving: phenotypes of knockout mice. Brief. Funct. Genomic. Proteomic. 6, 91–103 (2007).

    Article  CAS  Google Scholar 

  14. Schäffler, A. & Buechler, C. CTRP family: linking immunity to metabolism. Trends Endocrinol. Metab. 23, 194–204 (2012).

    Article  Google Scholar 

  15. Sheridan, C. Phase 3 data for PCSK9 inhibitor wows. Nat. Biotechnol. 31, 1057–1058 (2013).

    Article  CAS  Google Scholar 

  16. Morrison, A.C. et al. Whole-genome sequence-based analysis of high-density lipoprotein cholesterol. Nat. Genet. 45, 899–901 (2013).

    Article  CAS  Google Scholar 

  17. ARIC Investigators. The Atherosclerosis Risk in Communities (ARIC) study: design and objectives. Am. J. Epidemiol. 129, 687–702 (1989).

  18. Reid, J.G. et al. Launching genomics into the cloud: deployment of Mercury, a next generation sequence analysis pipeline. BMC Bioinformatics 15, 30 (2014).

    Article  Google Scholar 

  19. Li, H. & Durbin, R. Fast and accurate long-read alignment with Burrows-Wheeler transform. Bioinformatics 26, 589–595 (2010).

    Article  Google Scholar 

  20. Challis, D. et al. An integrative variant analysis suite for whole exome next-generation sequencing data. BMC Bioinformatics 13, 8 (2012).

    Article  Google Scholar 

  21. Grove, M.L. et al. Best practices and joint calling of the HumanExome BeadChip: the CHARGE Consortium. PLoS One 8, e68095 (2013).

    Article  CAS  Google Scholar 

  22. Wang, K., Li, M. & Hakonarson, H. ANNOVAR: functional annotation of genetic variants from high-throughput sequencing data. Nucleic Acids Res. 38, e164 (2010).

    Article  Google Scholar 

  23. Kelso, J. et al. eVOC: a controlled vocabulary for unifying gene expression data. Genome Res. 13, 1222–1230 (2003).

    Article  CAS  Google Scholar 

  24. Kamburov, A., Stelzl, U., Lehrach, H. & Herwig, R. The ConsensusPathDB interaction database: 2013 update. Nucleic Acids Res. 41, D793–D800 (2013).

    Article  CAS  Google Scholar 

  25. Petrovski, S., Wang, Q., Heinzen, E.L., Allen, A.S. & Goldstein, D.B. Genic intolerance to functional variation and the interpretation of personal genomes. PLoS Genet. 9, e1003709 (2013).

    Article  CAS  Google Scholar 

Download references


The Atherosclerosis Risk in Communities (ARIC) study is carried out as a collaborative study supported by National Heart, Lung, and Blood Institute (NHLBI) contracts (HHSN268201100005C, HHSN268201100006C, HHSN268201100007C, HHSN268201100008C, HHSN268201100009C, HHSN268201100010C, HHSN268201100011C and HHSN268201100012C). We thank the staff and participants of the ARIC study for their important contributions. Funding support for “Building on GWAS for NHLBI-diseases: the U.S. CHARGE Consortium” was provided by the National Institutes of Health through the American Recovery and Reinvestment Act of 2009 (ARRA) (5RC2HL102419). Sequencing was carried out at the Baylor College of Medicine Human Genome Sequencing Center (U54 HG003273).

Author information

Authors and Affiliations



A.H.L. carried out variant quality control, annotation and data analysis. X.L. developed the OP ratio and analyses of this metric. A.H.L., A.C.M., L.M.P. and B.Y. did statistical analyses of quantitative traits. C.K., G.M., D.M. and N.V. ensured that high-quality sequence variants were delivered for analyses. L.A.C., J.A.B. and T.L. were involved with study design. T.H.M. coordinated clinical data collection and recruitment. R.A.G. and E.B. provided materials and project oversight. A.H.L., E.B., A.C.M., X.L., B.Y. and L.M.P. prepared the manuscript.

Corresponding author

Correspondence to Eric Boerwinkle.

Ethics declarations

Competing interests

The authors declare no competing financial interests.

Integrated supplementary information

Supplementary Figure 1 Site-frequency spectrum of four categories of exome variation.

The relative proportions of these functional categories are shown binned by allele frequency. LCP, low-confidence or partial LOF criteria (Online Methods); Non-syn, nonsynonymous; Syn, synonymous.

Supplementary Figure 2 OP ratio in stably expressed genes.

We used data from the eGenetics database to select the top 5% of genes expressed in the most tissues (“Universally-Expressed”).

Supplementary Figure 3 OP ratio of highly interacting genes.

We used data from the ConsensusPathDB database to select the top 5% of genes with the most protein interactions (“Highly-Interacting”).

Supplementary Figure 4 Number of LOF variants and genes carrying LOF variants with increasing sample size.

For each sample size N, we randomly chose n ARIC individuals and recorded the number of LOF variants and genes harboring them. This process was repeated 1,000 times to calculate the average numbers of LOF variants and genes carrying LOF variants for sample size n.

Supplementary Figure 5 Quantile-quantile plots of P values from T5 associations with 20 phenotypes.

The 95% confidence intervals are depicted, and each circle represents one gene. Phenotype symbols are defined in Supplementary Table 2.

Supplementary Figure 6 Quantile-quantile plots of P values from T5 homozygous associations with 20 phenotypes.

The 95% confidence intervals are depicted, and each circle represents one gene. Phenotype symbols are defined in Supplementary Table 2.

Supplementary Figure 7 Relationship between OP ratio and RVIS for 15,053 genes.

The y-axis depicts the OP ratio, and the x-axis shows RVIS scores as described by Petrovski et al.25. Each circle represents one gene.

Supplementary information

Supplementary Text and Figures

Supplementary Figures 1–7 and Supplementary Tables 1,2 and 5 (PDF 1147 kb)

Supplementary Table 3

Genotype and phenotype associations: T5 burden. This downloadable table presents all associations in which we detected a T5 association reaching nominal statistical significance (P > 0.05) in either the discovery or replication strata. These data are available as a downloadable table. (XLSX 9939 kb)

Supplementary Table 4

Genotype and phenotype associations: T5 burden homozygotes only. This downloadable table presents all associations in which we detected a T5 association reaching nominal statistical significance (P > 0.05) in either the discovery or replication strata. LOF homozygotes were contrasted with LOF heterozygotes and non-LOF samples for phenotype comparison. These data are available as a downloadable table. (XLSX 656 kb)

Supplementary Table 6

List of all loss-of-function sites in ARIC This table includes hg19 coordinates of LOF sites, functional annotation to RefSeq and allele frequency in 8,554 ARIC samples, including the number of homozygous and heterozygous individuals. These data are available as a downloadable table. (CSV 2204 kb)

Rights and permissions

Reprints and Permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Li, A., Morrison, A., Kovar, C. et al. Analysis of loss-of-function variants and 20 risk factor phenotypes in 8,554 individuals identifies loci influencing chronic disease. Nat Genet 47, 640–642 (2015).

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI:

This article is cited by


Quick links

Nature Briefing

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing