Skip to main content

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

  • Technical Report
  • Published:

A spectral approach integrating functional genomic annotations for coding and noncoding variants

Abstract

Over the past few years, substantial effort has been put into the functional annotation of variation in human genome sequences. Such annotations can have a critical role in identifying putatively causal variants for a disease or trait among the abundant natural variation that occurs at a locus of interest. The main challenges in using these various annotations include their large numbers and their diversity. Here we develop an unsupervised approach to integrate these different annotations into one measure of functional importance (Eigen) that, unlike most existing methods, is not based on any labeled training data. We show that the resulting meta-score has better discriminatory ability using disease-associated and putatively benign variants from published studies (in both coding and noncoding regions) than the recently proposed CADD score. Across varied scenarios, the Eigen score performs generally better than any single individual annotation, representing a powerful single functional score that can be incorporated in fine-mapping studies.

This is a preview of subscription content, access via your institution

Access options

Buy this article

Prices may be subject to local taxes which are calculated during checkout

Figure 1: Correlation among different functional annotations for the noncoding variants on chromosome 1 in the training data set.
Figure 2: Violin plots showing the distribution of Eigen scores for de novo mutations in intellectual disability, epileptic encephalopathies, ASD (FMRP targets), ASD, schizophrenia and controls.
Figure 3: Violin plots showing the distribution of Eigen scores for noncoding variants in the COSMIC database that reside in different functional categories.

Similar content being viewed by others

References

  1. Mardis, E.R. The impact of next-generation sequencing technology on genetics. Trends Genet. 24, 133–141 (2008).

    Article  CAS  Google Scholar 

  2. Metzker, M.L. Sequencing technologies—the next generation. Nat. Rev. Genet. 11, 31–46 (2010).

    Article  CAS  Google Scholar 

  3. Zhang, J., Chiodini, R., Badr, A. & Zhang, G. The impact of next-generation sequencing on genomics. J. Genet. Genomics 38, 95–109 (2011).

    Article  Google Scholar 

  4. Adzhubei, I.A. et al. A method and server for predicting damaging missense mutations. Nat. Methods 7, 248–249 (2010).

    Article  CAS  Google Scholar 

  5. Davydov, E.V. et al. Identifying a high fraction of the human genome to be under selective constraint using GERP++. PLoS Comput. Biol. 6, e1001025 (2010).

    Article  Google Scholar 

  6. ENCODE Project Consortium. An integrated encyclopedia of DNA elements in the human genome. Nature 489, 57–74 (2012).

  7. Roadmap Epigenomics Consortium. Integrative analysis of 111 reference human epigenomes. Nature 518, 317–330 (2015).

  8. GTEx Consortium. The Genotype-Tissue Expression (GTEx) pilot analysis: multitissue gene regulation in humans. Science 348, 648–660 (2015).

  9. Capanu, M. et al. The use of hierarchical models for estimating relative risks of individual genetic variants: an application to a study of melanoma. Stat. Med. 27, 1973–1992 (2008).

    Article  Google Scholar 

  10. Capanu, M. & Begg, C.B. Hierarchical modeling for estimating relative risks of rare genetic variants: properties of the pseudo-likelihood method. Biometrics 67, 371–380 (2011).

    Article  Google Scholar 

  11. Kichaev, G. et al. Integrating functional data to prioritize causal variants in statistical fine-mapping studies. PLoS Genet. 10, e1004722 (2014).

    Article  Google Scholar 

  12. Ionita-Laza, I., Capanu, M., De Rubeis, S., McCallum, K. & Buxbaum, J.D. Identification of rare causal variants in sequence-based studies: methods and applications to VPS13B, a gene involved in Cohen syndrome and autism. PLoS Genet. 10, e1004729 (2014).

    Article  Google Scholar 

  13. Ng, S.B. et al. Exome sequencing identifies MLL2 mutations as a cause of Kabuki syndrome. Nat. Genet. 42, 790–793 (2010).

    Article  CAS  Google Scholar 

  14. Bamshad, M.J. et al. Exome sequencing as a tool for Mendelian disease gene discovery. Nat. Rev. Genet. 12, 745–755 (2011).

    Article  CAS  Google Scholar 

  15. Meyer, K.B. et al. Fine-scale mapping of the FGFR2 breast cancer risk locus: putative functional variants differentially bind FOXA1 and E2F1. Am. J. Hum. Genet. 93, 1046–1060 (2013).

    Article  CAS  Google Scholar 

  16. Pickrell, J.K. Joint analysis of functional genomic data and genome-wide association studies of 18 human traits. Am. J. Hum. Genet. 94, 559–573 (2014).

    Article  CAS  Google Scholar 

  17. Gusev, A. et al. Partitioning heritability of regulatory and cell-type-specific variants across 11 common diseases. Am. J. Hum. Genet. 95, 535–552 (2014).

    Article  CAS  Google Scholar 

  18. Kircher, M. et al. A general framework for estimating the relative pathogenicity of human genetic variants. Nat. Genet. 46, 310–315 (2014).

    Article  CAS  Google Scholar 

  19. Ritchie, G.R.S., Dunham, I., Zeggini, E. & Flicek, P. Functional annotation of noncoding sequence variants. Nat. Methods 11, 294–296 (2014).

    Article  CAS  Google Scholar 

  20. Gulko, B., Hubisz, M.J., Gronau, I. & Siepel, A. A method for calculating probabilities of fitness consequences for point mutations across the human genome. Nat. Genet. 47, 276–283 (2015).

    Article  CAS  Google Scholar 

  21. Liu, X., Jian, X. & Boerwinkle, E. dbNSFP v2.0: a database of human non-synonymous SNVs and their functional predictions and annotations. Hum. Mutat. 34, E2393–E2402 (2013).

    Article  CAS  Google Scholar 

  22. Iossifov, I. et al. The contribution of de novo coding mutations to autism spectrum disorder. Nature 515, 216–221 (2014).

    Article  CAS  Google Scholar 

  23. Iossifov, I. et al. De novo gene disruptions in children on the autistic spectrum. Neuron 74, 285–299 (2012).

    Article  CAS  Google Scholar 

  24. Neale, B.M. et al. Patterns and rates of exonic de novo mutations in autism spectrum disorders. Nature 485, 242–245 (2012).

    Article  CAS  Google Scholar 

  25. O'Roak, B.J. et al. Sporadic autism exomes reveal a highly interconnected protein network of de novo mutations. Nature 485, 246–250 (2012).

    Article  CAS  Google Scholar 

  26. Sanders, S.J. et al. De novo mutations revealed by whole-exome sequencing are strongly associated with autism. Nature 485, 237–241 (2012).

    Article  CAS  Google Scholar 

  27. Fromer, M. et al. De novo mutations in schizophrenia implicate synaptic networks. Nature 506, 179–184 (2014).

    Article  CAS  Google Scholar 

  28. Gulsuner, S. et al. Spatial and temporal mapping of de novo mutations in schizophrenia to a fetal prefrontal cortical network. Cell 154, 518–529 (2013).

    Article  CAS  Google Scholar 

  29. Girard, S.L. et al. Increased exonic de novo mutation rate in individuals with schizophrenia. Nat. Genet. 43, 860–863 (2011).

    Article  CAS  Google Scholar 

  30. McCarthy, S.E. et al. De novo mutations in schizophrenia implicate chromatin remodeling and support a genetic overlap with autism and intellectual disability. Mol. Psychiatry 19, 652–658 (2014).

    Article  CAS  Google Scholar 

  31. Xu, B. et al. De novo gene mutations highlight patterns of genetic and neural complexity in schizophrenia. Nat. Genet. 44, 1365–1369 (2012).

    Article  CAS  Google Scholar 

  32. Epi4K Consortium & Epilepsy Phenome/Genome Project. De novo mutations in epileptic encephalopathies. Nature 501, 217–221 (2013).

  33. de Ligt, J. et al. Diagnostic exome sequencing in persons with severe intellectual disability. N. Engl. J. Med. 367, 1921–1929 (2012).

    Article  CAS  Google Scholar 

  34. Rauch, A. et al. Range of genetic mutations associated with severe non-syndromic sporadic intellectual disability: an exome sequencing study. Lancet 380, 1674–1682 (2012).

    Article  CAS  Google Scholar 

  35. Darnell, J.C. et al. FMRP stalls ribosomal translocation on mRNAs linked to synaptic function and autism. Cell 146, 247–261 (2011).

    Article  CAS  Google Scholar 

  36. Dong, S. et al. De novo insertions and deletions of predominantly paternal origin are associated with autism spectrum disorder. Cell Rep. 9, 16–23 (2014).

    Article  CAS  Google Scholar 

  37. Farh, K.K. et al. Genetic and epigenetic fine mapping of causal autoimmune disease variants. Nature 518, 337–343 (2015).

    Article  CAS  Google Scholar 

  38. Lappalainen, T. et al. Transcriptome and genome sequencing uncovers functional variation in humans. Nature 501, 506–511 (2013).

    Article  CAS  Google Scholar 

  39. Forbes, S.A. et al. COSMIC: exploring the world's knowledge of somatic mutations in human cancer. Nucleic Acids Res. 43, D805–D811 (2015).

    Article  CAS  Google Scholar 

  40. Trynka, G. et al. Chromatin marks identify critical cell types for fine mapping complex trait variants. Nat. Genet. 45, 124–130 (2013).

    Article  CAS  Google Scholar 

  41. Ye, C.J. et al. Intersection of population variation and autoimmunity genetics in human T cell activation. Science 345, 1254665 (2014).

    Article  Google Scholar 

  42. Ko, A. et al. Amerindian-specific regions under positive selection harbour new lipid variants in Latinos. Nat. Commun. 5, 3983 (2014).

    Article  CAS  Google Scholar 

  43. Gudbjartsson, D.F. et al. Large-scale whole-genome sequencing of the Icelandic population. Nat. Genet. 47, 435–444 (2015).

    Article  CAS  Google Scholar 

  44. O'Rawe, J. et al. Low concordance of multiple variant-calling pipelines: practical implications for exome and genome sequencing. Genome Med. 5, 28 (2013).

    Article  CAS  Google Scholar 

  45. Lam, H.Y. et al. Performance comparison of whole-genome sequencing platforms. Nat. Biotechnol. 30, 78–82 (2012).

    Article  CAS  Google Scholar 

  46. Fang, H. et al. Reducing INDEL calling errors in whole genome and exome sequencing data. Genome Med. 6, 89 (2014).

    Article  Google Scholar 

  47. Parisi, F., Strino, F., Nadler, B. & Kluger, Y. Ranking and combining multiple predictors without labeled data. Proc. Natl. Acad. Sci. USA 111, 1253–1258 (2014).

    Article  CAS  Google Scholar 

Download references

Acknowledgements

This research was supported by US National Institutes of Health grants R01 MH095797 and R01 MH100233 and by the Seaver Foundation. All analyses were conducted on the Minerva HPC complex at the Icahn School of Medicine at Mount Sinai.

Author information

Authors and Affiliations

Authors

Contributions

I.I.-L. designed the study and wrote the manuscript. I.I.-L. and K.M. developed the statistical methods and the software. I.I.-L. and K.M. analyzed the data. B.X. and J.D.B. provided bioinformatics support and contributed to the interpretation of the results. All authors have read and contributed to the manuscript.

Corresponding author

Correspondence to Iuliana Ionita-Laza.

Ethics declarations

Competing interests

The authors declare no competing financial interests.

Integrated supplementary information

Supplementary Figure 1 Correlation among different functional annotations for the coding variants on chromosome 1 from the training data set.

Supplementary Figure 2 Violin plots for Eigen scores for GWAS SNPs and eQTLs for variants in different functional classes.

Supplementary Figure 3 ROC curves of the z values estimated from a hierarchical model with the Eigen scores.

The Eigen Scores are included as a functional predictor (solid curves) and ROC curves based on the ranking of the Eigen scores (dashed curves); associations between the Eigen score and the causal status of a variant vary, with relative risks of 1:1 (blue), 2 (green) and 4 (red); estimates are averaged across 100 simulations.

Supplementary information

Supplementary Text and Figures

Supplementary Figures 1–3, Supplementary Tables 1–22 and Supplementary Note. (PDF 2051 kb)

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Ionita-Laza, I., McCallum, K., Xu, B. et al. A spectral approach integrating functional genomic annotations for coding and noncoding variants. Nat Genet 48, 214–220 (2016). https://doi.org/10.1038/ng.3477

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1038/ng.3477

This article is cited by

Search

Quick links

Nature Briefing

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing