Brief Communication | Published:

Predicting effects of noncoding variants with deep learning–based sequence model

Nature Methods volume 12, pages 931934 (2015) | Download Citation

Abstract

Identifying functional effects of noncoding variants is a major challenge in human genetics. To predict the noncoding-variant effects de novo from sequence, we developed a deep learning–based algorithmic framework, DeepSEA (http://deepsea.princeton.edu/), that directly learns a regulatory sequence code from large-scale chromatin-profiling data, enabling prediction of chromatin effects of sequence alterations with single-nucleotide sensitivity. We further used this capability to improve prioritization of functional variants including expression quantitative trait loci (eQTLs) and disease-associated variants.

Access optionsAccess options

Rent or Buy article

Get time limited or full article access on ReadCube.

from$8.99

All prices are NET prices.

References

  1. 1.

    , & Bioinformatics 30, i185–i194 (2014).

  2. 2.

    , , & Nat. Methods 11, 294–296 (2014).

  3. 3.

    et al. Nat. Genet. 46, 310–315 (2014).

  4. 4.

    et al. Genome Biol. 15, 480 (2014).

  5. 5.

    et al. Nat. Genet. 47, 955–961 (2015).

  6. 6.

    et al. Trends Biochem. Sci. 39, 381–399 (2014).

  7. 7.

    , , & Proc. Natl. Acad. Sci. USA 111, 13367–13372 (2014).

  8. 8.

    , & Nat. Methods 12, 265–272 (2015).

  9. 9.

    ENCODE Project Consortium. Nature 489, 57–74 (2012).

  10. 10.

    et al. Nature 518, 317–330 (2015).

  11. 11.

    , , & Genome Res. 22, 1723–1734 (2012).

  12. 12.

    , , & PLoS Comput. Biol. 10, e1003711 (2014).

  13. 13.

    et al. Nature 489, 83–90 (2012).

  14. 14.

    et al. Nat. Genet. 44, 1191–1198 (2012).

  15. 15.

    et al. Science 312, 1215–1217 (2006).

  16. 16.

    et al. Nat. Genet. 46, 61–64 (2014).

  17. 17.

    et al. Hum. Genet. 133, 1–9 (2014).

  18. 18.

    et al. Nucleic Acids Res. 42, D1001–D1006 (2014).

  19. 19.

    et al. Nature 491, 56–65 (2012).

  20. 20.

    et al. Genome Res. 22, 568–576 (2012).

  21. 21.

    et al. Science 342, 747–749 (2013).

  22. 22.

    et al. Nucleic Acids Res. 42, D764–D770 (2014).

  23. 23.

    et al. Genome Res. 15, 1034–1050 (2005).

  24. 24.

    , , & Genome Res. 20, 110–121 (2010).

  25. 25.

    et al. Genome Res. 15, 901–913 (2005).

  26. 26.

    et al. PLoS Comput. Biol. 6, e1001025 (2010).

Download references

Acknowledgements

This work was primarily supported by US National Institutes of Health (NIH) grants R01 GM071966 and R01 HG005998 to O.G.T. This work was supported in part by the US National Science Foundation (NSF) CAREER award (DBI-0546275), NIH award T32 HG003284 and NIH grant P50 GM071508. O.G.T. is supported by the Genetic Networks program of the Canadian Institute for Advanced Research (CIFAR). We acknowledge the TIGRESS high-performance computer center at Princeton University for computational resource support. We are grateful to all Troyanskaya laboratory members for valuable discussions.

Author information

Affiliations

  1. Lewis-Sigler Institute for Integrative Genomics, Princeton University, Princeton, New Jersey, USA.

    • Jian Zhou
    •  & Olga G Troyanskaya
  2. Graduate Program in Quantitative and Computational Biology, Princeton University, Princeton, New Jersey, USA.

    • Jian Zhou
  3. Department of Computer Science, Princeton University, Princeton, New Jersey, USA.

    • Olga G Troyanskaya
  4. Simons Center for Data Analysis, Simons Foundation, New York, New York, USA.

    • Olga G Troyanskaya

Authors

  1. Search for Jian Zhou in:

  2. Search for Olga G Troyanskaya in:

Contributions

J.Z. designed the study, with input from O.G.T. J.Z. developed the method and analyzed the results. O.G.T. supervised the study. J.Z. and O.G.T. wrote the paper.

Competing interests

The authors declare no competing financial interests.

Corresponding author

Correspondence to Olga G Troyanskaya.

Integrated supplementary information

Supplementary information

PDF files

  1. 1.

    Supplementary Text and Figures

    Supplementary Figures 1–8 and Supplementary Note

Excel files

  1. 1.

    Supplementary Table 1

    List of all publicly available chromatin feature profile files used for training DeepSEA

  2. 2.

    Supplementary Table 2

    DeepSEA prediction performance for each transcription factor, DNase I hypersensitive site, and histone mark profile

  3. 3.

    Supplementary Table 4

    Allele-imbalance DNase I hypersensitivity prediction performance for 35 cell types

  4. 4.

    Supplementary Table 7

    Feature rankings for noncoding functional variant prioritization tasks.

CSV files

  1. 1.

    Supplementary Table 3

    Sequence based allele specific DNase I hypersensitivity predictions for allele imbalanced variants called from Digital Genomic Footprinting DNase-seq data

  2. 2.

    Supplementary Table 5

    DeepSEA functional variant prioritization model predictions for noncoding GRASP eQTLs and negative variants sets

  3. 3.

    Supplementary Table 6

    DeepSEA functional variant prioritization model predictions for noncoding GWAS Catalog SNPs and negative variant sets.

About this article

Publication history

Received

Accepted

Published

DOI

https://doi.org/10.1038/nmeth.3547

Further reading