Skip to main content

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

GIGGLE: a search engine for large-scale integrated genome analysis

Abstract

GIGGLE is a genomics search engine that identifies and ranks the significance of genomic loci shared between query features and thousands of genome interval files. GIGGLE (https://github.com/ryanlayer/giggle) scales to billions of intervals and is over three orders of magnitude faster than existing methods. Its speed extends the accessibility and utility of resources such as ENCODE, Roadmap Epigenomics, and GTEx by facilitating data integration and hypothesis generation.

This is a preview of subscription content, access via your institution

Relevant articles

Open Access articles citing this article.

Access options

Buy article

Get time limited or full article access on ReadCube.

$32.00

All prices are NET prices.

Figure 1: Indexing, searching, performance, and score calibration.
Figure 2: Visualization of GIGGLE scores from various searches.

References

  1. Quinlan, A.R. & Hall, I.M. Bioinformatics 26, 841–842 (2010).

    Article  CAS  Google Scholar 

  2. Li, H. Bioinformatics 27, 718–719 (2011).

    Article  Google Scholar 

  3. Sheffield, N.C. & Bock, C. Bioinformatics 32, 587–589 (2016).

    Article  CAS  Google Scholar 

  4. Favorov, A. et al. PLOS Comput. Biol. 8, e1002529 (2012).

    Article  CAS  Google Scholar 

  5. Elmasri, R., Wuu, G.T.J. & Kim, Y.-J. The time index: an access structure for temporal data. in Proceedings of the 16th International Conference on Very Large Data Bases (VLDB '90) (eds. McLeod, D., Sacks-Davis, R. & Schek, H.-J.) 1–12 (Morgan Kaufmann, San Francisco, California, USA 1990).

  6. Ernst, J. & Kellis, M. Nat. Methods 9, 215–216 (2012).

    Article  CAS  Google Scholar 

  7. Layer, R.M., Skadron, K., Robins, G., Hall, I.M. & Quinlan, A.R. Bioinformatics 29, 1–7 (2013).

    Article  CAS  Google Scholar 

  8. De, S., Pedersen, B.S. & Kechris, K. Brief. Bioinform. 15, 919–928 (2014).

    Article  CAS  Google Scholar 

  9. Xiao, Y. et al. Bioinformatics 30, 801–807 (2014).

    Article  CAS  Google Scholar 

  10. MacQuarrie, K.L. et al. Mol. Cell. Biol. 33, 773–784 (2013).

    Article  CAS  Google Scholar 

  11. Farh, K.K.-H. et al. Nature 518, 337–343 (2015).

    CAS  Google Scholar 

  12. Mei, S. et al. Nucleic Acids Res. 45, D658–D662 (2017).

    Article  CAS  Google Scholar 

  13. Splinter, E. et al. Genes Dev. 20, 2349–2354 (2006).

    Article  CAS  Google Scholar 

  14. Nativio, R. et al. PLoS Genet. 5, e1000739 (2009).

    Article  Google Scholar 

  15. Xu, Y. et al. PLoS Genet. 12, e1005992 (2016).

    Article  Google Scholar 

  16. Carroll, J.S. et al. Cell 122, 33–43 (2005).

    Article  CAS  Google Scholar 

  17. Theodorou, V., Stark, R., Menon, S. & Carroll, J.S. Genome Res. 23, 12–22 (2013).

    Article  CAS  Google Scholar 

  18. Mohammed, H. et al. Nature 523, 313–317 (2015).

    Article  CAS  Google Scholar 

  19. Hanstein, B. et al. Proc. Natl. Acad. Sci. USA 93, 11540–11545 (1996).

    Article  CAS  Google Scholar 

  20. Li, W. et al. Mol. Cell 59, 188–202 (2015).

    Article  CAS  Google Scholar 

  21. Periyasamy, M. et al. Cell Rep. 13, 108–121 (2015).

    Article  CAS  Google Scholar 

  22. Mohammed, H. et al. Cell Rep. 3, 342–349 (2013).

    Article  CAS  Google Scholar 

  23. Lizio, M. et al. Genome Biol. 16, 22 (2015).

    Article  CAS  Google Scholar 

Download references

Acknowledgements

We are grateful to the anonymous reviewers for their suggestions and comments. This research was funded by US National Institutes of Health awards to R.M.L. (K99HG009532) and A.R.Q. (R01HG006693, R01GM124355, U24CA209999).

Author information

Authors and Affiliations

Authors

Contributions

R.M.L. conceived and designed the study, developed GIGGLE, and wrote the manuscript. B.S.P. developed the GIGGLE score and the PYTHON and GO APIs. T.D. developed the web interface. G.T.M. provided input in the development of the web interface. J.G. conceived and designed the ChIP-seq experiment. A.R.Q. conceived and designed the study and wrote the manuscript.

Corresponding authors

Correspondence to Ryan M Layer or Aaron R Quinlan.

Ethics declarations

Competing interests

The authors declare no competing financial interests.

Integrated supplementary information

Supplementary Figure 1 GIGGLE indexing process.

(a) Three example annotation sets shown graphically (left) and encoded in files (right) by start position, end position, and ID. (b) GIGGLE's bulk indexing process. (c) The GIGGLE interval search process.

Supplementary Figure 2 The GIGGLE scores for all pairwise combinations of the ChIP-seq datasets for the MCF-7 cell line.

Group 1 highlights the relationship between CTCF, RAD21, and STAG1. Group 2 highlights ERS1, FOXA1, GATA3, and EPS300. Group 3 shows an unexpected relationship between H2AFX and GREB1.

Supplementary Figure 3 A web interface that integrates data from of Roadmap and the UCSC genome browser.

(a) Users specify either a single interval or file to upload as the query, and the server responds with the GIGGLE results from an index in a heatmap. In this case the index is of CHROMHMM prediction from Roadmap. The color of each cell indicates the GIGGLE score, and users can click on a cell (e.g., Myoblast enhancers, marked in red) for more information. (b) When a cell is selected by the user, a window opens that contains the list of intervals in that particular Roadmap cell type/genome state annotation that overlap the query. Each interval is a link that can be followed (e.g., chr1:33642000-33642800, marked in red) for more information. (c) When an interval is selected, that interval becomes a query to a GIGGLE index of the UCSC genome browser tracks. The result gives the set of tracks that contain an interval that overlaps the query, and the web interface opens a window with a “smartview” where only those tracks with overlaps are displayed.

Supplementary information

Supplementary Text and Figures

Supplementary Figures 1–3 and Supplementary Tables 1–4 (PDF 2119 kb)

Life Sciences Reporting Summary (PDF 128 kb)

Supplementary Data 1

Data used to generate Figure 1. (ZIP 133 kb)

Supplementary Data 2

Data used to generate Figure 2. (ZIP 34860 kb)

Supplementary Data 3

Cell line, tissue, and trait names from Figure 1; accession numbers from Figures 1 and 2 and Supplementary Figure 2. (XLSX 41 kb)

Supplementary Software

GIGGLE source code and experiment scripts. (ZIP 3041 kb)

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Layer, R., Pedersen, B., DiSera, T. et al. GIGGLE: a search engine for large-scale integrated genome analysis. Nat Methods 15, 123–126 (2018). https://doi.org/10.1038/nmeth.4556

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1038/nmeth.4556

This article is cited by

Search

Quick links

Nature Briefing

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing