GIGGLE is a genomics search engine that identifies and ranks the significance of genomic loci shared between query features and thousands of genome interval files. GIGGLE (https://github.com/ryanlayer/giggle) scales to billions of intervals and is over three orders of magnitude faster than existing methods. Its speed extends the accessibility and utility of resources such as ENCODE, Roadmap Epigenomics, and GTEx by facilitating data integration and hypothesis generation.
Subscribe to Journal
Get full journal access for 1 year
only $20.17 per issue
All prices are NET prices.
VAT will be added later in the checkout.
Rent or Buy article
Get time limited or full article access on ReadCube.
All prices are NET prices.
Quinlan, A.R. & Hall, I.M. Bioinformatics 26, 841–842 (2010).
Li, H. Bioinformatics 27, 718–719 (2011).
Sheffield, N.C. & Bock, C. Bioinformatics 32, 587–589 (2016).
Favorov, A. et al. PLOS Comput. Biol. 8, e1002529 (2012).
Elmasri, R., Wuu, G.T.J. & Kim, Y.-J. The time index: an access structure for temporal data. in Proceedings of the 16th International Conference on Very Large Data Bases (VLDB '90) (eds. McLeod, D., Sacks-Davis, R. & Schek, H.-J.) 1–12 (Morgan Kaufmann, San Francisco, California, USA 1990).
Ernst, J. & Kellis, M. Nat. Methods 9, 215–216 (2012).
Layer, R.M., Skadron, K., Robins, G., Hall, I.M. & Quinlan, A.R. Bioinformatics 29, 1–7 (2013).
De, S., Pedersen, B.S. & Kechris, K. Brief. Bioinform. 15, 919–928 (2014).
Xiao, Y. et al. Bioinformatics 30, 801–807 (2014).
MacQuarrie, K.L. et al. Mol. Cell. Biol. 33, 773–784 (2013).
Farh, K.K.-H. et al. Nature 518, 337–343 (2015).
Mei, S. et al. Nucleic Acids Res. 45, D658–D662 (2017).
Splinter, E. et al. Genes Dev. 20, 2349–2354 (2006).
Nativio, R. et al. PLoS Genet. 5, e1000739 (2009).
Xu, Y. et al. PLoS Genet. 12, e1005992 (2016).
Carroll, J.S. et al. Cell 122, 33–43 (2005).
Theodorou, V., Stark, R., Menon, S. & Carroll, J.S. Genome Res. 23, 12–22 (2013).
Mohammed, H. et al. Nature 523, 313–317 (2015).
Hanstein, B. et al. Proc. Natl. Acad. Sci. USA 93, 11540–11545 (1996).
Li, W. et al. Mol. Cell 59, 188–202 (2015).
Periyasamy, M. et al. Cell Rep. 13, 108–121 (2015).
Mohammed, H. et al. Cell Rep. 3, 342–349 (2013).
Lizio, M. et al. Genome Biol. 16, 22 (2015).
We are grateful to the anonymous reviewers for their suggestions and comments. This research was funded by US National Institutes of Health awards to R.M.L. (K99HG009532) and A.R.Q. (R01HG006693, R01GM124355, U24CA209999).
The authors declare no competing financial interests.
Integrated supplementary information
(a) Three example annotation sets shown graphically (left) and encoded in files (right) by start position, end position, and ID. (b) GIGGLE's bulk indexing process. (c) The GIGGLE interval search process.
Supplementary Figure 2 The GIGGLE scores for all pairwise combinations of the ChIP-seq datasets for the MCF-7 cell line.
Group 1 highlights the relationship between CTCF, RAD21, and STAG1. Group 2 highlights ERS1, FOXA1, GATA3, and EPS300. Group 3 shows an unexpected relationship between H2AFX and GREB1.
Supplementary Figure 3 A web interface that integrates data from of Roadmap and the UCSC genome browser.
(a) Users specify either a single interval or file to upload as the query, and the server responds with the GIGGLE results from an index in a heatmap. In this case the index is of CHROMHMM prediction from Roadmap. The color of each cell indicates the GIGGLE score, and users can click on a cell (e.g., Myoblast enhancers, marked in red) for more information. (b) When a cell is selected by the user, a window opens that contains the list of intervals in that particular Roadmap cell type/genome state annotation that overlap the query. Each interval is a link that can be followed (e.g., chr1:33642000-33642800, marked in red) for more information. (c) When an interval is selected, that interval becomes a query to a GIGGLE index of the UCSC genome browser tracks. The result gives the set of tracks that contain an interval that overlaps the query, and the web interface opens a window with a “smartview” where only those tracks with overlaps are displayed.
Supplementary Figures 1–3 and Supplementary Tables 1–4 (PDF 2119 kb)
Data used to generate Figure 1. (ZIP 133 kb)
Data used to generate Figure 2. (ZIP 34860 kb)
Cell line, tissue, and trait names from Figure 1; accession numbers from Figures 1 and 2 and Supplementary Figure 2. (XLSX 41 kb)
GIGGLE source code and experiment scripts. (ZIP 3041 kb)
About this article
Cite this article
Layer, R., Pedersen, B., DiSera, T. et al. GIGGLE: a search engine for large-scale integrated genome analysis. Nat Methods 15, 123–126 (2018). https://doi.org/10.1038/nmeth.4556
Current Protocols in Human Genetics (2020)
epiCOLOC: Integrating Large-Scale and Context-Dependent Epigenomics Features for Comprehensive Colocalization Analysis
Frontiers in Genetics (2020)
SparkINFERNO: a scalable high-throughput pipeline for inferring molecular mechanisms of non-coding genetic variants