A novel data mining algorithm, based on a newly developed indexing data structure called Sequence Bloom Trees (SBTs), efficiently searches large-scale short-read sequencing repositories for experiments containing a sequence of interest, at 162 times the speed of existing search methods. The authors validate their algorithm by building an SBT from 2,652 human RNA-seq short-read sequencing runs deposited in the NIH Sequence Read Archive (SRA). These runs represent the entire set of publicly available, human RNA-seq runs from blood, brain and breast tissues stored within the SRA at the time. Solomon and Kingsford searched the SBT for the expression of all 214,293 known human transcripts to identify tissue-specific transcripts. Results were obtained in 3.3 days, as opposed to an estimated 92 days with an alternative search method. The SBT index could be used to identify unknown long non-coding RNAs, for example, as it does not require prior knowledge of sequences of interest.