Experimental validation of gene enhancers is labour-intensive and is traditionally based on reporter assays that are not amenable to high-throughput applications. Self-transcribing active regulatory region sequencing (STARR-seq) in Drosophila cell lines has allowed the quantitative assessment of the activity of putative enhancers genome-wide. In a companion paper to ENCODE 3, Sethi, Gu and co-workers1 developed a framework for predicting enhancers by using the fly STARR-seq data in combination with epigenomic datasets. In a first step, by overlapping STARR-seq enhancers with acetylation of lysine 27 in histone H3 (H3K27ac) and/or DNase hypersensitivity (DHS) peaks, the authors identified high-confidence enhancers and generated metaprofiles of H3K27ac signals across active STARR-seq peaks, followed by dependent metaprofiles for other histone marks. A matched filter score was calculated for all 30 epigenetic modifications, each of which showed better concordance with activity of STARR-seq peaks than simple enrichment of signals. An integrated model that combined filter scores for six discriminative epigenetic marks (H3K27ac, methylation at lysine 4 of H3 (H3K4me1), H3K4me2, H3K4me3, H3K9ac, and DHS) using a linear support vector machine (SVM) performed well in predicting active regulatory regions. Additional models were trained to distinguish enhancer and promoter regions. The parameterized models obtained from fly STARR-seq data were then applied to mammalian epigenomic data from ENCODE 3. In each of six mouse tissues at embryonic day 11.5, between 31,000 and 39,000 regulatory regions were predicted. The numbers of predicted regulatory regions in a selection of human cell types was in a similar range. A subset of human and mouse predicted regulatory regions was validated in transgenic mouse enhancer assays, human reporter assays and by look-up of published experiments in the VISTA database, showing overall good performance (area under the operating curve (AUROC) of 0.8 for mouse regions). It is expected that retraining of the described framework with genome-wide STARR-seq data from mammalian cells — once available — will improve its predictive performance even further.