EP-DNN: A Deep Neural Network-Based Global Enhancer Prediction Algorithm

We present EP-DNN, a protocol for predicting enhancers based on chromatin features, in different cell types. Specifically, we use a deep neural network (DNN)-based architecture to extract enhancer signatures in a representative human embryonic stem cell type (H1) and a differentiated lung cell type (IMR90). We train EP-DNN using p300 binding sites, as enhancers, and TSS and random non-DHS sites, as non-enhancers. We perform same-cell and cross-cell predictions to quantify the validation rate and compare against two state-of-the-art methods, DEEP-ENCODE and RFECS. We find that EP-DNN has superior accuracy with a validation rate of 91.6%, relative to 85.3% for DEEP-ENCODE and 85.5% for RFECS, for a given number of enhancer predictions and also scales better for a larger number of enhancer predictions. Moreover, our H1 → IMR90 predictions turn out to be more accurate than IMR90 → IMR90, potentially because H1 exhibits a richer signature set and our EP-DNN model is expressive enough to extract these subtleties. Our work shows how to leverage the full expressivity of deep learning models, using multiple hidden layers, while avoiding overfitting on the training data. We also lay the foundation for exploration of cross-cell enhancer predictions, potentially reducing the need for expensive experimentation.


SUPPLEMENTARY
We also carry out an investigation in which we change the threshold for the three different protocolsEP-DNN, DEEP-EN, and RFECS -and observe the effect on the validation and invalidity rates.
The detailed results are included here, separately for each of the two same-cell predictions (H1 and IMR90), Table S7 and S9, and for each of the two cross-cell predictions (IMR90 → H1 and H1→ IMR90) in Tables S8 and S10. The outputs from each algorithm can be interpreted as a confidence level. The higher the threshold, the more confident the algorithm is that that particular location is an enhancer. However, the algorithms have different scales for thresholds. For example, the level of confidence from an output of 0.5 from EP-DNN is different from an output of 0.5 from DEEP-EN (since DEEP-EN is a majority vote). The maximum validation rate in the table is the highest for EP-DNN in all cases. Even when finding a small number of enhancers (and keeping this number constant across the three protocols), EP-DNN has higher validation rate. This means it can find higher confidence enhancer site more accurately.
Training with IMR90 on EP-DNN gives a maximum output less than 0.8 and hence, threshold values 0.8 and above are not available (N/A). This means the EP-DNN is biased toward the zero value since it never fully reaches an output of 1 (it never even reaches an output of 0.8). This can be interpreted as lower confidence on the elements it is predicting as an enhancer. This only happens when using IMR90 as training data, not for H1 training data. So, we conclude that IMR90 does not contain information that fully captures the subtleties of enhancer signatures. Consequently, EP-DNN gives less accurate predictions when trained on IMR90. Therefore, IMR90 is not as good as a training set as H1. We hypothesized this earlier and provided some evidence. This result adds to the body of evidence.
Here we see the effect of varying the threshold to the validation rate and the invalidity rate, for each of the three protocolsEP-DNN, DEEP-EN, and RFECS. A threshold value of k for EP-DNN means that if the output of EP-DNN is greater than k, then that sample is considered an enhancer. A similar definition is used for DEEP-EN and RFECS. As the threshold is increased, the number of (positive) predictions naturally decreases, since the criteria for something to be called an enhancer is being made stricter. For EP-DNN, with increasing threshold, the validation rate improves till the very end, when there is a slight dip (threshold going from 0.8 to 0.9). The increase in validation rate is explained by the fact that a fewer number of samples are now being labeled as enhancers, but the ones that are being flagged as enhancers are more likely to be enhancers. Thus, in the formula for validation rate, the denominator (the number of samples predicted to be enhancers) decreases and so does the numerator (of the positive samples predicted, how many are actually predicted). But the decrease in the denominator is less sharp, leading to an overall increase in validation rate. However, beyond a certain threshold (somewhere between 0.8 and 0.9) there are far too many genuine enhancers that are being missed and consequently, the validation rate dips a little. For both DEEP-EN and RFECS, as the thresholds are increased, the validation rate keeps increasing. The dip toward the very end is not seen and this is likely because the jumps in threshold are too big. .
With increasing threshold, the invalidity rate monotonically decreases for all three protocols. This can be explained by the fact that the criteria for calling something an enhancer is becoming stricter with increasing threshold values. Thus, there is less likelihood that something is incorrectly being predicted to be an enhancer. Mathematically, for the invalidity rate, with increasing threshold, the number of samples that are being predicted as enhancers (the denominator) decreases, as does the number of those samples in the numerator that are actually TSS or unknowns (the numerator). But here the numerator decreases faster than the numerator, causing the overall invalidity rate to go down.
A reasonable setting for the thresholds will make a sizeable number of positive predictions, say 100K samples. Thus, really large values of the threshold are not very useful because too few of the enhancers re being found out.