East African countries rely on rain-fed agriculture, and their economies are highly dependent on the seasonal variability of rainfall. Extreme floods and droughts are therefore posing a serious threat to East African countries1,2. During extreme floods, millions of people are displaced due to loss of property and lack of safe drinking water. In addition, malaria and other water-borne diseases put a strain on poorly equipped health systems, often resulting in life-threatening situations1,3. In the most recent floods in 2019, an estimated 1.5 million people were affected1. Droughts are equally damaging, with child malnutrition, livestock deaths and lack of safe drinking water posing critical problems for millions of people4. These problems become even more dangerous during prolonged droughts, such as the recent one in 2021, which left 20 million people struggling to survive2. Accurate seasonal rainfall forecasting is therefore urgently needed in East Africa. This would greatly improve socio-economic activities by reducing disruption to the various sectors affected.

East Africa mainly receives rainfall during two seasons in a year, the first in March-April-May, known as the ‘long rains’, and the second in October-November-December (OND), known as the ‘short rains’. The amount of rainfall in the long rains season is relatively greater5. However, the interannual variability of the short rains is more intense6 due to the rapid southward movement of the inter-tropical convergence zone7, which is relatively slower during the long rains. Although both seasons pose considerable threats to the region3,4,5, the large-scale climate drivers of long rains variability are lesser known8,9 affecting its predictability skills and usefulness to society.

However, the interannual variability of the short rains is found to be strongly linked to SST variability over the Indian10,11,12,13,14, Pacific8,15,16,17, and partly over Atlantic Ocean18,19 and thus represent a major source of moisture variability. Among these teleconnections, the Indian Ocean is primarily responsible for above- or below-normal precipitation during short rains10,11,12 due to its independent phenomenon known as the Indian Ocean Dipole (IOD), in which the western part [50°E-70°E, 10°S-10°N] of the Indian Ocean is anomalously warmer (cooler) than the eastern part [90°E-110°E, 0°-10°N], creating a zonal positive (negative) SST gradient known as the Dipole Mode Index (DMI) (see “Methods”)10. The above-normal rainfall is due to the reversal of the usual westerlies in the Indian Ocean, which become easterlies11,20,21 during the strong positive IOD events (pIOD)11,20,21, as a result of the warmer western Indian Ocean compared to a much cooler eastern part, thus bringing more moisture to East Africa11,20,21. On the contrary, during negative IOD (nIOD), these westerlies become even stronger20,21 and sweep the equatorial Indian Ocean, diverting moisture away from East Africa20,21 and resulting in below-normal short rains. Additionally, the region of the southwestern Indian Ocean, west of Madagascar, is also strongly linked with above- and below-normal short rains via modulation of the south-easterlies22,23 during its high and low phases, respectively. Similarly, the equatorial Pacific Ocean has been shown in several previous studies to influence the variability of short rains by modifying the Walker circulation. During the warm (cold) phase of the El Niño/Southern Oscillation (ENSO), short rains are observed to be above (below) normal with a positive correlation8,13,15, whereas anomalies in the western Pacific are negatively correlated17. However, there is an arguable difference when such an effect is analyzed during pure ENSO events when the correlation is moderate but importantly negative11. Hence, the effect of ENSO on the short rains is suggested to be effectively mediated via the state of the Indian Ocean11,12,14, thus suggesting the preeminent role of the Indian Ocean. Additionally, the southwestern Atlantic Ocean also partly influences the variability of short rains by altering the south-easterlies18,19. These global teleconnections, therefore, imply the possibility of skillfully predicting short rains a few seasons in advance.

Various dynamical model-based seasonal prediction systems are quite good at predicting short rains, but only at short-lead times (when initialized from August/September)24,25,26. Moreover, their skill decreases rapidly at longer lead times26,27 (when initialized from April/May) and thus exhibits more false alarms25,26,28. Such an inability of the dynamical models to predict short rains at long lead times is supposedly associated with failure in simulating the mean state of the Indian Ocean24,29,30, which plays a vital role in controlling the variability of the short rains, and the spring predictability barrier phenomenon31, where the predictive skill of the forecasts declines rapidly when initialized during/before spring26,31. Similarly, seasonal predictions of short rains have also been studied in the past using the statistical models by training them on these various teleconnections32,33,34. Interestingly, a slightly superior skill was observed in them as compared to the dynamical models at longer lead times; however, several extreme floods and droughts events during the short rains season were largely missed25,32,34. Moreover, the predictive skill of these statistical models was considered to be biased due to the estimation of the teleconnections over the entire training-validation period, which should rather be estimated separately during each period, thus overestimating the predictive skill35.

The strong teleconnections between short rains and SST variability in different oceanic regions, as well as the shortcomings in dynamical and statistical seasonal prediction models in predicting short rains at long leads, particularly extreme floods and droughts, to which East Africa is more prone, motivated us to investigate the predictability of short rains at both short and longer leads. To bridge this gap, we proposed a methodology based on the ‘convolutional neural network’ (CNN), a deep learning tool. In this study, we used SST anomalies (SSTA) and vertically averaged subsurface temperature anomalies (VATA) as predictors from September (short lead) and May/April (long lead) to develop an ensemble of CNNs to predict the East African short rain index during the OND season (hereafter EASRI). The predictability of various extreme floods and droughts that occurred over East Africa during a recent 39-year period (1983–2021) is discussed in the “Results” section, and the predictive skill is further diagnosed using CNN heatmap analysis to measure the self-sufficiency of these oceanic state-based predictors.


EASRI predictability assessment

The EASRI is predicted using global monthly anomalies of SSTA and VATA as predictors from April, May, and September initializations for the 1983–2021 period, using the procedures discussed in “Methods”. As mentioned in “Methods”, the ensemble mean of CNN predicted (hereafter: CNN predicted) EASRI is evaluated using Global Precipitation Climatology Project (GPCP)36 estimated EASRI (hereafter: observed). For the September, May, and April initializations, the anomaly correlation coefficient (ACC) between CNN predicted and observed EASRI was 0.64, 0.64, and 0.61, respectively, significant at the 95% level (See Fig. 1). Such consistent ACC of the CNN predicted EASRI at different initializations is related to efficient extraction of precursors from oceanic predictors (discussed in more detail in subsequent sections). On the contrary, when evaluated over a similar time period, the leading seasonal dynamical prediction systems show a very poor to moderate ACC in predicting short rains. For example, the Scale Interaction Experiment-Frontier ver. 2 (SINTEX-F2) observes an ACC below 0.45 for September and June initialization25, the coupled forecast system model version 2 (CFSv2), the Global Environmental Multiscale Nucleus for European Modelling of the Ocean (GEM-NEMO), the Canadian Centre for Climate Modelling and Analysis Coupled Climate Model v.4 (CanCM4I), the Center for Ocean-Land-Atmosphere Community Climate System Model v.4 (COLA_CCSM4), the Geophysical Fluid Dynamics Laboratory (GFDL)-A, and the GFDL-B observed moderately poor ACC of 0.4, 0.4, 0.26, 0.24, −0.42, 0.06, and 0.40 (<0.4, 0.19, 0.24, −0.54, 0.04, 0.41, and 0.47)26, respectively, in May (April)-initialized predictions. Such poor predictive abilities of dynamical models to forecast EASRI from April/May initialization are reportedly linked to the spring predictability barrier26,31 and bias in simulating the Indian Ocean’s mean state24,29,30. However, it is possible to improve these poor skills by using a hybrid statistical-dynamical model approach. This approach has been shown to improve the correlation of May-initialized predictions for a few dynamical models25,26. Interestingly, we found that the CNN model performs even better than those dynamical and hybrid models in predicting EASRI from different initializations and is least affected by predictability barriers.

Fig. 1: CNN ensemble mean prediction of EASRI.
figure 1

Comparison of observed EASRI (black-dashed) with CNN predicted using global SSTA and VATA as predictors from September (blue), May (orange), and April (blue) over a period from 1983 to 2021. Gray lines show the 90th and 10th percentile bounds. Ensemble mean ACC at different lead times is significant at the 95% significance level using a two-tailed t-test.

Given East Africa’s high vulnerability to extremes, we further examine CNN predictions during extreme floods and droughts. These extremes were classified using the 90th and 10th percentiles of the observed EASRI; if the EASRI exceeds either of these thresholds in a given year, it is referred to as an extreme flood/drought. During the validation period, 11 extreme events were observed, five of which were extreme floods: 1994, 1997, 2006, 2010, and 2019, while six were extreme droughts: 1996, 1998, 2005, 2010, 2016, and 2021. We also include the recent 2021 drought in the extremes, even though it was above the 10th percentile criteria because it was part of a series of recurrent droughts. Fig. 2 compares the CNN predictions of these various extreme floods and droughts from different initializations.

Fig. 2: CNN predicted extreme floods and droughts at different lead times.
figure 2

Comparison of CNN-predicted extreme floods and droughts initialized in September (blue), May (orange), and April (green) with observed GPCP precipitation anomalies (black). Extreme floods and droughts are sorted using the 90th and 10th percentile bounds (see Fig. 1) of EASRI. The standardized DMI (circle) and Niño3.4 (cross) indices calculated using OISSTv2 over September–November and November–January are overlaid.

Extreme floods

The CNN models generally predicted most of the extreme flood events, although with slight variations in the predicted amplitudes, for different lead times (Fig. 2). In particular, the phases of the two most severe floods of 1997 and 2019 are well predicted from different initialization months, due to the co-occurrence of pIOD and El Niño events. Similarly, the floods of 1994 were also predicted with high agreement with observations for all initializations. However, the prediction of the 2006 and 2011 floods was challenging for the CNN models during some initializations. The 2011 flood was predicted with very high agreement for the April initialization but was poorly predicted for the May and September initializations. The 2006 flood was predicted in phase with the April and May observations but failed the September initialization.

Investigating the cause of this high prediction skill during extreme floods using CNN heatmap analysis described in “Methods”, we observe a pIOD-like pattern in heatmaps as shown in Fig. 3. Such a pIOD pattern has been found to produce severe floods11,12 in East Africa due to the reversal of the usual westerlies to easterlies20,21 bringing with it abundant moisture; such a strong pIOD pattern is profoundly noted in the September initialization heatmaps (Fig. 3a–e), and these findings are consistent with previous analytical investigations11,12,14.

Fig. 3: Oceanic regions linked to predictions of extreme floods in CNN heatmaps.
figure 3

CNN gradient-based heatmaps observed during extreme flood prediction at different initializations of September (ae), May (fj) and April (ko). Heatmaps are extracted from the first convolutional layer for the best ensemble member among ten others. The shading, red (blue), denotes the positive (negative) relationship with EASRI. The color bar at the bottom denotes the strength of the gradients from each region; higher strengths suggest a stronger influence on EASRI variability. The pIOD is highlighted with a solid black box in the western [50°E-70°E, 10°S-10°N] and eastern [90°E-110°E, 0°-10°N] Indian Ocean, and the Mascarene High in the southwestern Indian Ocean [60°E-90°E, 20°S-30°S] and while Niño3.4 is highlighted with a dotted black box.

Similar predictive skills are observed for May-initialized CNN predictions as in September, where all six extreme floods are predicted in phase with observations and with reasonable amplitude, while only one extreme (2011) is underestimated. The heatmap analysis for May-initialized CNN predictions (Fig. 3f–j) also reveals a pIOD-like pattern, though the intensity is slightly lower than in September heatmaps. Because of the early signals observed prior to the peak season, the pIOD-like pattern signals in May-initialized heatmaps (Supplementary Fig. 1: b1-c5). The pIOD-related prediction is aided by the anomalies in western, central, and eastern Pacific regions during a few of the extreme floods, as noted in previous studies15,16,17.

As the initialization month shifts to April, the intensity of the pIOD-like pattern for extreme flooding in the heatmap further decreases drastically, and a part of the southwestern Indian Ocean north of Madagascar, the Mascarene High (MH High), is seen to contribute significantly (Fig. 3k–o). The region above the MH high has a profound influence on the south-easterly winds, which intensify during its high phase and bring abundant moisture to East Africa, as do the unusual easterlies. Such a region is also investigated as a long-lead precursor of short rains in some important studies22,23. Also, we note a slightly greater contribution from the equatorial Pacific in April-initialized heatmaps, compared to the other initializations, as potential long-lead precursors of short rains13,15. In addition, a contribution from the southwestern Atlantic is also observed, which may be due to its effects on the southeastward flow18,19.

Despite the skillful prediction of various extreme floods, CNN did not predict the 2006 flood from September and underestimated the 2011 flood from September and May (Fig. 2). CNN’s poor September-initialized predictions in 2006 and 2011 could be attributed to strong MJO activity during those years37,38, whereas the 2011 May-initialized predictions could be attributed to a weak pIOD in the presence of a strong La Nina (see Supplementary Fig. 1: b4), both of which are known to affect short rains in opposite ways13,38. But CNN correctly predicted the phase of the 2011 flood because of the pIOD.

Extreme droughts

In a similar analysis for extreme droughts, we discovered that CNN predicted droughts in 1996, 1998, 2010, 2016, and the most recent 2021 with high agreement with the September initializations (Fig. 2) but with an underestimation for the 2005 droughts. Whereas four out of six extreme droughts were predicted in-phase in May, two (1996 and 2005) were incorrectly predicted (Fig. 2). However, when compared to May, April-initialized predictions show consistent underestimation, with predictions of two of the six extreme droughts out of phase (i.e., 2005 and 2021) and underestimating the droughts of 2010. Extreme droughts are subjected to the same heatmap analysis as extreme floods, as shown in Fig. 4.

Fig. 4: Oceanic regions linked to predictions of extreme droughts in CNN heatmaps.
figure 4

Same as in Fig. 3, but observed during extreme droughts at different initialization from September (af), May (gl) and April (mr).

The high skill of September in predicting extreme droughts was found to be closely related to the nIOD-like pattern in the Indian Ocean (Fig. 4a–f), in contrast to the pIOD during extreme floods (Fig. 3a–e). Similar strong nIOD patterns were observed during the entire September initialization (Supplementary Fig. 2: a1–a6); such nIOD events further enhance the usual westerlies, diverting moisture away from East Africa and further leading to the droughts20,21, similar findings can be noted in many previous studies11,12,14. In addition to the Indian Ocean, the equatorial Pacific Ocean is also observed to contribute but with slightly less intensity.

During the May-initialized heatmaps (Fig. 4g–i), the nIOD-like pattern is slightly reduced in intensity compared to the September-initialized heatmaps (Fig. 4a–f). In addition, the contribution from the southwestern Indian Ocean, which has been studied as a potential cause of droughts22, is seen to increase in intensity, along with a slightly increased contribution from the equatorial Pacific.

Furthermore, in the April-initialized CNN heatmaps (Fig. 4m–r), this nIOD-like pattern further decreases compared to the May and September-initialized CNN heatmaps, along with an increase in intensities in the southwestern Indian Ocean and the equatorial Pacific Ocean. A contribution from the southern Atlantic is also detected, similar to the extreme flood heatmaps initialized in April.

Similar to extreme floods, there are a few cases where CNN has incorrectly predicted or underestimated droughts. These poor predictions of the 1996 and 2005 droughts from May may be related to the non-stationary relationship between the Indian Ocean and short rains in the former case, as in recent years39,40, and to the warmer Indian Ocean (Supplementary Fig. 2: b3) favoring floods in the latter case28. Similarly, the poor predictions during the April 2005 drought initialization were also related to the warmer Indian Ocean (Supplementary Fig. 2: b4), while the poor predictions for the 2010 and 2021 droughts could be due to the stronger MH high (Supplementary Fig. 2: c4, c6), which has been studied to favor floods22,23. In a further section, we discuss the results of the current study in terms of the comparative skill of the dynamical models and the differences in teleconnections observed for extreme flood and drought events.


The prediction of extreme floods and droughts has been elaborated over the last 39 years (1983–2021) using deep learning-based CNN models trained with global monthly anomalies of SSTA and VATA. The ensemble means of the CNN models show excellent skill in predicting extreme flood and drought years in short rainy seasons from September initialization, compared to May and April (Fig. 1). The ACC of the CNN predictions from May and April are observed to be much higher than the various dynamical and hybrid models25,26 that have the potential to resolve the physical linkages and associated dynamics. Such improved long-lead prediction skills of CNN were related to its ability to capture early precursors of predictors (Figs. 3 and 4), including the obvious pattern of IOD, low and high phases of MH, anomalous warm and cold phases of the equatorial Pacific, and anomalies in the South Atlantic, in part. Interestingly, the short-lead precursors (Figs. 3a–e and 4a–f) were observed mainly in the Indian Ocean, especially in the western and eastern tropical Indian Ocean, with weak signals in the northern Indian Ocean and the equatorial Pacific Ocean. On the other hand, these precursors with long leads (Figs. 3f–o and 4g–r) show a slight shift toward the southern Indian, equatorial Pacific and southern Atlantic Oceans. Such precursors detected by CNN models are consistent with several studies investigating large-scale drivers of short rains. Moreover, the efficient extraction of these precursors also helped CNN to reduce the effect of the spring predictability barrier; however, dynamical models experience a rapid reduction in predictive skill25,26 due to such a barrier, leading to false predictions. For example, CFSv2 predicted the drought of 2016 as a flood from April, perhaps due to the lasting memory of the earlier 2015 El Niño26. However, CNN models trained on long-observed datasets correctly predicted the extreme floods of 1997 and 2019, and the withering droughts of 2010, 2016, and the most recent 2021 from May (Fig. 2). Nevertheless, there were a few cases of poor predictions for both extreme floods (2006 and 2011) and droughts (1996, 2005, 2010, and 2021). Such poor predictions can be partly attributed to the non-stationarity relation between the Indian Ocean and the short rains, the unfavorable warming of the western and southwestern Indian Ocean, and the co-occurrence of opposite states of IOD and ENSO. In addition, high-frequency weather and climate variations such as the MJO, which are not resolved by the monthly data used here, may also play a role in these poor predictions. Nevertheless, further study using dynamical models or heatmap-driven machine learning models with high-frequency data may help to understand such exceptional extreme cases.

In summary, the CNN-based models show a high degree of predictability for both extreme floods and droughts over East Africa at different lead times over the recent 39 years (1983–2021), especially for September initialization. This consistency is also evident for May and April initializations barring a few exceptions. The IOD pattern emerges as a dominant precursor for extreme floods and droughts, with a particularly strong impact in September initializations. This is in addition to the South Indian, Atlantic, Western and Central Pacific regions with longer lead times. However, a few cases were poorly predicted at longer lead times. Those were strongly associated with a weak DMI at the time of initialization, allowing factors other than IOD to degrade the relationship. The skills shown by CNN models for predicting extreme floods and droughts two to three seasons ahead are promising and will greatly help in organizing mitigation efforts to manage extremes, especially in cases such as the prolonged droughts of 2021.


Estimation of EASRI

EASRI is estimated over the East African region [35°E-46°E, 5°S-5°N]. This study region (see Supplementary Fig. 3) includes most of Kenya, followed by southeastern Somalia, southern Ethiopia, and northern Tanzania. Rainfall anomalies over this region were calculated by subtracting the actual rainfall values from the long-term climatological mean calculated over the period 1981–2010. These anomalies were then averaged for the OND season. This procedure is repeated for the Global Precipitation Climatology Center (GPCC)41 and GPCP rainfall datasets to prepare EASRI for the training and validation of CNN models.


The DMI is calculated by taking the difference in spatially averaged SSTA between the western [50°E-70°E, 10°S-10°N] and eastern [90°E-110°E, 0°-10°N] Indian Ocean, as described by Saji et al.10. The SSTA were calculated by subtracting the SST from its long-term climatological mean from 1981 to 2010. The NOAA Optimum Interpolation (OI) SST V2 (OISSTv2)42 monthly data sets were used to calculate the DMI.


We have attempted to predict EASRI using SSTA and VATA as predictors from the months of September, April, and May in separate experiments using an ensemble of CNN43. Here we describe in detail the structure of the CNN used. The proposed CNN involves convolutional processes over the global monthly SSTA and VATA to extract useful patterns from them in relation to the EASRI. This process is briefly elaborated by Eq. 1. Several key constituent parameters of the CNN are listed in Supplementary Table 1, and these are optimized using a random search algorithm44 over a range of values for each hyperparameter in the specific domain as listed in Supplementary Table 1. In a random search algorithm, 300 trials with different combinations of different parameters are considered, and CNNs are trained and evaluated for each of these 300 trials. The selection of 300 different combinations of CNN was considered in relation to the number of hyperparameters of the CNN (i.e., 10, see Supplementary Table 1) and an arbitrary ratio of 30, which is sufficiently large for such analysis44. For validation, we retained the top ten CNNs out of 300 based on high ACC criteria to estimate the ensemble mean skill. The number of ensemble members is equal to the number of hyperparameters, as the performance of the CNN is highly sensitive to variations in each of them44. The “Results” section elaborates on the predictive skill of EASRI based on the ensemble mean of the top ten CNN models.

$${{\rm{EASRI}}}_{{\rm{t}}}=\mathop{\sum }\limits_{l=1}^{L}\left({avgP}\left(\mathop{\sum }\limits_{f=1}^{F}\sum _{{{INP}}_{t-{ld}}}\sigma \left(\mathop{\sum }\limits_{i=1}^{{fw}\times {fh}}\left({W}_{{ifl}}{R}_{{il}}\right)+{b}_{{fl}}\right)\right)\right)$$


$${global\; SSTA},{VATA}\,{map}\,{of}\,{size}\,({lat}\,x\,{lon}),\,{for}\,{first}\,{convolutional}\,{layer},$$
$$\begin{array}{l}{feature}\,{maps}\,{for}\,{subequent}\,{convolutional}\,{layer}\,{of}\,{size}\\\left(\left({lat}-{fh}+1\right)/2,\,\left(\log -{fw}+1\right)/2\right)\end{array}$$
$$F-{size\; of}\,{convolutional\; filter\; height}\,({fh}),{width}({fw})$$
$${W}_{{ifl}}-{weight\; matrix\; of\; size}\,{\hbox{`}}F{\hbox{'}} ,\,{shared\; over}\,{various}\,{regions}\,{of}\,{INP}$$

Training procedure of CNN

The training input attributes of the CNN, namely monthly SSTA and VATA, were derived from the Centennial in situ Observation-Based Estimates ver. 2 (COBEv2)45 (sea surface temperature) and Simple Ocean Data Assimilation (SODA)46 (subsurface temperature) datasets. These training attributes cover the period 1871–1980 with a spatial resolution of 5° × 5°, regridded from the original grid size by bi-linear interpolation to reduce the number of CNN parameters. In addition, the data were preprocessed by standardization followed by normalization (range −1 to +1) at each grid point. The target EASRI for CNN is calculated using GPCC datasets. The prediction of EASRI is performed using lagged monthly SSTA and VATA with a corresponding central month of the short rainy season (i.e., November). Different initializations are considered starting from April, May, and September monthly SSTA and VATA, where the distance between the initialization month and the central month of the seasonal EASRI is termed the lag. April, May, and September initializations are considered to have a lead of 7, 6, and 2 months, respectively.

We validate our proposed CNN model using SSTA obtained from OISSTv2, VATA obtained from the Global Ocean Data Assimilation System (GODAS)47 and the target EASRI index estimated based on GPCP rainfall anomalies (see Supplementary Fig. 4). These datasets are different from those used in the CNN training process, as similar sources may produce biased and non-robust predictions48.

A threefold cross-validation procedure was used to train the CNN, where the training data (1871–1980) was divided into several parts, and the CNN was trained on each part to ensure robust learning over data periods (see Supplementary Fig. 4). The hyperparameters of the CNN (see Supplementary Table 1) are optimized over training and cross-validation data sets using a mean square error-based loss function between observed and predicted EASRI. Model validation was performed over the period from 1983 to 2021. The training of the CNN was performed in an open-source Python environment based on Keras49 as the front-end APIs and Tensor-flow50 at the back-end, using the Earth Simulator at the Japan Agency for Marine-Earth Science and Technology (JAMSTEC).

Measures against overfitting

To overcome the overfitting problem in the CNN, several other layers are added in addition to the convolutional layers before the pooling layers. These are the drop-out layer51, the batch normalization layer52, and the l2 regularization layer43. The respective role of each layer is to filter out unnecessary parts of the predictors, to normalize the output after each convolution process to the limits of the transfer function, and to penalize the large trained weights. Apart from these measures, the training and cross-validation losses are also monitored, and the trials with validation losses lower than the training losses are avoided when choosing an ensemble member.

CNN heatmaps

The gradients of the trained CNN models from the first convolutional layer are extracted as heat maps to assess the importance of a specific region in the global ocean in controlling the variability of EASRI. The larger the gradients from a particular region, the more control it has over variability53. These gradient heatmaps differ from those used in some recent past studies48,54 where activation values were multiplied by gradients to produce heatmaps; however, such heatmaps are prone to contamination by large predictor values and thus misrepresent the importance of specific regions53. Equation 3 details the gradient-based heatmap extraction from the first convolution layer of trained CNN models. The heatmaps shown in Figs. 3 and 4 are extracted from the first convolutional layer for the best ensemble member (i.e., the first with the highest ACC) among the top ten ensemble members.

$${{\rm{O}}}_{{\rm{m}}}\,=\,\mathop{\sum }\nolimits_{f=1}^{F}\sum _{{{INP}}_{t-{ld}}}\sigma \left(\mathop{\sum }\nolimits_{i=1}^{{fw}\,\times \,{fh}}\left({W}_{{ifl}}{R}_{{il}}\,\right)\,+{b}_{{fl}}\right)$$
$$\frac{\partial L}{\partial {X}_{i}}=\mathop{\sum }\nolimits_{m=1}^{M}\frac{\partial L}{\partial {O}_{m}}\times \frac{\partial {O}_{m}}{\partial {X}_{i}}$$


$$M-{number\; of\; convolutional\; filters}$$
$$\frac{\partial L}{\partial {X}_{i}}-{gradients}\,{from}\,{convoutional}\,{layer}\,({heatmaps})$$