Anthropogenic influence on extreme precipitation over global land areas seen in multiple observational datasets

The intensification of extreme precipitation under anthropogenic forcing is robustly projected by global climate models, but highly challenging to detect in the observational record. Large internal variability distorts this anthropogenic signal. Models produce diverse magnitudes of precipitation response to anthropogenic forcing, largely due to differing schemes for parameterizing subgrid-scale processes. Meanwhile, multiple global observational datasets of daily precipitation exist, developed using varying techniques and inhomogeneously sampled data in space and time. Previous attempts to detect human influence on extreme precipitation have not incorporated model uncertainty, and have been limited to specific regions and observational datasets. Using machine learning methods that can account for these uncertainties and capable of identifying the time evolution of the spatial patterns, we find a physically interpretable anthropogenic signal that is detectable in all global observational datasets. Machine learning efficiently generates multiple lines of evidence supporting detection of an anthropogenic signal in global extreme precipitation.

Supplementary Text:

More on Layer-wise Relevance Propagation
The -rule rule with = 1 and = 0 (LRPα1β0) only considers the information which positively contributes to the final decision. For regression tasks such as the problem at hand here, inputs which contribute to a decrease in ( ) (i.e. an earlier predicted year; negative relevance) are equally as important as inputs which contribute to an increase (i.e. a later predicted year; positive relevance) to understand what the ANN has learned. Moreover, when > 1, the -rule might not conserve the relevance from the output value back to the input layer. For these reasons, ref. 1 pointed out that caution should be exercised when applying the -rule with 1) = 1 for regression and 2) > 1 in general. This is mainly because the interpretation of relevance heatmaps can be more subjective in these cases. We find that for our simple ANN, applying LRPα2β1 results in a 1:1 relationship between the resultant relevance heatmaps and ( ) for each input ( Supplementary Fig. 4d). This allows the visualization of input that contributes to a decrease in ( ) while maintaining a direct relationship between the ANN predicted value and LRP heatmaps. Therefore, we proceed with rescaled relevance heatmaps derived from LRPα2β1 for interpreting our ANN. We also found qualitatively similar relevance heatmaps with the basic relevance propagation rule LRPz, which does not treat negative and positive pre-activations separately. More details on LRP can be found in previous work (ref. [1][2][3][4]. For a toy example of LRP, we refer to ref. 5.

Role of model uncertainty in detecting the anthropogenic influence
To assess the influence of model uncertainty in detecting the signal, we redid the analysis, but including a widely used highly quality controlled HadEX3 dataset 6 , which along with its predecessors have been used in traditional detection and attribution of extreme precipitation 7,8 . HadEX3 and its predecessors are considered as a more reliable dataset than the other observational estimates used in this study, but do not provide full global coverage. Therefore, analyses were done for all GCMs and observations, just over the regions with a continuous data coverage in HadEX3 for the period 1979-2018 ( Supplementary Fig. 5). Two separate analyses were conducted. The first analysis was similar to the main analysis (Figures 1-4), using multimodel simulations to train the ANN ( Supplementary Fig. 6) to include the model uncertainty. The second was done to assess the role of model uncertainty. The ideal ANN input dataset in this case would be a large ensemble of realizations which have a time evolution of the ensemble mean equivalent to the multimodel mean of the CMIP models used in the first analysis.
The difference between realizations in this case represents the natural variability, as opposed to the first case in which it includes model uncertainty as well. We found CESM large ensemble simulations 9 suitable for this task. We used 40 initial condition perturbed ensemble members from the dataset for the period 1920-2099. Simulations follow similar forcing as CMIP5 models described in methods. To follow the same ANN training process as the first step, we used 26 ensemble members for training, 9 ensemble members for validation and the rest (5) for testing. Thereafter, the analysis is identical to the main analysis ( Supplementary Fig. 7).
When the model uncertainty is included, observations and reanalysis fail to identify the anthropogenic influence for the selected domain ( Supplementary Fig. 6), whereas when the model uncertainty is not included signal is detected in 9 out of 12 datasets (Supplementary Fig. 7). This suggests that when the model uncertainty is considered, the power of detecting the anthropogenic influence decreases.
Similar behavior in reanalysis and observations and the difference compared to testing data in these results ( Supplementary Fig. 6,7) also justifies the use of reanalysis as an alternative observation in assessing the anthropogenic influence, as argued in previous studies (e.g. ref. 10).

Sources of the spread in the signal of observations
For observational datasets used in the main text, the absolute value of the predicted year shows a wide range of values, with an overall underestimation compared to GCMs (Supplementary Fig. 3). A composite difference of the relevance and Rx1day between observations and testing models reveal that different regions contribute to this result ( Supplementary Fig. 8). In observations, a lower relevance compared to GCMs can be seen over Asia and North America (Supplementary Fig. 8a,c,e,g). These patterns correspond to an underestimation of Rx1day in the historical observational record compared to GCMs ( Supplementary Fig. 8b,d,f,h). Among the observations, the predicted year for MSWEP is the highest, which is due to having higher Rx1day over Greenland, Alaska and East Russia.
To investigate the differences in the anthropogenic signal in the observations and reanalyses, we first calculated the linear trend of Rx1day for each grid cell and weighted that by the normalized relevance for grid cells with a positive relevance (Supplementary Fig. 9). A simple explanation for this difference is that when more pixels with a positive relevance show an increase in Rx1day there is an increase in the predicted year. Confirming this, datasets with a smaller anthropogenic signal (e.g. ERA5, and CFSR, as shown in Figure 4) have a smaller number of grid cells with an increasing relevanceweighted trend in Rx1day compared to the rest of the datasets (Supplementary Fig. 9).