Improving prediction and assessment of global wildfires using neural networks

Fires determine vegetation patterns, impact human societies, and provide complex feedbacks into the global climate system. Empirical and process-based models differ in their scale and mechanistic assumptions, giving divergent predictions of fire drivers and extent. Especially, the role of anthropogenic drivers remains less understood. Taking a data-driven approach, we use an artificial neural network to learn region-specific relationships between fire and its socio-environmental drivers across the globe. As a result, our models achieve higher predictability than previously reported, with global spatial correlation of 0.92, temporal correlation of 0.76, interannual correlation of 0.69, and grid-cell level correlation of 0.6, between predicted and observed burned area. Our analysis reveals universal global patterns in fire-climate interactions, coupled with strong regional differences in fire-human relationships. Given the current socio-anthropogenic conditions, Equatorial Asia, southern Africa, and Australia show a strong sensitivity of fire extent to temperature whereas northern Africa shows a strong negative sensitivity. Overall, forests and shrublands, show a stronger sensitivity of burned area to temperature compared to savannas, potentially weakening their status as carbon sinks under future climate-change scenarios.


Measuring model performance
To rank models by performance, we calculate five performance metrics for each model -temporal and interannual correlations 119 between spatially aggregated monthly and yearly timeseries of burned area (r T and r IA ), correlation between predicted and 120 observed yearly anomalies (r An ), spatial correlation between mean yearly burned area (r S ), and fractional deviation of predicted 121 total yearly burned area from that observed (r BA = 1 − abs(1 − BA predicted /BA observed )). We then combine these metrics with 122 weights (w i ) into an aggregate performance score P = 100 ∑ w i r 2 i 4 ∑ w i , which ranges between 0 − 100, higher the better. These 123 metrics are not used in NN training, but only to rank trained models (i.e., the models are optimized for cell level, and not 124 aggregate, performance). We aim to identify models that have good inter-annual predictability. However, since the spatial extent 125 of data is much greater than its temporal extent, if all weights were equal, models that perform well spatially would receive 126 a higher score even if they delivered poor interannual predictability. Therefore, to privilege models with better interannual 127 predictability, we use w IA = 4 and all others weights w i = 1. We report the correlation between predicted and observed BA in 128 individual grid-cells (r I , SI- Fig. 1), but do not account for it in evaluating the model performance. This is because we found r I more variables, trying out different combinations of drivers and measuring the model performance P, until we arrive at the best 133 performing model. We then further drop variables to arrive at a 'minimal model', i.e., a model that uses the least number of 134 variables without a substantial performance loss compared to the best model (we use the criterion, P best − P minimal ≤ 3). For 135 global analysis, we mosaic predictions from the minimal regional models for each timestep. 136 3 Results 137 3.1 Regional differences in the observed fire niche 138 To lay the ground for visualizing our model results and justifying our modelling framework, we first plot the fire niche 139 from observed (GFED4.1s) data. The fire niche can be thought of as an n-dimensional function, where each dimension is a 140 socio-environmental variable and the function value corresponds to burned area. This fire niche can be further understood by 141 looking at the density distribution of the drivers in the same n-dimensional space, where the density corresponds to the number 142 of times that driver combination occurs. We show how seemingly similar biomes can have very different fire regimes due to 143 differences in human activity. 144 First, we find that weather imposes universal constraints on fires: fires are limited when temperatures are very low or very high, 145 occurring largely at temperatures between 15 − 30 o C (Fig. 1). Similarly, high (instantaneous) precipitation suppresses fires, 146 with most fires occurring when precipitation is below 5 mm/month (Fig. 1 However, strong regional differences can be observed when we look along anthropogenic dimensions, where similar environ-149 mental conditions lead to very different fire regimes. For example, the responses of burned area to population density differ 150 strongly between regions -burned area declines sharply with population density in Australia and South America, declines 151 gradually and persists until much higher population densities in Africa, and even increases with population density in Boreal 152 and Equatorial Asia. Indeed, the most surprising difference is between northern and southern Africa. Despite having similar 153 environmental conditions, similar biomes and phylogenetically similar vegetation, the fire niches in the two regions are 154 substantially different: in northern Africa, we find a high fire extent in areas with low GPP (marked with a white triangle in 155 Fig. 2), but not so in southern Africa; similarly, we find high burned area in southern Africa in regions with high population 156 density and high GPP, but not so in northern Africa (marked with white ellipse in Fig. 2). This makes a solid case for a separate 157 treatment of different subcontinents in fire modelling.  159 We now compare the predictions of regionally trained neural-network models with the observed data. At the global scale, 160 predictions from our model (mosaicked regional models) closely match observations ( Fig. 3; see SI- Table 1 for the complete 161 set of models and their regional performance statistics). Our model accurately captures the spatial, seasonal, and interannual 162 variability in burned area, with correlations between predicted and observed data as follows: spatial correlation using temporally 163 averaged burned area -r S = 0.92, temporal correlation using global monthly burned area -r T = 0.76, interannual correlation 164 using global yearly burned area -r IA = 0.69, and individual correlation (between burned area of individual gridcells across 165 time and space) -r I = 0.6 (SI- Fig. 1). We also evaluated the performance of our model among different vegetation types, in 166 which r I varies between 0.31 − 0.78. The model performs best in savannas and broadleaved evergreen forests, and worst in 167 closed shrublands and needleleaved forests (SI- Fig. 1    Here the precipitation axis is log transformed with the function y = log(1 + x). Fires are largely confined between 15 − 30 o C temperature and < 5 mm/month precipitation (about 1.5 units on the log transformed scale shown here). Notable outliers can be seen in southern hemisphere Africa, where fires are observed at lower temperature and higher precipitation.

5/25
Figure 2. Regional differences in fire regimes can be seen along the GPP-population density axis. The frequency of occurrence of different GPP-population density driver pairs (A), and the mean burned area observed for each pair (B). Population density is log-transformed with the function (y = log(1 + x)). In general, fires occur at intermediate values of GPP and decrease with population density. However, the responses of burned area to population density are starkly different in different regions: In South America, burned areas are already low at low population densities (likely due to lower temperatures experienced there), and decrease sharply to almost zero once population density crosses about 3 persons/km 2 . By contrast, fires persist till very high population densities and decline only gradually with increasing population density in Africa. In Australia, burned areas are high at near-zero population densities, but decline sharply even for small population densities. Regression lines indicate the long-term trend in burned area, with the trends in observed (t o ) and predicted (t p ) burned areas mentioned above each panel. Interannual variability in burned area is well captured in our model, especially in Equatorial Asia, Australia, Southeast Asia, and South America. Long-term decline is highest in northern Africa, with our model predicting 36% of the observed decline. Fires in southern Africa drop sharply after 2013. 185 For each region, we obtain the sufficient regional predictors of fire from the inputs of the regional minimal models. Regional  Table 1. Regional predictors of fire. Performance of the best and minimal models for each region with respect to each of the five performance measures described in Methods, along with the aggregate performance score. In some regions, the best model is the same as the minimal model. Also mentioned are the variables that form the inputs of the models. BA is Burned Area, and LT is long-term trend in spatially aggregated yearly timeseries. Variables are as follows: gppl1 -cumulative GPP, gppm1growing season GPP (northern hemisphere), gppm1s -growing season GPP (southern hemisphere), pr -precipitation, ts -temperature, cld -cloud cover, vp -vapour pressure, rdtot -total road network density, pop -population density. All models include vegetation type fractions, including cropland fraction. The model for NHAF uses yearly vegetation fractions, whereas rest of the models use a single snapshot.

9/25
in the subcontinental region (e.g., fuel load is always high in Equatorial Asia, and population density is always very low in cover is sufficient to predict fire because both are correlated, but temperature drops out of the minimal model for Australia  implying that cropland fraction is neither a consistent driver nor a consistent deterrent of fires in these regions.

208
Road network density and lightning climatology showed no substantial explanatory power within regions, and dropped out of 209 all regional minimal models (for the effects of lightning, compare version 8 models in SI- Table 1). However, data on both 210 these variables were not available for multiple years. Therefore we do not rule out their effect on fires based on this study.

211
Specifically, including monthly lightning data may improve predictions in Boreal regions, as these regions are known to be 212 frequently ignited by lightning.

213
Droughts associated with El Niño events have been shown to strongly influence fires across the tropics, especially South Our model also predicted high burned areas in Australia during these years (Fig. 4J).

219
Previous studies have attributed the long-term decline in fire in northern Africa to cropland expansion 42 . However, we find 220 that this decline is instead explained most strongly (39%) by increasing population density. We found no performance drop 221 after excluding cropland fraction from the model (compare version 6 models in SI- Table 1), implying a low predictive value of Globally, forest-dominated areas show the highest sensitivity 10.46% − 18.75%/ o C of burned area to temperature (Table 2). density associated with an increase in aridity (vegetation type is held constant). Southeastern Australia and eastern Himalayan 247 regions have relatively less fires, but are highly sensitive to temperature changes in terms of percent change in burned area 248 (SI- Fig. 6). SI- Fig 3 (

250
Our machine-learning model rivals more complex process-based models, and delivers higher accuracy with fewer input variables. 251 We found distinct regional differences in fire drivers across regions, but within regions, between 1-5 drivers are sufficient to 252 accurately predict burned area. We found that whereas climatic constraints on fires were universal, differences in anthropogenic 253 niches may drive regional differences in fire activity. We predicted differential effects of increasing temperature in different 254 regions, with forests being disproportionately sensitive to temperature changes compared to savannas, although we have not 255 accounted for changes in co-varying drivers in this analysis. Our work suggests that an improvement in predictive accuracy 256 of fire models can result from better parameterization of models with fewer drivers, rather than expanding already complex 257 models with more processes and parameters.

258
Modelling approaches based in machine learning often face the criticism that they do not provide any understanding of the 259 underlying mechanisms and processes. However, as we demonstrate in this work, it is now possible (due to advantages in issues, future work could use data resampling to equalize the spatial and temporal extent of training data.

276
Studies differ on the predicted drivers of fire for the same regions. For example, the drivers of fire in northern Africa are 277 predicted to be precipitation, population density, and cropland fraction 42 , or population density, temperature, and wet days 31 .

12/25
We find precipitation and population density to be important drivers, but no effect of cropland fraction. In southern Africa, 279 predicted drivers are fuel and climate 42 , or wet days and cropland fraction 31 , or tree-cover, rainfall, dry season, and grazing 30 . 280 We, too, find fuel and climate to be the key drivers. While other studies predict fuel and climate to be the drivers in Equatorial

281
Asia 29 , we found that precipitation alone explained the variability in fires in this region. Similarly, Abatzoglow et al. 29 find 282 aridity alone as a driver in Southeast Asia, whereas we find climate, fuel, as well as population density to be important drivers.

283
In Boreal areas, we find an important effect of fuel load, not predicted by previous studies. SI- Table 4 gives a comparison of 284 predicted drivers in all regions.

285
Researchers agree that under future climate scenarios, some areas of the globe will see increased fire activity while others 286 see a decline 4 . The contrast in fire sensitivity to increasing temperature between northern and southern Africa may seem 287 surprising due to similarities in weather and vegetation. However, global climate models agree that fires in northern Africa 288 may decline by end of the century, whereas model agreement for southern Africa is low, and on average models predict no global models is northern Australia, where we predict an increase in winter-spring fires with temperature whereas global 296 models predict and agree on a decrease. However, this discrepancy could be explained by accounting for precipitation, which is 297 expected to increase in this region, but is assumed constant in our analysis. Furthermore, a small predicted increase in fires in 298 already-arid interior Australia is also surprising, but is consistent with the consensus of global models, and could be an artefact 299 of data-limitations as we argue below. A quantitative analysis of changes in burned area using future projected climatic drivers 300 would provide more accurate projections of fire activity under future climate change scenarios. Our neural-network model can 301 be directly integrated into vegetation models for such analyses. 302 We caution readers in interpreting the sensitivity values in arid areas in interior Australia and parts of South America. Although 303 we expect fires to reduce at extremely high temperatures due to declines in vegetation cover, in Australia and South America, 304 data-points which show a reduction in burned area at higher temperatures are limited (Fig. 1). Therefore, the neural network   Figure 1. Observed vs predicted burned area for each forest type. Red line is the 1:1 line, and black/white circles show mean BA in each class. Density of points increases from grey to blue to red to yellow. Correlation is indicated in the top left corner. To identify the dominant PFT in each grid, we first excluded all grids with more than 50% non-vegetated or agricultural area (they were classified as non-vegetated and croplands respectively). Among the remaining grids, we ranked the types by abundance. If the most abundant type was at least 10% more abundant than the second most abundant one, we classified the grid as dominated by that type, or else as mixed vegetation.  Table 3. Models used for sensitivity analysis, in the case where the best regional model from Table 1 does not include temperature. These models may have a substantial performance loss (Score difference > 5) compared to the best model.