## Introduction

Volcanic ash clouds generated by explosive eruptions are distally transported in the atmosphere up to distances ranging from a few hundreds to thousands of kilometres from the vent. They are mainly composed of the finest ash fraction which survives proximal sedimentation, and referred to as very fine ash (<32 µm) following the physical volcanology-derived terminology of explosive eruptions1. However, in some cases coarser particles can reach distal location as recently demonstrated during moderate Icelandic eruptions2. Very fine ash can have damaging effects on aircraft, hence having detrimental impact on air traffic safety as demonstrated by air traffic disruption during Eyjafjallajökull3 and Cordón del Caulle4 eruptions. Distal airborne very fine ash represent only a fraction of the total amount of solid particles (referred to as tephra) injected into the volcanic plume column above the crater. Here we examine this partitioning (ε; given in percentage) as the ratio between the very fine ash flux transported in distal clouds (Qa) estimated from satellite-based infrared measurements, and the total flux of tephra emitted at the source (Qs) inferred from ground studies of tephra deposits. The latter is also referred to as the Mass Eruption Rate (MER). The ratio ε quantifies the volcanic ash removal efficiency in proximal areas, which is critical for constraining ash sedimentation processes during the early stages of cloud dispersal, as well as for predicting the ash clouds properties as they are advected around the globe.

We compiled a database of 22 eruptions of various magnitudes and intensities carefully selected from remarkably well documented case studies in the published records (Table 1). They are characterized by distinct eruption styles describing the dynamics and phenomenology of the explosive activity. Eruption styles can be defined from various classifications using different parameters5,6. In our case, we adopted the most recent one6 for the following reasons: (i) it uses the MER (equivalent to Qs) and the volcanic ash plume height (H) as input parameters; (ii) it allows individual eruption phases to be easily requalified; and (iii) permits a real-time first order classification of eruptive events, which is useful for operational applications. Therefore, we distinguish sustained eruptions (9 Small/Moderate, 7 Subplinian and 4 Plinian styles) defined by quasi-steady discharge conditions (i.e., with a duration of tephra emission much longer than the time necessary to reach the neutral buoyancy level) from transient eruptions corresponding to unsteady impulsive explosions (i.e., 2 Vulcanian style).

The database comprises satellite-based infrared measurements of Qa inferred from the extinction properties of ash using the split-window method7,8, except for the May, 2011 Grimsvötn and October 27, 2002 Etna eruptions, whose measurements were carried out using hyperspectral sounders. This technique provides vertical column densities as a mass per unit area, and allows total mass of very fine ash to be retrieved from integration of ash-bearing pixels over the whole cloud surface. Then, the average value of Qa can be calculated by simply dividing the total mass by the duration of ash emission. Most of the data come from Low-Earth Orbiting (LEO) platforms, hence allowing image acquisition 10 hours on average after the start of the eruption. The large Instantaneous Field Of View (IFOV) of these sensors usually allows the observation of the whole cloud from a single or a few images. The extent of the cloud determines the ash emission duration and the timing of the image acquisition allows us to identify the explosive phase within the eruption chronology. The inversion of thermal infrared measurements is particularly relevant for the characterization of the very fine ash content of volcanic clouds because it allows the retrieval of particles in the size range 1–32 µm (at 2σ for a lognormal distribution2). However, some known uncertainties remain from (i) the split-window technique leading to false detection and missed ash-bearing pixels9 as well as from (ii) the limits of the validity domain within the Mie scattering theory2,8. Overall uncertainty associated with satellite-based measurements was estimated to be in the range ±40–60%2.

The average value of Qs was calculated from the published mass of tephra deposited on the ground, divided by the duration of the explosive phase. Importantly, Qa and Qs refer to the same period of explosive activity and can thus be reliably compared. The temporal concordance required between these 2 parameters explains the relatively low number of eruptions finally selected. The deposit masses selected in this study (Table 1) were calculated by integrating the mass decay rate of the fallout deposit or integrating the thinning rate of the fallout deposit10,11. These methods are sensitive to the quality and density of field data, to the mathematical function chosen (e.g., exponential, power-law, Weibull) to represent their spatial variations and to the distal extrapolation limit. Indeed, individual measurements of tephra thickness or mass are extrapolated at greater distances than the maximum sampling, allowing the finest ash fraction to be accounted for, and the total mass of tephra estimated. Thus, Qs represents the average MER of the total grain size distribution at the source vent and is given with an uncertainty of ±10–40%12,13. Note that uncertainties on each individual eruption for both satellite and ground deposits retrievals have not been systematically published or inferred, but we specifically selected eruptions for which the measurements errors should be low (e.g. no clouds in the atmosphere, no erosion of the deposit, sampling performed hours to days following the eruptions, etc.). The related error on ε has been calculated from the average bulk uncertainties of Qa and Qs and reported for each eruption in Table 1.

## Results

### Source-to-atmosphere very fine ash partitioning

We show that ε of sustained eruptions spans a wide range of values, from 0.1% (e.g., Plinian Kelut 2014 eruption) to 6.9% (Small/Moderate Ruapehu 1996 eruption; Table 1). Fine ash removal from Plinian eruptions is thus about two orders of magnitude more efficient than that from Small/Moderate ones. Remarkably, the variation of the partitioning coefficient is not arbitrary. From Fig. 1, ε decreases with increasing MER, with respect to eruption styles. The four Plinian eruptions selected have large MER (1.7 × 107 < Qs < 1.8 × 108 kg/s). They all produced copious amount of volcanic ash, as for the 1980 Mount St Helens, and the 1982 El Chichón eruptions, for which the mass fraction of ash smaller than 63 µm represents 50% of the total mass of tephra emitted14,15. Such high fine ash contents are related to efficient magma fragmentation processes, occurrence of phreato-magmatic episodes, and contribution of ash elutriated from pyroclastic density currents (PDC) forming co-PDC plumes16. However, they all exhibit a very small proportion of distal very fine ash, as shown by the weak partitioning coefficient range (0.1 < ε < 0.9%), and fall in a well delimited area in Fig. 1. To explain this observation, we suggest that early enhanced fallout in proximal regions makes the actual proportion of very fine ash transported in distal clouds much lower than expected. This highlights the critical role played by collective settling mechanisms, occurring preferentially in ash-rich plumes, which enhance the sedimentation rate of tephra regardless of grain size. Such mechanisms include aggregation17,18, gravitational instabilities19, diffusive convection20, particle-particle interactions21, and wake-capture effects22. These are inferred to be key processes controlling the early depletion of ash-rich plumes, which cannot be explained by individual particle settling. Aggregation efficiency, in particular, has been identified23,24 to be proportional to a power greater than two of ash concentration. This means that the higher the fine ash concentration the more important the aggregation efficiency, which is in agreement with the observations made in our study.

Collective settling mechanisms, allow en masse sedimentation of particles of different sizes, which explains the significant amount of fine ash as well as the poor grain sorting sometimes observed in proximal tephra fallout deposits of large Plinian events. The fallout deposit from the 18 May 1980 eruption of Mount St. Helens (MSH80) for instance, shows both a poor grain sorting in proximal locations25,26 and an increase of mass and thickness at distances >300 km27 demonstrating rapid removal of fine ash from the plume. Different enhanced sedimentation processes have been invoked and successfully tested to explain these observations, including aggregation28, and hydrometeor formation14. Subplinian eruptions, although less powerful than Plinian ones, remain very explosive and capable of efficient fragmentation, also leading to the formation of ash-rich plumes. For example, the August and September 1992 Mt. Spurr eruptions produced fallout deposits with fine ash contents reaching 30% and 40% of the total mass of tephra, respectively29. The origin of this fine ash is discussed, and could be related to heterogeneities in the source magma or secondary particle fragmentation in the volcanic conduit or eruption column30. They have MER in the range 0.28–6.7 × 106 kg/s and still exhibit low partitioning coefficient, although spanning a wider range of values (0.1 < ε < 1.6%), hence implying collective settling mechanisms to be at work. The Mt Spurr fallout deposits also show an increase of mass and thickness at distances >150 km from the source, which can be explained by collective settling mechanisms including aggregation, topographic effects and gravitational instabilities30. Small/Moderate eruptions are drastically different from Plinian and Subplinian. The eruption explosivity and the MER are much weaker. The plume column height is generally lower and the fine ash fraction of the size distribution at the source vent is much smaller. Consequently, Small/Moderate eruptions do not produce ash-rich plumes, and enhanced sedimentation in proximal regions is limited, resulting in larger partitioning coefficients (0.5 < ε < 6.9%). The wide range of ε values for Small/Moderate eruptions reflects the heterogeneity of grain size and concentration of the associated plumes. But, this can also be explained by the natural complexity of some long-lasting eruptions. This is the case, in particular, for the Eyjafjallajökull 2010 eruption displaying multiple and discontinuous phases of explosive activity with varying intensity.

The inverse relationship between ε and the MER shows that for very powerful eruptions, the proximal sedimentation is mainly controlled by the concentration of fine ash. This suggests that above a given threshold of the fine ash volume fraction, collective mechanisms dominate over individual particle settling, and conversely. Assessment of this threshold is very difficult as proximal measurements (i.e., in the first tens of kilometres from the source vent) of airborne volcanic ash concentration are scarce. Indeed, satellite-based retrievals are usually impossible due to the opacity of the cloud. But, radar instruments operating at larger wavelengths are able to provide volcanic ash concentration within proximal cloud. The comparison of ash concentration at various distances, hence using various techniques, should bring precious information on early depletion processes and about sedimentation rate evolution. As an example, proximal measurements carried out in the first two hours after the MSH80 Plinian eruption by a 23-cm wavelength radar31 give an ash cloud concentration of 8.5 g/m3. In this region of the cloud we expect sedimentation rate to be high with a significant contribution of collective particle settling mechanisms. As a comparison, distal measurements carried out by satellite-based infrared sensors on the Kelut 2014 Plinian eruption32 give an ash cloud concentration 3 orders of magnitude lower (maximum value 9 mg/m3). In this region of the cloud, the sedimentation rate is very low and the individual particle settling is most likely to prevail.

### Ash cloud hazards and operational response

The partitioning parameter ε is crucial in operational volcanic risk mitigation, as it is required as input for ash-cloud-dispersal models used by several VAACs responsible for global air traffic safety. Given that satellite images are not systematically available, the VAACs need rapid parameterization schemes to predict Qa, and to provide frequent and reliable up-to-date forecast maps of atmospheric ash concentration during volcanic crises33. With this aim, VAACs (such as London and Toulouse) have typically used a poorly constrained default ε value of 5%34 to forecast the concentration of very fine ash composing distal ash clouds following Qa = ε × Qs. However, as demonstrated in Fig. 1, the fraction of very fine ash that survives proximal settling varies by 2 orders of magnitudes (0.1% > ε > 6.9%) with respect to the MER. Therefore, a constant partitioning value cannot be used, even as first order estimate for operational purposes. Note that Qs is needed, and usually obtained operationally from top plume height estimates following a power-law relationship35 between the two parameters. The reliability of Qs estimate from this method is discussed later in this work and compared with our satellite-based prediction model of Qs (see next section). Here we propose a new operational eruption-style-dependant parameterization of ε using the mean values for Plinian (εP = 0.5%), Subplinian (εSP = 0.8%), and Small/Moderate (εS/M = 3.2%) eruptions (Fig. 1). This parameterization is easily implementable in ash-cloud-dispersal models, allowing operational use by the VAACs. Also, the choice of the correct partitioning parameter to be used during the course of an eruption is not difficult. The phenomenology as well as the real-time assessment of Qs will be particularly useful to discriminate the eruption style. In some cases, the eruptive history at each volcanic target can also be helpful. Our assessed values of ε significantly depart from the default 5% value used by the VAACs (Fig. 1), and the resulting differences will propagate into the modelled ash cloud concentrations.

Therefore, in order to test the sensitivity of concentration variations to partitioning values, distal ash cloud dispersion maps were simulated for 4 eruption scenarios (Supplementary Information Table S1) using MOCAGE-accident, the ash-cloud-dispersal model of VAAC Toulouse. This model is based upon the three-dimensional chemistry and transport model developed by Météo-France, and specifically adapted for the transport and diffusion of accidental release from the regional to the global scale. For this study, meteorological data were extracted from Météo-France operational database, including 20 pressure levels, from 1000 to 10 mb, with a time resolution of 1-hour and a horizontal resolution of 0.5°. MOCAGE-accident internal grid resolution is 0.5°. For each scenario, ash release was constant for the eruption phase duration and uniform along a vertical line rising from the vent to the maximum plume height. The particle size distribution in the distal cloud includes 6 grain size fractions between 0.1 and 100 µm, with 70 wt% of the particles smaller than 30 µm36. For modelling simplicity, we run the simulations using present-day meteorological data. For the Plinian case (see Supplementary Information Fig. S2 for Subplinian and Small/Moderate cases), we use an eruptive scenario based on the Kelut 2014 eruption (Supplementary Information, Table S1). We compare the ash cloud loading (i.e., integration of ash concentration along the vertical path; in kg/m2) simulated using the VAAC-default ε value (5%) with the Plinian partitioning coefficient (εP = 0.5%) derived from our model (Fig. 2). The ash cloud concentrations are drastically different, with maximum values of 1.7 × 10−1 and 1.6 × 10−2 kg/m2 for the VAAC-default and our eruption-style-dependant coefficients, respectively (Fig. 2a,b). This means that for such Plinian eruptions, VAAC operational simulations could overestimate by a factor of 10 the amount of very fine ash in the atmosphere. Consequently, this would overestimate the extent of the no-fly zone (delimited in Fig. 2 by the black dashed line) set by the European Commission beyond a threshold37 of 4 mg/m3 (Fig. 2). Patterns of the no-fly zones are drastically different, and the extent computed from the VAAC model is 6.5 times larger than the other one, which could have serious implications for air traffic regulation during an eruption.

Volcanic ash particles can be responsible for the formation of indirect aerosols and/or droplets, the ones potentially having short term effect on the climate38,39,40. However, the systematic overestimation of the fine ash amount injected in the atmosphere during large sustained eruptions raises questions about the actual impact of volcanic ash on radiative forcing Conversely, when no calibration is available from ground deposits, the proximal sedimentation can be underestimated by such models (or other tephra-deposition models), as collective settling mechanisms are still not well constrained. This raises the question of the actual impact (buildings damage, agriculture and water pollution, health and respiratory problems, etc.) of tephra fallout in the vicinity of volcanic areas, likely to be larger than expected.

### Satellite-based prediction model of Qs

The interdependence of Qa, Qs and the eruption style leads us to develop statistical models for predicting Qs using satellite measurements of Qa with additional controlling parameters. A reliable assessment of Qs is essential for estimating plume dynamics close to the source, and hence for delineating zones impacted by tephra fallout using tephra-deposition models41. However, direct measurements of Qs remain impossible during the course of an eruption42. Thus, for rapid assessment of Qs, indirect methods have been developed using scaling laws based on relationships between measured plume height H and time-averaged Qs; these are referred to as empirical scaling laws35,43. This methodology currently represents the standard for real-time determination of Qs, although associated with uncertainties as large as a factor of 54 at a 95% confidence interval35. Data investigated here are small sized while the number of explanatory variables is relatively high. Therefore, we developed specifically a novel and robust statistical technique using a modified Akaike Information Criterion (AICc; see Methods) allowing the selection of the best regression mixture model for the eruptions in our database (all statistical indicators are summarized in Table 2). By combining Qs, Qa and H in three-dimensional space (Fig. 3a), the best model selected follows a power-law in the form:

$${Q}_{s}=30.22{Q}_{a}^{0.51}{H}^{2.25}$$
(1)

This relationship gives an AICc of 12.9 with excellent p-values (Table 2). The RMSE (Root Mean Square Error) yields an error factor of 12.8 at a 95% prediction interval. With an uncertainty four times lower than the empirical scaling laws35, this new satellite-derived model improves significantly the estimation of Qs (Table 2). In particular, the error distribution is not uniform as shown in Fig. 3b from the projection of the 95% prediction interval envelope in the H-Qa plane. This yields an error factor of 2, close to the data centre of mass that encloses 12 of the 22 eruptions of our dataset.

Then, we also collected 5 additional parameters (P1 to P5, Table 3) related to magmatic system properties and external processes (referred to as modalities), likely to control the amount of very fine ash produced and injected in the plume. Each modality has been coded on a Boolean basis (0/1) so that they can be statistically analysed. We then proceeded to the selection steps to discriminate between all the possible models with 7 different variables (Qa, H, P1 to P5), with modalities (P1, …, P5) being class parameters for Qa and H. The modalities include the SiO2 (P1) and H2O (P2) contents of the magma, the open or closed character of the conduit (P3), the occurrence of phreatomagmatic activity (P4), and the formation of co-pyroclastic density current (co-PDC) plumes (P5). Using our selection model analysis, these modalities allow clustering of the 22 data samples in the 3D space defined by Qs, Qa and H, and the identification of sub-models corresponding to different eruption scenarios (see the Methods section for details). We found that P1 and P3 are the parameters that best improve the fitness criterion, with a low AICc value of 10.4. This leads to a new sub-model yielding an error factor of 9.3 at a 95% prediction interval based on four different equations as follows:

$${Q}_{S}=25.95{Q}_{a}^{0.72}{H}^{1.95}\,\,{\rm{low}} \mbox{-} {{\rm{SiO}}}_{2}\,{\rm{and}}\,{\rm{closed}} \mbox{-} {\rm{conduit}}$$
(2)
$${Q}_{S}=25.95{Q}_{a}^{0.72}{H}^{1.4}\,\,{\rm{low}} \mbox{-} {{\rm{SiO}}}_{2}\,{\rm{and}}\,{\rm{open}} \mbox{-} {\rm{conduit}}$$
(3)
$${Q}_{S}=25.95{Q}_{a}^{0.62}{H}^{1.95}\,\,{\rm{high}} \mbox{-} {{\rm{SiO}}}_{2}\,{\rm{and}}\,{\rm{closed}} \mbox{-} {\rm{conduit}}$$
(4)
$${Q}_{S}=25.95{Q}_{a}^{0.62}{H}^{1.4}\,\,{\rm{high}} \mbox{-} {{\rm{SiO}}}_{2}\,{\rm{and}}\,{\rm{open}} \mbox{-} {\rm{conduit}}$$
(5)

The magma SiO2 content (P1), often associated with the magma viscosity is a critical parameter controlling the pressurization state of the shallow magmatic system provided sufficient gas is available. The parameter (P3) related to the open/closed character of the conduit goes in the same direction. Indeed, closed systems usually designate volcanic conduits or vents sealed by cooled lava acting as an impermeable plug preventing from easy gas exhaust, and hence allowing a pressure increase in the shallow magmatic system. Exclusion of P2 is unexpected, as the gas usually controls the MER at the source vent. This can be explained by the difficulty of comparing H2O content measurements made with different techniques. Exclusion of P4 is also interesting. Indeed, the phreatomagmatism is a mechanism involving external water and is frequently observed during recent subglacial Icelandic eruptions44. In one hand, magma-water interaction can enhance the explosivity hence the formation of very fine ash. On the other hand, water-rich eruptive column is likely to cause premature deposition of ash through wet aggregation and hydrometeor formation14,29. However, no significant influence of the phreatomagmatism could be demonstrated by the our statistical analysis. Significant amount of very fine ash can be produced by PDC as for MSH1980 Plinian eruption, therefore the contribution to airborne ash by co-PDC plumes needed to be tested. However, the variability of co-PDC plumes (P5) dispersion mechanisms45 associated with the difficulty to assess quantitatively their amplitude is likely to explain their exclusion. The power-law coefficients are related to the modalities (Pn) and show a strongly non-linear behaviour with power values of 0.72 and 0.62 on Qa for low and high-SiO2, respectively, and power values of 1.95 and 1.4 on H for closed and open-conduit respectively. The constant (c0 = 25.95) is inherent to the general model structure and is not dependent on the explanatory variable Qa and H, nor on the modalities P1 and P3.

Equations 2 to 5 offer a new tool for accurate, near-real-time estimation of Qs during an eruption, provided that Qa and H can be estimated. In order to validate our approach, we simulated the 23 February 2013 eruption of Mount Etna (Sicily) using two different Qs inputs. The goal of this work is to test the ability of each parameterization (Qs1 and Qs2) to reproduce the observed tephra fallout deposits. Simulations have been carried out using Fall3D; which is a tephra-transport and deposition model, and now represent a standard used at INGV (Italy), VAACs of Buenos Aires (Argentina) and Darwin (Australia). Thus, Fall3D is a perfect candidate for this analysis; a full description of its characteristics can be found in the litterature41,46. In one hand, Qs1 was estimated from our satellite-derived statistical model using the parameterization for low-SiO2 content and open conduit (Eq. 3), and used as input parameter in simulation 1 (Fig. 4a). On the other hand, Qs2 was calculated from the standard empirical scaling law35 (currently used operationally by the London and Toulouse VAAC), and used as input parameter in simulation 2 (Fig. 4b). Simulations were run between 00:00 (all times are in UTC) on the 23 and 24:00 on the 28 February 2013, within a 445 by 445 km grid domain using meteorological fields (from ECMWF data). They include 37 pressure levels with a time resolution of 6 hours and a horizontal resolution of 0.75°. The FALL3D internal grid resolution was 4 by 4 km, obtained by interpolating linearly the meteorological data. The three main Eruption Source Parameters (ESP) required by FALL3D at the input of the model are the plume column height, the Total Grain-Size Distribution and Qs. Then, the ability of each model to reproduce the observed tephra fallout deposits is assessed using field measurements47 of tephra loading at 10 locations carried out after the 23 February 2013 eruption of Mount Etna (Fig. 4; Supplementary Information Table S2, Table S3 and Fig. S3). The two simulations are strikingly different. The first one (Fig. 4a) provides a faithful reconstruction of the deposits as shown by the 5 isomass contours (set at 10, 1, 0.1, 0.01, and 0.001 kg/m2) correctly enclosing the sampling points #1 (21 kg/m2), #8 (0.29 kg/m2), #9 (0.013 kg/m2), and #10 (0.0014 kg/m2). On the contrary, the simulation 2 (Fig. 4b) using the empirical H-derived scaling law fails at reproducing the actual deposits and significantly underestimates the amount of tephra deposited on the ground. This is clearly shown by the restricted extent of the computed isomass contours, and is the direct consequence of the underestimation of Qs. These results illustrate the robustness of our model and highlight the importance of including satellite-derived estimates of Qa for reliable estimations of Qs.

## Conclusion

Volcanic very fine ash clouds can travel great distances and contaminate the atmosphere for long periods of time, disrupting air traffic as demonstrated during recent eruptions. However, the proportion of very fine ash distally transported in the atmosphere, and related proximal settling processes, are difficult to assess. Yet, for the past two decades, several operational meteorological agencies (VAACs) have used an unrealistic default value of ε = 5% as input for forecast models of atmospheric ash cloud concentration. Here, from the combination of field and satellite data, we provide first-time quantitative assessment of the source-to-atmosphere partitioning (ε) of very fine ash from 22 eruptions. We also developed a robust and novel statistical model for predicting the source mass eruption rate (Qs) with an unprecedentedly low level of uncertainty. The main findings are summarized below:

1. i.

The fraction of very fine ash (i.e., which survive proximal settling) varies by 2 orders of magnitudes (0.1 > ε > 6.9%) with respect to the MER. This partitioning is not arbitrary as ε decreases with increasing MER, with respect to eruption styles.

2. ii.

Large plumes from Plinian eruptions are much less efficient (up to 50 times lower) at transporting very fine ash through the atmosphere than previously anticipated.

3. iii.

We explain this behaviour by the existence of collective particle settling mechanisms occurring in ash-rich plumes, which enhance early and en masse fallout of very fine ash.

4. iv.

We suggest that proximal sedimentation during powerful eruptions is controlled by the concentration of fine ash regardless of the grain size.

5. v.

We thus propose a style-derived parameterization of ε (εP = 0.5%; εSP = 0.8%; εS/M = 3.2%) to be used into VAAC ash-cloud-dispersal models for operational applications.

6. vi.

We provide a novel and robust statistical model for the estimation of the source Mass Eruption Rate (Qs), with an unprecedented reduction of uncertainties from an error of a factor 54 (previous work used by some VAACs) to a factor 9.3 at a 95% prediction interval.

The fact that very fine ash from Plinian eruptions are not efficiently transported in the atmosphere and experience early sedimentation has major implications for risk management. On the ground, tephra fallouts can be more severe than predicted by current tephra-deposition models, having a detrimental effect on water infrastructure, buildings or agriculture. In the atmosphere, the concentration of far-travelled ash clouds can be much lower than predicted by current ash-cloud-dispersal models, hence having important impact for crisis management related to air traffic safety. We propose incorporating our eruption-style-dependant partitioning coefficients into VAAC ash-cloud-dispersal models, as well as the use of the equations (25) of our statistical model into tephra-deposition models. For this purpose, we provide (Supplementary Information, Table S4) operational parameters to be used in real-time for three standard eruptive scenarios (i.e., Plinian, Subplinian, and Small/Moderate). For each scenario, these parameters include the Total Grain Size Distribution48 (TGSD), the total ash fraction with diameter <64 µm, the distal very fine ash fraction (εP, εSP, εS/M), and equations of Qs for the estimation of the source mass eruption rate.

## Methods

### Statistical model

Data investigated here are small sized while the number of explanatory variables is relatively high. A classical solution consists in regularizing parameters estimation by introducing a penalty term into the maximum likelihood estimation problem. For instance, Ridge or Lasso regressions are based on this principle and have been introduced for variable grouping or to reduce the residuals variance49,50. However, each one is either specialized in the selection (grouping of variables) or in the reduction of quadratic errors51,52,53. Consequently, and in adequacy with our context, we propose to introduce a new penalty term that will allow: (i) grouping explanatory variables to determine the relevant number of predictors (ii) improving the estimation of the parameters assigned to each class and (iii) taking into account the small size of observed data. To avoid making any a-priori, the methodology has been at first, set in the general context of a Gaussian regression mixture models but it turned out by investigating the data set that only one Gaussian regression model is selected via our procedure. Therefore, we only present our methodology in this context, which moreover allows physical interpretations of the involved parameters. Indeed, consider (y1, …, yn)t a sample observed from the interest variable Y (Mass Eruption Rate, Qs) and let (xt, …, xt)t be a matrix of explanatory variables X (Qa, H, P1…, P5); xi are vectors of P. The estimation problem reduces to maximize the following penalized log-likelihood function:

$$l(\beta ,\sigma ,Y,X)=\sum _{i=1}^{n}\,log\,f(y,{x}^{t}\beta ,\sigma )-\,{\boldsymbol{Pe}}(\beta )$$

where f (y, xtβ, σ) is a Gaussian probability density with mean xtβ and variance σ2. The penalty term Pe (.) is given by $$Pe(\beta )=\alpha \sum _{j=1}^{p}\,|\beta |+(1-\alpha )\sum _{j=2}^{p}\sum _{l=1}^{j-1}\,|{\beta }_{j}-{\beta }_{l}|$$. The quantity 0 < α < 1 is a tuning parameter whose optimal choice makes a balance between the error of the model and the numbers of predictors used in it. This procedure is confirmed and emphasized by using model’s selection Criterion. The best known is the Akaike Information Criterion (AIC). It was designed as an asymptotically unbiased estimator of the Kullback divergence between the true model (that actually generated the data) and a statistical approximation of it. The measure of separation between the generating and a candidate model that we use is given by the Kullback’s symmetric divergence54. If we denote Φ = (β, σ) and Φ0 = (β0, σ0) this divergence is defined by:

$$J({{\rm{\Phi }}}_{0},{\rm{\Phi }})=\{d({{\rm{\Phi }}}_{0},{\rm{\Phi }})-d({{\rm{\Phi }}}_{0},{{\rm{\Phi }}}_{0})\}+\{d({\rm{\Phi }},{{\rm{\Phi }}}_{0})-d({\rm{\Phi }},{\rm{\Phi }})\}$$

where d0, Φ) = EΦ0 {−2 log f (Y |Φ)} is the Kullback-Leibler divergence and EΦ0 denotes the expectation with respect to f (Y|Φ0). Since d0, Φ0) does not depend on Φ we use:

$$K({{\rm{\Phi }}}_{0},{\rm{\Phi }})=d({{\rm{\Phi }}}_{0},{\rm{\Phi }})+\{d({\rm{\Phi }},{{\rm{\Phi }}}_{0})-d({\rm{\Phi }},{\rm{\Phi }})\}$$

For large sample data and inspired by55, one may prove that the criteria defined by:

$$AIC=nlog\,{\hat{\sigma }}^{2}+2(p+1)$$

is asymptotically unbiased estimator of EΦ0 (d(Φ0, $$\hat{{\rm{\Phi }}}$$)). In the case of small samples, we may prove that the criteria defined by:

$$AICc=nlog\,{\hat{\sigma }}^{2}+2\frac{n(p+1)}{n-p-2}$$

is unbiased estimator of EΦ0 (d(Φ0, $$\hat{{\rm{\Phi }}}$$)) and still satisfy the same asymptotic properties than the AIC. We say that a model is selected through the AICc if it has the lowest AICc in the family of chosen models.

Let us first observe that due to the range of the observations value, it is natural to consider log(y) as our new observations. The Gaussianity and independence of the observations will be asserted once the selection procedure is performed. Secondly, due to the very small number of observations, with respect to the number of covariables and using parameter estimations in a complete model, we then proceed to the selection steps to discriminate between all the possible models with 7 different variables (Qa, H, P1, …, P5). To this end, let us remark that Qa and H are physical parameters (directly related to Qs), while other parameters (here named P1 to P5) are related to magmatic system properties and external processes likely to impact Qs. We thus choose (P1, …, P5) to be class parameters for Qa and H, which is natural from a physical interpretation of the different volcanoes and meteorological conditions. Namely, we test models depending on the modality of the parameters leading to the complete model with unknown parameters generically called β·

$${\rm{l}}{\rm{o}}{\rm{g}}\,{y}^{k}={\beta }_{0}+\sum _{i=1}^{7}\,{\beta }_{i}{P}_{i}^{k}+\sum _{j=1}^{2}\sum _{i=3}^{7}\,{\beta }_{j,{P}_{i}^{k}}{P}_{j}^{k}+\sigma {\xi }^{k}$$

where (ξk) are independent standardized Gaussian variable. The selection via AICc criterion is then performed.