a Quality metrics as a function of alert threshold for a model trained excluding the December 2019 eruption: MCC = Matthews correlation coefficient, a balanced quality metric similar to r2 (see “Methods”); eruption probability during alert = proportion of all raised alerts that contain an eruption; alert duration = fraction of analysis period during which the forecast model is in-alert. The red dashed line indicates an alert threshold above which the December 2019 eruption would have been missed. b Performance of the same model over the analysis period for a threshold of 0.8 (red dotted line). Ensemble mean (black), eruptions (vertical red dashed lines), and alerts with (green) and without eruptions (yellow) are shown. c Performance of forecast models under cross-validation, anticipating four out of five eruptive periods (five out of seven eruptions) when that eruptive period is excluded from training. As in (b), the alert threshold is 0.8. Missed eruptions are indicated in red, the remainder in blue. d RSAM signal (black) in the three days prior to the 2012, 2013, and 2019 eruptions, alongside the three feature values from Fig. 1d–f (blue, magenta, cyan). To aid comparison, feature values have been normalized in log space to zero mean and unit standard deviation. Precursor signals identified by arrows are referred to in the text.