Introduction

Heavy precipitation increases flood risk and is anticipated to become more frequent and intense due to ongoing human-caused global warming. In climate models, daily heavy precipitation intensifies at the 7% K−1 Clausius-Clapeyron rate of atmospheric moistening1,2 while mean precipitation is limited to 1–3% K−1 due to the efficiency with which the atmosphere can lose the additional latent heat3,4,5. The heaviest precipitation intensification is offset by lengthening dry periods and less intense moderate precipitation2,6 although precise results depend on event rarity and space- or time-averaging7,8,9,10,11,12,13,14. Here we investigate hourly afternoon-to-evening heavy precipitation, gridded at 1° × 1° over much of the central-east Continental United States (CE-CONUS; land within 32–55°N, 160–130°W). Heavy precipitation is defined from percentiles relative to all hours, both dry and wet15, during March–November of 2019 and 2020 (see Methods). We focus specifically on the hours following the 1:30 pm local time (LT) AIRS overpass rather than the 1:30 am LT overpass, to capture the hours in which the heaviest precipitation is most frequent16.

Rain gauge data support rapid local increases in intensity with temperature (T) or near-surface specific humidity (q)17,18,19, or regional time trends in heavy precipitation20,21,22,23,24,25, with evidence for some sub-daily events exceeding the Clausius-Clapeyron rate26,27,28. Rain gauge networks are spatially sparse, so larger-scale trends have been identified using satellites29 or reanalysis8. Products (e.g. refs. 30,31,32) include data artifacts from changes in satellites, calibration or overpass time, which can cause data discontinuities or drift that reduce confidence in conclusions about climate trends. By contrast, AIRS33 had exceptional instrument stability34 and constant overpass times from 2002–2021. However, it retrieves 3-D fields of T and q rather than precipitation, and then only in clear and partly-cloudy conditions at overpass times of approximately 1:30 and 13:30 local time35.

Here we present a case study showing that the AIRS T and q retrieved fields can be predictive of heavy precipitation detected by a surface radar network for hours following local overpass time. Our contribution is to account for mesoscale atmospheric motion in the evolution of the thermodynamic fields.

The prediction skill is drastically improved over the standard proximity sounding method in which observations or forecast model outputs of thermodynamics are directly used. The results are remarkable since our method currently does not explicitly account for surface fluxes, radiation or convection, and may unlock the development of a full AIRS-FCST record to complement studies of multi-decade trends in convection-related risk over large geographical regions.

Results and discussion

Linking sounder-retrieved thermodynamics to convection

The atmospheric thermodynamic state drives convective development, although factors such as aerosols may contribute36,37,38. Proximity sounding research was initially based on radiosonde or rawinsonde profiling near storms, typically within 3–6 h of the event. Extreme phenomena of interest include downburst-related severe winds39, derechos40, and many others41,42,43,44,45. The aim has been to relate convective indices such as CAPE or wind-shear in the pre-convective environment to the likelihood, intensity, or type of subsequent convective event.

AIRS retrievals have an advantage over sonde data thanks to the greatly increased spatial sampling, and have been applied in a similar manner46,47,48, or as a source of input data to improve numerical weather prediction49. Field campaigns have shown that small-scale, near-surface processes in the boundary layer are critical for convection initiation (CI)50. We do not address these processes, however, as we only address mesoscale thermodynamic variations above the surface that are among the observing strengths of satellite-based infrared sounders.

Proximity soundings are typically applied to hazards such as hail and tornadoes rather than precipitation, and in contemporary forecasting it is common to use thermodynamic fields from a combination of observations and forecast model output to relate atmospheric thermodynamics to risk. In this study we are particularly motivated to exploit the spatial coverage and temporal length of the AIRS data record to study multi-decade trends in the risk of heavy precipitation. AIRS’ observed distribution of thermodynamics, \(P(q,T)\), is related to the distribution of convection \(P({conv})\) through:

$$P\left({conv}\right)=P\left({conv}|q,T\right)P(q,T)$$
(1)

Fundamentally, all studies of climate change that evaluate changes in thermodynamics in the context of convective risk implicitly assume a relationship akin to Eq. (1), with our case using precipitation as a proxy for \(P({conv})\) and CAPE as our metric of \(P(q,T)\). In order for the approach to be useful, then \(P({conv|q},T)\) must be informative, in the sense that knowledge of \(P(q,T)\) should result in a narrower \(P\left({conv}\right)\) distribution than would be assumed a priori. If convection were unrelated to thermodynamics, then \(P({conv|q},T)\) would be uniform, and knowledge of \(P\left(q,T\right)\) would not affect our understanding of the probability of convection. On the other hand, if thermodynamics is informative of our convective proxy, then \(P({conv|q},T)\) will be nonuniform, and increasing nonuniformity represents increased information. The most extreme case would be a delta-function distribution, where convection only occurs for a particular combination of (q,T) and perfect knowledge of thermodynamics would lead to perfect prediction of convection. For our primary analysis, we therefore select a statistical measure of the nonuniformity of \(P({conv|q},T)\). Our intent is to use the limited overlap time between reliable products to establish \(P({conv|q},T)\) which can be used to generate a longer-term estimate of \(P({conv})\) over the full AIRS record. Since convection is associated with intense precipitation in our study time and region (see Methods), we use heavy precipitation as our convective proxy. After examining multiple indices of (q,T), we found that CAPE alone accounts for nearly all prediction skill (Supplementary Notes 1, Supplementary Fig. 1). CAPE represents the energy available from parcel buoyancy that could be converted to vertical motion (\(w\le \sqrt{2\times {CAPE}}\)), then as buoyant parcels are lofted into the upper atmosphere the local CAPE is consumed. Sub-hourly profiling from surface radiometers has revealed how convective indices evolve during storm lifecycles, including changes of over 1000 J kg−1 h−1 in CAPE51,52. The rapid atmospheric changes hinder interpretation of relationships between convective indices and convection, as they are derived from atmospheric profiles that do not instantaneously match those with CI.

CAPE changes are driven by atmospheric motion, surface fluxes, radiation and convection. To capture the impacts associated with mesoscale atmospheric motion, trajectory enhancement was introduced with promising results for severe weather events53. In trajectory enhancement, every point in a 3-D field of soundings is treated as an air parcel. Parcels are then moved in 4-D using weather forecast winds as inputs for the Hybrid Single-Particle Lagrangian Integrated Trajectory (HYSPLIT54) model; T and q are modified following thermodynamic rules for adiabatic parcels (see Methods). A 3-D grid of T and q is produced and convective indices such as CAPE are derived. Unlike individual profiles from sondes, the spatial coverage of sounders such as AIRS ensures that even after parcel motion, there are sufficient data at each location to define profiles and therefore derive convective indices.

Trajectory enhancement first yielded promising results for nowcasting in proof-of-concept case studies with NOAA operational sounders55,56. The analyses related thermodynamic conditions when severe weather events occurred, and therefore reported \(P(q,{T|conv})\), in contrast to Eq. (1)’s required \(P({conv|q},T)\). The next step in justifying trajectory enhancement was a simulation experiment, which demonstrated the ability to capture most of the time variance in T and q over six hours in ERA5 over CE-CONUS57. The CAPE fields derived from trajectory-enhanced ERA5 T and q were predictive of ERA5 heavy precipitation, establishing evidence for useful \(P({conv|q},T)\) in a forecast system. In the present study, we extend the results to real-world observations from AIRS for thermodynamics and the Multi-Radar Multi-Sensor (MRMS) rain gauge-corrected product for heavy precipitation events58. MRMS provides excellent time- and space coverage and performs well at quantitative precipitation estimation (QPE) even in extreme events59,60,61. Our analysis focusses on one-hour accumulations averaged over 1° × 1° grid cells, and so only captures some aspects of convective risk.

Figure 1 shows the development of intense precipitation observed by MRMS over Kansas beginning at 23 UTC on 26th July 2020, a date selected to illustrate typical AIRS-FCST performance. Note the minor data gaps over northern Kansas, representing cloud cover that is too dense to permit AIRS retrievals. AIRS retrieves the clear-sky profiles in partially cloudy scenes, but approximately 13 % of 1° × 1° grid cells do not return valid retrievals during our study period. Our final section will discuss the consequences of this sampling issue.

Fig. 1: Development of an intense convective storm over Kansas beginning on 26th July 2020.
figure 1

Colours are Multi-Radar Multi-Sensor (MRMS) 1° × 1° grid-cell mean quantitative precipitation estimtes (QPE). Contours are “moderate” CAPE > 90th percentile and “high” CAPE > 99th percentile. Shaded grey areas are where Atmospheric Infrared Sounder AIRS retrievals were obtained, the white area represents the gap between the two successive satellite swaths.

Nearer to overpass time (pre-2100 UTC), there is only a small region of CAPE > 99th percentile. The AIRS-FCST high-CAPE area expands to encompass the intense precipitation that occurs at midnight UTC. A proximity sounding approach using the AIRS values before 21 UTC would not have reliably identified the storm extent. Figure 1 shows how knowledge of mesoscale atmospheric motion alone improves the identification of areas likely to experience heavy precipitation. However, there are large areas where CAPE exceeds its 90th percentile but precipitation is not observed, a common problem with thermodynamics based prediction of convection and which will be discussed below. The results at 01–02 UTC also highlight a feature of trajectory enhancement: since AIRS-FCST does not account for small-scale convective updrafts, the AIRS-FCST CAPE is not consumed as it is in reality, and the high values persist after CI.

Statistical performance for predicting heavy precipitation

We are motivated to determine whether AIRS-FCST thermodynamics inform about heavy precipitation risk by an amount that justifies development of a full 2002–2021 record. AIRS-FCST would complement radiosonde profiles62,63,64, reanalysis65,66 or climate model outputs67 which have been widely used to infer changes in convective risk or precipitation. These suffer from the same issue as proximity soundings, in that the reported fields are not coincident in time with CI. As modelled hourly or daily precipitation are used in other research68,69,70,71, we compare our performance with ERA5 and the High-Resolution Rapid Refresh (HRRR72) version 3 convection permitting model. All data are regridded to 1° × 1° in latitude and longitude, a coarser resolution than any individual product (see Methods). Results may be sensitive to horizontal resolution, but our selection is consistent with datasets typically used in climate trend analysis.

Events at the Xth percentile are henceforth labelled using subscripts, for example QPE > QPEX, where percentile thresholds are derived from the entire sample of all times and locations. The full sample size is N > 1 million (>160k per forecast hour), so QPE99.95 corresponds to the top ~600 events. Ideally, percentiles would be defined more locally and for individual seasons, but local calculation would reduce the sample size to the point where statistics of rarer events cannot be reliably determined. By targeting events that reach or exceed QPEX, we face a classification problem. A common method for scoring classifier skill is based on the receiver operating characteristic (ROC73,74,75), but we primarily use the closely related Gini coefficient76,77, which gives the same conclusions (Supplementary Notes 2, Supplementary Fig. 2) and can be more intuitively interpreted. Bootstrapped standard errors are of order ±0.01 in Gini coefficient for QPE99.95 (Supplementary Notes 3, Supplementary Fig. 3), sufficiently tight to support our primary conclusions. Tests with regional or seasonal subsets also suggest that our results are not greatly sensitive to our choice of using the full sample for deriving QPEX thresholds (Supplementary Figs. 4, 5, Supplementary Table 1).

Figure 2a illustrates the Gini coefficient calculation, and from the cumulative distribution function (CDF) QPE > QPEX curve it is apparent that most of the heavy precipitation events occur when CAPE exceeds its 90th percentile (CAPE90). The Gini coefficient is derived from the CDF of \(P({conv|q},T)\) introduced in Eq. (1). A Gini score of zero represents an uninformative, uniform \(P({conv|q},T)\) equivalent to random guessing. A Gini score approaching one would mean all QPE > QPEX events occur at the highest CAPE.

Fig. 2: Derivation of Gini coefficient, and the coefficients of individual products and event rarity.
figure 2

a Derivation of Gini coefficient from the cumulative distribution function (CDF) of the occurrence of Multi-Radar Multi-Sensor quantitative precipitation estimate (MRMS QPE) above threshold QPEX, in this case QPE99.95, plotted by AIRS-FCST CAPE. b Gini coefficient calculated for QPEX, where X ranges from the 95th to 99.95th percentile. c CDFs for the select QPEX thresholds using Atmospheric Infrared Sounder-Forecast (AIRS-FCST) convective available potential energy (CAPE), d CDFs for two QPEX thresholds comparing AIRS-FCST CAPE (solid) with High-Resolution Rapid Refresh (HRRR) QPE (dashed).

Other measures of classifier skill include the false alarm ratio (FAR) and probability of detection (POD) used in forecast skill diagrams78, which require selecting a predictor threshold (e.g. CAPE > CAPEX) in addition to the QPE threshold. The scores are very sensitive to selected thresholds (Supplementary Notes 4, Supplementary Fig. 6) although AIRS-FCST CAPE is consistently similar in skill to ERA5 QPE and consistently exceeds the skill scores of the other thermodynamic predictors (Supplementary Fig. 7).

In Fig. 2a, 80% of QPE > QPE99.95 events occur for CAPE > CAPE90, meaning a probability of detection of 0.80. However, changing the CAPEX threshold allows a user to select almost any POD. On the other hand, intense precipitation very rarely occurs even in areas of high CAPE, leading to many false alarms when using thermodynamic thresholds alone. For example, Fig. 1 has large areas of CAPE > CAPE90 that do not coincide with precipitation. The triggering of convection is still not fully understood and remains an ongoing area of research79,80,81, and for forecasting individual events the high false-alarm ratio is a serious challenge. However, to better understand climate-related trends in risk, we simply require that \(P({conv|q},T)\) be informative, as discussed when introducing Eq. (1).

The product comparison in Fig. 2b shows how HRRR QPE is the most skilful predictor for all intensities beyond QPE99, but at or above QPE99.9 the AIRS-FCST CAPE displays similar or better skill than all other CAPE products or even ERA5 QPE. AIRS proximity sounding CAPE is consistently the worst performer. In general, CAPE is less predictive of lighter precipitation, which can include larger contributions from stratiform precipitation or shallow convection (Supplementary Notes 5, Supplementary Fig. 8). However, CAPE is necessary for the most intense events, with Fig. 2c showing almost zero QPE > QPE99.9 events for CAPE < CAPE75 (binned distributions of CAPE and QPE are shown in Supplementary Figs. 9 and 10 and discussed in Supplementary Notes 6). Figure 2d shows the CDFs using AIRS-FCST CAPE or HRRR QPE as predictors, and demonstrates how the forecast model is far superior than AIRS-FCST for the lightest events, but is less notably superior for the most intense. A potentially surprising result in Fig. 2b is the lower Gini coefficient for HRRR CAPE compared with ERA5 CAPE, despite its finer resolution. This may in part be due to the averaging over different spatial resolutions in each product, or due to the selection of surface-parcel CAPE in HRRR versus most-unstable parcel in ERA5 (see Methods).

For the rarest (QPE99.95) events, the AIRS-FCST Gini coefficient in Fig. 3a significantly improves with forecast hour (p < 0.05, see Methods). Meanwhile, the AIRS overpass CAPE degrades with time as expected, and by the latest timesteps AIRS-FCST outperforms ERA5 but not HRRR QPE.

Fig. 3: Changes in Gini coefficient with forecast hour.
figure 3

Progression of detection cumulative distribution function (CDF) for quantitiative precipitation estimates over the 99.95th percentile (QPE99.95) by hour, along with Gini coefficient (in legend) for (a) convective available potential energy (CAPE) derived from Atmospheric Infrared Sounder-Forecast (AIRS-FCST), b CAPE derived from the AIRS field at overpass time, c European reanalysis 5 (ERA5) precipitation, d High-Resolution Rapid Refresh (HRRR) model precipitation.

We propose two factors for the weaker AIRS-FCST performance at earlier timesteps. Firstly, precipitation at or before overpass time could continue through 21 UTC. Such cases would result in high precipitation at lower AIRS-observed CAPE, as seen at 21 UTC in Fig. 3a. A second factor is a potential shift in the cause and type of events. Earlier hours include more frequent precipitation near the east coast and the Gulf of Mexico, while later hours feature more precipitation in the Great Plains (Supplementary Notes 7, Supplementary Fig. 11). Data south of 32°N was excluded due to poorer performance (see Supplementary Fig. 12), even including other convective indices as predictors. In addition to the more intense ongoing Gulf Coast precipitation at overpass time, convective indices could be a poorer predictor of the coastal precipitation types, or processes neglected in trajectory enhancement could play a larger role near the coastline. A notable example could where air is advected onshore from the Gulf Coast, as AIRS-FCST would account for the advection, but not the moistening of parcels by extensive evaporation from the warm Gulf waters. Stronger performance inland fits this hypothesis, since surface fluxes play a smaller role in parcel evolution over drier land surfaces. The performance over these locations could be tested using trajectory enhancement of MetOp satellite overpasses at 9:30 am Local Time.

Understanding the mechanics of trajectory enhancement performance

We have shown that CAPE plays a larger role in intense hourly CE-CONUS precipitation than is revealed by proximity soundings. Figure 2 raises a mystery, however. ERA5 and HRRR account for processes that are neglected in AIRS-FCST, and HRRR is at a convection-permitting resolution, yet AIRS-FCST CAPE appears to outperform ERA5 or HRRR CAPE for QPE > QPE99.5. The results are also potentially surprising since ERA5 assimilates AIRS radiances at 13:30 local time, and yet by later hours it is outperformed by AIRS-FCST. Here we argue here that one way in which ERA5 and forecast model performance could degrade is through inappropriate triggering of convection.

We now use a case study to illuminate how trajectory enhancement CAPE compares with other datasets, including a hypothetical forecast model which is representative of HRRR or the forecast system used in ERA5. In Fig. 4a, b, an earlier proximity sounding returns CAPE far below the initiation value. In Figure 4a the forecast model accurately captures the event timing and strength. Standard practice is to extract model timestep outputs67,82,83, or to take an interpolated or timestep-mean value. In Fig. 4a, the pre-initiation hour reports high CAPE, but using the later hour or an interpolation would falsely report low-to-moderate CAPE. ERA5 or HRRR CAPE is taken from the instantaneous fields at a given hour, and matches one of the filled magenta circles. The idealised AIRS-FCST output shows performance similar to the pre-convective hour value from the model. However, since AIRS-FCST does not explicitly account for convection, the CAPE is not consumed and ends up artificially high in later hours. Such behaviour explains the maintenance of the expansive areas of high-CAPE at 02 UTC in Fig. 1, and is what would also occur in ERA5 or forecast models when their convection fails to trigger, or triggers late.

Fig. 4: Timelines of forecast CAPE, actual CAPE and convection initiation.
figure 4

Two hypothetical cases of convective available potential energy (CAPE) estimation for a single storm. In each case, the real-world evolution is in blue with convection initiation (CI) starred. Potential model output is shown in magenta, in a the model matches the real-world CI while in b the model convection triggers too early, consuming CAPE so that by the time of the real-world event, the model CAPE is too low. a, b show the same estimates of proximity sounding (red) and Atmospheric Infrared Sounder-Forecast (AIRS-FCST, black) outputs, where it is assumed that the large-scale winds driving AIRS-FCST are not strongly affected by CI.

Figure 4b represents a hypothetical case in which the model convection triggers prematurely, during the earlier CAPE peak. Model CAPE is consumed and is therefore artificially low at the point of true CI. Cases akin to Fig. 4b, in which CI is mistimed in a model, have been discussed previously84 and are plausible explanations of why AIRS-FCST CAPE shows equivalent or better performance than ERA5 CAPE for the most intense events.

Conclusions

CAPE is often used in model convective schemes85,86,87 but observational studies have not universally reported a simple relationship between thermodynamic measures of instability (e.g., CAPE) and convection88,89,90. Here we have shown strong evidence that mesoscale air motion reveals a far stronger relationship between local CAPE and the heaviest precipitation than when using standard observational methods. In the context of Eq. (1), the results show a far more informative \(P({conv|q},T)\) than for commonly used products, and support using a full AIRS-FCST record to document changes in convective risk over CE-CONUS. Such a record will offer improved spatial coverage compared with radiosondes or surface radiometers.

Future work could expand results to the nighttime overpass, and other geographical regions or convective proxies. The present study considered hourly accumulations averaged over 1° × 1° grid cells only, and future work could extend to different measures of hazard such as longer-term accumulations, using longer records for a greater sample size that would allow study of rarer events, of the stability of \(P({conv|q},T)\) through time, or of accumulations over longer time periods or larger areas. One notable avenue is that the Gini coefficient is higher for using CAPE to predict the QPE inside precipitating areas, rather than averaging over 1° × 1° grid cells (Supplementary Notes 8, Supplementary Fig. 13).

Trajectory enhancement could enable investigation of less-intensively instrumented regions, perhaps exploiting meteorological satellites’ lightning mapper coverage of South America91 and Africa92. Trajectory enhancement is anticipated to work best in conditions akin to Fig. 4, where the pre-convective atmosphere is sampled by the sounder and the thermodynamics have not already been perturbed by ongoing convection. Ongoing convection results in cloudiness, and while AIRS retrieves the clear-sky part of partially cloudy scenes, it does not work in totally overcast scenes. Trajectory enhancement of combined infrared and microwave data55,56 could include fully overcast scenes, and would allow study of whether the ~13% of AIRS non-retrieved grid cells results in meaningful biases in performance. However, we note that our statistics are derived by matching products where AIRS returns valid data, and so our results showing the large improvements in performance related to trajectory enhancement are not caused by sampling biases. The utility of AIRS analysis will, however, depend on where the overpass time falls within a regional diurnal cycle, for example the frequent occurrence of ongoing convection at overpass time caused us to exclude the Gulf Coast in this study.

For CE-CONUS global warming is expected to increase CAPE and convective inhibition (CIN), and decrease wind shear93,94,95. These may interact in complex and time-varying ways, for example CIN can restrict CI, perhaps reducing moderate events but allowing CAPE to build to relatively higher levels resulting in more explosive events. We therefore emphasise that our results are solely for predictive skill of intense hourly precipitation, and provide only a narrow description of the link between convective indices and convective risks. Far more complex quantification of precipitation properties is possible, has been applied to HRRR and MRMS96, and could be extended to thermodynamics-convection research.

Despite these limitations, trajectory enhancement of satellite sounder data have provided a major step toward nowcasting intense hourly precipitation and for tracking multi-decadal changes in risk through time.

Methods

Data

We use AIRS Version 7 L2Sup infrared only retrievals for temperature (T) and specific humidity (q) on up to 100 vertical levels with a typical horizontal resolution of ~50 km. Numerical Weather Prediction winds are from the WRF27km runs held at the NOAA Atmospheric Research Laboratory (ARL) for HYSPLIT. AIRS retrievals are included during March–November 2019 and 2020 for footprints within 17–21 UTC, and within the latitude-longitude box of 25–53°N, 107–64°W. All valid “good” or “best” flagged retrievals are included, regardless of whether they are land or ocean, or fall within our CE-CONUS region or not. The western limit was selected to exclude the Rockies, where orography is expected to cause larger issues for the relatively coarse resolution forecast winds. The other limits were selected to capture the typical AIRS afternoon orbits that intersect CONUS. The main analysis was then restricted to north of 32°N based on poorer performance near the Gulf of Mexico. The trajectory-enhancement method described below requires that the first forecast step (21 UTC in this case) occurs after the final AIRS sounding that’s included. Expanding west of the Rockies would include a third overpass, necessitating a delay in the first forecast timestep to 23 UTC or 00 UTC, missing much of the development of convection over CE-CONUS.

ERA5 CAPE and total precipitation (labelled QPE in the main manuscript), MRMS gauge-corrected QPE, and HRRR version 3 QPE are time matched and re-gridded to the AIRS-FCST grid and times.

HRRR grib2 data were obtained using the Herbie package and then extracted to netCDF using NOAA’s Weather and Climate Toolkit (https://www.ncdc.noaa.gov/wct). For our time period the Total_precipitation_surface variable was not found by the WCT command line tool, so hourly QPE was determined from instantaneous rain rates (mm/s) available every 15 min. The relevant variable, Precipitation_rate_surface was multiplied by 900 s then the four 15-min accumulations were summed into an hourly QPE, including zero values. Some uncertainty will be introduced by the use of instantaneous rates rather than the true accumulations.

The years 2019 and 2020 were selected since they are the only two years with complete March–November output from all datasets with constant versions. The primary limitation is version changes in HRRR, with the only two-year stints being July 2018–December 2020 (v3) and December 2020–recent (v4). All HRRR data over CONUS was selected and downloaded using the freely available Herbie Python package. The latter period was ruled out by an AIRS deep space manoeuvre in September 2021, which has led to an increase in erroneous thermodynamic retrievals that are not well understood at this time.

Trajectory enhancement to generate 3-D T and q fields

AIRS L2 data are treated as representing parcels distributed along the 3-D retrieval grid. Firstly, height above ground is calculated using the hypsometric equation with the product pressure levels (P) and retrieved T and q profiles as inputs. The parcel locations are then input into the Hybrid Single-Particle Lagrangian Integrated Trajectory (HYSPLIT) model with WRF27km winds for parcel motion. Parcels are moved forward with hourly resolution until 0200 UTC of the following day. The hours 21–02 UTC are reported as the forecast hours, with the first timestep being 0.5–3.5 h after AIRS overpasses. The large accepted range allows some days to capture two AIRS overpasses for more complete spatial coverage. At each hour, T and q of the parcels within a given 1° × 1° × 30 hPa latitude-longitude-pressure grid cell are averaged to generate 3-D fields.

The horizontal resolution is sufficiently fine to capture features of interest, as shown by the improved skill in terms of predicting heavy precipitation. It is also sufficiently coarse to ensure that sufficient air parcels are assigned to each grid cell to allow robust calculation of CAPE values and prevent data gaps from appearing. For an example of results at finer, 0.5° resolution, see Fig. 4 of ref. 57.

Calculation of thermodynamic indices and selection of data for analysis

In each latitude-longitude grid cell, and for each hour, convective indices are calculated using SHARPpy module97, for the most-unstable (MU), mean-mixed layer (MML) and surface (SFC) parcels. The calculated indices are CAPE, CIN, equilibrium level (EL), lifted condensation level (LCL) and level of free convection (LFC). Sensitivity tests showed that CAPE alone provided effectively all prediction skill, and among the variables MU_CAPE was most consistently predictive (Supplementary Fig. 2). Therefore MU_CAPE alone is used in the analysis.

Grid cells are included for a day if, for all time steps, there are (i) >20 AIRS-FCST parcels within the profile and (ii) MU_CAPE, MU_EL, MU_LCL and MU_CIN have valid values calculated by SHARPpy (including 0). In addition, the latitude-longitude range is limited to 32–53°N, 107–64°W and grid cells are excluded if their land fraction is below 50 %.

All properties, including those from ERA5, HRRR and MRMS, are matched in time and space to those from AIRS-FCST, and by requiring valid indices across all timesteps, the geographic sample is the same for all forecast hours and all panels of Fig. 3 contain consistent datasets. For Figures 13 and all reported statistics, only the forecast hours 21–02 UTC are included. For the AIRS CAPE values, the values calculated at overpass time are replicated for each of the 21–02 UTC forecast timesteps and the calculations proceed in the same way as other methods.

ERA5 CAPE is extracted from the ERA5 files rather than calculated from its profiles, and the ERA5 calculation method approximates the most-unstable parcel but does not exactly match standard calculations. Conclusions are not greatly affected since the ranking of CAPE values is similar when calculated from the profiles (see Fig. S2 of ref. 57). The HRRR hourly files by default provide surface CAPE, so our values for HRRR CAPE are the surface parcel values. Note that both ERA5 and HRRR calculate CAPE at the native horizontal resolution, and we average those values to 1° × 1°, whereas AIRS and AIRS-FCST T and q profiles are first spatially averaged and then CAPE is calculated.

Gini and significance calculations

For a selected threshold (e.g. QPE > QPE99.95, which is QPE > 5.1 mm h−1), entries are flagged as 1 or 0, where 1 is QPE > QPEX. In all cases, thresholds are based on all data, including all locations, all seasons, and both wet and dry hours. The flags are then sorted by the predictor (CAPE or QPE from another product), and the normalised cumulative distribution function (CDF) is calculated for 100 equally sized bins, each of which contains 1 % of the sample data.

For low rankings, predictors are zero, and these entries have a small random perturbation added, of order ±1 × 10−10, to allow unambiguous sorting. The overall results are unaffected by this, and the consequences are that the CDF shows an appropriate linear form at low values of predictor, for example at CAPE below the 65th percentile in Fig. 2b.

For the Fig. 3 discussion, we commented on significant changes or differences in Gini coefficient. This was confirmed in two ways: firstly, for an individual hour we use bootstrapping, by randomly sampling (with replacement) all values of the predictor from all six forecast hours down to a sample size equal to one hour. The Gini coefficient is then calculated for this new sample, and the resampling procedure is repeated 500 times and the standard deviation of the 500 Gini coefficients is assumed to be the 1σ standard error in the Gini coefficient. Individual hours were assumed to be independent, so the standard error in the difference between two hours is calculated from quadrature as \(\sqrt{2{\sigma }^{2}}=\sqrt{2}\sigma\). The p < 0.05 confidence level is then estimated as twice this value, or \(2\sqrt{2}\sigma\). For the hourly samples, this is approximately a difference of ±0.08 in Gini coefficient. Secondly, ordinary least squares was used to calculate the hourly trend in Gini coefficient within each product and twice the slope standard error was treated as significant at p < 0.05. Using either the bootstrapped-differencing or the trend approach, the Gini coefficients show significant improvements in hour in AIRS-FCST, and significant declines in AIRS.

Figure 4 generation

Figure 4 is only intended as an illustrative example. The “observed” time series is digitised from the event in Fig. 4a of a study of lightning initiation during storms over China51, where CAPE was estimated by regular profiling with a surface-based microwave radiometer. The “model” field in (a) are the observed fields with random Gaussian white noise (\(\sigma\) of ± 50 J kg−1). In (b) the “model” is assumed to initiate convection early, at the magenta cross point, and then the subsequent CAPE is artificially lowered. The proximity sounding is the true value at 06 time, and the AIRS-FCST values are illustrative.