## Introduction

Hailstorms cause severe damage to crops and property, which has resulted in substantial economic losses. Over the past decade, average annual insured losses due to hail are estimated at \$8–13 billion in the U.S., and losses have increased dramatically over the last few decades (S. Bowen). Understanding the factors that are causing this increase in hail losses is of growing interest to stakeholders who wish to better understand and mitigate their hail risk. While all hail can produce damage, depending on what is being impacted, we focus on large, “significantly severe” hail (≥5 cm), because this class of hail can produce extreme amounts of damage to commonly exposed assets.1,2

A critical component of assessing hail risk is spatially estimating large hail frequency (e.g., average annual number of days with large hail at a location), and there are several approaches for estimating this frequency. First, hail observations, collected through in situ reports3,4 or instrumentation,5,6,7 represent the most direct way to estimate large hail frequency. Storm Data (https://www.ncdc.noaa.gov/stormevents/) contains U.S. hail reports dating back to 1955, but using this database is challenging because of spurious trends and biases in hail reports due to non-meteorological factors.8,9 Second, doppler radar data may be used to derive hail size proxies,10,11,12 but these proxies can have difficulty distinguishing between small versus large hail.13,14 Third, reanalysis and sounding datasets may be used to analyze the frequency of environmental factors that support hailstorms15,16,17,18,19,20 and, more broadly, severe thunderstorm environments.21,22,23 These factors are necessary, not sufficient, for large hail and do not consider convective triggers and in-cloud processes that are important for large hail formation.24,25,26 Given the limitations of each method for estimating frequency, we seek to employ a combination of all three approaches to assess trends in large hail frequency, because each approach supplies independent information to check for consistency.

Are changes in large hail frequency partially responsible for the observed increase in hail losses? There is large regional and decadal variability in hail incidence, regardless of size, in 20th century surface station observations across the U.S.27 While there has been no trend in empirically-modeled national hail occurrences since 1979,15 and no increase in hail days since the early 1990s,9 these findings do not preclude regional changes, nor do they preclude changes in other hail metrics. For example, Schlie et al.28 used a radar-based hail area metric to show that there has been an increasing number of days with hail outbreaks (90th percentile of the hail area metric) in the U.S., despite a decrease in the number of radar-based hail days from 2000 to 2011. A decrease in hail days, along with an increase in hail kinetic energy due to a shift in the distribution to larger hail sizes, has been observed in hailpad networks in Europe,5,6 possibly due to increased melting level heights and greater atmospheric instability.29 Motivated by these findings, we seek to investigate changes in regional large hail frequency using multiple approaches, and analyze possible reasons for any potential changes.

## Results

### Trends in large hail environments and observations

The Large Hail Parameter (LHP) and the Significant Hail Parameter (SHIP; https://www.spc.noaa.gov/exper/mesoanalysis/help/help_sigh.html) are specifically designed to discriminate between environments capable of producing large hail and smaller hail.30 The LHP is a nonlinear combination of six variables: the most-unstable convective available potential energy (MUCAPE), the hail growth zone (–10 °C to –30 °C) thickness (THKHGZ), the 700–500-hPa lapse rate (LR7–5), the bulk wind difference between the surface and parcel equilibrium level (ShearEL), the wind direction difference between the equilibrium level and 3–6-km layer (GRWαEL), and the storm-relative wind difference between the 3–6-km and 0–1-km layers (SRWαMid). The SHIP is a nonlinear combination of MUCAPE, the water vapor mixing ratio of the most-unstable parcel, the LR7–5, the 500-hPa temperature, the bulk wind difference between 0 and 6 km, and the freezing level.

We use the North American Regional Reanalysis31 (NARR) from 1979 to 2017 to calculate the LHP and SHIP at each reanalysis grid point. At a given grid point, we define an LHP day to be any 24-h period, beginning at 1200 UTC, that has a maximum LHP that exceeds 5.8, the 25th percentile of the LHP distribution based on proximity model soundings for hail report sizes ≥5 cm.30 A SHIP day is defined in a similar manner, using a maximum SHIP that exceeds 1.0, which is used as a guideline by forecasters to indicate an environment favorable for large hail.

The thresholds used to define an LHP and SHIP day are based on a balance. A lower threshold results in the inclusion of environments supporting smaller (<5 cm) hail. A higher threshold excludes areas where large hail is uncommon, but where trends in large hail environments may still exist. The chosen thresholds in this study represent a balance to isolate environments supporting ≥5 cm hail, while including as large of an area as reasonably possible. The resulting spatial distribution of the annual mean number of LHP and SHIP days (Fig. 1) has a strong correspondence with annual large hail frequency.8,32

Figure 1a shows that there is a positive trend of annual LHP days over much of the central and eastern portions of the U.S., while there is a negative trend of annual LHP days in the immediate lee of the Rocky Mountains. The trend in annual SHIP days (Fig. 1b) is mostly consistent with the trend in annual LHP days, albeit trends are generally smaller, and statistically significant positive trends are concentrated within the Midwest region.

Trends in LHP days are confined to specific months, depending on the region (Fig. 2). Increases in LHP days in the south-central portions of the U.S. occur in March and April. In May and June, increases in LHP days occur farther northward over the north-central U.S. Small increases in LHP days occur in the Northeast in June and July. Decreases in LHP days in the immediate lee of the Rocky Mountains occur mainly in July and August. The pattern of monthly trends in SHIP days is very similar (Supplementary Fig. 1).

The overall pattern in LHP- and SHIP-day trends in Figs. 1 and 2 indicate there has been an expansion and eastward shift of environments conducive for large hail from 1979 to 2017. This result suggests there has been potential for an overall increase in risk of damages and losses from large hail, albeit the increase in risk would only be realized if large hail actually materializes more frequently.

To assess whether large hail has materialized more frequently, we compare annual LHP/SHIP-day area, which we will refer to as the “large hail environment area”, with annual large-hail-report- (LHR-)day area, which we will refer to as the “large hail report area”. Here, each area refers to the areal sum of grid boxes that satisfy a LHP, SHIP, or LHR day over a given year. Due to the sparse and inhomogeneous nature of LHRs, sums over regions help to alleviate the inherent noisiness in the report data. There is a significant, positive correlation between the annual large hail environment and report area (Fig. 3) when averaged nationally and over each region, except the Rocky Mountains region. The high covariance in interannual variability suggests that there is indeed greater annual large hail report area when there is greater annual large hail environment area east of the Rocky Mountains. Moreover, the trends in both areas mimic one another, especially in regions that have a higher population density (Midwest, Northeast, and Southeast). As a result, these data suggest the possibility that increases in the frequency of large hail reports may be partially attributed to increases in the frequency of LHP/SHIP days through the analysis period. One must keep in mind that increases in LHRs are also due to non-meteorological factors (e.g., changes in reporting practices, establishment of the Next-Generation Radar network, and increases in observers),8,9 although restricting the scope to large hail sizes and examining these data through areal sums of LHR days, and not raw counts, diminishes the effect of these non-meteorological factors.9

To obviate these non-meteorological factors, the maximum estimated size of hail (MESH),10 derived from 3D-gridded radar (GridRad)33 data from 1995 to 2016, is used to calculate the MESH-day area, using a MESH threshold of 6.4 cm (see “Methods” section), which we will refer to as the “large hail radar-derived area”. There is a significant, positive correlation between the annual large hail environment area and radar-derived area in the Midwest and Northeast regions, and there are non-significant, weak correlations in other regions (Fig. 3). There has been a general increase in the large hail radar-derived area nationally, dominated by increases in the Rocky Mountains and Central regions.

MESH data are too limited in time to definitively say whether trends in the large hail radar-derived area are consistent with the longer-term trends in large hail environment and report area. While there is a positive trend in the large hail radar-derived area nationally, it is important to keep in mind that the reliability and resolution of the radar network has increased with time. It is also important to keep in mind that having a MESH ≥6.4 cm cannot be equated to large hail actually reaching the ground, so the large hail radar-derived area is likely encompassing a wider range of hail sizes.13,14

There is a puzzling decrease in the large hail environment area in the Rocky Mountains region, which is opposite of the increases in large hail report area in this region. There are a few possibilities for this inconsistency. The first possibility is that this region has seen large population growth (P. Mackun and S. Wilson), and it could be that these population effects are dominating the increase in large hail report area. As a result, large hail reports have become more reliable since the early 2000s.9 The correlation between LHP-day area and large hail report area increases from 0.08 to 0.51 when only considering the data after 2002. Similarly, the correlation between SHIP-day area and large hail report area increases from 0.17 to 0.43. The second possibility is that the LHP and SHIP are not as well calibrated for high-plains and mountainous environments, where orographic effects are especially important during boreal summer.20 The third possibility is that there are biases in the NARR over the Rocky Mountains region.34,35 Evaluating these possibilities is beyond our research scope here, but it appears the statistics of large hailfall and hail environments in this region may require special consideration.

### Causes for LHP trends

Figure 4 shows trends in the LHP components in order to assess which variables are causing the trends in LHP days. We subjectively choose the LHP, because it has a wider diversity of kinematic and thermodynamic components compared to the SHIP, and the results that follow are very similar if using the SHIP. First, in order for a day to potentially have a nonzero LHP, which we call an eligible day, it must simultaneously meet minimum thresholds of instability (MUCAPE ≥ 400 J kg−1) and bulk wind difference between the surface and 6 km (Shear6 ≥ 14 m s−1). Figure 4a shows there has been a decrease in eligible days in portions of the Rocky Mountains region, and there has been an increase in portions of the Midwest and Northeast. The Northeast, which is typically characterized by more marginal environments for severe weather,36,37 has seen an increase in eligible days. Although not the focus here, we also note that there has been a significant positive trend in eligible days in southern Canada, which is consistent with an observed increase in severe hail frequency over Ontario.38

When considering the six variables that encompass the LHP, we calculate trends in days that exceed variable-specific, spatially-varying thresholds, controlling for trends in eligible days. Instability (MUCAPE) changes have contributed to a positive trend in LHP days in portions of the Midwest, Northeast, and Southeast regions (Fig. 4b). Hail growth zone thickness changes have contributed to modest increases in LHP days in the lee of the Rocky Mountains and in the vicinity of the U.S.–Canada border (Fig. 4c). Mid-level lapse rate changes have contributed to a decrease in LHP days over the Rocky Mountains region and a broad increase in LHP days over much of the eastern half of the U.S. (Fig. 4d), which represents the most robust signal among the LHP variables. Shear (ShearEL) changes have contributed to small increases in LHP days in parts of the Rocky Mountains, Midwest, and Northeast regions (Fig. 4e). Changes in layer wind differences associated with large hail production (GRWαEL and SRWαMid; Fig. 4f, g) have had limited contributions to trends in LHP days, with increases mainly focused in the Great Plains. The sum of the trends in Fig. 4a–g largely resembles the spatial pattern of trends in Fig. 1a, but are not directly comparable because LHP days are defined differently and contain nonlinearities between variables. Upon using the SHIP in lieu of the LHP, similar results are found with the instability and mid-level lapse rate contributions (Supplementary Fig. 2). The remaining SHIP variables contribute less or covary strongly with the instability and mid-level lapse rate.

The combined evidence suggests the risk of large hail has increased since 1979 due in part to environmental factors. In particular, there have been positive trends in annual LHP and SHIP days east of the Rocky Mountains, together with significant, positive correlations between annual large hail environment area and large hail report area; and significant, positive correlations between annual large hail environment area and large hail radar-derived area in the Midwest and Northeast. The increase in LHP days is primarily due to an increase in the number of days with steep lapse rates in mid-levels of the troposphere and, to a lesser extent, increases in the number of days with necessary combinations of instability and vertical wind shear for severe thunderstorms over the northeast U.S.

## Discussion

We compare our results with studies that have examined recent trends in severe weather environments and hail in the U.S. There is little indication that severe thunderstorm environments, often defined as the product of instability and vertical wind shear, have increased over the period examined (1979–2017),22,39,40 which might appear to disagree with the positive trend in environments favorable for large hail. A fairer comparison may be with extreme thunderstorm environments, defined by Sander et al.23 as the 99.99th percentile of the “thunderstorm severity potential”, which have increased in frequency over the last few decades and have been tied to an increase in economic losses. However, caution must be exercised in these comparisons, because large hail is correlated with variables other than just instability and vertical wind shear,41 such as those variables used in the LHP and SHIP. The positive trends in large hail environment, report, and radar-derived area are consistent with a positive trend in the number of hail outbreaks (greater hail area per event) over the U.S. from 2000 to 2011,28 and could be consistent with the shift in the hail distribution to larger sizes observed outside the U.S.5,6,29

Are the observed trends due to natural variability or part of a long-term signal associated with anthropogenic climate change? We cannot discount the possibility that the observed trends are part of natural, multidecadal variability, and 39 years of data is not enough to capture this possible variability. However, we also have reason to believe that there might be a connection with anthropogenic climate change. First, we hypothesize that the change of steep lapse rates in mid-levels of the troposphere is tied to changes in the seasonal evolution of the elevated mixed layer, which forms over source regions in Mexico and the western U.S., and is subsequently advected eastward.42 Stronger surface heating at higher elevations, tied to aridification and earlier snowmelt,43 could result in elevated mixed layers appearing earlier in the season and more frequently, increasing the probability of large hail events as severe thunderstorms form in these elevated-mixed-layer environments.44 Second, we hypothesize that there are more favorable background or synoptic-scale patterns for increased instability and vertical wind shear in the Northeast, the reasons for which warrant further investigation. The signal of increasing instability is also consistent with projections from (downscaled) climate model projections for this century.45,46,47,48,49,50 Whereas small hail might decrease with warming, the trends identified here are consistent with studies that have found that large hail, and the risk of hail damage, might increase in the future because of anthropogenic climate change.29,51,52

One must be mindful that the NARR, as will any reanalysis, have biases and errors that affect the LHP, SHIP, and the variables that comprise them. Previous studies have noted that the NARR has spurious grid-scale precipitation events34 and low-level moisture and instability errors,15,22,35 thus introducing biases and errors into the hail parameters. Nonetheless, the NARR adequately captures convective environments when compared to observed radiosonde data,35 and the NARR has smaller instability biases than other reanalyses.53 As reanalyses improve in representing the convective environment, trends in these hail parameters should be checked for consistency.

The LHP and SHIP are conditional parameters, reflecting only the possibility of large hail based on the environment in which severe thunderstorms might develop. An avenue of possible improvement would be to combine the hail parameters (or variables that make up these parameters) with other observations that reflect the existence of thunderstorms, such as radar, lightning, and satellite data.54 Doing so would decrease the high false alarm rate these purely environmental parameters have in detecting large hail, and may allow for a more refined assessment of trends in large hail and large hail environments.

## Methods

The LHP is computed, using three hourly NARR data, at each grid point from 1979 to 2017. The LHP is zero if Shear6 < 14 m s−1 or MUCAPE < 400 J kg−1. Otherwise, the LHP is defined as

$$\begin{array}{l}{\mathrm{LHP}} = \left( {\frac{{{\mathrm{MUCAPE}}\, - \,2000}}{{1000}} + \frac{{3200\, - \,{\mathrm{THK}}_{{\mathrm{HGZ}}}}}{{500}} + \frac{{{\mathrm{LR}}_{{\mathrm{7-5}}}\, - \,6.5}}{2}} \right)\\ \left( {\frac{{{\mathrm{Shear}}_{{\mathrm{EL}}}\, - \,25}}{5} + \frac{{{\mathrm{GRW}}\alpha _{{\mathrm{EL}}} + 5}}{{20}} + \frac{{{\mathrm{SRW}}\alpha _{{\mathrm{Mid}}}\, - \,80}}{{10}}} \right) + 5.\end{array}$$
(1)

Details about the step-by-step calculation of the LHP and definitions of the variables, and why they are relevant for large hail, may be found in Johnson and Sugden.30 The LHP was originally calibrated using Storm Data hail reports and Rapid Update Cycle soundings. The Sounding and Hodograph Analysis and Research Program in Python55 (SHARPpy) is used to extract the necessary variables from the NARR data to calculate the LHP. At each grid point, the daily maximum LHP is computed for each 1200–1200 UTC period. An LHP day occurs if the daily maximum LHP exceeds 5.8. The daily LHP-day area is defined to be the areal sum of grid boxes that satisfy an LHP day. Thereafter, both the LHP days, for each grid point, and LHP-day area, for each region in Fig. 3, is summed over each year.

SHIP days and SHIP-day area are computed using the NARR in a similar manner. The definition and instructions for calculating the SHIP may be found at https://www.spc.noaa.gov/exper/mesoanalysis/help/help_sigh.html. A SHIP day occurs if the daily maximum SHIP over a 1200–1200 UTC period exceeds 1.0.

Linear trends in annual LHP days, SHIP days, LHP-day area, and SHIP-day area are calculated using the Theil-Sen estimator.56 The Theil-Sen estimator is preferred over least-squares linear regression, because annual LHP days and SHIP days may not be normally distributed, especially in areas where large hail is uncommon. Statistical significance (i.e., testing for a nonzero trend) is determined at the 95% confidence level using Kendall’s tau test.

Hail reports are obtained from the National Oceanographic and Atmospheric Administration (NOAA) Storm Prediction Center Warning Coordination Meteorologist page (https://www.spc.noaa.gov/wcm/). In order to match the NARR period and hail size threshold, only LHRs occurring between 1979 and 2017, and hail report sizes ≥5 cm, are retained. For each NARR grid point, an LHR day occurs if an LHR occurs within 50 km of that grid point for each 1200–1200 UTC period, mimicking the LHP and SHIP day calculation. The 50-km neighborhoods reduces the sparse granularity of the hail reports. Annual LHR days and LHR-day area are both calculated in the same manner as those for the LHP and SHIP.

MESH data are generated from the GridRad dataset. GridRad consists of gridded 3D radar reflectivity at a horizontal resolution of 0.02°, a vertical resolution of 1 km, and a time resolution of 1 h from 1995 to 2016. The reflectivity is converted to the Severe Hail Index (SHI), following the algorithm from Witt et al.10. The algorithm requires estimates of the height of the 0 °C and –20 °C isotherms, which are estimated by interpolating the MERRA-2 reanalysis57 temperatures onto the GridRad grid and times. MESH is then computed from a revised empirical fit between the SHI and the 95th percentile of observed hail size, following Murillo and Homeyer.58 MESH days are then computed identically to LHR days, using a MESH threshold of 6.4 cm and a 50-km neighborhood. The MESH threshold was determined by maximizing the critical success index of a predictor metric and verifying metric. The predictor metric is whether or not the MESH exceeds a threshold within 50 km of a given grid point over a day. The verifying metric is whether or not there is a large hail report within 50 km of a given grid point over the same day. A threshold of 6.4 cm yielded the maximum critical success index (Supplementary Fig. 3). Note that this threshold is specific to this study. Annual MESH days and MESH-day area are both calculated in the same manner as those for the LHP and SHIP.

In order to compare the annual LHP-, SHIP-, LHR-, and MESH-day areas, it is convenient to normalize the variables. Each variable is normalized by subtracting its 1995–2016 mean and then dividing by its standard deviation over the same period. The Pearson (linear) correlation is computed between the normalized variables for their respective overlapping periods.

The contribution of each individual LHP variable to the trends in annual LHP days is assessed in a two-step process. First, at each grid point, an “eligible day” is defined to be a 1200–1200 UTC day that has one 3-hourly time step with MUCAPE ≥ 400 J kg−1 and Shear6 ≥ 14 m s−1, which is required in order for the LHP > 0. As with LHP days, we calculate the trend of annual eligible days using the Theil-Sen estimator. Second, for each LHP variable, we determine a subset of eligible days, which we call “component days”, above a spatially-varying threshold (except below a threshold for THKHGZ). The threshold is determined by calculating the 25th percentile of the variable for all LHP days at a grid point. This percentile is chosen to be consistent with the definition of an LHP day. The more frequently a variable exceeds its threshold (except below its threshold for THKHGZ), the more the variable contributes to increasing the number of LHP days. At each grid point, the annual component days are divided by the annual eligible days to calculate the relative frequency of component days. The relative frequency of component days is then renormalized by multiplying by the 1979–2017 mean number of annual eligible days. The trend for each renormalized component days is computed using the Theil-Sen estimator.

This two-step process controls for trends in annual eligible days, such that one can more fairly compare the components. For example, the annual component days of an LHP variable may appear to have a positive trend if the annual number of eligible days is increasing, simply due to more days having the prerequisite MUCAPE and Shear6, not because of any trend in the climatology of said LHP variable. After transforming to the renormalized component days, a significant, nonzero trend implies that changes in the component days cannot be explained by changes in the eligible days.