Due to their relatively low resolution, current general circulation models cannot simulate mesoscale convective organisation explicitly1. Yet, the recent observed positive trend in regional tropical rainfall was stated to predominantly stem from changes in the frequency of organised deep convection2. Similarly, in mid-latitudes, the majority of flood-producing rain was attributed to mesoscale convective systems (MCSs)3,4. In line with Moeng and LeMone5, we here take MCSs to be long-lived complexes of rain cells spanning ~100 km in diameter. Over continental regions, the clustering of convection in an MCS poses a risk to humans as severe storms can lead to intense downdraughts and flash flooding3,4,6,7. Pronounced long-lived clustering can be simulated in a process known as convective self-aggregation forming over the timescale of weeks8,9,10. There, the radiative–convective equilibrium (RCE) experimental set-up11,12 is usually employed, assuming spatially and temporally constant surface temperature (~300 K). In RCE self-aggregation, radiation feedbacks have emerged as the “smoking gun” for sustaining and increasing clustering8. Still, factors such as sea-surface temperature feedbacks13, domain size, geometry and resolution10, as well as periodically varying insolation14,15 or cold-pool effects16,17 were also stated to influence RCE self-aggregation.

Prescribing constant boundary conditions is an elegant model simplification, but it is not always realistic. Diurnal sea-surface temperature amplitudes can be as large as two18,19 to five20 Kelvin, especially under weak surface wind conditions and strong insolation. Diurnal surface temperature amplitudes of this magnitude change the properties of the atmosphere20, resulting in diurnally varying precipitation intensity and affecting large-scale phenomena, such as the Madden–Julian Oscillation (MJO) or the El Niño phenomenon. Satellite observations show a diurnal cycle in cloud height for the MJO21,22. Diurnally varying precipitation intensity also emerges from numerical simulations that account for differences in insolation23. Further, in contrast to RCE self-aggregation simulations, observed tropical cloud clusters typically are more transient, as they often persist for less than two days24.

Mechanistically, cold pools (CPs), that is, denser air formed by rain evaporation under convective clouds, were long implicated in the organisation of convection25,26, both by thermodynamic and mechanical effects27,28,29,30,31, and were suggested to lead to clustering32,33. In the following, when referring to MCS, we take this to encompass both the cluster of deep convective cells and the associated CPs, which are often long-lived. The continental diurnal cycle of deep convection is characterised by an afternoon or evening peak in rainfall31,34,35,36,37,38. Over the ocean, this daily period can further be modulated by bi–diurnal oscillations, a dynamics observed by Chen and Houze for local cloudiness in MCSs and referred to as “diurnal dancing”39.

Previous studies of convective self-organisation under diurnally varying surface temperature either did not address the emergence of MCSs or simulated a relatively small domain37. Current state-of-the-art regional climate models are now capable of simulating much larger areas (beyond 1000 km horizontally at kilometre resolution)40,41,42,43. Encompassing features, such as terrain variation and inhomogeneous external forcing, adds unique regional insight but makes mechanistic analysis of self-organised clustering more cumbersome.

Through an idealised set-up, we analyse the multi-day organisation in the convective rain field over land and sea. In our set-up, surface heating is spatially homogeneous but oscillates diurnally. Our focus is on the spontaneous emergence of clustering. We show that a transition from a homogeneous to a strongly clustered state occurs spontaneously when the amplitude of the diurnal surface temperature oscillation is sufficiently large. We refer to the emergent convective clusters as MCSs because of their typical scale of ~100 km44 and term the clustering process diurnal self-aggregation due to the lack of imposed spatial scales. This clustering oscillates bi-diurnally and continues to strengthen from day to day, features we back by a conceptual model. Finally, we discuss the relevance for the emergence of extreme continental rainfall and the implications for organised convection over the tropical ocean.


We carry out a suite of numerical experiments on square mesoscale model domains (L × L) of up to L = 960 km horizontal length and horizontal grid resolutions of 1 km and finer. Diurnally oscillating, spatially homogeneous surface temperature, Ts(t), is prescribed as

$${T}_{s}(t)=\overline{{T}_{s}}-{T}_{a}\cos (2\pi \ t/{t}_{0}),$$

where \(\overline{{T}_{s}}=298\ {\mathrm{K}}\), t0 ≡ 24 h is the duration of the simulated model day, and \(\overline{{T}_{s}}\) and Ta represent the temporal average and amplitude of Ts(t), respectively. The diurnal insolation cycle is chosen to be typical for the equator and peaks at noon (just like the surface temperature Ts). We provide sensitivity experiments exploring resolution, domain size, rain evaporation, surface conditions and insolation (see details in “Methods”). We contrast simulations with different diurnal surface temperature amplitudes, Ta, but equal average surface temperature, and refer to experiments with Ta = 2 K, 3.5 K and 5 K as A2, A3.5 and A5, respectively. Modifiers, such as A5b or A5sea (see Table 1), label sensitivity studies. Unless explicitly stated, our key results regarding clustering occurring under varying values of Ta, are qualitatively robust under the sensitivity experiments. Each numerical simulation is run for several days, allowing for a spin-up and quasi-steady-state period (Table 1; Supplementary Fig. 1).

Table 1 Summary of numerical experiments.

Domain-mean time series

Unsurprisingly, the differences in diurnal surface temperature amplitudes are reflected in larger diurnal amplitudes of atmospheric near-surface temperatures (Fig. 1a; Supplementary Fig. 1a, b). Changes in the time series of domain-mean rain rate are more profound (Fig. 1b): whereas A5 yields a relatively sharp mid-afternoon single-peak structure, the curve transitions to a broader and double-peaked structure for A2, where it approximates the diurnal cycle typical of oceanic convection45. For Ta = 0, only a weak nocturnal maximum remains (Supplementary Fig. 2f, compare: Janowiak et al.)46. Again, the differences in the temporal mean (horizontal lines) are minimal, reflecting radiation constraints on rainfall47. Real surface temperatures normally peak at a delay relative to local solar noon. Such delays accordingly shift the onset of organisation; hence, radiative effects are not key in this timing (Supplementary Fig. 2e). Curves that vary in a similar manner to the rain rate are found for rain area fraction, which, by contrast, differs from those of rain rate immediately before the central peak (Fig. 1c). These differences are made more transparent when inspecting rain rates conditional on a threshold (Fig. 1d). Both mean and heavy precipitation show a pronounced evening peak for A5 and an early-morning or nocturnal peak for A2. In summary, whereas time averages of rain rate are nearly identical for numerical experiments with varying diurnal surface temperature amplitudes, the time series differ markedly.

Fig. 1: Transition to a clustered rainfall state.
figure 1

ad Diurnal cycles of domain-averaged quantities. Each quantity was horizontally averaged for the lowest model level (z = 50 m). Time series represent a compound diurnal cycle, where equal times of day were averaged over all available model days. a Temperature for simulations with different imposed diurnal surface temperature amplitudes Ta, as labelled in the legend. Horizontal lines of corresponding colours represent the time average of each simulation. b Analogous to a, but for rain intensity. c Analogous to a, but for rain area fraction. d Mean, 90th and 99th percentiles of event rain intensity (conditional on I > I0 = 0.5 mm h−1) for A5a and A2a, as labelled (see Table 2 for experiment label details and “Methods” for the definition of rain events). e Surface rainfall average during day 1 (spin-up), day 4 and day 5 for A2a. f Similar to e, but for A5a. Boxes of side length 200 km and scale bars of length 200 km highlight the spatial and temporal variation (see Supplementary Fig. 3).

Quantifying clustering

Do the differences in domain-mean time series hint at differences in spatial organisation? Consider the horizontal patterns formed by surface rainfall, visualised by temporally averaging the rainfall pattern during each model day (Fig. 1e, f). During the spin-up from the initial condition (day 1), both A2 and A5 show modest and relatively homogeneous convective activity throughout the domain. During the subsequent model days, convection intensifies for both simulations because near-surface temperatures gradually increase. In A2, the spatial pattern of events remains rather homogeneous (Fig. 1e). In contrast, for A5, an inhomogeneous pattern self-organises, with several subregions receiving pronounced average rainfall, whereas others are all but dry (Fig. 1f, days 4 and 5). Furthermore, for A5, temporal alternations in surface rainfall rate are apparent when comparing one day to the next (compare: Fig. 1f, day 4 vs 5): a cluster on 1 day leaves an almost rain-free area the next day.

To quantify such spatiotemporal inhomogeneities, we determine all surface rain event tracks and compute their centre-of-mass positions. Tracks are thereby defined as spatially and temporally contiguous rainy grid boxes (see “Methods”). We then break the horizontal domain area down into square boxes of side length l, yielding n(l) ≡ (L/l)2 such boxes, and determine the number of tracks located in each of the boxes. The probability pl of a track occurring within one of the boxes at random would be pl = n(l)−1 and the binomial

$${P}_{l}(m)\equiv \frac{N!}{(N-m)!\ m!}{p}_{l}^{m}{\left(1-{p}_{l}\right)}^{N-m}$$

hence describes the probability of m of N randomly distributed tracks lying in one of these boxes during the model day. The variance of counts m, given a side length l, denoted as Varran(l; N), is48

$${\rm{V}}{{\rm{ar}}}_{ran}(l;N)=N\ {p}_{l}(1-{p}_{l}),$$

which we compare with the variance of the empirical data

$${\rm{V}}{{\rm{ar}}}_{emp}(l;N)=\mathop{\sum }\limits_{i=1}^{n(l)}{({m}_{i}-\langle m\rangle )}^{2},$$

where 〈m〉 ≡ N/n(l) is the average number of tracks per box, and the sum runs over all boxes i. The comparison of the two quantities in Eqs. (3) and (4) hence measures how much variation is obtained empirically when contrasted against completely uncorrelated data. This comparison can be expressed in a clustering coefficient, which we define as \({\mathcal{C}}(l)\equiv {\rm{V}}{{\rm{ar}}}_{emp}(l;N)/{\rm{V}}{{\rm{ar}}}_{ran}(l;N)\). \({\mathcal{C}}(l)\) is below unity when tracks are regularly spaced and above unity when tracks are clustered. The results show that, for small diurnal surface temperature amplitudes (experiment A2a, Fig. 2a) \({\mathcal{C}}(l)\,<\,1\) for all box sizes l; hence, the spacing of tracks is generally more regular than expected at random. For larger diurnal surface temperature amplitudes (A3.5 and A5, Fig. 2b, c) regular spacing with \({\mathcal{C}}(l)\,<\,1\) is found only for relatively small box sizes of l ≈ 20 km, whereas spacing at larger box sizes is strongly clustered, that is, \({\mathcal{C}}(l)\gg 1\). Moreover, this clustering increases from day to day (Fig. 2d).

Fig. 2: Quantifying the transition to clustering.
figure 2

a Clustering coefficient \({\mathcal{C}}(l)\) at different box sizes l for A2a. Curves range from day one (red) to seven (green). The pattern of rain events is regular, as \({\mathcal{C}}(l)\,<\,1\) throughout. Note the double-logarithmic axis scaling. b Analogous to a, but for A3.5. c Analogous to a, but for A5a. In b and c, at small scales (l ~10 km) or early times (t < 2 d), rain events are regularly distributed (\({\mathcal{C}}(l)\,<\,1\)), whereas at larger scales (l ~ 180 km) and later times (t ≥ 3d) events are clustered (\({\mathcal{C}}(l)\,>\,1\)). d Maximum of \({\mathcal{C}}(l)\) versus time for different simulations (see: legend and Table 1). Note the general increase for large Ta (A3.5, A5: upward-pointing triangles) but flat behaviour for small Ta (A2: downward-pointing triangles). e Scales of clustering, i.e., the position l at which the maxima in \({\mathcal{C}}(l)\) occur in A3.5 and A5. The grey shaded area marks the standard error of lmax, averaged over all times t ≥ 3d. f Autocorrelation c(τ) for τ = 1 and τ = 2 shows increasing c(τ) with scale l for both A2 and A5. Several points for A2 are not shown due to lack of statistical significance at the 1% confidence level. Each data point represents an average over all possible correlations for the experiment at hand: that is, for c(2), the pairs (1, 3), (2, 4), etc. were used.

We also define a box size lmax, at which \({\mathcal{C}}(l)\) is maximal. Despite some variation, lmax ≈ 180 km can be identified from A5 (Fig. 2c, e). In addition, we measure the autocorrelation c(τ) between day d and day d + τ by computing the Pearson correlation coefficient of daily mean precipitation rates from all grid boxes (Fig. 2f). We find rainfall to be anticorrelated from one day to the next for all box sizes (c(1) < 0), suggesting a local inhibitory effect of rain and positively correlated two days into the future (c(2) > 0). The magnitudes of c(1) and c(2) both increase with box size l, but appear to level off near l = 200 km (Fig. 2f).

Clustering as a result of cell-density difference

We now explore the mechanistic origin of the clustering. Differences in the areal number density, that is, the count of rain cells per unit area, between A2 and A5, are evident from the rainfall diurnal cycle (Fig. 1b, c). The peak in rain intensity and fractional area covered by rain cells is nearly twice as high for A5 compared with A2. Given that the area, intensity and duration of each individual rain cell are all similar for A2 and A5 (Supplementary Table 1), roughly twice the number of rain events will coincide during the peak of rainfall for A5 compared with the peak of A2. As mentioned, CPs are known to mediate interactions between convective rain cells32,49. We expect interactions to become more relevant at increased rain cell density because CPs will more often collide as they spread along the surface. To quantify density effects on the clustering dynamics in A2 versus A5, we first perform a simple CP tracking. The method identifies spatially and temporally contiguous patches of low buoyancy, measured by a 1 K temperature depression (for detail, see “Methods”). In A2, CP areas typically do not exceed 500 km2, have modest temperature depressions and lifetimes of generally less than 2 h (Fig. 3a–c). In A5, where CPs only occur during parts of the day, CP areas often exceed 103 km2, and such large CPs have much stronger temperature depressions and substantially longer lifetimes (Fig. 3d–f). The formation of very large and strongly negatively buoyant CPs suggests that CPs from distinct rain cells often bunch together before each CP has fully expanded. We employ the CP tracking to detect merging events, that is, instances where two previously separate CP areas combine. For A5b, CP merging is indeed ubiquitous, whereas it is all but absent in A2b (Supplementary Fig. 4c).

Fig. 3: Cold pool merging and deepening.
figure 3

a CP occurrence time, maximum area and average temperature depression (colours from red to blue, see legend) on days 1, 2 and 4 of the experiment A2b. The black curve indicates the total CP area at each time, whereas the thick coloured curve (compare legend for colours) highlights the time series of the largest CP during the particular day. b Areas covered by CPs during day 4 of A2b, the colours (see colour bar) indicate the duration during which CPs were present (compare: Supplementary Fig. 5). The scale bar is 100 km long. c CP area versus the corresponding maximum of areal mean temperature depression (day 4). Symbol sizes indicate CP lifetime and colours indicate occurrence time within the model day (see legends). Black and red rectangles along the axes indicate the median and 90th percentile of the corresponding quantity. Note the logarithmic vertical axis scale in a and c. df Analogous to ac, but for A5b.

In addition, we compute the CP height, measured by detecting temperature anomalies in the vertical dimension. In A5b, when CPs first appear, they show heights comparable to those of A2b (Supplementary Figs. 4a, b and 5). Subsequently, a double peak forms, where groups of merged CPs obtain much larger heights, reaching close to the level of free convection (≈1100 m). Larger CP heights h and deeper temperature depressions \(T^{\prime}\) are consistent with larger mean surface wind speed \({v}_{cp} \sim {(hT^{\prime} )}^{-1/2}\) under CPs50 (compare: panels in Supplementary Fig. 5) and surface fluxes (Supplementary Fig. 1e–h), resonating with markedly enhanced near-surface moisture and convective available potential energy (CAPE) along CP boundaries in A5 (compare: Supplementary Fig. 6e, j).

We aim to formulate a simplified model that incorporates the observed number-density differences. To this end, consider first a detailed comparison of CPs formed in A2 and A5: in A2, CPs are spatially isolated from one another, and the area covered by each CP remains small (Fig. 4a). In A5, many CPs occur so close to each other, that their temperature anomalies inevitably merge (Fig. 4a; Supplementary Fig. 4c), forming a large patch of dense air. This combined CP, which we associate with the emergent MCS, reaches a greater height and develops a relatively cold and dry region in its interiour where convection is suppressed. This inhibitory effect in the MCS interiour can be ascribed to the divergence of the level of free convection (LFC) (Fig. 4b) and a depletion of CAPE (Supplementary Fig. 6e, j).

Fig. 4: A simplified model for convective clustering.
figure 4

a Examples of typical CPs on day 4 of A2b (15:35) and A5b (14:00) showing virtual potential temperature anomaly at z = 50 m (compare: boxes in Fig. 3b, e for context). Thin yellow lines show accumulated surface rainfall (1-, 5- and 10-mm contours) within the subsequent hour. A scale bar and an area Acrit are marked (compare: panel c). CP areas (bold black contours of −1 K anomalies) exceeding Acrit are rare in A2b, but frequent in A5b. The combined CPs in A5b can excite subsequent rainfall and feed the emergent MCS. b Respective xz cross-sections along the white horizontal lines in a including the lifting condensation level (LCL) and level of free convection (LFC), and corresponding domain means (dashed). Marked regions correspond to lowered triggering probability (compare: panel c). c Schematic for the simplified model dynamics. (i) Low-density subdomain with vacant (white) and active sites (blue). (ii) Similar to (i) but for high density, showing also boundary sites (orange) with higher probability. d C(l) for ta = 0.5 K, analogous to Fig. 2a–c but now for the simplified model (days as marked). Note the double-logarithmic axis scaling. e, Analogous to d, but for smaller ta. Inset: diurnal cycle for rain area for the simplified model for two values of ta (compare: Fig. 1b–d). f Simple checkerboard model, explaining the increase in variance over time for large ta (red symbols) and small ta (blue symbols). The grids below show the patterns at days 1–4, 10 for large and small ta, respectively. Box colours white → faint red → red → dark red indicate increasing density.

In contrast, as the combined CP spreads outward from the dense region of rain cells, new cells are often triggered at its front—further feeding the MCS. At this combined CP’s gust front, hence at the periphery of the emergent MCS, the greater CP height allows boundary-layer environmental air to be forced higher up and set off new rain cells. In addition, the CP’s surroundings benefit from the moisture transported by the combined CP’s front and the additional latent heat flux and increased CAPE along the combined CP’s edges (compare Supplementary Fig. 6e, j). Indeed, many subsequent rain cells do form near the perimeter of the combined CP, whereas this is rare for A2 (thin black contours in Fig. 4a). To quantify this moisture redistribution from within the MCS to its surroundings, we contrast domain subregions of A5, which receive intense versus weak precipitation during a given model day (Supplementary Fig. 6f–h). Regions of intense rainfall are characterised by enhanced moisture near the cloud base (z ~1 km) before precipitation onset, but marked depletion after rain has occurred. Conversely, areas of weak rainfall show nearly a “mirror image”, with depressed moisture before but enhanced values after precipitation. The bi–diurnal dynamics for A5 can hence be characterised as an alternation of cloud-base moisture, driven by the lateral expansion of MCSs, which leaves behind an inhibitory dry patch in their interior that is marked by low CAPE (Supplementary Fig. 6i, j). On the subsequent day, this drying in locations of previous MCSs leads to relatively enhanced near-surface and cloud-base moisture in those regions not affected by these previous MCSs—in turn providing favourable conditions for convective activity there (for detail, see Supplementary Information Text and Supplementary Fig. 6). We further find a positive correlation between cloud-base moisture and precipitation, which could help promote the transition from shallow to deep convection29,51 and potentially reinforce the updraughts. In A2, day-to-day moisture alternations are lacking (see Supplementary Fig. 6g, h for A5b to Supplementary Fig. 6b, c for A2b), a finding that falls in line with the absence of organised convection. As with rainfall, the spatial pattern of cloud-base moisture again reflects a coarsening of scales for A5 relative to A2 (Supplementary Fig. 6d, i).

In total, the analysis suggests that an MCS emerges, when several CPs combine, become deeper and excite new rain cells along their common gust front. The emergence of an MCS on one day causes a suppressed region the subsequent day. As a further check, we weaken CPs by decreasing the ventilation coefficients (av and bv in Eq. (24) of ref. 52) to a fraction of their default values. These coefficients control rain evaporation, hence temperature depression \(T^{\prime}\) and CP propagation (\(\sim {T}^{^{\prime} 1/2}\)) (for details, see “Methods”). Indeed, decreasing these coefficients systematically reduces the spatial extent of the rain-free regions (Supplementary Fig. 3).

Clustering from a two-level atmosphere model

A key characteristic of A5 is that parts of the day see no rainfall at all. In contrast, immediately after the onset of rain (near mid-day, Fig. 1b), the area covered by precipitation is relatively large—corresponding to a high number density of rain events and CPs. Our simplified model describes the system domain as a square lattice of sites, where each site can reside in two different states. For simplicity, we consider each pair of a rain cell and its CP as one entity. We let this entity occupy one lattice site, termed an “active site”, and say that it fills the area of a rain cell a0 ≈25 km2 (Supplementary Table 1). Lattice sites not active are considered “vacant”. In practice, our model mimics the qualitative clustering dynamics by allowing MCSs to emerge, whenever a sufficient number of active sites occur close to each other.

Assume that, at a given time, the fraction of active sites is p0, and sites are independently populated. That is, each site of a square lattice contains a rain event at probability p0. To exemplify, in Fig. 4c, i, p0 is relatively small, and few sites are active. In Fig. 4c, ii, p0 is larger, more sites are therefore active and many of them are now neighbours. To incorporate the effect of an MCS, we simply let the remaining vacant sites be more likely to become active when they are immediate neighbours to a sufficiently large area A > Acrit of spatially contiguous active sites (compare: Fig. 4a). This is accomplished by assigning increased probabilities to the neighbourhood sites (Fig. 4c, ii, orange sites). When p0 is small (p0 1), the system will, however, be unlikely to contain contiguous rain areas exceeding Acrit (compare: shaded box in Fig. 4a). To be explicit: the probability of finding two active sites on two neighbouring sites is proportional to \({p}_{0}^{2}\), and this probability will decay exponentially with the number of sites contained in the contiguous area53.

But how does p0 emerge from the diurnal cycle dynamics, and how can bi–diurnal temporal correlations be captured (Fig. 2)? To self-consistently incorporate these features, we describe the population of rain cells within a two-layer atmosphere model consisting of a prescribed boundary- layer temperature Tbl, varying sinusoidally with a small diurnal surface temperature amplitude ta (see Fig. 1a), and an interactive free-tropospheric temperature Tft. We compute the probability for a rain cell to become active by coupling it to the temperature difference between the boundary layer and the free troposphere ΔT = Tbl − Tft, as an approximation for atmospheric stability. When Tbl increases during the model day, rain events will eventually be set off. Two key processes impact on the free-tropospheric temperature: thermal radiation to space reduces Tft, whereas latent heat transfer by rain events increases it. The boundary-layer temperature (Tbl) is not directly affected by rain cells. However, buoyancy depressions, arising mainly from sustained drying after rain events, are implemented by an inhibitory potential. In practice, once an active site transitions back to vacant, it needs to “wait” until it can become active again (for details, see “Methods”).

We implement reasonable coefficients for these processes and find the simulations to reach a repetitive diurnal cycle (Fig. 4e, inset). Indeed, for small ta, modest rainfall is present during the entire day, whereas for larger ta, rainfall is either intense or absent. Time-averaged rain areas for large and small ta match (compare: Fig. 1b), a result of the radiative constraint and in agreement with the numerical experiments. The earlier phase in Fig. 4e, inset versus Fig. 1b, can be attributed to the fact that we directly prescribed atmospheric boundary-layer temperature Tbl, rather than surface temperature itself, causing an immediate precipitation response.

Considering the variance of the spatial pattern for large ta, the simplified model indeed produces increased clustering over time and larger spatial scales (Fig. 4d, colours from red to green, and Supplementary Fig. 7). Conversely, clustering is absent for small ta (Fig. 4e). This can be explained intuitively: when the thermal forcing caused by the boundary-layer temperature Tbl increases rapidly during the day, many rain events will be set off during a short time—leading to large p0 during those times. The negative feedback on the free-tropospheric temperature Tft will then rapidly cause the “budget” of rainfall to be used up. MCS will form as long as the increased probability at the edges of the contiguous CP patches counteracts the ongoing increase of Tft. Hence, MCSs will be able to spread, as long as this is the case, thus setting the time (≈6 h) and space (≈100 km) scale for MCS, which is significantly larger than the scale of a single rain event (≈1 h and ≈5 km).

Conceptualising further

The two scales of individual rain cells (~5 km) and MCS (~100 km) allow for a further simplified conceptual view, which builds on domain total moisture being conserved from day-to-day. In this view, the occurrence of rain cells is directly proportional to the available moisture and rain cells act to redistribute moisture, once they form an MCS. Take the model domain to be broken down into a square lattice consisting of blocks. Each “block” consists of 20 × 20 sites for single rain cells of area a0, together yielding a block size of 100 km × 100 km, which provides space for an entire MCS. In each block, an MCS is only set off if a sufficient number of rain cells are present. We find that a simple set of rules can model the MCS dynamics: (1) first, initialise each block in this square lattice, by assigning an integer, encoding the number of rain cells, drawn from a binomial distribution; (2) update the system once per day, by letting all blocks above a particular threshold (an MCS forms) hand over their content (the moisture transported by the MCS) at equal parts to their four neighbouring blocks; after this distribution step, all sites collect the spilt content, hence conserving total moisture.

When the mean of the binomial distribution is low compared with the threshold, which is the case for small ta, no MCS will form and no redistribution will take place (Fig. 4f, blue points, and lower row of squares). In contrast, when ta is sufficiently large, a long sequence of reallocations will occur, leading to a checkerboard-like clustering, which strengthens in time (red curve and upper row of squares). This example is sufficient to capture the increase of normalised variance for A5 and the lack of it for A2 (Fig. 4f).


Our study analyses the spontaneous emergence of convective clustering, when the amplitude of the diurnal cycle surface temperature is varied. This “diurnal self-aggregation” occurs without imposing spatial scales by external forcing, as with radiative–convective equilibrium (RCE) self-aggregation8,9,10. Clearly, the two cases differ regarding the surface boundary conditions. However, also the spatial and temporal scales of the clustering, and in particular the physical processes causing it, differ. In RCE self-aggregation, clusters are temporally coherent over multiple days, whereas in diurnal self-aggregation, the spatial patterns of convection are almost opposite from one day to the next. The findings here suggest that, when the diurnal surface temperature variation is sufficiently large, the clustering, and thereby the potential for flash floods, increases from day-to-day. Increasing model horizontal resolution to 0.5 or 0.2 km (A5c and A5d in Supplementary Fig. 3) leads to even stronger clustering effects, in agreement with recent findings for continental surface conditions42. It should be tested if higher resolution leads to stronger cold-pool interaction, hence more activation in areas with dense rain cell occurrence. The current findings suggest that observational studies on extremes54,55,56 should additionally focus on diurnal surface temperature amplitude, not just the mean.

In RCE self-aggregation, the standard explanation for clustering invokes arguments based on differential cooling and circulation changes induced by this cooling. Remarkably, the memory enabling clustering in diurnal self-aggregation is purely thermodynamically driven: we do not detect any sustained changes of circulation that harbour a memory from one day to the next (Supplementary Fig. 1i, j). The findings hence suggest two possible “shortcuts” to observed mesoscale clustering over the ocean: (i) at times, sea-surface temperature oscillations may be sufficient to kick-start clustering; (ii) clustering may emerge over land surfaces, where a strong diurnal cycle prevails, and may then be advected over the sea, where it could grow further under more RCE-like conditions. Practically, it could be checked through simulations, whether a state of persistent, spatially and temporally coherent self-aggregation is reached more quickly when sea-surface temperatures transiently oscillate—thereby exciting initial clustering on the scale of lmax. Both “shortcuts” require that clustering can persist or grow once formed—a requirement consistent with hysteresis effects57. In our conceptual model, hysteresis is, in fact, easy to achieve. Once an imbalance between neighbouring grid cells is established, sustained re-distributions will be possible, even when the overall event number density is then lowered. It should further be explored, if the day-to-day variation we find can be amplified by an interactive surface, such as by “cloud shading”8,13.

The specific feature of the MCS, formed by higher cell and thus cold-pool density at larger diurnal surface temperature amplitude, is to trigger new convective cells at its periphery. These new cells form cold pools, feeding the emergent MCS and further forcing updraughts near its boundary. Notably, new convective cells are also formed within the interior of the MCS (Fig. 4a), a finding in line with collision effects of multiple cold-pool gust fronts17,32. These interior cells could further act to deepen and cool the combined cold pool driving the MCS expansion. Together, MCSs hence act to excite new convection both within and around the combined cold-pool area. In conventional self-aggregation studies, strong cold pools were found to inhibit clustering16. However, under the diurnal cycle conditions studied here, they combine to promote the formation of MCSs due to the high numbers of cold pools occurring in spatially confined subregions within a relatively short time.

Altogether, the present work interpolates between the established RCE set-up for oceanic convection and typical boundary conditions for continental convection. Over the ocean, the origin of initial clustering could be found in modest sea-surface temperature fluctuations—in line with those observed18,19,20. Over continents, fluctuating surface temperatures are well-known to imprint pronounced diurnal rainfall variations31,34,35,36,37,38, and observed MCS rainfall was reported to increase2,58, elevating the risk of flash floods. In summary, the present results suggest that explanations for tropical and summertime mid-latitude flash floods might also be found in temperature variations.


Large-eddy model, boundary and initial conditions

We simulate the convective atmosphere using the University of California, Los Angeles (UCLA) Large Eddy Simulator (LES) with sub-grid-scale turbulence parameterised after Smagorinsky59. This is combined with a delta four-stream radiation scheme60 and a two-moment cloud microphysics scheme61. Radiation interacts with the atmosphere including clouds, but it does not impact the surface temperature and hence surface fluxes. The diurnal insolation cycle is chosen typical for the equator (Supplementary Fig. 2d). Rain evaporation is implemented after Seifert and Beheng52. Diurnally oscillating, spatially homogeneous, surface temperature, Ts(t), is prescribed, as

$${T}_{s}(t)=\overline{{T}_{s}}-{T}_{a}\cos (2\pi \ t/{t}_{0}),$$

where \(\overline{{T}_{s}}=298\ {\mathrm{K}}\), t0 = 24 h is the duration of the simulated model day, \(\overline{{T}_{s}}\) represents the temporal average and Ta the amplitude of Ts(t). Surface heat fluxes are computed interactively and depend on the vertical temperature and humidity gradients as well as horizontal wind speed. Horizontal wind speed is approximated using the Monin–Obukhov similarity theory, which applies scaling arguments to assume logarithmically increasing horizontal wind speed with height62. Temperature and humidity were initialised using observed profiles that potentially represent convective conditions31. However, due to the repeated diurnal cycle forcing, the system eventually establishes a self-consistent vertical temperature and moisture profile.

Model grid, dynamics and output

The model integrates the anelastic equations of motion on a regular horizontal domain with varying horizontal grid spacing dx and periodic boundary conditions (Table 1). The vertical model resolution is 100 m below 1 km, stretches to 200 m near 6 km and reaches 400 m in the upper layers, with the model top located at 16.5 km. A sponge layer is employed between 12.3 km and the model top. Horizontal resolution dx and domain size vary (Table 1). The Coriolis force and the mean wind were set to zero with weak random initial perturbations added as noise to break complete spatial symmetry. For all two- and three-dimensional model variables, the output timestep varies between experiments between Δtout = 5 min and 15 min. At each output timestep, instantaneous surface precipitation intensity, as well as the three-dimensional moisture and velocity fields, is recorded for the entire model domain. In addition, at 30-s and 5-min intervals, respectively, spatially as well as horizontally averaged time series were extracted from the numerical experiments.

Sensitivity experiments

The principal focus of this study is the response to different values of diurnal surface temperature amplitude, Ta. The main experiments (A5a, A2a, A5b and A2b) were carried out at Ts = 298 K, contrasted Ta = 2 K and Ta = 5 K (Table 1) and used dx = 1 km horizontal model resolution. One intermediate value of Ta = 3.5 K was tested to constrain the transition to clustering further. The control case of Ta = 0 K was also tested as a benchmark. These simulations assumed that surface latent heat fluxes occurred at 70% of their potential value. This mimics a land surface, which is reasonable given the mean surface latent and sensible heat fluxes of LHF ≈ 57 W/m2 and SHF ≈18 W/m2, respectively, yielding a Bowen ratio of B ≈ 0.30, realistic for forested land. These main experiments already explore domain-size effects, as the clustering observed could be influenced by the finite system size. We find that the results vary little between these domain sizes. We conducted additional experiments to test the sensitivity to a range of modifications: horizontal model resolution (A5c, A5d, Supplementary Fig. 3), where dx = 0.5 km and dx = 0.2 km were used; cold-pool strength (A5vent05, A5vent001, Supplementary Fig. 3), where the ventilation coefficients av and bv in Eq. (24) of Seifert and Beheng52 were respectively reduced to 0.5 and 0.01 of their default values; surface conditions, where the surface evaporation was that of a sea surface (Supplementary Fig. 2a–c, as well as A5sea and A2sea in Supplementary Fig. 3, here B ≈ 0.15); the phase of the surface temperature diurnal cycle, described by a term τ, when generalising Eq. (5) to

$${T}_{s}(t)=\overline{{T}_{s}}-{T}_{a}\cos (2\pi \ (t-\tau )/{t}_{0})$$

relative to that of the diurnal insolation cycle. τ = 3.6 h and τ = 6 h (A5ph3.6h and A5ph6h, respectively) were applied, accounting for a possible lag of the surface temperature peak towards the mid or late afternoon, as well as a constant solar radiative forcing (A0constrad and A5constrad). The dominant effect of nonzero τ is to shift the diurnal precipitation cycle accordingly (Supplementary Fig. 2e), whereas clustering still occurs (Supplementary Fig. 3). In contrast, setting insolation to the temporal average (dashed line in Supplementary Fig. 2d) has the effect of shifting the diurnal cycle of precipitation to earlier times (Supplementary Fig. 2e, dotted vs solid red line). However, also for constant insolation, clustering is still observed for A5 (Supplementary Fig. 3) (for further details, see Table 1 and Supplementary Table 1).

Rain cell and cold-pool tracking

In all experiments, we track rain cells using the Iterative Rain Cell Tracking Method (IRT63). In the two-dimensional surface precipitation field corresponding to any output timestep, the IRT first detects all spatially contiguous patches (using connectivity 4) of rain intensities exceeding a threshold I0 = 0.5 mm h−1, termed rain objects. Tracks are then identified, by determining any rain objects that overlap from one output timestep to the next. In the case of merging of tracks, the track of a larger area is continued under the same track ID (the threshold value θ was set to unity). In our variance analysis, we use the set of coordinates formed by the initial positions of each track, which is defined here as the precipitation-weighted centre of mass of the first rain object belonging to each track. The locations of all subsequent rain objects of the same track are discarded in our variance analysis, to avoid artefacts from double counting of the positions of the nearly collocated objects.

To account for possible changes in the CP extent between the different simulations, CPs were tracked by a simple temperature-depression method. In this method, the IRT was modified, such that at any timestep, any grid box within the lowest model level with a temperature depression, measured relative to the spatial mean at the same time, exceeding a threshold of one Kelvin, was recorded. As for rainfall, spatially contiguous patches of temperature depression were collected into objects and assigned a unique track identifier. To increase the signal, only objects exceeding an area threshold of 10 km2 were considered. Going forward in time, overlapping objects were identified and considered to be the same CP track. Again, the track of the larger CP area is continued under the same track identifier63.

For CP height, a criterion based on a fixed temperature deviation was found inappropriate, as temperature differences are smaller aloft. Therefore, we measure CP height by evaluating the average temperature of the domain and the standard deviation at each output timestep and vertical model level k and comparing the local temperature T0 at each grid cell (i, j, k) with this value. If T0 is two standard deviations below the domain mean at the same height level k, this grid box is considered part of a CP. Within each column (i, j), the CP height is determined as the highest level that fulfils this criterion.

A simplified mesoscale convective system (MCS) model

A square lattice of s × s sites is initialised, where the area of each site is taken as the average area a0 occupied by a single convective rain cell, a0 ≈ 5 km × 5 km (compare: Supplementary Table 1). The total domain area A hence is A ≡ a0s2. Each site can be in one of two convective states, namely active or inactive. Our model assumes boundary-layer processes to be local. In contrast, the free troposphere acts as a reservoir of large heat capacity, where heat is quickly redistributed in the entire free troposphere by gravity waves64; thus, spatial homogeneity is assumed there. The model incorporates three fundamental processes affecting event formation: (1) spontaneous activation due to the moderate drive of the diurnal cycle temperature forcing, (2) refractory dynamics due to CPs and (3) strong activation due to combined CPs ahead of MCS fronts. Below, we discuss the estimation of model parameters for these processes (summarised in Table 2).

Table 2 Parameters in the simplified MCS model.

Spontaneous activation. To define a vertical temperature gradient, we consider quantities Tbl and Tft, which represent the deviation of the boundary layer and free-tropospheric temperature from their assumed steady-state values. Tbl(t) is prescribed and oscillates harmonically as

$${T}_{bl}(t)=-{t}_{a}\cos (2\pi \ t/{t}_{0}),$$

where t0 = 24 h. The free-troposphere temperature anomaly Tft is initialised to zero. We also define the temperature difference ΔT ≡ Tbl − Tft, and its normalised version \(\Delta T^{\prime} \equiv \Delta T/\Delta {T}_{0}\), where the reference scale ΔT0 ≈ 0.3 K is taken as two standard deviations of LES-simulated near-surface temperature. \(\Delta T^{\prime}\) serves as a proxy for both convective available potential energy (CAPE) and convective inhibition (CIN). The basic dynamics proceeds in discrete model timesteps of 0.5 h, reflecting the typical timescale of convective cloud formation. At each timestep, the default probability p0 for a vacant site to become active is taken as

$${p}_{0}=\left\{\begin{array}{ll}0&\,\text{if}\,\ \Delta T^{\prime}\, <\,0\\ \Delta T^{\prime} &\,\text{if}\,\ 0\,<\,\Delta T^{\prime}\, <\,1\\ 1&\,\text{if}\,\Delta T^{\prime}\, >\,1.\end{array}\right.$$

Equation (8) makes the qualitative assumption that there is no activity at all for stratified conditions (\(\Delta T^{\prime}\, <\,0\)), and events occur with certainty when the temperature gradient is much larger than typical fluctuations. Both of these limits could be softened, as could the assumption of linearity at intermediate \(\Delta T^{\prime}\). One could equally argue for a smooth function of \(\Delta T^{\prime}\), such as an error function. Nonetheless, Eq. (8) captures the observed fact that more initial activity occurs when surface temperature changes quickly. Without any further perturbations, the probability pij to initiate convection in a site (i, j) will be equal for all sites (pij = p0).

Spatial structure. Notably, Eq. (8) does not depend on the position within the lattice. Local modifications cause spatial structure (compare: Fig. 4a). Cold pools have two effects: reduction of local temperature and reduction of local humidity. However, the recovery timescale for the former is fast (τcp ≈ 3 h), whereas that of the latter can be slower (τinh ≈ 24 h, compare: Supplementary Fig. 6)27. To consider the former, we take active sites to persist for τcp, during which no further rain cell initiation is possible at the same site. Simultaneously, the local probability pij is reduced as \({p}_{ij}={p}_{0}-{p}_{{\mathrm{inh}}}\exp (-\delta t/{\tau }_{{\mathrm{inh}}})\), where δt is the time after the occurrence of the rain event and pinh > 0. pinh controls the fidelity of the anticorrelation from day-to-day, and the results are qualitatively not affected by it.

An MCS is defined to occur when a 4-connected contiguous patch of active sites exceeds the threshold area Acrit ≡ n0a0, with n0 = 20 (Fig. 4). While this is the case, and \(\Delta T^{\prime}\, >\,0\), the probability p at each of the surrounding sites (i, j) becomes pijpij + pact, which acts to lower the barrier presented by CIN. The increased CIN above CPs is reflected in the strongly elevated LFC above the CP or MCS (Fig. 4a, b). When pij is no longer a surrounding site, the term pact is no longer applied for this site. The initiation probability pij is then defined analogously to the one in Eq. (8).

Radiative constraint. Generally, Tbl rises during the day as the surface is heated by insolation. pij will then eventually become positive at some sites, and rain cells can be produced there. When this occurs, latent heat is transferred to the free troposphere, increasing Tft, thus subsequently lowering ΔT and thereby pij. Simultaneously, the site (i, j) is shut down by the adverse buoyancy effects through CP formation. Tft will relax by heat loss (Pout ≈ 200 W m−2) through outgoing thermal radiation. According to the Stefan–Boltzmann law, \({P}_{out}=\sigma {T}_{eff}^{4}\), with σ ≈ 5.7 × 10−8 W m−2 K−4, the Stefan–Boltzmann constant—translating to an effective emission temperature Teff ≈ 250 K. Through Pout, the free troposphere would hence cool by ~2 K d−1. As a crude estimate of the change in Tft through latent heat transfer, we estimate the free-tropospheric heat capacity Cft = Mftcpd, assuming dry air. cpd ≈ 1 kJ kg−1 and the mass of the free troposphere

$${M}_{ft}=\int_{z = {z}_{{\mathrm{LCL}}}}^{{z}_{{\mathrm{top}}}}dz\ \rho (z)\approx 8\times 1{0}^{3}\ {\mathrm{kg}},$$

using zLCL ≈ 1 km and ztop ≈ 16 km for the lifting condensation level (LCL) and top of the troposphere, respectively. Hence, Cft ≈ 8 × 106 Jm−2 K−1. From rain cell tracking of our LES simulations, we obtain the average rain cell lifetime to be ≈1 h (Supplementary Table 1), the mean cell area a0 ≈ 20 km2 and the average cell rain rate of ≈4 mm h−1 (Supplementary Table 1); hence, each rain cell can be assumed to yield Mevent = 4 kg m−2 of liquid water. Assuming that the corresponding latent heat was previously deposited in the free troposphere upon ascent, each rain event heats the free troposphere by Qevent = Lv Mevent. We take this heat to increase Teff accordingly by \(\delta \ {T}_{{\mathrm{eff}}}={Q}_{{\mathrm{event}}}{C}_{ft}^{-1}{{\mathrm{s}}}^{-2}\), where s−2 results from the ratio a0/A, yielding δTeffs2 ≈ 1 K: if one rain event occurred instantaneously at each site, the free-tropospheric temperature would rise by 1 K.