Spectra, intermittency, and extremes of weather, macroweather and climate

It was recently found that the accepted picture of atmospheric variability was in error by a large factor. Rather than being dominated by a series of narrow scale-range quasi-oscillatory processes with an unimportant white noise “background”, it turned out that the variance was instead dominated by a few wide range scaling processes albeit occasionally interspersed with superposed quasi-oscillations. Although the classical model implied that successive million year global temperature averages would differ by mere micro Kelvins, the implausibility had not been noticed. In contrast, the new picture inverts the roles of background and foreground and involves four (possibly five) wide range scaling processes. As with any new paradigm, there are consequences; in this paper we focus on the implications for the spectra, intermittency and the extremes. Intermittency is an expression of the spatio-temporal sparseness of strong events whereas the extremes refer to the tails of their probability distributions and both affect the spectra. Although we give some results for the macro and mega climate regimes, we focus on weather, macroweather and climate: from dissipation to Milankovitch scales.

in thick lines (from left to right, the solid curves are data set #1, bottom, #2, top #3), #4, #5, #7, #8). The dashed lines are reproduced from ref. 2 left to right, datasets #2, #6, #7, #8. Here as in b, the fluctuations were multiplied by the canonical calibration constant of 2 so that when the slopes are positive, the fluctuations are close to difference fluctuations. The various scaling regimes are indicated at the bottom. (b) (bottom panel): Same as (a) but for spatial transects. The bottom aircraft (data set #9) and top (daily resolution, data set #10) data are in the weather regime ("W"), the line labelled "M" is in the macroweather regime (data set #11), and the one labelled ("C"), the climate regime (data set #12, see Table 1 for details). At scales >≈2500 km, the aircraft fluctuations from a single flight, have a dip caused by poor statistics. The breaks in the other (weather, macroweather and climate) transects only occur at scales of about 10,000 km (upper right).
Scientific REPORTS | (2018) 8:12697 | DOI: 10.1038/s41598-018-30829-4 where ϕ is a dimensional quantity driving the process and the equality is understood in a statistical sense, the subscript indicates the resolution in time or in space. H is the fluctuation exponent; it is used in honour of Edwin Hurst who introduced it in hydrology 5 , but it is only the same as Hurst's exponent for Gaussian processes; the fluctuation exponent is of more general utility. For example, in turbulence in space, if we replace T by a velocity component, and ϕ by the energy flux to the one third power, then H = 1/3 and the equation represents the Kolmogorov law. In general H (and the other scaling exponents) are different in space and in time and different in the horizontal and vertical directions. This is the problem of scaling stratification, scaling anisotropy, "Generalized Scale Invariance" 6 it is out of our present scope; hence to lighten the notation we drop the t, x indices. In the basic picture, ϕ is a Fourier space flux, in this example, the energy per time passing from small to large wavenumbers (large to small scales). The normalized spikes Δ Δ T T / can thus be interpreted as an estimate of the nondimensional, normalized driving flux: The interpretation in terms of fluxes comes from turbulence theory and is routinely used to quantify turbulence (the physically significant flux may be some power of ϕ), but it turns out that for scaling processes, eq. 1 is of general validity. In the weather regime, in respectively time and space, the squares and cubes of the wind spikes are estimates of the turbulent energy fluxes. The spikiness is because most of the dynamically important events are sparse, hierarchically clustered (fractal), occurring mostly in storms and the centre of storms. The scaling in eq. 1 implies that the statistics of the fluctuations fall off in a (scale free) power law way. In comparison, classical autoregressive or moving average processes (time) or Kriging (space) involve much more rapid (exponential) decays of correlations. In contrast, scaling implies "long range" statistical dependencies, in series (time) they have "long range memories" (in Gaussian models, more restrictive definitions of long range memory are occasionally used that depend on the value of H).
The spikes could be considered as manifestations of "sudden transitions from quiescence to chaos"; a rather general definition of intermittency. Quantitative definitions applicable to fields (space) are of the "on-off " type, the idea being that when the temperature, wind or other field exceeds a threshold then it is "on" i.e. in a special state -perhaps of strong/violent activity. At a specific measurement resolution, the on-off intermittency can be defined as the fraction of space that the field is "on" (where it exceeds the threshold). In a scaling system, for any threshold the "on" region will be a fractal set and both the fraction and threshold will be characterized by exponents (c and by γ, eq. 4) that describe the intermittency over all scales and all intensities (thresholds). The spikes of each amplitude thus each have their own sparseness, and intermittency. In scaling time series, the same intermittency definition applies; note however that other definitions are sometimes used in series in deterministic chaos.
As long as H < 1 (true for nearly all geoprocesses), the differencing that yields the spikes acts as a high pass filter, the spikes are dominated by the high frequencies.
Smoothed Gaussian white noises such as the scaling fractional Gaussian noise (fGn) and fractional Brownian motion (fBm) processes, or nonscaling processes such as autoregressive and moving average processes and the hybrid fractional versions of these, will have spikes that look like the macroweather spikes: in Fig. 2a, they will be roughly bounded by the solid horizontal lines. Mathematically, the scaling fGn, fBm are the results of power law filters (fractional integrals) of Gaussian white noises whereas more generally scaling processes are power law filters of densities of multifractal measures (the Fractionally Integrated Flux model 7 ).
To go beyond Gaussians, each spike is considered to be a singularity of order γ: λ is the scale ratio: λ = (the length of the series)/(the resolution of the series) = the number of pixels; in Fig. 2a,b, λ = 360, 1000 respectively. The extreme spikes ( Δ Δ T T / ) in any 1000 point long series (e.g. Fig. 2b) have a probability ≈1/1000. For Gaussian processes, the spikes with this probability are Δ Δ T T / = 4.12, this is shown by the solid lines in Fig. 2b, the line therefore corresponds to γ = Δ Δ λ ≈ . ≈ . T T log( / )/log log 4 12/ log 1000 0 20. For comparison, Table 2 gives the both the observed maximum γ for each series in Fig. 2 as well as the generally comparable theoretically expected maxima for the multifractal processes with the parameters estimated for the series in question.
In the general scaling case, the set of spikes that exceed a given threshold form a fractal set whose sparseness is quantified by the fractal codimension c(γ) = D − d(γ) where D is dimension of the space (D = 1 for series and transects) and d(γ) is the corresponding fractal dimension. The codimension is fundamental since it is the exponent of the probability distribution: is the probability that a randomly chosen spike Δ Δ T T / exceeds a fixed threshold s, P(s) is a nonscaling prefactor. c(γ) characterizes sparseness because it quantifies how the probabilities of spikes of different amplitudes change with resolution λ (for example when they are smoothed out by averaging). The larger c(γ), the sparser the set of spikes that exceed the threshold s = λ γ . A series is intermittent whenever it has spikes with c > 0. Gaussian series are not intermittent since c(γ) = 0 for all the spikes. To see this, note that for Gaussian processes the cumulative probability of a spike exceeding a fixed threshold s is independent of the resolution λ, where here, P(s) is related to the error function. Comparing this with Eq. 4, we see that c = 0. Equation 4 is true for scaling series of any scale ratio λ (i.e. any length, any resolution); its characterization in terms of γ and c(γ) is independent of λ, scale invariant. Scale invariance is in fact the physical symmetry principle respected by the dynamics. Eq. 4 can be used to estimate the most extreme spike expected on a series with λ pixels. The probability of the corresponding singularity γ max is ≈1/λ, hence from eq. 4, c(γ max ) = 1 (equivalently, d(γ max ) = 0) so that γ max can be estimated from the inverse codimension function: γ max = c −1 (1); the far right column of Table 2 shows that this provides reasonable estimates.
Finally, we may note that c(γ) is not changed under certain transformations; for example under a linear change of variables. This means that although many of the analyses discussed here are proxy-based, as long as their calibration is linear, c(γ) and hence the intermittency will not be affected. In contrast, noise in the proxies (e.g. from bioturbation in ocean sediments, or isotope diffusion in ice cores) breaks the scaling at high frequencies; it will not affect the exponents estimated over the lower frequency scaling ranges. It is more difficult to generalize about the consequences of variable and uncertain chronologies, but at least in some cases -such as ice cores -the sampling intervals themselves can be highly variable -indeed spiky and scaling -yet at least in principle, the sampling intermittency may be statistically separated from the proxy intermittency (see box 11.3 in ref. 8

).
No.  Table 2. The scaling parameters H, C 1 , α and probability exponents q D . The far right columns give theoretical estimates of the maximum spike heights using the parameters C 1 , α, q D and the scale ratio λ of the plots in Fig. 2 (=1000 except for data sets 10, 11, 12 where λ = 360, 360, 180 respectively; the spike plot for data set #6 is not shown). The theory column uses these parameters with the multifractal theory described in the text to estimate the solution to the equation c(γ max ) = 1. The "observed" column determines γ max from the spike plot directly: For comparison, for λ = 1000, Gaussian probabilities of 10 −3 , 10 −6 , 10 −9 yield respectively γ max = 0.20, 0.26, 0.30. Error estimates for the right hand columns (extremes) were not given due to their sensitivity to the somewhat subjective choice of range over which the regressions were made.
Scientific REPORTS | (2018) 8:12697 | DOI:10.1038/s41598-018-30829-4 Spectra Figure 2 shows that superficially routine series and transects can hide enormous nonclassical statistical variability. Until revealed by the spike plots, these fluctuations are virtually invisible yet they do not hide from spectral analysis which is highly sensitive to jumps. To explore this, we made a sample of 5000 realizations of a mildly intermittent multifractal process with parameters similar to those of the spiky regimes in Fig. 2, see Table 1 and Methods; here H = 0.33, C 1 = 0.05, α = 1.6 and each realization had 2048 points. Figure 3 (bottom left) shows one of the simulations and the corresponding spike plot (top left) confirming that while the process itself is apparently only weakly variable (bottom left), it is nevertheless highly spiky and non-Gaussian with two of the spikes having Gaussian p ≪ 10 −9 . The spikiness is well detected by the spectral analysis which itself has spikes. The spectrum (the black curve) is of a single realization, it is a "periodogram" obtained with a standard Hanning window. The right panel of Fig. 3 reveals that with respect to the ensemble averaged spectrum (over 5000 periodograms from the same process, smooth, blue), nearly a dozen spectral spikes exceeded the conventional 2 standard deviation significance level (the arrows), with some exceeding the 4 standard deviation level (p ≈ 10 −5 ). Although to make our point, we subjectively chose one of the more extreme of the first 50 simulations, this mild selection effect cannot explain the numerous strong spectral spikes. However, these spikes were to be expected since for each frequency, the tail of the probability distribution of the Fourier components was found to be a power law (here with exponent q D ≈ 5, see below).
Imagine that we performed a spectral analysis, but didn't know what to expect; e.g. an early analysis of ice core paleo temperatures. Encouraged by the idea that variability is synonymous with oscillations (e.g. the NOAA paleo-data website states that "because a particular phenomenon is called an oscillation, it does not necessarily mean there is a particular oscillator causing the pattern. Some prefer to refer to such processes as variability", cited in ref. 2 ), we might attempt to find a physical interpretation for some of the stronger spectral spikes. This temptation would be strengthened by the finding that some spikes are broad whereas (Gaussian) theory leads us to expect nearly independent Fourier components. Using sophisticated statistical methods such as multi-taper analysis (MTA) or singular spectral analysis (SSA), we could estimate the precise positions of the spectral spikes (see e.g. ref. 9 ). With this, we could judge their statistical significance. The conventional null hypothesis is that the background residuals are autoregressive order 1 Gaussian processes. For many of the spikes in Fig. 3, we could reject this null hypothesis with very highly levels of confidence. For example, out the 100 lowest Fourier components shown, 10 would be significant at the 95% level (the arrows) and one would have a probability of less than  Table 1). The smooth blue curve is the ensemble average of the process (with theory and numerics superposed: they cannot be distinguished). Above the blue curve, is an orange 2 standard deviation curve, and (red), 3, 4 standard deviation curves (probabilities 0.1%, 0.003% respectively). The arrows indicate spikes with Gaussian p < 0.05.
It is tempting to conclude that many ephemeral (and presumably spurious) claims of oscillatory atmospheric (and other) processes might have been consequences of fast computers, Fast Fourier Transforms, combined with inappropriate null hypotheses.

Understanding and Quantifying the Spikiness
In scaling processes, the variability systematically grows with scale range; this is the idea behind the cascade processes used in simulations (Fig. 3). Building up over a wide range of scales, the spikes exhibit clusters within clusters within clusters; we have seen how their sparseness can be quantified by fractal codimensions that decrease for stronger and stronger spikes. While it is possible to estimate c(γ) by directly analyzing the probabilities, it turns out to be advantageous to use mathematically equivalent, but technically simpler methods based on statistical moments. Although more details are given in Methods, the idea can be simply illustrated by plotting the ratio of the average absolute fluctuation to the root mean square fluctuation: the hierarchical clustering (i.e. clusters within clusters within clusters...) implies that this changes with scale in a power law manner: 1 where a is a constant close to 1 (Methods) and C 1 is the codimension of the mean, the basic intermittency exponent. The notation "<.>" denotes an average in the statistical (ensemble) sense. Figure 4a ; it is a scale-independent constant indicated as a dashed line. Although more precise estimates of C 1 are given in Table 1, the reference lines show that they are of the order of 0.03-0.10 in time and 0.075 to 0.15 in space (the ratios are noisy because of the large statistical fluctuations in each of the estimates of Δ Δ T t ( ) and Δ Δ T t ( ) 2 1/2 that are amplified when the ratio is taken). The C 1 values quantify how the intermittency near the mean builds up from one scale to another; even these apparently small C 1 values can imply significant effects when λ is large. In addition, as shown in Methods their importance for moments grows rapidly with their order. For example, for the fourth order moment important for the kurtosis (the ratio of the fourth moment to the square of the second), the effective codimension is amplified by a factor of ≈9 (see Eq. M2 with q = 4, α = 1.6). In this case, even if the scaling only holds over a range of λ = 1000 and C 1 = 0.1, this implies a kurtosis ≈36 compared to the Gaussian value 3.

Extremes, Outliers, Black Swans and Tipping Points
The codimension c(γ) quantifies the rate at which -due to the clustering -the spike levels λ γ build up as a function of the range of scales over which the dynamical processes act (eq. 3). c(γ) is nonlinear, requiring at least two parameters: the codimension near the mean (C 1 ) and the multifractal index α that determines its curvature near the mean (eq. M3). However, it generally requires an additional third parameter q D that characterizes the large γ behaviour, i.e. the tails of the probability distributions (eq. M8). This is because scaling in space and/or time generically gives rise to power law probability distributions (also known as "Pareto" or "fat-tailed" distributions). Specifically, the probability of a random temperature fluctuation ΔT exceeding a fixed threshold s is: where q D is a different exponent, this time characterizing the extremes.
To get an idea of how extreme the extremes can be, consider an example with q D = 5 (as has been estimated for the wind or temperature going back to ref. 10 see Fig. 5 and Table 2). With this exponent, temperature fluctuations 10 times larger than typical fluctuations occur only 10 5 times less frequently. In comparison, for a Gaussian, they would be ≈10 23 times less likely; they would never be observed. Understanding the nature of the extremes is fundamental since it determines our interpretation of large events as either extreme -but nevertheless within the expected range, and hence "normal" -or outside this range, hence an "outlier" or perhaps even a "tipping point".
A relevant example is global warming. Over about a century, there has been 1 °C warming of the globally averaged temperature -a roughly a 5 standard deviation event (with Gaussian probability ≈3 × 10 −6 ). In spite of this, climate skeptics claim that it is no more than a Giant Natural Fluctuation (GNF) i.e. a change that might nevertheless be normal -albeit extreme. The relevant extreme centennial changes are indeed non-Gaussian, and bounding the probability tail between power laws with 4 < q D < 6, ref. 11 showed that the probability of extremes were enhanced by a factor of as much a factor of 1000. Yet the GNF hypothesis could nevertheless be statistically rejected with more than 99.9% confidence.
In the log-log plots shown in Fig. 5, we see that -as expected from eq. 5 -most of the distributions show evidence of log-log linearity near the extremes. When judging possible deviations, it could be recalled that due to inadequate instrumental response times, post-processing noise reduction procedures (e.g. smoothing) or via "outlier" elimination algorithms, that extremes can easily be underestimated. Also, several of the distributions in Fig. 5 are from reanalyses which are numerical model outputs. Since physically, the extremes are consequences of variability building up over a wide range of spatial scales, the models' small hyperviscous scale range (truncated at ≈10 5 m rather than at the viscous scale of ≈10 −3 m), effectively truncates the extreme tails. Any of these effects could explain deviations from perfect power law tails or might explain some of the larger (i.e. less extreme) q D values in Table 2. Finally, while power law probabilities arise naturally in scaling systems, the distributions are not necessarily power laws; non-power law (curved tails in Fig. 5) may simply correspond to the special cases where → ∞ q D . In any event, for each empirical distribution, Fig. 5 shows the Gaussian with the same mean and standard deviation (dashed). We see that the empirical distributions are generally quite far from Gaussian. The power law fluctuations in Fig. 5a,b are so large that according to classical assumptions, they would be outliers. While Gaussians are mathematically convenient and can be justified when dealing with measurement errors, in atmospheric science thanks to the scaling, very few processes are Gaussian. Extremes occur much too frequently, a fact that colleagues and I regularly underscored starting in the 1980's (for a review see Table 5.1a,b. in ref. 8 ).
At best, Gaussians can be justified for additive processes, with the added restriction that the variance is finite. However, once this restriction is dropped, we obtain "Levy distributions" with power law extremes, but with exponents q D < 2. The Gaussian assumption also fails for the additive but scaling H model 2,12 . Most importantly, Gaussians are irrelevant for multiplicative processes: these generally lead to power law extremes but without any restriction on the value of q D 7,13 . Related models include Self-Organized Criticality 14 and Correlated Additive and Multiplicative noise 15 . Note that purely multiplicative random variables lead to somewhat less extreme log Levy and lognormal distributions (i.e. the logarithms are Levy or Gaussian; for the latter, see 16 for an early review: their tails are "long" but not "fat"). The enhanced variability of multiplicative processes when compared to multiplicative variables is due to the singular small scale limit of the former; it The ratios of the first moments (S 1 (Δt)) to RMS moments (S 2 (Δt) 1/2 ) for the same data used in Fig. 2b. The slopes give the exponent aC 1 (see the text). Using estimates of a = 0.86 (see Methods), reference lines with various intermittency exponents C 1 are shown. Gaussian processes are independent of scale Δt, they would lie along the dashed line. Since each curve is from ratios of statistics taken from data at different resolutions, we do not expect that the separate curves will be part of a single overall curve as they were in Fig. 1. has been theorized in the framework of multifractal phase transitions 17 . We could also mention that power law distributions also appear as the special (Frechet) case of Generalized Extreme Value Distributions although due to long range statistical dependencies, standard Extreme Value theory does not generally apply to scaling processes.
To underscore the importance of nonclassical extremes, Taleb 18 introduced the terms "grey and black swans". Originally, the former designated Levy extremes, and the latter was reserved for extremes that were so strong that they were outliers with respect to any existing theory. However, the term "grey swan" never stuck, and the better-known expression "black swan" is increasingly used for any power law extremes.
All of this is important in climate science where extreme events are often associated with tipping points. The existence of black swan extremes leads to a conundrum: since black swans already lead to exceptionally big extremes, how can we distinguish "mere" black swans from true tipping points?  Table 2. Also shown for reference (dashed curves) are the Gaussian distributions that give the same mean and standard deviations. (b) (bottom panel): The same as (a) but for the transects used in Fig. 2c. Note that there plausible arguments that numerical model outputs and reanalyses (the two middle curves) tend to be too smooth and to underestimate extremes; this might explain some of the larger q D values (for weather and macroweather).

Discussion
The emerging picture of atmospheric dynamics is based on processes acting over wide ranges of scale occasionally interspersed with narrow scale range, quasi periodic processes, most notably the daily, annual and Milankovitch cycles. This picture not only clarifies the distinction between weather, macroweather and climate, it has implications for our understanding of spectra, intermittency and extremes. To make these obvious, we introduced spike plots that highlight hidden extreme jumps in series and transects, showing that the only regime plausibly compatible with Gaussian assumptions was macroweather in time. Unsurprisingly, Fourier analysis is sensitive to these jumps so that we expected (and found) random spikes in Fourier space (spectra) as well, potentially explaining numerous ephemeral claims of quasi-periodic behaviour.
The spikes have two features: they are clustered, occurring on sparse fractal sets, and their probability distributions have "fat" power law tails. Both features can be quantified by a codimension function c(γ) that quantifies how the sparseness changes with resolution, and is itself determined by three parameters. The first was quantified by estimating the exponent C 1 from the variation of the ratio of the first and RMS moments, another (α) by the curvature of c(γ), and a final parameter (q D ) from log-log plots of the probabilities of exceeding thresholds. We argued that in the atmosphere, strong intermittency and extreme black swan events are ubiquitous and that this has numerous consequences including for spectral analysis and for distinguishing "normal", "expected" extremes from tipping points.

Methods
Data descriptions. The following are detailed descriptions of the data described in Table 1 and analyzed in the text. In Fig. 1a the solid lines are analyzed over the ranges indicated; the dashed lines are from the same sources but analyzed over wider scale ranges. The dashed and solid analyses of data # 2 differ lightly due to different daily and annual detrendings, were used 19 . Data from 0 to 40 °N (21 latitude bands, 180 longitude bands) were broken into five 28 year segments (=336 months each), a total of 21 × 180 × 5 × 336 = 6,350,400 points. 28 years is roughly the longest possible without the low frequencies being dominated by anthropogenic warming. 5. Greenland GRIP paleo temperatures at 5 year resolution were degraded by factor of 17 (to 85 years) so as to have (roughly) pre-industrial climate regime resolution with a series 1032 points long (87,720 years). For analysis, this was broken into 20,000 year segments (the rough limit of the climate regime). 6. EPICA (Antarctica) ice core paleo temperatures; there were 5788 points spanning 801,000 years. In order to avoid highly nonuniform sampling, the q D estimate was made from the equal depth data (55 cm resolution), see Fig. 5.21 in ref. 8 ; the Haar analysis (shown as a dashed "S" shaped line in Fig. 1a) was performed using the offical chronology and a non interpolation algorithm 2 and is reproduced from the latter. 7. A stack of benthic δ 18 O ratios 20 , with 14,825 points over the last 67 million years using the recommended −6.5 °C/mil calibration. The data from the first 6.85 million year were averaged to a 5,000 year resolutions (with only 3 missing data points being interpolated), 1,368 points in all. To avoid the larger scale mega-climate regime, this was then segmented into 8 segments each 855,000 years long to roughly represent the macroclimate period. The full length of the series was analyzed in ref. 2 , and is reproduced in Fig. 1a as a dashed line. 8. A stack of benthic δ 18 O ratios 2980 points 21 very unevenly distributed over the Phanerozoic -back 553 million years. For the spike plot (Fig. 2b) and probability distribution (Fig. 5a), the range was broken into 1000 intervals (553 k year resolution) and the missing data was filled by linear interpolation. For the scaling exponents, the data was further degraded to 1.54 M year resolution to avoid the macroclimate regime at small scales ( Fig. 1a which also shows the full analysis from ref. 2 , where it is indicated by a dashed line). The temperature calibration was the recommended value −4.5 °C/mil. This was the only series with significant interpolation (visible in Fig. 2b,  Analysis techniques. The Haar fluctuation analyses (Fig. 1), the spike plots (Fig. 2), the spectral analysis ( Fig. 3), the intermittency plot (Fig. 4) and the probability distributions (Fig. 5) are straightforward and were described in the main text. In this section, we indicate how the exponents (Table 2) were estimated. In terms of fluctuations, we can conveniently express the generic scaling result by taking the q th order statistical moments of eq. 1: , and "<.>" is the ensemble average. Δ Δ T t ( ) q is the (generalized) structure function and ξ(q) is its exponent (it is often convenient -e.g. eqs 3, 4 -to express the above in terms of the scale ratio λ. For series of length T, resolution Δt, we have λ = T/Δt so that λ ∝ 1/Δt). K(q) is the exponent that characterizes how the statistics of the driving flux ϕ Δt change with scale Δt; it directly characterizes the spikes (eq. 2). When ϕ Δt is fairly constant -e.g. quasi Gaussian -then the various moments ϕ Δt q are independent of scale so that K(q) = 0 and the structure function exponent ξ(q) is linear (=qH, e.g. fGn or fBm). More generally, K(q) is a convex function (K″ > 0) with only constraints K(0) = 0 (for a nonzero process) and K(1) = 0 (the mean is independent of scale). A convenient characterization of the intermittency is thus K′(1) = C 1 , where C 1 is the fractal co-dimension of the regions that give the dominant contribution to the mean of the process. More generally, due to the existence of stable, attractive, "universal" scaling processes: where 0 ≤ α ≤ 2 is the Levy index that characterizes the degree of multifractality 7 since α = K″(1)/K′ (1), it also characterizes the curvature of K near the mean (q = 1). The relationship in eq. M2 is only valid for q < q D (below); for larger q, K diverges (note: this is only true for an infinite ensemble, for a finite ensemble, there will be "spurious scaling" and estimates of K will grow with the sample size).
We have already introduced the codimension function c(γ) that characterizes how the probability distributions change with scale ratio λ (eq. 4). Since a description in terms of probabilities is equivalent to one in terms of moments, it turns out that c(γ) and K(q) are related in a surprisingly simple way: by a Legendre transformation 23 . A consequence is that each order of moment is determined by a single level of spikes (singularities, γ). C 1 is therefore the codimension of the set of spikes that dominates the calculation of the mean level of the spikes. The Legendre transformation of eq. M2 yields: 1 1 valid for γ > − C 1 α′/α otherwise = 0 and where the auxiliary variable α′ is defined by: 1/α′ + 1/α = 1. This means that processes whose moments have exponents K(q) given by eq. M2, have probabilities with exponents c(γ) given by eq. M3. Since Gaussian processes are specified by their variances (the means of their signed fluctuations are zero), in Fig. 1a,b we have plotted the conventional the Root Mean Square fluctuation Δ Δ ∝ Δ − T t t ( ) H K 2 1/2 (2)/2 (eq. M1) -rather than the scaling of the absolute mean Δ Δ ∝ Δ T t t ( ) H . For a Gaussian, K(2) = 0, the two would be the same; here they are different. In order to bring out this difference, and to quantify the intermittency which is responsible for it, in Fig. 4a (2)/2 . In ref. 8 we show how to construct a scaling function whose exponent yields C 1 directly; however, since empirically, α ≈ 1.6 (see Table 1) the exponent ratio: varies in only a narrow range with α, (for α = 1.6, a = 0.86) in Fig. 4 we used this to plot various reference lines for C 1 . We see that C 1 is larger in space than in time and is largest of all for the climate transects ( Fig. 4b far right). This large intermittency is the statistical expression of the existence of various climate zones. For comparison, values of C 1 in the range 0.05-0.10 are typical of turbulent fields; see Table 4.5 in the review ref. 8 . In this paper, we estimated ΔT(Δt) using Haar fluctuations; these were estimated by dividing the series and transects into disjoint intervals each of length Δt or Δx and taking the absolute difference of the first and second halves. Each interval yielded an estimate of a single fluctuation from which various powers were calculated. The corresponding statistical moments were estimated by averaging the powers over all the available fluctuations. For each order of magnitude of scale spanned by the data, 20 logarithmically spaced Δt's were chosen. The exponents ξ(q) were estimated by linear regression on a plot of the log(moments) versus logΔt, and the uncertainties were estimated in the standard way. The use of more sophisticated regressions (and uncertainty estimation) techniques is not warranted since the values of ξ(q) depend on the somewhat subjective choice of scaling range used for their estimation. In addition, one would require the use of multifractal (rather than Gaussian) models for the underlying variability, and the corresponding statistical results are not yet available.
Estimating α. The exponent α (eq. M2) is difficult to estimate since it characterizes how the nonlinear part of the structure function exponent ξ(q) varies with statistical moment q. Although various methods have been suggested, in Table 1 it was estimated by the following method. Following ref. 8  where a is defined in eq. S3. Finally, we estimate α from the ratio: which is independent of Δt (uncertainties can be estimated from its small variations about a constant mean level). We can numerically invert eq. M4 to obtain α from knowledge of a.
Estimating q D . We made histograms of absolute first differences of the series and transects using logarithmically spaced bins. We then accumulated these absolute changes starting with the largest to the smallest, yielding Pr(Δt > s) for the probability that an absolute change Δt exceeds the threshold s (note that the usual Cumulative Distribution Function (CDF) is Pr(Δt < s) = 1 − Pr(Δt > s)). Figure 4a,b shows the resulting log-log plots whose slope is expected to asymptote to a constant value = −q D (see eq. 5). The difficulty in the estimation is that the theory does not tell us the probability levels over which the power law is expected to hold and the value of q D depends on which part of the probability distribution that the regressions are made. Once the power law region is identified, then an unbiased "Hill" estimator can be used to determine q D , (alternatively, the maximum likelihood method 24 ) but the basic problem remains. In Fig. 5a,b, plausible reference lines were shown, and in Table 2, their slopes are given. For a review of several dozen relevant atmospheric q D estimates, see Table 5.1a,b in ref. 8 .
In terms of c(γ), the extreme power law behaviour (eq. 5) corresponds to the linear large γ behaviour: where γ D is a critical singularity, given by the solution to the equation: q D = c′(γ D ). For universal multifractals, eq. M3 holds for γ < γ D and so that C 1 , α, q D completely determine c(γ), and hence (via eq. 4), the probabilities at all scales. Since, we showed that the expected maximum γ max satisfies c(γ max ) = 1, we can therefore theoretically calculate the maximum expected "spikes" in Fig. 2. The results are shown in Table 2 in the column γ max (theory).
Data availability. All the data used are from publically accessible sources. The only exceptions are the thermistor data (#1) and the aircraft data (#9) that are available from the author on request.