Background & Summary

Existing global datasets of isothermal layer (ITL) depth (h) are all climatologies on regular grids with 1° horizontal resolution and standard World Ocean Atlas (WOA) vertical depths, either computed from 3D gridded temperature climatology1,2 or identified from observational temperature profiles and then averaged in 1° or 2° cells3,4. Such datasets provide climatological monthly mean gridded values. However, many processes occurring in the ITL cause h to change on diurnal, seasonal, to interannual time scales5,6,7, which cannot be represented by the existing climatological monthly mean datasets.

Besides, underneath the ITL, there exists a thermocline with a strong gradient (G). Both h and G are important parameters in studying climate variability, interaction between atmosphere and oceans, and ocean prediction8. For example, the thermocline affects the heat exchange between the ITL and a deeper layer, and changes the ITL heat content (HITL), influences the evolution of the sea surface temperature (SST), and in turn the heat/moisture fluxes and upward long wave radiation at the surface9. Thus, establishment of a synoptic dataset of (h, G, HITL, SST) from temperature profiles becomes urgent.

The traditional approach to determine h from temperature profiles is because of a single near-zero gradient in the ITL using difference and gradient methods. The difference method requires the deviation of temperature from its value at a reference depth (zref) to be smaller than a certain fixed value, which varies among 1.0 °C10, 0.8 °C11,12, 0.2 °C4,13,14, to 0.1 °C15. The reference level changes from near the surface12 to 10 m depth4,15. The gradient method requires the vertical temperature gradient ∂T/∂z to be smaller than a certain fixed value, which varies from 0.015 °C/m16 to 0.025 °C/m17,18. A recent approach is to use the transition from the near-zero gradient in the ITL to the non-zero gradient in the thermocline to determine h. This leads to a maximum curvature method19,20,21,22. Large errors may occur in noisy profile data since the curvature involves the calculation of second derivative versus depth20,22. To improve the curvature method, the optimal linear fitting23 and maximum angle24 methods were developed with capability of handling noisy profile data. However, these two methods are iterative and not as straightforward as earlier methods such as the difference, gradient, and maximum curvature methods. Furthermore, none of these methods determines G.

Recently, the exponential leap-forward gradient (ELG)25 was developed to archive (G, h, SST) and temperature below the ITL (Tm) simultaneously from a temperature profile, and has been verified as optimal among all the existing methods with the highest skill score using the Q-index22 and the lowest Shannon information entropy (representing the least uncertainty)25. Vertical integration of temperature profile from the surface (z = 0) down to the base of the ITL (z = −h) leads to the ITL heat content (HITL). The benefit of using HITL rather than heat content within fixed depths such as H700 for the upper 700 m is that HITL is fully determined by the upper ocean mixed layer dynamics and directly interacts with the atmosphere but H700 is not. Dynamics is more complicated for H700 than for HITL.

We analysed the global ocean CTD and XBT temperature profiles (1961–2017), downloaded from the NCEI website https://www.nodc.noaa.gov/OC5/WOD/pr_wod.html in May 2018, with the ELG method. After that, the (G, h) data pairs were generated by the ELG method from 446,811 out of 964,942 CTD profiles and 755,086 out of 2,303,433 XBT profiles. The lower number of (G, h) data pairs compared to the observational temperature profiles is due to their quality and vertical resolution. Nine criteria are used for the data filtering such as too few data points between 10 m and 40 m, too few total observational points, maximum depth less than 20 m and so on. Please see MATLAB function (getgradient.m). The NCEI/WOD observational XBT profiles (no correction) were used because the attached algorithm is for analysing observational temperature profiles with real vertical data distribution (not on the standard depths). The quality of the (G, h) data were identified by both Q-index22 and identification index (see the Technical Verification Section). Altogether, the dataset contains 1,201,897 sets of (G, h, HITL, SST, Tm, Q-index, I-index), and is located at the NCEI website for public use.

Methods

ELG Method

The ELG method25 was used to process observational temperature profile data to obtain (G, h). It contains four steps: (1) estimating the ITL gradient (near-zero), (2) identifying the thermocline gradient G, (3) to computing the vertical gradient at each depth (non-dimensionalized by G), and (4) to determining h with a given threshold (or user input) to separate the near-zero gradient layer (i.e., the ITL) and the non-zero gradient layer (i.e., the thermocline). Figure 1 illustrates the procedures of this method.

Fig. 1
figure 1

Characteristics of the isothermal layer and thermocline. (a) Temperature profile with the illustrated z(0.1) and z(0.7). (b) Vertical gradient and the defined z1. (c) Gradient ratio.

Step 1

Temperature profile data are often noisy, sometimes with unrealistically high vertical gradients near the surface. The reference levels (zref) of (0 m, −3 m, −10 m) are used for the difference method criteria in determination of h to reduce such noise. Thus, a layer with depth deeper than zref, say 20 m, represents the vertical temperature variation of the upper layer. A depth (z1) with a minimum gradient

$${\left|\frac{{\rm{\Delta }}T}{{\rm{\Delta }}Z}\right|}_{z={z}_{1}}=\mathop{Z\ge -20\,{\rm{m}}}\limits^{min}\left|\frac{{\rm{\Delta }}T}{{\rm{\Delta }}z}\right|,$$
(1)

is identified within this layer (i.e., 0 ≥ z ≥ −20 m). This minimum gradient at z1 is the best representation of the ITL gradient (Fig. 1a).

Step-2

Let an observational temperature profile starting from z1 to any depth zk be represented by [T(zk), k = 1, 2, …, b] with zb the bottom of the profile. The vertical temperature difference from z1 to zb, Td = T(z1) − T(zb), is the variation of temperature across the ITL, thermocline, and deep layer. Since the vertical gradient is strongest in the thermocline and weakest in the ITL (Fig. 1b,c), it is reasonable to assume that the vertical temperature difference inside the ITL is within 10% of Td. Since the deep layer is usually not vertically well mixed, the temperature variation is more in the deep layer than in the ITL. Thus, the main part of the thermocline can be roughly identified between the upper depth [z(0.1)] with 10% temperature difference to z1 comparing to Td

$$T({z}_{1})-T({z}_{(0.1)})=0.1{T}_{d},$$
(2a)

and the lower depth [z(0.7)] with 70% temperature difference to z1 comparing to Td (Fig. 1a).

$$T({z}_{1})-T({z}_{(0.7)})=0.7{T}_{d}.$$
(2b)

Let (Nth + 1) be the number of vertical data points for the closed interval [z(0.1), z(0.7)] with [T0 = T(z(0.1)), TNth = T(z(0.7))], and (Ti, i = 0, 1, 2, …, Nth − 1) in between. The ordinary vertical gradients are calculated from the depth z(0.1) to each depth through z(0.7),

$${G}_{i}=\left[T({z}_{(0.1)})-{T}_{i}\right]/[{z}_{(0.1)}-z{| }_{{T}_{i}}],\quad i=1,2,\,\ldots ,{N}_{th}.$$
(3)

Here, the gradient data [G1, G2, …, GNth] are not normally distributed. Thus, the overall feature [i.e., thermocline gradient (G)] is represented by the median,

$$G={\rm{Median}}\left[{G}_{1},{G}_{2},\,\ldots ,{G}_{{N}_{th}}\right].$$
(4)

If the computed G is extremely small,

$$G\le 0.001\,^\circ {\rm{C}}\,{{\rm{m}}}^{-1}$$
(5)

the thermocline vanishes. Since the temperature difference between z(0.7) and zb is only 30% of Td, and the vertical distance of neighbouring data points increases with depth below z(0.7), the gradient below z(0.7) is also extremely small. The ITL extends to the bottom of the profile zb.

Step 3

Let Ng be the number of the vertical data points between z1 and z(0.7) (inclusive), and let \(N=\left\langle {{\rm{log}}}_{2}({N}_{g})\right\rangle \) with the bracket indicating the integer part of the real number inside. N is much smaller than Ng. Starting from z1, the (N + 1) exponential leap-forward gradients (ELGs) are calculated at each depth zk [between z1 and z(0.7)]

$${D}_{n}T({Z}_{k})=[T({Z}_{k})-T({T}_{k+{2}^{n}})]/[{Z}_{k}-{Z}_{k+{2}^{n}}],\,n=1,\,2,\,\ldots ,\,N.$$
(6)

where the computation stops if \({z}_{k+{2}^{n}}\) is equal to or deeper than zb. It is also noted that Dn is an identifier and not a numerical value. The averaged value among (N + 1) gradients [D0T(zk), D1T(zk), …, DNT(zk)] is computed by

$${G}_{\ast }({z}_{k})=\left[\sum _{n=1}^{N}{D}_{n}T({z}_{k})\right]/N,$$
(7)

which represents the gradient effectively at the depth zk with capability to filter out noises in the gradient calculation.

Step 4

Since \({G}_{\ast }({z}_{k})\approx 0\) if zk in the ITL; \({G}_{\ast }({z}_{k})=G\) if zk in the thermocline,

$$R({z}_{k})\equiv {G}_{\ast }({z}_{k})/G=\left\{\begin{array}{l} \sim 0,\quad {z}_{k}\,{\rm{in}}\,{\rm{the}}\,{\rm{ITL}}\\ \sim 1,\quad {z}_{k}\,{\rm{in}}\,{\rm{the}}\,{\rm{thermocline}}\end{array}\right.$$
(8)

A threshold of the gradient ratio, R(zk), is needed to determine if zk is in the ITL or the thermocline. We make it a user input parameter. Since there is no layer between the ITL and thermocline, and gradient in the thermocline is near 1, this threshold should be reasonably near 1 to be included in the thermocline. Here, 0.8 is suggested, but readers can change it in their practice.

Data filtering

The ELG method requires a certain quality of the profile data. A temperature profile is flagged if it has the following features: (1) number of data points ≤ 2 between 10 and 40 m depths, (2) total number of data points ≤ 5, (3) maximum depth < 20 m (not reaching thermocline), (4) upmost data point deeper than 50 m (no way to identify the ITL), (5) variance above 20 m > variance below 20 m, (6) range (TmaxTmin) < 1.0 °C, (7) too small thermocline gradient (<0.001 °C/m) (no thermocline), (8) big change (>5 °C) between two neighboring data points. Finally, 446,811 out of 964,942 CTD profiles and 755,086 out of 2,303,433 XBT profiles are not flagged (i.e., normal). Thus, 1,201,897 sets of (G, h, HITL, SST, Tm, Q-index, I-index) have been established. Note that SST here is the temperature at the ITL base (z = −h). The bulk SST is easily obtained from HITL and h or from the first depth level of the profile.

Data Traced Back to the WOD Profiles

The dataset contains 1,201,897 sets of (G, h, HITL, SST, Tm, Q-index, I-index). Each set has exactly the same metadata as the original temperature profile in the WOD such as time, location (longitude, latitude), country code, WOD cruise identifier, and WOD unique cast. Note that we added 200° to the WOD latitudes. With these parameters, each set of (G, h, HITL, SST, Tm, Q-index, I-index) is easy to trace back to the original WOD temperature profile.

Use of WOD CTD (2013-14) for Illustration

The WOD dataset (1961–2017) contains 964,942 CTD and 2,303,433 XBT profiles with one file for one profile. We uploaded the WOD CTD temperature data (2013–2014) with 55,554 profiles in the folder named ‘wod20132014CTD’ and the derived dataset of (G, h, HITL, SST, Tm, Q-index, I-index) named ‘WODCTD1314ELGtemp.nc’ to the Naval Postgraduate School website http://faculty.nps.edu/pcchu/ocean_data.htm (Item 12) for readers to practise. The following MATLAB codes are ready to use for this sub-dataset. It is easy to change dataset name to process other profile data.

Data Records

This global dataset for ocean synoptic thermocline gradient, isothermal-layer depth, and other upper ocean parameters26 is publicly available at the NOAA/NCEI data repository as a NetCDF file, which includes data citation, dataset identifiers, metadata, and ordering instructions. The dataset is located at the NCEI website (https://doi.org/10.25921/dgak-7a43) for public use.

Technical Validation

The key parameter of this dataset is h. The error at the data point is defined by the difference between the fitted and observed temperatures. The whole temperature profile data fitted to a single linear function (thick lines in Fig. 2) represents the maximum error since it disregards the existence of ITL and thermocline. Such a maximum error is represented by the total sum of the square error (SSET). After the ITL depth is identified, two lines are used to fit the observational data to get the fitted profile with the first one (near zero gradient in the ITL) from the top to the ITL base (circles in Fig. 2) and the second one (non-zero gradient in the thermocline) from the ITL base to the bottom of the profile. Such fitting has the errors in the ITL represented by the sum of the square error in ITL (SSEITL) and in the thermocline represented by the sum of the square error in thermocline (SSETH). The identification index (called I-index) is defined by

$${I}_{ITL}=\sqrt{1-(SS{E}_{ITL}+SS{E}_{TH})/SS{E}_{T}}.$$
(9)
Fig. 2
figure 2

The identification index to represent the quality of the ITL depth determination. (a) ITL not existence or ITL in existence, but not identified (h = 0). (b) Identified ITL depth shorter than the real ITL depth. (c) Perfectly identified ITL depth. (d) Identified ITL depth longer than the real ITL depth. (e) Identified ITL depth at z(0.7), i.e., thermocline not existence. Here, the ITL depth is marked by a circle. The thick line in each figure is a single linear function fitted to the temperature profile data.

If an ITL exists, but is not identified (h = 0) (Fig. 2a), SSEITL = 0, SSETH = SSET; which gives IITL = 0. If the identified h is shorter than the real one (Fig. 2b), SSEITL = 0, SSETH > 0, SSETH < SSET; which leads to 0 < IITL < 1. If the identified h is the same as the real one (Fig. 2c), SSEITL = 0, SSETH = 0, SSET > 0; which makes IITL = 1. If the identified h is longer than the real one (Fig. 2d), SSEITL > 0, SSETH = 0, SSETH < SSET; which leads to 0 < IITL < 1. If the identified h reaches the bottom of the thermocline [i.e., at z(0.7)] (Fig. 2e), SSETH = 0, SSEITL = SSET; which gives IITL = 0. The histograms of IITL show high quality of the ELG method for identifying ITL depth using the WOD/XBT data with the mean IITL of 0.884 (Fig. 3a), the WOD/CTD data with the mean IITL of 0.843 (Fig. 3b). However, the quality is lower with the climatology dataset such as World Ocean Atlas (WOA) 2013, downloaded from the website: https://www.nodc.noaa.gov/OC5/woa13/woa13data.html. The mean IITL is 0.788 using the WOA-13 on 1° resolution (WOA13-1°) (Fig. 3c) and 0.755 using the WOA-13 on 0.25° resolution (WOA13-0.25°) (Fig. 3d). The low scores may be caused by the coarser vertical resolution in WOA dataset than in the WOD/CTD and WOD/XBT. Besides, the ELG method has the highest score of the commonly used Q-index22,25. Quite a few zero values for IITL in Fig. 3 indicate that no ITL or thermocline can be identified from those profiles.

Fig. 3
figure 3

Histograms of the identification index showing the quality of determination of the ITL depth. (a) WOD/XBT data. (b) WOD/CTD data. (c) WOA annual 1° × 1°. (d) WOA annual 0.25° × 0.25°.

Figure 4 shows the spatial and temporal distributions of the derived 446,811 sets (from CTD) and 755,086 sets (from XBT) of (G, h, HITL, SST, Tm, Q-index, I-index). The longitude range of this dataset has the same range as the WOD from −180° to 180°. The values of the horizontal axis in Fig. 4a,b had 200° added for the sake of plotting. The dataset covers the global oceans pretty well except the Southern Ocean, where there are data-poor regions. The WOD XBT (Fig. 4c) and CTD (Fig. 4d) temperature profile data increase drastically after 1970.

Fig. 4
figure 4

Temperature profiles were processed by the ELG method. (a) Horizontal distribution of 755,086 WOD/XBT stations. (b) Horizontal distribution of 446,811 WOD/CTD stations. (c) Monthly number of the WOD/XBT profiles. (d) Monthly number of the WOD/CTD profiles.

Differently from the existing climatological datasets of h, this dataset provides variabilities on various time scales. Here, we take decadal variability as an example for illustration. To do so, we divided the time from 1961 to 2017 into six periods: 1961–1970, 1971–1980, 1981–1990, 1991–2000, 2001–2010, and 2011–2017, along with the climatology WOA-1° and WOA-0.25° for comparison. For each period, we used the global data to construct the histograms of h (Fig. 5), G (Fig. 6), and HITL (Fig. 7). They clearly show temporally varying (on decadal time scales) non-Gaussian distributions with strong positive skewness and higher values (>3) of kurtosis. Furthermore, we calculated the statistical parameters for variables (h, G, HITL) within each period to determine the decadal variabilities.

Fig. 5
figure 5

Histograms showing decadal variation for the isothermal layer depth (h) in comparison to the climatology. (a) 1961–1970. (b) 1971–1980. (c) 1981–1990. (d) 1991–2000. (e) 2001–2010. (f) 2011–2017. (g) WOA annual 1° × 1°. (h) WOA annual 0.25° × 0.25°.

Fig. 6
figure 6

Histograms showing decadal variation for the thermocline gradient (G) in comparison to the climatology. (a) 1961–1970. (b) 1971–1980. (c) 1981–1990. (d) 1991–2000. (e) 2001–2010. (f) 2011–2017. (g) WOA annual 1° × 1°. (h) WOA annual 0.25° × 0.25°.

Fig. 7
figure 7

Histograms showing decadal variation for the isothermal layer heat content (HITL) in comparison to the climatology. (a) 1961–1970. (b) 1971–1980. (c) 1981–1990. (d) 1991–2000. (e) 2001–2010. (f) 2011–2017. (g) WOA annual 1° × 1°. (h) WOA annual 0.25° × 0.25°.

Table 1 shows the decadal variations of statistical parameters for the global h with a monotonically increasing mean (standard deviation) from 33.1 m (32.7 m) during 1961–1970 to 67.5 m (61.7 m) during 2011–2017, and monotonically decreasing skewness (kurtosis) from 4.01 (29.3) during 1971–1980 to 2.70 (15.1) during 2011–2017. There is no clear pattern for skewness and kurtosis for h. Table 2 shows the decadal variations of statistical parameters for the global G with a monotonically decreasing mean (standard deviation) from 0.244 °C/m (0.178 °C/m) during 1961–1970 to 0.121 °C/m (0.134 °C/m) during 2011–2017, and monotonically increasing skewness (kurtosis) from 1.23 (4.40) during 1961–1970 to 2.55 (10.9) during 2011–2017. Table 3 shows the decadal variations of statistical parameters for the global HITL with a monotonically increasing mean (standard deviation) from 2.79 × 109 J/m2 (2.58 × 109 J/m2) during 1961–1970 to 5.43 × 109 J/m2 (4.63 × 109 J/m2) during 2011–2017. All the statistical parameters are comparable between the synoptic and climatological datasets except the mean and standard deviation of G, which were much lower in the climatology datasets such as (0.034 °C/m, 0.022 °C/m) in the WOA13-1°, and (0.034 °C/m, 0.023 °C/m) in the WOA13-0.25°.

Table 1 Decadal variation of statistical characteristics of the global isothermal layer depth (h) in comparison to climatology.
Table 2 Decadal variation of statistical characteristics of the global thermocline gradient (G) in comparison to climatology.
Table 3 Decadal variation of statistical characteristics of the global isothermal layer heat content (HITL) in comparison to climatology.