Global ocean synoptic thermocline gradient, isothermal-layer depth, and other upper ocean parameters

Different from the existing global ocean climatological datasets of isothermal layer (ITL) depth (h), a global ocean synoptic dataset of h along with other parameters has been established from temperature profiles of the National Centers for Environmental Information (NCEI) world ocean database 1961–2017. The exponential leap-forward gradient method was used to identify h and thermocline gradient (G) from each temperature profile measured by expendable bathythermograph (XBT) and conductivity, temperature, depth (CTD) instruments. Due to quality and vertical resolution of the profiling data, the numbers of (G, h) pairs are 446,811 out of 964,942 CTD profiles and 755,086 out of 2,303,433 XBT profiles. With the given h, other parameters such as ITL heat content (HITL), sea surface temperature (SST), temperature below ITL (Tm), and quality measures (Q, I) indices are provided. Altogether, the dataset contains 1,201,897 temporally and horizontally varying sets of (G, h, HITL, SST, Tm, Q-index, I-index). Note that we added 200° to the longitude. This synoptic dataset is located on the NCEI website for public use.

Recently, the exponential leap-forward gradient (ELG) 25 was developed to archive (G, h, SST) and temperature below the ITL (T m ) simultaneously from a temperature profile, and has been verified as optimal among all the existing methods with the highest skill score using the Q-index 22 and the lowest Shannon information entropy (representing the least uncertainty) 25 . Vertical integration of temperature profile from the surface (z = 0) down to the base of the ITL (z = −h) leads to the ITL heat content (H ITL ). The benefit of using H ITL rather than heat content within fixed depths such as H 700 for the upper 700 m is that H ITL is fully determined by the upper ocean mixed layer dynamics and directly interacts with the atmosphere but H 700 is not. Dynamics is more complicated for H 700 than for H ITL .
We analysed the global ocean CTD and XBT temperature profiles , downloaded from the NCEI website https://www.nodc.noaa.gov/OC5/WOD/pr_wod.html in May 2018, with the ELG method. After that, the (G, h) data pairs were generated by the ELG method from 446,811 out of 964,942 CTD profiles and 755,086 out of 2,303,433 XBT profiles. The lower number of (G, h) data pairs compared to the observational temperature profiles is due to their quality and vertical resolution. Nine criteria are used for the data filtering such as too few data points between 10 m and 40 m, too few total observational points, maximum depth less than 20 m and so on. Please see MATLAB function (getgradient.m). The NCEI/WOD observational XBT profiles (no correction) were used because the attached algorithm is for analysing observational temperature profiles with real vertical data distribution (not on the standard depths). The quality of the (G, h) data were identified by both Q-index 22 and identification index (see the Technical Verification Section). Altogether, the dataset contains 1,201,897 sets of (G, h, H ITL , SST, T m , Q-index, I-index), and is located at the NCEI website for public use.

Methods
ELG Method. The ELG method 25 was used to process observational temperature profile data to obtain (G, h). It contains four steps: (1) estimating the ITL gradient (near-zero), (2) identifying the thermocline gradient G, (3) to computing the vertical gradient at each depth (non-dimensionalized by G), and (4) to determining h with a given threshold (or user input) to separate the near-zero gradient layer (i.e., the ITL) and the non-zero gradient layer (i.e., the thermocline). Figure 1 illustrates the procedures of this method. www.nature.com/scientificdata www.nature.com/scientificdata/ Step 1. Temperature profile data are often noisy, sometimes with unrealistically high vertical gradients near the surface. The reference levels (z ref ) of (0 m, −3 m, −10 m) are used for the difference method criteria in determination of h to reduce such noise. Thus, a layer with depth deeper than z ref, say 20 m, represents the vertical temperature variation of the upper layer. A depth (z 1 ) with a minimum gradient  www.nature.com/scientificdata www.nature.com/scientificdata/ is identified within this layer (i.e., 0 ≥ z ≥ −20 m). This minimum gradient at z 1 is the best representation of the ITL gradient (Fig. 1a).
Step-2. Let an observational temperature profile starting from z 1 to any depth z k be represented by [T(z k ), k = 1, 2, …, b] with z b the bottom of the profile. The vertical temperature difference from z 1 to z b , T d = T(z 1 ) − T(z b ), is the variation of temperature across the ITL, thermocline, and deep layer. Since the vertical gradient is strongest in the thermocline and weakest in the ITL (Fig. 1b,c), it is reasonable to assume that the vertical temperature difference inside the ITL is within 10% of T d . Since the deep layer is usually not vertically well mixed, the temperature variation is more in the deep layer than in the ITL. Thus, the main part of the thermocline can be roughly identified between the upper depth [z (0.1) ] with 10% temperature difference to z 1 comparing to T d − = . .
and the lower depth [z (0.7) ] with 70% temperature difference to z 1 comparing to T d (Fig. 1a).  www.nature.com/scientificdata www.nature.com/scientificdata/ Let (N th + 1) be the number of vertical data points for the closed interval [z (0.1) , z (0.7) ] with [T 0 = T(z (0.1) ), T Nth = T(z (0.7) )], and (T i , i = 0, 1, 2, …, N th − 1) in between. The ordinary vertical gradients are calculated from the depth z (0.1) to each depth through z (0.7) , Here, the gradient data [G 1 , G 2 , …, G Nth ] are not normally distributed. Thus, the overall feature [i.e., thermocline gradient (G)] is represented by the median, www.nature.com/scientificdata www.nature.com/scientificdata/ If the computed G is extremely small, 1 the thermocline vanishes. Since the temperature difference between z (0.7) and z b is only 30% of T d , and the vertical distance of neighbouring data points increases with depth below z (0.7) , the gradient below z (0.7) is also extremely small. The ITL extends to the bottom of the profile z b .
Step 3. Let N g be the number of the vertical data points between z 1 and z (0.7) (inclusive), and let = ( ) with the bracket indicating the integer part of the real number inside. N is much smaller than N g . Starting from z 1 , the (N + 1) exponential leap-forward gradients (ELGs) are calculated at each depth z k [between z 1 and z (0.7) ] www.nature.com/scientificdata www.nature.com/scientificdata/ where the computation stops if + z k 2 n is equal to or deeper than z b . It is also noted that D n is an identifier and not a numerical value. The averaged value among (N + 1) gradients [D 0 T(z k ), D 1 T(z k ), …, D N T(z k )] is computed by k n N n k 1 which represents the gradient effectively at the depth z k with capability to filter out noises in the gradient calculation. www.nature.com/scientificdata www.nature.com/scientificdata/ A threshold of the gradient ratio, R(z k ), is needed to determine if z k is in the ITL or the thermocline. We make it a user input parameter. Since there is no layer between the ITL and thermocline, and gradient in the thermocline is near 1, this threshold should be reasonably near 1 to be included in the thermocline. Here, 0.8 is suggested, but readers can change it in their practice.  WoD CTD (2013-14) for Illustration. The WOD dataset (1961-2017 contains 964,942 CTD and 2,303,433 XBT profiles with one file for one profile. We uploaded the WOD CTD temperature data (2013-2014) with 55,554 profiles in the folder named 'wod20132014CTD' and the derived dataset of (G, h, H ITL , SST, T m , Q-index, I-index) named 'WODCTD1314ELGtemp.nc' to the Naval Postgraduate School website http://faculty. nps.edu/pcchu/ocean_data.htm (Item 12) for readers to practise. The following MATLAB codes are ready to use for this sub-dataset. It is easy to change dataset name to process other profile data.

Data records
This global dataset for ocean synoptic thermocline gradient, isothermal-layer depth, and other upper ocean parameters 26 is publicly available at the NOAA/NCEI data repository as a NetCDF file, which includes data citation, dataset identifiers, metadata, and ordering instructions. The dataset is located at the NCEI website (https:// doi.org/10.25921/dgak-7a43) for public use.

technical Validation
The key parameter of this dataset is h. The error at the data point is defined by the difference between the fitted and observed temperatures. The whole temperature profile data fitted to a single linear function (thick lines in Fig. 2) represents the maximum error since it disregards the existence of ITL and thermocline. Such a maximum error is represented by the total sum of the square error (SSE T ). After the ITL depth is identified, two lines are used to fit the observational data to get the fitted profile with the first one (near zero gradient in the ITL) from the top to the ITL base (circles in Fig. 2) and the second one (non-zero gradient in the thermocline) from the ITL base to the bottom of the profile. Such fitting has the errors in the ITL represented by the sum of the square error in ITL (SSE ITL ) and in the thermocline represented by the sum of the square error in thermocline (SSE TH ). The identification index (called I-index) is defined by

ITL I TL TH T
If an ITL exists, but is not identified (h = 0) (Fig. 2a), SSE ITL = 0, SSE TH = SSE T ; which gives I ITL = 0. If the identified h is shorter than the real one (Fig. 2b), SSE ITL = 0, SSE TH > 0, SSE TH < SSE T ; which leads to 0 < I ITL < 1. If the www.nature.com/scientificdata www.nature.com/scientificdata/ identified h is the same as the real one (Fig. 2c), SSE ITL = 0, SSE TH = 0, SSE T > 0; which makes I ITL = 1. If the identified h is longer than the real one (Fig. 2d), SSE ITL > 0, SSE TH = 0, SSE TH < SSE T ; which leads to 0 < I ITL < 1. If the identified h reaches the bottom of the thermocline [i.e., at z (0.7) ] (Fig. 2e), SSE TH = 0, SSE ITL = SSE T ; which gives I ITL = 0. The histograms of I ITL show high quality of the ELG method for identifying ITL depth using the WOD/ XBT data with the mean I ITL of 0.884 (Fig. 3a), the WOD/CTD data with the mean I ITL of 0.843 (Fig. 3b). However, the quality is lower with the climatology dataset such as World Ocean Atlas (WOA) 2013, downloaded from the website: https://www.nodc.noaa.gov/OC5/woa13/woa13data.html. The mean I ITL is 0.788 using the WOA-13 on 1° resolution (WOA13-1°) (Fig. 3c) and 0.755 using the WOA-13 on 0.25° resolution (WOA13-0.25°) (Fig. 3d). The low scores may be caused by the coarser vertical resolution in WOA dataset than in the WOD/CTD and WOD/XBT. Besides, the ELG method has the highest score of the commonly used Q-index 22,25 . Quite a few zero values for I ITL in Fig. 3 indicate that no ITL or thermocline can be identified from those profiles. Figure 4 shows the spatial and temporal distributions of the derived 446,811 sets (from CTD) and 755,086 sets (from XBT) of (G, h, H ITL , SST, T m , Q-index, I-index). The longitude range of this dataset has the same range as the WOD from −180° to 180°. The values of the horizontal axis in Fig. 4a,b had 200° added for the sake of plotting. The dataset covers the global oceans pretty well except the Southern Ocean, where there are data-poor regions. The WOD XBT (Fig. 4c) and CTD (Fig. 4d) temperature profile data increase drastically after 1970.
Differently from the existing climatological datasets of h, this dataset provides variabilities on various time scales. Here, we take decadal variability as an example for illustration. To do so, we divided the time from 1961 to 2017 into six periods: 1961-1970, 1971-1980, 1981-1990, 1991-2000, 2001-2010, and 2011-2017, along with the climatology WOA-1° and WOA-0.25° for comparison. For each period, we used the global data to construct the histograms of h (Fig. 5), G (Fig. 6), and H ITL (Fig. 7). They clearly show temporally varying (on decadal time scales) non-Gaussian distributions with strong positive skewness and higher values (>3) of kurtosis. Furthermore, we calculated the statistical parameters for variables (h, G, H ITL ) within each period to determine the decadal variabilities. Table 1 Table 3 shows the decadal variations of statistical parameters for the global H ITL with a monotonically increasing mean (standard deviation) from 2.79 × 10 9 J/m 2 (2.58 × 10 9 J/m 2 ) during 1961-1970 to 5.43 × 10 9 J/m 2 (4.63 × 10 9 J/m 2 ) during 2011-2017. All the statistical parameters are comparable between the synoptic and climatological datasets except the mean and standard deviation of G, which were much lower in the climatology datasets such as (0.034 °C/m, 0.022 °C/m) in the WOA13-1°, and (0.034 °C/m, 0.023 °C/m) in the WOA13-0.25°.

Code Availability
We present custom codes with the MATLAB to determine (G, h, I ITL ) from an individual temperature profile (see 'Supplementary Material'). The main program is 'ThermoclineMLD.m' , which contains four Matlab functions: 'validata.m' to filter out bad profiles, 'getgradient.m' to calculate the vertical gradients between z (0.1) and z (0.7) , 'ELGCore.m' to calculate the ELGs from z 1 to z (0.7) , and "Iindex.m' to calculate the Identification index (I ITL ) for the technical validation (see the Technical Validation Section). The code 'getgradient.m' can use temperature or potential density profiles to obtain the ITL and thermocline gradient or mixed layer and pycnocline gradient, but only the work with temperature profiles will be explained in this data descriptor. Interested readers may use our MATLAB codes to analyse the WOD 2013-14 CTD temperature profiles and to get the derived dataset (G, h, SST, T m , H ITL , Q-Index, I-index).   Table 3. Decadal variation of statistical characteristics of the global isothermal layer heat content (H ITL ) in comparison to climatology.