Background & Summary

The Global Drifter Program (GDP) funded by the U.S. National Oceanic and Atmospheric Administration (NOAA) maintains an array of satellite-tracked water-following drifting buoys, hereafter referred to as drifters, designed to acquire in situ observations of near-surface ocean current, sea surface temperature (SST), and atmospheric sea level pressure1. The requirement of the Global Ocean Observing System (GOOS) to achieve a nominal 5° × 5° coverage of the world’s ocean has been fulfilled since September 2005 with a pool of 1250 drifters2. In near-real time, drifter locations and sensor data are relayed to the World Meteorological Organization’s Global Telecommunication System (WMO GTS), contributing to the collection of critical information needed for the World Weather Watch programme. Drifter data are also harvested by various national and international projects and organizations which aim at assembling in situ SST observations to produce quality-controlled and reformatted datasets for scientific analysis, climate monitoring, and calibration and validation of satellite-based SST observations. In delayed-time, the GDP maintains the historical database of drifter data and metadata, and delivers regular updates of drifter data products of surface currents and SST following quality control and estimation procedures. The historical observations, with the earliest ones from 1979, have been processed in incremental steps to generate a 6-hour joint dataset of drifter position, velocity, and SST estimates, along with their uncertainty estimates3,4. Because the frequency of drifter observations has increased since the onset of the array, an hourly product of drifter velocity estimates with uncertainties has been generated following a new estimation methodology, since 20165. This paper describes the methods that have now been devised to generate a new dataset of SST estimates at hourly time steps along drifters’ trajectories, aimed at accompanying the on-going hourly drifter velocity dataset5. The dataset of drifter position and velocity estimates augmented with SST estimates is now the version 2.00 of the Hourly location, current velocity, and temperature collected from Global Drifter Program drifters world-wide dataset6. A summary of the products generated by the GDP is contained in Table 1.

Table 1 Table of location, velocity, and temperature data products and availability from the Global Drifter Program (GDP) as described in Hansen and Poulain (1996), Elipot et al. (2016), and this paper which defines three levels of data processing.

Hourly estimates of SST along drifters’ trajectories are ultimately obtained from in situ sea water temperature observations. Estimates are obtained by least squares fitting a mathematical model of SST temporal evolution to temporally-uneven SST observations. The adopted fitting method is an adaptation of the locally weighted scatterplot smoothing method, known as LOWESS7. The method operates in an iterative manner in order to gradually reduce originally-uniform weights given to observations, eventually rejecting observations diagnosed as outliers. The method first generates SST estimates at the original times of the drifter SST sensor observations, and second generates SST estimates at regular top-of-hour times that typically do not coincide with the observation times. After fitting the mathematical model, the local error variance of the assumed observational process is estimated by summing the variance of the residuals from the fit and an ad hoc term aimed at taking into account the quantization error arising from temperature sensor resolution8. The ultimately chosen mathematical SST model is the sum of a polynomial function of order one, meant to capture non-diurnal variability, and the sum of three pairs of cosine and sine functions at harmonic frequencies of the diurnal frequency, meant to capture diurnal variability. The error variance estimates are subsequently propagated through the least squares method to derive standard uncertainties of the model parameter and of the SST estimates. The parameters of the model and of the fitting method have been chosen by analyzing two limited subsets of the global observational dataset. The choices made aim at minimizing both the mean square error calculated from the residuals of the fits and the estimates of the error variance of the observational process model. Ultimately, the variables added to the existing hourly dataset of drifter positions and velocities consist of SST diurnal hourly estimates, SST non-diurnal hourly estimates, and total SST hourly estimates. Each of these estimates is accompanied by its respective standard error estimates and specifically devised quality flag. Global statistics from our estimates indicate that the square root of the typical error variance is 0.031 °C for drogued drifters and 0.036 °C for undrogued drifters, and that the typical standard error for SST estimates is 0.016 °C for drogued drifters and 0.019 °C for undrogued drifters. There exist, however, marked geographical differences for these values across the world’s ocean. The magnitudes of these uncertainties are an order of magnitude smaller than previously estimated measurement uncertainties for drifting buoys9 because of differences of methods.

Methods

The methods described in this paper define four levels of SST data denoted Level-0,1,2,3, as explained in the following sections:

  • Level-0 corresponds to the original, temporally unevenly distributed data as reported by the SST sensor and transmitted to the GDP DAC by Service Argos system or by the Lagrangian Drifter Laboratory at Scripps Institution of Oceanography;

  • Level-1 data result from the application of initial processing and quality-controls to Level-0 data;

  • Level-2 corresponds to SST estimates at the same unevenly distributed times as Level-1;

  • Level-3 corresponds to SST estimates at a regular hourly interval, at the top of each hour. Level-3 estimates are obtained at the same times as the position and velocity estimates for drifters of the of the GDP hourly dataset, release 1.04c5. The dataset of drifter position and velocity estimates augmented with SST estimates is release 2.00.

This paper describes the derivations of Level-1,2,3 datasets and announces the release of the Level-3 data as a product.

Data collation

In its basic configuration, a standard SVP drifter (from the Surface Velocity Program of the World Ocean Circulation Experiment) is composed of a surface float tethered to a holey-sock “drogue”, or sea anchor, centered at 15 m depth when deployed. As a result, the surface displacement of the float tracked by satellites is predominantly representative of oceanic velocity at 15 m10. With time, a drifter can lose its drogue and becomes “undrogued2” but still continues to transmit its position and its sensor data until it dies11. In addition to the standard SVP configuration, a number of drifters are equipped with additional sensors such as a barometer for sea-level atmospheric pressure12 or a conductivity sensor to measure salinity13 (see https://www.aoml.noaa.gov/phod/dac/deployed.html for the historical deployment log of the GDP). Yet, all drifters are equipped with a temperature sensor attached to the surface float and located at about 18 cm depth when at rest. Despite being an environmental variable of climate importance, sea surface temperature does not have a unique definition, and the depth at which a measurement is taken is crucial to interpret its value and variability. The definitions for near-surface seawater temperature from the Group for High Resolution Sea Surface Temperature (GHRSST, https://www.ghrsst.org/ghrsst-data-services/products/) would suggest to call the temperature data from surface drifters “observations of sea water temperature at a depth of 18 cm”. In the rest of this paper, for simplicity, we will refer to temperature observations from drifters, as well as temperature estimates derived from these, as SST data.

At the onset of the GDP in 1979, drifters were tracked by Argos which is both a positioning system and a data transmission system. At the beginning of the program, battery power and money were conserved by sampling location and transmitting data using different 1/3 and 2/3 schemes. As an example, data was transmitted for one day, followed by two days of no transmission, or data was transmitted for 8 hours, followed by 16 hours of no transmission14. Since 2000, this sampling scheme has been abandoned thanks to increased battery life and other technological advancements. At the same time, the number of operational satellites of the Argos constellation increased with time, so that the typical time interval between two consecutive Argos fixes reduced to between 1 and 2 hours15. However, stemming from the original sampling pattern, the GDP has continued to routinely process and interpolate the location fixes and temperature observations to produce drifter locations and temperature estimates continuously along trajectories at 6-hour intervals. The general method of interpolation, called kriging3, provides an estimate of location, or of SST, at a given time as a weighted linear combination of observations close in time (the five previous ones and the five subsequent ones in this case). Finding the optimal set of weights involves assuming a mathematical expression for a so-called structure function which is half of the variance of observation differences as a function of temporal lag. For the 6-hourly GDP product, structure functions for location or temperature are fitted regionally and in discrete time periods to observations. As such, the kriging implementations for either location or temperature differ because of the structure function employed, and estimates of location and SST are independent from each other in the sense that no location information is used to estimate SST and vice versa. Drifter velocities are subsequently computed from the 6-hour locations by 12-hour central differencing3. The GDP started to phase out the Argos positioning system in 2014 in favor of the Global Positioning System (GPS). This system provides locations with estimated O(10)-meter scale accuracy5 that are relayed almost instantly via the Iridium Satellite Communication system, along with sensor data, at regular temporal interval (typically hourly but not always), in contrast to Argos locations and transmissions. At the time of writing, the transition to GPS tracking and Iridium transmission is complete. A few drifters of the array were equipped with GPS receivers that transmitted their data via the Argos system5.

All transmitted locations and sensor data from Argos-tracked drifters were collected by Collecte Localisation Satellite (CLS) which relayed them, first in near-real time to the WMO GTS, and second to the GDP Data Assembly Center (DAC) located at the NOAA Atlantic Oceanographic and Meteorological Laboratory (AOML) in Miami, Florida. The Argos data received by the DAC are organized in messages, each associated with an Argos localization from a single Argos satellite pass, with a time stamp contained within the 10 to 20 min duration of that pass16,17. Each message may contain one or more sets of sensor observations, each set having its own sensor time which differs from the localization time, but is typically within plus or minus the pass’ duration. Sometimes, an Argos message does not contain a location and a location time but nevertheless contains some sets of observations. In this case, the DAC assigns the location and time of the previous message to these observations. The sensor observations are subsequently processed by the DAC as follows. In the case of a message containing multiple and distinct sets of observations, the median value of all observation times and SST observations are retained for that message. For some drifters with specific sampling configurations, observations may explicitly include an age, which is a time interval that needs to be subtracted from the nominal observation time to obtain the true time of observations. Next, the DAC reorganizes Argos data as a row file, one per drifter, with each row containing in its columns Argos location (latitude and longitude), Argos location time, observation time, and observation data (SST and other sensors). Observation data are originally in sensor count but are decoded and converted throughout this process to physical units according to sensor equations found within each drifter specification sheet. Note that because several Argos satellites can be within the view of a single drifter at the same time, it is possible for the same set of observations to be transmitted by a drifter to different satellites, and to be eventually repeated in the dataset collated by the DAC, but with different locations and location times. As a result of the disconnection between Argos localization and acquisition of observations, there is no strict temporal coordination of location data and sensor data for Argos-tracked drifters.

For the modern GDP drifters that relay their data through the Iridium Satellite System, the geographical location from a GPS receiver is treated as another sensor variable, like the sensor SST, and as such is not subject to the semi-aleatory transmission schedule like with the Argos system. The data are transmitted in Short Burst Data (SBD) format which contains a number of parameters that depends on the type of drifter and the manufacturer, but typically includes date and time and sensor data including GPS when available. GPS location times and sensor data times are therefore concurrent. If a GPS position is not available at an observation time, the previous position is reported with a recorded time delay. Depending on a drifter’s firmware, the GPS location sampling interval may differ from the sensor data sampling intervals. Drifter locations and sensor data are relayed in near-real time to the GDP Data Processing Center (DPC) located at the Lagrangian Drifter Laboratory (LDL) of the Scripps Institution of Oceanography and, from there, are sent the WMO GTS after decoding the GPS and sensor data according to manufacturers’ specification sheets. The drifter messages decoded by the LDL are also made available as text files to the DAC at AOML for inclusion in the GDP database. All data relayed by CLS and the GDP DPC, collated by the GDP DAC, form what we call here Level-0 data.

Pre-processing and initial quality controls

Out of the Level-0 data, we consider for the initial release of this new augmented product the SST observations from 20-Dec-1978 02:00:00 to 06-Jul-2020 22:59:31, which totals 285,886,818 data tuples of SST values and observation times. We apply a number of pre-processing and quality control procedures to these data to form the Level-1 data. These initial procedures, as well as all subsequent estimation methods, are applied to all drifters irrespective of their tracking and data transmission systems. As we saw for Argos drifters, the nominal sampling patterns for drifter SST and location acquisition are essentially independent. Note that at this stage, as described in the previous section, an approximate or “raw” geographical location with varied, or unknown, uncertainty is associated with a SST data point.

The GDP DPC and DAC harvest drifter deployment sheets filled out at sea by operators, as well as conduct a number of diagnostics based on location and sensor data, in order to maintain a directory file at the DAC. This directory file lists the dates, times, and locations of trajectory starts, the dates, times, and locations of trajectory ends (i.e. drifter deaths), and the estimated dates and times of drogue losses11,18. The start and death dates and times are used to truncate if needed the SST time series for data points before oceanic deployment and after oceanic death (“post-death” sensor data may exist in the transmitted data for example if a drifter had been picked up by a vessel or run aground but continued to transmit its sensor data). Next, we further determine blocks of time for which SST observations are deemed valid by applying a quality-control step that has been in place as part of the production of the 6-hourly SST dataset. The NOAA Optimum Interpolation (OI) SST V219 at monthly time steps is used to calculate a monthly climatology which is subsequently interpolated to the raw locations associated with the drifter SST observations. These interpolated values are then visually compared to drifter SST observations to determine a first and a last “good” SST observation per drifter trajectory, based on an expert human assessment. The corresponding dates and times of these two points are recorded in a dedicated master file for all drifters. Another master file is maintained by the DAC that lists periods of time for which it appears that an SST sensor failed temporarily, also based on the comparison to climatological values. These potential periods of sensor failure, and the periods before and after the first and last good points, are subsequently discarded from the SST observation time series. Note that this comparison to a climatology is not used to remove seemingly outlying individual points, but rather to determine blocks of invalid SST observations. After this stage, we remove some data records contain filling values for missing data points. Next, we find two or more SST observations existing at the same time for a single drifter. In that case, all observations are kept and will be processed for obtaining Level-1 estimates at the same times. Next, we identify a number of drifter SST time series with only a single point, which are removed, and constant-value time series which are also removed. In the end, the final Level-1 dataset consists of 197,916,695 tuples of SST and time data originating from 24,597 drifter trajectories.

As a result of the differing technologies of the data transmission systems (Argos and Iridium), of the number of different drifter manufacturers for the GDP, and of changing firmwares with time, the Level-1 dataset of time series of SST observations is heterogeneous in its sampling intervals and apparent levels of noisiness. Two-dimensional histograms of occurrences of time differences and absolute temperature differences between two subsequent data points for all trajectories (Fig. 1) generally indicate that larger absolute SST differences are found for smaller time differences, for both Iridium and Argos drifters. For Iridium drifters, time differences are concentrated around multiples of one hour or 30 min, but with deviations from these because of possible delays of GPS signal acquisitions compared to a specified schedule. The distribution of time differences for Argos drifters is more continuous but exhibits local peaks near one hour and 101 min, the latter corresponding to the orbital period for an Argos satellite5. Even when considering the distribution of the median of time differences (or sampling interval) per trajectory, the Argos drifters exhibit a much more varied set of values compared to the Iridium drifters (Fig. 2).

Fig. 1
figure 1

Distribution of time differences and absolute temperature differences between two consecutive SST observations of the Level-1 data. (a) Two-dimensional histogram for Iridium drifters, and (b) Two-dimensional histogram for Argos drifters. Only values of Δt less than 24 hours and values of ΔSST less than  40 °C are shown.

Fig. 2
figure 2

Histograms in 3-minute bins of median SST temporal sampling intervals per drifter trajectory for Level-1 data (24,597 SST time series from 4,495 Iridium drifters and 20,102 Argos drifters). Only values smaller than 10 hours are displayed. 105 Argos time series have a median sampling larger than 10 hours.

Model of SST temporal evolution

We seek to obtain SST estimates by fitting a local temporal model to the temperature data acquired along drifter trajectories. In situ observations of SST from drifting or moored platforms, as well as remote sensing observations, suggest that two types of temporal variability typically co-exist: a relatively fast evolution on a diurnal time scale, sometimes referred to as a diurnal warming, and a relatively slower background evolution. For this background evolution (also referred to as non-diurnal in the rest of this manuscript), there is a priori no expectation of a dominant physical process acting at all times and all places. As such, it is reasonable to model this evolution with a local polynomial model as an approximation of a Taylor series expansion of an unknown underlying function20. For the diurnal evolution, we follow a number of previous studies21,22,23,24 and model this evolution as the sum of cosine functions with fundamental frequency ω = 2π radians per day. In contrast to some previous studies however, our diurnal model is exactly periodic in the sense that it is locally zero-mean. A mean SST value and a possible difference of SST between the beginning and the end of a diurnal period will be both captured by the background non-diurnal polynomial model (as it will be at least of order 1). In addition, the amplitudes and phases of each of the cosine functions contributing to the diurnal model are not constant within a day, but rather vary locally since they are fitted at every time step using data within a sliding window centered on that time step. This diurnal model allows us to accommodate various environmental conditions (e.g. momentum and heat fluxes) affecting the shape of the diurnal signal in time and space as a drifter is advected by ocean currents. Note that since the diurnal SST estimate is locally zero-mean and does not represent solely a diurnal warming, the contemporaneous non-diurnal SST estimate differs from what is called a foundation temperature, that is a temperature free of diurnal temperature variability. In other words, the non-diurnal SST estimate typically contains the local mean of the SST diurnal variability.

In summary, the complete SST model is the sum of a polynomial sP of order P, and a sum sD of N cosine functions at harmonic frequencies of the diurnal frequency ω = 2π radians per day. Next, we consider that a number of drifter SST observations si, found in the temporal vicinity of a single observation sk at time tk, are generated by the process described by the equation

$${s}_{i}={s}_{m}({t}_{i};{t}_{k})+{\sigma }_{i}{\varepsilon }_{i},$$
(1)

where \({s}_{m}({t}_{i};{t}_{k})\) is the SST model and the noise model, εi, is zero-mean, has unit variance [\(E({\varepsilon }_{i})=0\), \({\rm{Var}}({\varepsilon }_{i})=1\)], and is independent of the noise at other times. The noise is locally scaled by the square root of \({\sigma }_{i}^{2}\) which is the error variance of the observations, is conditional to time ti, and will be estimated a posteriori.

The temporal evolution model is

$${s}_{m}({t}_{i};{t}_{k})={s}_{P}({t}_{i};{t}_{k})+{s}_{D}({t}_{i};{t}_{k})$$
(2)
$$=\mathop{\sum }\limits_{p=0}^{P}{s}_{p,k}{({t}_{i}-{t}_{k})}^{p}+\mathop{\sum }\limits_{n=1}^{N}{A}_{n,k}\,{\rm{\cos }}\;\left[n\omega ({t}_{i}-{t}_{k})+{\phi }_{n,k}\right]$$
(3)
$$=\mathop{\sum }\limits_{p=0}^{P}{s}_{p,k}{({t}_{i}-{t}_{k})}^{p}+\mathop{\sum }\limits_{n=1}^{N}\left[{\alpha }_{n,k}\;{\rm{\cos }}\;n\omega ({t}_{i}-{t}_{k})+{\beta }_{n,k}\,{\rm{\sin }}\;n\omega ({t}_{i}-{t}_{k})\right],$$
(4)

with

$${\alpha }_{n,k}={A}_{n,k}\,{\rm{\cos }}\;{\phi }_{n,k},$$
(5)
$${\beta }_{n,k}=-{A}_{n,k}\,{\rm{\sin }}\;{\phi }_{n,k}.$$
(6)

The last form (4) of the model shows that the P + 1 + 2N parameters of this model can be estimated by forming a linear system of equations. Ultimately, once the model parameters are estimated, the SST estimate itself at time tk is evaluated by setting t = tk in (4) to obtain:

$${\widehat{s}}_{m,k}\equiv {s}_{m}({t}_{k};{t}_{k})={s}_{0,k}+\mathop{\sum }\limits_{n=1}^{N}{\alpha }_{n,k},$$
(7)

which involves only N + 1 parameters of the P + 1 + 2N estimated parameters. The other N + P parameters nevertheless provide further physical information such as the SST tendency for the non-diurnal evolution (e.g. \({s}_{1,k}=\partial {s}_{P}({t}_{k};{t}_{k})/\partial t\) if P ≥ 1) or the phase and amplitude of the diurnal harmonics:

$${\phi }_{n,k}={\rm{\arctan }}\left(\frac{-{\beta }_{n,k}}{{\alpha }_{n,k}}\right),$$
(8)
$${A}_{n,k}=\frac{{\alpha }_{n,k}}{{\rm{\cos }}\;{\phi }_{n,k}}.$$
(9)

Ultimately, we will select P = 1 and N = 3 on the basis of the analyses of two subsets of surface drifters, as explained in the section Model selection. As explained further in the next section, the model is first fitted at all original observation times of a drifter trajectory in a iterative manner in order to gradually adjust the weight of the data in the estimation, as well as identify outlier data points. After a given number of iterations, the model is ultimately fitted once at regular, top-of-the-hour, times that do not typically coincide with the original times.

Estimation of model parameters and SST

The devised method to estimate SST continuously along a drifter trajectory is adapted from the method known as the locally weighted scatterplot smoothing or LOWESS7. This method is iterative, and thus robust to outlying data points which are commonly observed in SST time series from surface drifters (see an example in Fig. 3). Our method goes as follows. For a given SST time series from a single drifter, for each SST observation sk at time tk, we compute by weighted least squares the P + 1 + 2N parameters of the model sm that minimize

$$\mathop{\sum }\limits_{i=1}^{K}{\left[{s}_{i}-{s}_{m}({t}_{i};{t}_{k})\right]}^{2}{K}_{{h}_{k},i},$$
(10)

where \({K}_{{h}_{k},i}\), is a set of weights given by

$${K}_{{h}_{k},i}=K\left(\frac{{t}_{i}-{t}_{k}}{{h}_{k}}\right),$$
(11)

with K the tricube kernel function7:

$$K(\tau )={\left(1-| \tau {| }^{3}\right)}^{3}{I}_{\left[-1,1\right]}(\tau ),\quad {\rm{with}}\,\;{I}_{[-1,1]}(\tau )=\left\{\begin{array}{ll}1, & | \tau | \le 1\\ 0, & | \tau | > 1.\end{array}\right.$$
(12)
Fig. 3
figure 3

Time series of SST data for GDP drifter ID 55366 (WMO number 3100541). This drifter was built by Pacific Gyre and is of the Surface velocity Program (SVP) type, tracked by the Argos positioning system. The median time interval between SST observations for this drifter is about 52 min. The SST equation for this drifter is SST(°C)  =  0.05 × n - 5.00 where n is a 10-bit sensor count. This equation defines the data resolution (0.05°C) as well as the minimum value (0.05 × 0–5.00 = −5.00° C) and maximum value (0.05 × (210−1)−5.00 = 46.15° C) that should be returned by the temperature sensor, indicated by the horizontal dashed lines on this figure. The Level-1 data (sk) are indicated by gray dots. The circled dots are the data points that are ultimately down-weighed to zero by the iterative estimation method, and thus flagged as outliers (δk = 0).

In (11), hk is called the bandwidth of the kernel K, that is the half-width of the temporal window around the observation time tk within which the weights \({K}_{{h}_{k},i}\) are different from zero. The least squares calculation therefore involves practically only those data points with non-zero weights. Because the complete model sm includes a diurnal oscillation, we initially set hk = 1 day for all points, but this value is automatically and gradually increased as needed in 1-hour steps in order to include more data points to ensure that the least squares system of equations is not undetermined, up to a maximum value of 2 days. If not enough data points are available within the temporal window of maximum length, then no SST estimate is obtained. For the data selected to match the GDP hourly dataset version 1.04c (released in February 2021, with data through June 2020), fewer than 0.4% of the data points require a half-bandwidth longer than 1 day.

Using matrix notation for convenience, the minimization problem (10) can be written

$$\mathop{{\rm{\min }}}\limits_{{\boldsymbol{\beta }}}{({\bf{s}}-{\bf{X\beta }})}^{T}{\bf{W}}({\bf{s}}-{\bf{X\beta }}),$$
(13)

with solution

$$\widehat{{\boldsymbol{\beta }}}={({{\bf{X}}}^{T}{\bf{WX}})}^{-1}{{\bf{X}}}^{T}{\bf{Ws}}.$$
(14)

In (13), X is the design matrix for linear model (4):

$${\bf{X}}=\left[\begin{array}{lll}{{\bf{X}}}_{1} & {{\bf{X}}}_{2} & {{\bf{X}}}_{3}\end{array}\right],$$
(15)

with

$${{\bf{X}}}_{1}=\left[\begin{array}{llll}1 & ({t}_{1}-{t}_{k}) & \cdots & {({t}_{1}-{t}_{k})}^{P}\\ \vdots & \vdots & & \vdots \\ 1 & ({t}_{i}-{t}_{k}) & \cdots & {({t}_{i}-{t}_{k})}^{P}\\ \vdots & \vdots & & \vdots \\ 1 & ({t}_{K}-{t}_{k}) & \cdots & {({t}_{K}-{t}_{k})}^{P}\\ & & & \end{array}\right],$$
(16)
$${{\bf{X}}}_{2}=\left[\begin{array}{llll}{\rm{\cos }}\{\omega ({t}_{1}-{t}_{k})\} & {\rm{\cos }}\{2\omega ({t}_{1}-{t}_{k})\} & \cdots & {\rm{\cos }}\{N\omega ({t}_{1}-{t}_{k})\}\\ \vdots & \vdots & & \\ {\rm{\cos }}\{\omega ({t}_{i}-{t}_{k})\} & {\rm{\cos }}\{2\omega ({t}_{i}-{t}_{k})\} & \cdots & {\rm{\cos }}\{N\omega ({t}_{i}-{t}_{k})\}\\ \vdots & \vdots & & \\ {\rm{\cos }}\{\omega ({t}_{K}-{t}_{k})\} & {\rm{\cos }}\{2\omega ({t}_{K}-{t}_{k})\} & \cdots & {\rm{\cos }}\{N\omega ({t}_{K}-{t}_{k})\}\\ & & & \end{array}\right],$$
(17)
$${{\bf{X}}}_{3}=\left[\begin{array}{llll}{\rm{\sin }}\{\omega ({t}_{1}-{t}_{k})\} & {\rm{\sin }}\{2\omega ({t}_{1}-{t}_{k})\} & \cdots & {\rm{\sin }}\{N\omega ({t}_{1}-{t}_{k})\}\\ \vdots & \vdots & & \\ {\rm{\sin }}\{\omega ({t}_{i}-{t}_{k})\} & {\rm{\sin }}\{2\omega ({t}_{i}-{t}_{k})\} & \cdots & {\rm{\sin }}\{N\omega ({t}_{i}-{t}_{k})\}\\ \vdots & \vdots & & \\ {\rm{\sin }}\{\omega ({t}_{K}-{t}_{k})\} & {\rm{\sin }}\{2\omega ({t}_{K}-{t}_{k})\} & \cdots & {\rm{\sin }}\{N\omega ({t}_{K}-{t}_{k})\}\\ & & & \end{array}\right].$$
(18)

The weight matrix W is defined by

$${\bf{W}}={\rm{diag}}\left\{{K}_{{h}_{k},i}\right\},$$
(19)

and s and β are the vector of data points and the vector of dimension P + 1 + 2N of parameters to be estimated, respectively:

$${\bf{s}}=\left[\begin{array}{l}{s}_{1}\\ \vdots \\ {s}_{K}\end{array}\right],\beta =\left[\begin{array}{l}{s}_{0,k}\\ \vdots \\ {s}_{P,k}\\ {\alpha }_{1,k}\\ \vdots \\ {\alpha }_{N,k}\\ {\beta }_{1,k}\\ \vdots \\ {\beta }_{N,k}\end{array}\right].$$
(20)

Next, following the initial iteration of estimating the model parameters and calculating the corresponding SST estimates at all times tk, we consider the residuals at all observation times:

$${r}_{k}={s}_{k}-{\widehat{s}}_{m,k},$$
(21)

and compute M, the median value of the distribution of their absolute values. A set of robust weights are next calculated as

$${\delta }_{k}=B\left(\frac{{r}_{k}}{DM}\right),$$
(22)

where

$$B(t)={\left(1-| t{| }^{2}\right)}^{2}{I}_{[-1,1]}(t),$$
(23)

is the biweight kernel function7 and D is a real factor to be determined.

The next step of the method consists in iterating the weighted least squares estimation of all parameters of the model at all times tk, but this time using modified weights \({\delta }_{i}{K}_{{h}_{k},i}\) instead of \({K}_{{h}_{k},i}\) in (10). How many data points are down-weighted is dependent on the coefficient D in the denominator of (22) which is typically set to 67 but here is set to 14, as discussed in the section Model selection. The number of iterations is chosen here to be three after the initial least squares estimation without modified weights. The modified weights can effectively become zero when δk = 0, that is when the absolute value of a residual is larger than D times M for a given SST time series associated with one drifter trajectory. This implies that such data points are ultimately not used for any estimation but SST values at the corresponding time are nevertheless obtained using all available non-zero-weighted data points within the temporal window centered on any of these points. This method effectively flags as outliers some of the Level-1 SST data point like a “de-spiking” procedure would do, for example by applying a median filter3. An example of flagged outliers in a Level-1 drifter SST time series is shown in Fig. 3. One implicit assumption of using (22) to modify the weights of the data is that all residuals from a given drifter SST time series originate from a common distribution, or equivalently that the statistics of the observations are constant within a given trajectory. This assumption may be violated if an entire drifter trajectory is long enough to experience environmental condition changes, or the characteristics of the SST sensor changes in an undetected fashion. An illustration of a Level-2 estimation step is provided in Fig. 4.

Fig. 4
figure 4

Time series of SST estimates for GDP drifter ID 55366 (WMO number 3100541) between 2005/9/1 and 2005/9/10. Top panel: Black dots are the original SST data (Level-1, sk) and circles are the data points down-weighed to zero (δk = 0). The blue dots with vertical lines show the total SST estimates and their plus or minus two standard errors (\({\widehat{s}}_{m}\pm 2{\widehat{\sigma }}_{m}\)). The red dots and vertical lines show the non-diurnal SST estimates and their plus or minus two standard errors (\({\widehat{s}}_{P}\pm 2{\widehat{\sigma }}_{P}\)). The blue estimates are the sum of the red estimates and purple estimates shown in the lower panel of the figure. Lower panel: Black dots show SST data minus the non-diurnal SST estimate (\({s}_{k}-{\widehat{s}}_{P}\)). The purple dots and vertical lines show the corresponding diurnal SST estimates and their plus or minus two standard errors (\({\widehat{s}}_{D}\pm 2{\widehat{\sigma }}_{D}\)).

Finally, as the last step of the method, the SST model (4) with the same number of parameters is fitted to the data but at times tk corresponding to the top of the hour UTC (00:00, 01:00, etc.), in one iteration with weights \({\delta }_{i}{K}_{{h}_{k},i}\) where the δi were calculated prior to the last iteration for the original data times (not posterior). As before, the bandwidth is set to 1 day but is allowed to increase in increments of one hour, up to two days, to make sure the linear estimation problem is not under-determined. This last step generates the final Level-3 data product. An illustration of Level-3 estimated data is provided in Fig. 5.

Fig. 5
figure 5

Time series of continuous hourly SST estimates for GDP drifter ID 55366 (WMO number 3100541) between 2005/9/1 and 2005/9/10. Top panel: Black dots show the original SST data (Level-1, sk). The blue line and shaded region show continuously the hourly total SST estimates and twice their standard errors (\({\widehat{s}}_{m}\pm 2{\widehat{\sigma }}_{m}\)). The red line and shaded region show continuously the hourly non-diurnal SST estimates and twice their standard errors (\({\widehat{s}}_{m}\pm 2{\widehat{\sigma }}_{P}\)). The blue line is the sum of the red line and the purple line shown in the lower panel. Lower panel: Black dots show SST data minus the non-diurnal SST estimate (\({s}_{k}-{\widehat{s}}_{P}\)). The purple line and shaded region show continuously the hourly diurnal SST estimates and twice their standard errors (\({\widehat{s}}_{D}\pm 2{\widehat{\sigma }}_{D}\)).

Error variance estimates and uncertainty estimates

As part of the method, we quantify the uncertainties of the parameter estimates and thus the uncertainties of the diurnal SST estimates, of the non-diurnal SST estimates, and of the total SST estimates. Formally, the covariance matrix of the weighted least squares solution at time tk is

$${{\bf{C}}}_{\beta }\equiv {\rm{Var}}\left(\widehat{\beta }\right)={({{\bf{XW}}}^{* }{\bf{X}})}^{-1}({{\bf{X}}}^{T}{{\bf{W}}}^{* }\Sigma {{\bf{W}}}^{* }{\bf{X}}){({{\bf{XW}}}^{* }{\bf{X}})}^{-1},$$
(24)

where W* is the weight matrix containing in its diagonal the modified weights \({\delta }_{i}{K}_{{h}_{k}}({t}_{i}-{t}_{k})\) from the penultimate iteration of the least squares estimation, and Σ is the unknown covariance matrix of the observation errors from the process model (1). In order to proceed, we assume local homoscedasticity and that the errors are independent which results in \({\boldsymbol{\Sigma }}={\sigma }^{2}({t}_{k}){\bf{I}}\), where the local error variance \({\sigma }^{2}({t}_{k})\) is unknown and needs to be estimated. In the case of a local polynomial regression of order P, it is recommended20 to re-conduct a polynomial fit of order P + 2 and to estimate the error variance from the residuals of that fit. In our case, which is not a sole polynomial regression since model (2) also includes trigonometric functions, the optimal course of action is unclear. Yet, to proceed, we classically calculate a first estimate of the error variance from the normalized weighted residual sum of squares:

$${\widehat{\sigma }}_{1}^{2}({t}_{k})=\frac{{\left({\bf{s}}-{\bf{X}}\widehat{\beta }\right)}^{T}{{\bf{W}}}^{* }\left({\bf{s}}-{\bf{X}}\widehat{\beta }\right)}{{\rm{tr}}\left\{{{\bf{W}}}^{* }-{{\bf{W}}}^{* }{\bf{X}}{({{\bf{X}}}^{T}{{\bf{W}}}^{* }{\bf{X}})}^{-1}{{\bf{X}}}^{T}{{\bf{W}}}^{* }\right\}}$$
(25)
$$=\frac{{\Sigma }_{i}{\left[{s}_{i}-{\widehat{s}}_{m}({t}_{i};{t}_{k})\right]}^{2}{\delta }_{i}{K}_{{h}_{k}}({t}_{i}-{t}_{k})}{\nu }.$$
(26)

The denominator of (25), referred to as ν in (26), is the effective number of degrees of freedom for the residuals for weighted least squares20. For ordinary least squares, ν would simply be the number of data points used to calculate \(\widehat{\beta }\) minus the number of parameters of the model (P + 1 + 2N), but for weighted least squares cases, ν is smaller.

The majority of drifters from the GDP database are equipped with temperature sensors returning a bit count n used to calculate SST following the sensor equation:

$$SST=an+b,$$
(27)

where a is the resolution of the temperature sensor. As such, the Level-1 data should be rounded due to the resolution of the instrument recording. In the signal processing literature this is known as quantization, and has the effect of removing high resolution information in the data. As a result, the estimated error variance [\({\widehat{\sigma }}_{1}^{2}({t}_{k})\), (25)] should be increased to reflect the additional uncertainty created through quantization, as this information cannot be recovered. In the extreme case that the input data is the same value within a full window length then the increase to the error variance is a2/128, following from the properties of the uniform distribution. As a result, adjusting for resolution, our total error variance is

$${\widehat{\sigma }}^{2}({t}_{k})={\widehat{\sigma }}_{1}^{2}({t}_{k})+\frac{{a}^{2}}{12}.$$
(28)

This adjustment is conservative, in that the effect of resolution on the error variance will decrease as the input values have more variance8. For simplicity, we use the conservative adjustment proposed above as this ensures the reported standard errors always include the resolution effect which should not be ignored.

For about 85% of the drifters, representing 83% of the Level-2 estimates, the resolution a can be obtained from the individual specification sheets provided by the manufacturers. We identify in this way 179 different resolutions, ranging from 0.00260877 °C to 0.17 °C. The three most common resolutions are 0.01 °C, 0.05 °C, and 0.08 °C. For the remaining 15% of the drifters, some have an SST equation which is not a linear function of a sensor single bit count and the impact of the quantization error cannot be simply modeled as in (28). Some other drifters have an unknown resolution because of the lack of available metadata. For these drifters, we estimate the resolution from the data as follows: we consider the time series of absolute SST temporal difference, bin these differences in 0.001 °C bins, and assign the resolution to the most common value that is not zero. In this way, the three most commonly estimated resolutions are 0.05 °C, 0.08 °C., and 0.043 °C. This method is successful in 92% of cases when tested on the data of the drifters for which the resolution is known from the metadata. The overall distribution of all resolution values, as well as their temporal distribution, are illustrated in Fig. 6.

Fig. 6
figure 6

Distributions of drifter SST sensors resolution. (a) Temporal distribution of drifter SST resolution in the GDP database from February 1979 to July 2020. The blue points corresponds to resolution a obtained from the drifter metadata from the SST equation: SST(°C) = a × n + b where n is a bit sensor count. The red points corresponds to drifters for which the resolution is not available from the metadata and was estimated directly from the data (see text). (b) Histogram of drifter SST resolution values in 0.001 °C bins. The red bars correspond to the estimated resolution values and the gray bars correspond to all values. Note that the horizontal axis is on a log scale. The three most common resolution values in the dataset are in order: 0.05, 0.01, and 0.08 °C.

We found necessary to consider the resolution error for two reasons. First, since we have allowed our estimation algorithm to obtain a solution with as few data points as the number of model parameters to be estimated, and because of numerical precision errors, we find a small number of instances (0.33% of the Level-2 data) for which ν, and therefore the first estimated error variance \({\widehat{\sigma }}_{1}^{2}({t}_{k})\), is small and negative. These instances are resolved by adding the resolution error. Second, in some other instances (0.22% of the Level-2 results, see Fig. 7), we find that the residuals, and hence the estimated error variance and the parameter uncertainties, are locally zero within numerical precision despite an ample number of data points available for the estimation. This occurs when the Level-1 SST data does not change in value within the estimation window for reasons which are not clear. Once again, these instances are resolved by adding the resolution error, resulting in more realistic error estimates. Nevertheless, these two instances define two populations of the results that are clearly separated within a two-dimensional histogram of \({\widehat{\sigma }}_{1}^{2}({t}_{k})\) and ν (Fig. 7). As such, we can flag these results using an empirical and ad-hoc condition:

$${{\rm{\log }}}_{10}\,{[\widehat{{\sigma }_{1}^{2}}]}^{1/2} < -\frac{1}{2}{{\rm{\log }}}_{10}| \nu | -10.$$
(29)
Fig. 7
figure 7

Two-dimensional histogram of the effective degrees of freedom for the residuals (ν) and estimates of the error variance neglecting the resolution error variance [\({\widehat{\sigma }}_{1}^{2}({t}_{k})\)] for Level-2 data results. The two populations found below the black dashed line (\({{\rm{\log }}}_{10}{[{\widehat{\sigma }}_{1}^{2}]}^{1/2} < -\frac{1}{2}{{\rm{\log }}}_{10}| \nu | -10\)) correspond to failed estimations of the error variance and are flagged with quality flag 1. The left population below the dashed line (0.33% of the data) corresponds to negative estimated variance (see text). The right population below the dashed line (0.22% of the data) corresponds to uncharacteristically flat SST records leading to unrealistic near-zero estimated error variances. The peak of the distribution is found within the upper-right population for ν ≈ 22.4 and \({[{\widehat{\sigma }}_{1}^{2}({t}_{k})]}^{1/2}\approx 0.02{8}^{\circ }\)C.

The final error variance estimate \({\widehat{\sigma }}^{2}({t}_{k})\) (28) is a function of time tk and specific to a drifter because of the sensor resolution. This estimate is subsequently used to calculate an estimate of the local covariance matrix of the observations \(\widehat{{\boldsymbol{\Sigma }}}={\widehat{\sigma }}^{2}({t}_{k}){\bf{I}}\) and to calculate the covariance matrix Cβ [expression (24)].

From the expression for the SST estimate (7), its variance is

$${\sigma }_{m}^{2}\equiv {\rm{Var}}\left[{s}_{m}({t}_{k};\,{t}_{k})\right]={\rm{Var}}\left[{s}_{0,k}+{\Sigma }_{n=1}^{N}{\alpha }_{n,k}\right]$$
(30)
$$=\;{\rm{Var}}\left[{s}_{0,k}\right]+2{\rm{Cov}}\left[\left({s}_{0,k}\right),\left({\Sigma }_{n=1}^{N}{\alpha }_{n,k}\right)\right]+{\rm{Var}}\left[{\Sigma }_{n=1}^{N}{\alpha }_{n,k}\right]$$
(31)
$$=\;{\sigma }_{P}^{2}+2{\rm{Cov}}\left[\left({s}_{0,k}\right)\left({\Sigma }_{n=1}^{N}{\alpha }_{n,k}\right)\right]+{\sigma }_{D}^{2}.$$
(32)

This last expression describes how the variance of the total SST estimate (\({\sigma }_{m}^{2}\)) is the sum of the variance of the non-diurnal SST estimate (\({\sigma }_{P}^{2}\), containing one term), of the variance of the diurnal estimate (\({\sigma }_{D}^{2}\), containing N2 terms), and of 2N additional cross-covariance terms. The (N + 1)2 needed terms to estimate \({\sigma }_{m}^{2}\), \({\sigma }_{P}^{2}\), and \({\sigma }_{D}^{2}\) are extracted and summed appropriately from the calculated covariance matrix Cβ [expression (24)] at each time step. The square root of each of these three estimated variances, referred to subsequently as \({\widehat{\sigma }}_{m}\), \({\widehat{\sigma }}_{P}\), and \({\widehat{\sigma }}_{D}\), define the standard errors, or standard uncertainties, of the three SST estimates. Illustrations of estimated square roots of error variances and SST uncertainties is provided in Figs. 4, 5, and 8. These figures, and Fig. 8 in particular, illustrate that the uncertainty estimates are temporally correlated for each individual drifter. Because the estimation procedure takes place within a sliding window, one should expect correlation at least among uncertainty estimates separated in time by less than the total length of the window (or twice the bandwidth hk, typically 2 days). Additional discussions of error variance and uncertainty estimates are provided in section Interpretation of uncertainty estimates.

Fig. 8
figure 8

Time series for GDP drifter ID 55366 (WMO ID 3100541) between 2005/9/1 and 2005/9/10 of SST standard error estimates (\({\widehat{\sigma }}_{m}\)), non-diurnal SST standard error estimates (\({\widehat{\sigma }}_{P}\)), diurnal SST standard error estimates (\({\widehat{\sigma }}_{D}\)), square root of error variance estimates from residuals [\(\sqrt{{\widehat{\sigma }}_{1}^{2}}\), Eq. (25)], square root of total error variance estimates [\(\sqrt{{\widehat{\sigma }}^{2}}\), Eq. (28)], and absolute residuals (\(| {\widehat{s}}_{m}-{s}_{k}| \)). The curves for \(\sqrt{{\widehat{\sigma }}_{1}^{2}}\) and \(\sqrt{{\widehat{\sigma }}^{2}}\) are most often indistinguishable except around 09/03 and 09/09-09/10.

Model selection

In order to fit the total SST model to the data, choices need to be made for the order P of the polynomial of the non-diurnal model and the number N of harmonics of the diurnal model. We consider a total of 14 models with P varying between 0 and 3, and N between 2 and 6 (Table 2). We test and assess the performances of the models by fitting them to two limited subsets of GDP drifters as it would be computationally prohibitive to conduct tests on the entire Level-1 data.

Table 2 Table of temporal models considered for SST as a function of polynomial order and number of diurnal harmonics.

The first subset is from the Salinity Processes in the Upper Ocean Regional Study (SPURS) in the subtropical North Atlantic13,25. The drifters released as part of SPURS were manufactured by Pacific Gyre Inc. but differed from standard SVP-type drifters of the GDP10. Instead of a temperature sensor on their buoys, these drifters were equipped with an unpumped Sea-Bird Electronics SBE37-SI MicroCAT CTD placed underneath the surface buoy with its sensors located at a depth of 50 cm. The Microcat instruments were set to acquire conductivity and temperature at 30-min intervals by sampling once a minute for 5 min and averaging the values. According to the manufacturer, the initial accuracy and resolution of the temperature sensor are 0.002 °C and 0.0001 °C respectively, but the data transmitted and relayed to the DAC exhibit a resolution of 0.01 °C. For this study, we select 80 drifters which generated temperature data (considered to be SST observations) for time periods spanning between 29 and 660 days. These drifters transmitted their locations and sensor data via the Argos satellite system, including their position data from GPS receivers. These GPS data were previously used as a test set to devise the methodology being used to generate the global dataset of hourly position and velocity for the GDP5. However, here, the original Argos message data files for these drifters are re-processed to eliminate redundant and corrupted data by taking into consideration a previously ignored checksum flag indicating the integrity of Argos data transmissions. Next, the SST time series are further truncated to match the beginnings and ends of the regular hourly time series of position and velocity for these drifters, as well as truncated for their first and last good data points as diagnosed by the QC procedures of the DAC. The resulting dataset consists of 80 time series of SST at uneven temporal intervals (multiple of 30 minutes), totalling nearly 1.26 M data points over 29,018 drifter days.

The second subset of drifter SST data, hereafter referred to as the “test” subset, is built from the global database by selecting at random 14 drifters within each 10° latitude band between −70°S and 70°N with an average SST temporal sampling interval of between 50 and 70 minutes, resulting in a total of 98 individual SST time series which are further truncated in time for deployment times etc. The resulting dataset consists of 98 time series of SST at uneven temporal intervals, totalling 697,045 data points over 29,408 drifter days. These test drifters constitute a limited subset but represent a variety of drifter types deployed between years 2000 and 2019. Fifty of them are drifters with barometer (SVPB type), and 48 are standard SVP drifters. Fifty-seven of them were Iridium drifters and 41 Argos drifters. The test drifters were built by a variety of manufacturers: 18 by DBi, 19 by Metocean, 9 by Clearwater, 30 by Pacific Gyre, 18 by Scripps Institution of Oceanography, 2 by Technocean, 1 by Marlin-Yug, and 1 by NKE. Finally, the stated resolution of their SST sensors as specified by their respective specification sheets were varied: 0.01 °C for 68 of them, 0.05 °C for 20, 0.04 °C for 2, 0.043 °C for 2, 0.04329 °C for 1, 0.04343 °C for 1, 0.08 °C for one, and unknown for three of them.

We proceed to fit the 14 models listed in Table 2 to the SPURS and test subsets of drifter SST time series, and subsequently consider two statistics calculated for each time series. The first statistic is the weighted root mean square error (WRMSE) calculated from the residuals of a given fit. For this calculation, the weights are the robust weights calculated by the algorithm described previously after the penultimate iteration (that is the weights calculated before the last estimation at the original times), but with a further normalization to ensure that weights sum to one:

$${\rm{WRMSE}}={\left[\mathop{\sum }\limits_{k=1}^{M}{w}_{k}{({s}_{k}-{\widehat{s}}_{m,k})}^{2}\right]}^{1/2},$$
(33)
$${w}_{k}=\frac{{\delta }_{k}{K}_{k,k}}{{\sum }_{i=1}^{M}{\delta }_{i}{K}_{i,i}}.$$
(34)

The second statistic considered is the square root of the weighted median of the error variance estimates: after the final iteration, we consider the error variance estimates \({\widehat{\sigma }}^{2}({t}_{k})\) given by (28) and using the weights defined by (34), we calculate the weighted median defined as the value \({\widehat{\sigma }}^{2}({t}_{n})\) such that

$$\mathop{\sum }\limits_{k=1}^{n-1}{w}_{k}\le 1/2\,{\rm{and}}\;\mathop{\sum }\limits_{k=n+1}^{M}{w}_{k}\le 1/2.$$
(35)

For these two statistics, using weighted calculations effectively filters out the outlier data points diagnosed from the method (that for which wk = 0). We also tried non-weighted calculations that include all data points: besides shifting numerical values of the results in the sense of worsening performances, this did not change the relative performances of the models nor our overall conclusions and model selection choice.

We find that varying the number of parameters of either the diurnal model or the non-diurnal model affects the two statistics differently. We present the results in Fig. 9, displaying the WRMSE and the square root of the weighted median error variance, both in unit of degrees Celsius. The figures display scatter plots of the two statistics averaged over each of the subsets, along with ellipses representing the 95% confidence intervals for the means in order to illustrate the scatter of the results. Not surprisingly, the scatter of the results is relatively smaller for the SPURS subset for which the SST records have the same nominal characteristics compared to the test subset composed of heterogeneous records.

Fig. 9
figure 9

Summary statistics for the 14 models listed in Table 2 for (a) the subset of 80 drifters from the SPURS experiment and (b) the subset of 98 “test” drifters selected from the global database. Colored dots with numbers indicate the average values of the square root of the weighted median of the error variance (horizontal axis) versus the average values of the weighted root mean square error (WRMSE, vertical axis). Ellipses correspond to 95% confidence intervals across ensemble statistics. Note the different axis ranges between panels (a) and (b). The black dotted line indicates the slope-1 intercept-0 curve.

For both data sets, we find that for a fixed number of harmonics of the diurnal model, increasing the polynomial order of the non-diurnal model (going right through the columns of Table 2) reduces the error variance with little change to the WRMSE. The most dramatic reduction occurs when going from models for which P = 0 (models 1, 2, and 3) to models for which P = 1 (models 4 and higher) for which the square root of the weighted median error variance is at least approximately halved. Conversely, for a fixed order of the polynomial non-diurnal model, we find that increasing the number of harmonics (going down the rows of Table 2) decreases the WRMSE with little change to the error variance. Considering these two general tendencies together, as well as the scatter of the results as depicted by the ellipses, we find that model 5 (P = 1 and N = 3) provides a good balance between the two statistics. Further, we find that from model 5, no significant improvement is obtained for the WMRSE error variance by increasing the polynomial order from 1 to 2 (going to model 8), and no significant improvement is obtained for the error variance by increasing the number of harmonics from 3 to 4 (going to model 6). Significant improvements are obtained for both statistics by both increasing the polynomial order from 1 to 2 and the number of harmonics from 3 to 4 (going from model 5 to model 9) for the SPURS subset but not for the test subset, which is expected to be representative of a much greater fraction of the total data. We also tested models with P = 1 and N = 5,6 (models 13 and 14) but these, while reducing significantly the WRMSE from model 5, did not reduce significantly the error variance, and started to show larger error variances for the test subset. As a result, model 5 is our final choice of model to be fitted to the entire SST drifter dataset to generate the Level-2 and Level-3 datasets.

We now discuss briefly the choice of the bandwidth length [hk, (11)] and the choice of the factor D for the robust weights [see (22)]. The sensitivity of the results to these choices is summarized in Fig. 10 for model 5 only. The choice of hk technically implies that data points within a 2hk window centered on the estimation time are considered [Eqs. (10) and (11)]. Yet, because the weighing window is not uniform but a tricube kernel, the effective number of degrees of freedom used for each estimation is closer to the number of data points one would find in a uniform window of length hk. Here, our choice hk = 1 day is based on observations that the characteristics of diurnal SST oscillations change on a daily time scale26. Yet, we examine the summary statistics for model 5 for hk varying between 0.25 and 1.25 days at 0.25 day interval for the test subset (Fig. 10). We find that decreasing hk to less than 1 day significantly decreases the error variance yet does not decrease the WRMSE, and thus does not overall improve the performances of model 5. In contrast, we find that increasing hk to 1.25 days significantly increases the WRMSE and increases the error variance. The results for the SPURS subset are similar (not shown). These overall results therefore suggest that hk = 1 day is an appropriate choice for the bandwidth.

Fig. 10
figure 10

Panel (a) Summary statistics for the 14 models for the test set of drifters as in Fig. 9(b). Here are also shown the results of varying the D factor in Eq. (22) from 4 to 20 in increments of 2 for model 5 (black dots and ellipses) and the results of varying the bandwidth parameter hk from 0.25 days to 1.25 days in increments of 0.25 (white dots and ellipses). The white ellipse around the black dot for model 5 corresponds to hk = 1 and D = 14 as in Fig. 9. Panel (b) Ensemble averages of the fraction of data points not labeled as outliers as a function of factor D. The shading indicates plus or minus one standard deviation around the ensemble averages. The vertical dotted line indicates D = 14 ultimately chosen here.

The choice of the factor D in the denominator of the biweight kernel for calculating the robust weights [Eq. (22)] effectively sets the threshold for labeling data points as outliers. Our final choice of D = 14 is compared to alternatively choosing D between 4 and 20 at intervals of 2. The sensitivity of the summary statistics to the value of D is displayed in Fig. 10 for the test subset. We find that varying D has a modest impact on the performances of model 5 and only for D less than 6 does model 5 exhibits significantly better WRMSE, but no better standard deviation error. We also consider the ensemble average of the fraction of data points not labeled as outliers as a function of the choice of D (Fig. 10), and find that this fraction starts to decrease strongly as D decreases from 8. In the original LOWESS method7, D is set to 6 without justification, and such a choice in our case would result in around 10% of the data points labeled as outliers. In the end, we settled on D = 14 which results in only between 1% and 4% of the data points being labeled as outliers, but maintains approximately the performance of model 5 compared to D = 6. The results are similar for the SPURS subset (not shown).

Quality indication

The Level-3 data product is intended to provide SST estimates contemporaneous to the estimated positions and velocities of drifters at hourly top-of-the-hour times from the hourly GDP dataset version 1.04c5, which with SST now included we shall call version 2.00. Since the sampling of SST sensors onboard drifters can be independent from the positioning, it sometimes occurs that our methodology is able to provide an SST estimate at times when no location estimate is available. Since there is little use for an SST estimate with no associated location estimate, these are not included in the Level-3 data product (see Table 1).

We devise three different quality indication flag schemes, one for each component of SST (non-diurnal, diurnal, total), with flag values ranging from 0 (worst) to 5 (best), with the intention of characterizing an increasing level of correctness. For all three schemes, when no SST estimate could be obtained from the methodology (for example for lack of enough Level-1 data within the sliding temporal window), or when SST data was simply not transmitted by a drifter (as an example because of a faulty sensor), the estimate is assigned quality flag 0 (and the NetCDF file contains a standard filling value). When an SST estimate could be obtained but not an SST uncertainty estimate, the estimate is assigned quality flag 1 (and the NetCDF file contains a filling value for the uncertainty estimate).

For higher flag values, the schemes for the non-diurnal SST estimates and for the total SST estimates are the same, as illustrated in Fig. 11. When an SST estimate and an uncertainty estimate both exist, the quality flag is based on the relative position of the interval formed by the SST estimate plus or minus its standard error estimate with respect to the [−2,50]°C range of physically-acceptable temperature values27. If the estimated interval is completely contained within this range, the assigned quality flag is the highest, at 5. If one or two end points of the interval are located outside of the range but the SST estimate is inside the range, the assigned quality flag is 4. If the SST estimate is outside the range but one of the end point of the interval is within the range, the assigned quality flag is 3. Finally, if the interval is located completely outside the physical range, then the quality flag is 2. For analyses of total and non-diurnal SST, only estimates with quality flag 4 and 5 should be utilized. Estimates with quality flag 1, 2, and 3 are suspect and should not be used. They suggest that their corresponding true SST value are located outside of the physically-acceptable range. These estimates are nevertheless retained in the dataset with their distinct flags for traceability of the methods.

Fig. 11
figure 11

Illustration of the quality flag determination scheme for time series of non-diurnal and total SST estimates. Flag 0 indicates both missing SST estimates (\(\widehat{s}\)) and uncertainty estimates (\(\widehat{\sigma }\)). Flag 1 indicates a missing or failed uncertainty estimate only. Flags 2, 3, 4, and 5 are based on the values of \(\widehat{s}\) and \(\widehat{s}\pm \widehat{\sigma }\) with respect to the range of acceptable temperature values ([−2,50]°C).

The quality flag scheme for the diurnal SST estimates differs from the scheme described above because a diurnal SST estimate is an anomaly around zero for which a range of physically plausible values is not straightforward to define. A climatology of SST diurnal variability24 constructed by fitting a model to temperature observations from drifters within zonal bands, by seasons, and by environmental categories (clear or cloudy sky, wind speed) provides amplitude of SST diurnal anomalies no larger than 2.4 °C (from observations) or 0.689 °C (from modeled values). Yet, diurnal warming as large as 6.6 °C has been be detected in coastal regions28. As a consequence, rather than defining here an acceptable amplitude threshold for diurnal SST anomalies, we consider three criteria for the quality flag of a diurnal SST estimate:

  1. 1.

    Is the absolute value of the diurnal anomaly estimate strictly larger than its standard error estimate?

  2. 2.

    Is the standard error estimate for the diurnal estimate smaller than 1 °C?

  3. 3.

    Were more than 24 Level-1 data points used to obtain an estimate?

As illustrated in Fig. 12, criteria (1) and (2) define specific sub-regions in the parameter space defined by the absolute value of the diurnal estimates and the value of the standard error estimate of the diurnal estimate. In contrast, criterion (3) does not strictly defines a sub-region in that parameter space, but rather an average region which can be visualized by mapping in that space the average number of data points used for the estimations. On average, estimates obtained with 24 data points or more are found in the parameter space for which the diurnal anomaly estimates are smaller than 10 °C and the standard error of the estimates are most often smaller than 1 °C. In conclusion, we use the three criteria listed above to define self-exclusive quality flags as follows: a quality flag 5 indicates that all criteria (1), (2), and (3) are fulfilled; a quality flag 4 indicates that (1) and (2) are fulfilled but not (3); a quality flag 3 indicates that (1) is fulfilled but not (2) nor (3); and quality flag 2 indicates that none are fulfilled. For analyses of diurnal SST, we recommend utilizing only estimates with quality flag 5. Further, a user may want to discard diurnal SST estimates for which the corresponding total and non-diurnal SST estimates both have quality flag less than 4. This occurs for less than 0.03% of the diurnal SST estimates with quality flag 5.

Fig. 12
figure 12

(a) Two-dimensional histogram of absolute diurnal SST estimates (\({\widehat{s}}_{D}\)) and standard error estimates for diurnal estimates (\({\widehat{\sigma }}_{D}\)) for Level-3 data. The dotted line corresponds to the slope 1 line (\({\widehat{\sigma }}_{D}={\widehat{s}}_{D}\)). (b) Average number of Level-1 data points used for estimating SST mapped onto the two-dimensional distribution shown in the top panel. The black contour corresponds to 24 data points on average.

The inventory of Level-3 estimates for each type (total, non-diurnal, and diurnal) and each quality flag class (0 to 5) is provided in Table 3. The number of position and velocity estimates for the GDP hourly dataset version 2.00 is 165,754,333 from 17,324 individual drifter trajectories. Of this target number, 95.59% are with a quality flag 5 for the total SST estimates, 95.60% are with a quality flag 5 for the non-diurnal SST estimates, but only 75.58% are with quality flag 5 for the diurnal SST estimates. Note that estimates of total SST and non-diurnal SST with quality flag 3 or 2 are outside the physically-acceptable range of values and should be used and interpreted with extreme caution. We assessed that diurnal SST estimates with quality flag 5 are plausible but we could not conclude the same for lesser quality flags.

Table 3 Inventory of quality flags for Level-3 estimates.

Interpretation of uncertainty estimates

In order to interpret our uncertainty estimates, we examine the distribution of the residuals of all model fits, normalized by their associated estimates of error standard deviations. This constitutes an assessment of the distribution of the error term εi of the process model (1):

$${\widehat{\varepsilon }}_{i}=\frac{{s}_{k}-{\widehat{s}}_{m,k}}{\widehat{\sigma }({t}_{k})}.$$
(36)

The results are shown in Fig. 13 for both the SPURS and test drifter subsets. For both sets, the distributions are never Gaussian for any of the models. The distributions are nearly centered but exhibit central peaks more narrow than Gaussian distributions with the same means and standard deviations (only comparisons to model 5 are shown). We observe that increasing the number of harmonics of the diurnal oscillation model consistently renders the peak of the residual distribution to be narrower and higher, and the tails to be slightly lighter. The opposite is true when increasing the order of the non-diurnal polynomial model, while still being non-Gaussian in the sense of exhibiting a higher kurtosis. We find that a t location-scale distribution (also known as non-standardized Student’s t distribution), previously used to model Argos location errors5, is a better fit to the observed distributions than Gaussian distributions, yet still does not completely capture their shapes (not shown). An implication of the non-gaussianity of the normalized residuals is that the error term εi of the process model (1) is also not Gaussian-distributed. As a result, a classic least squares estimation of the parameters of the models would tend to give too much weight to outliers in the data. Fortunately, we are applying an iterative least squares estimation method based on the LOWESS7 which is expected to temper such outliers, but the exact impact on the estimation is difficult to quantify here.

Fig. 13
figure 13

Probability density function (PDF) estimates of normalized residuals following the fitting of the 14 models for (a) the subset of 80 drifters from the SPURS experiment and (b) the subset of 98 “test” drifters selected from the global database. The PDFs are estimated using an Epanechnikov kernel20 at 0.01 resolution using only residuals with non-zero final robust weights. In each panel, the thin dashed black lines indicate the 16-th and 84-th percentiles of the distribution of residuals for model 5 whereas the thick dashed black lines indicate the 2.5-th and 97.5-th percentiles. The gray curve in each panel corresponds to the fit to a normal distribution for the residuals for model 5 and the gray dashed vertical lines indicate the mean plus or minus 1 (thin line) and 1.96 (thick line) standard deviation and therefore correspond to the 2.5-, 16-, 84-, and 97.5-th percentiles of that fitted normal distribution.

Nevertheless, a further implication of the non-gaussianity is that caution should be taken when interpreting the standard errors for the SST estimates described above: whereas for Gaussian-distributed errors one standard error can be used to calculate a 68% confidence interval for an estimate, in our case, a standard error represents an interval encompassing more probable values of the true unknown values of a quantity, thus a more conservative confidence interval. As shown in Fig. 13 for model 5, the 16-th and 84-th percentiles, encompassing 68% of the residual distribution, define an interval narrower than the interval defined by plus or minus one sample standard deviation around the sample mean. Plus or minus one standard deviation actually encompasses approximately 78% of the distribution of the residuals for model 5 (and approximately the same percentage for the other models, not shown). In other words, the standard error for our estimates can be interpreted as being representative of a 78% confidence interval rather than a 68% confidence interval. In contrast, the 2.5-th and 97.5-th percentiles, encompassing 95% of the distribution, define an interval slightly wider but close to the one defined by plus or minus 1.96 sample standard deviation around the sample mean, which encompasses approximately 94% of the distribution of the residuals for model 5 (and approximately the same percentage for the other models, not shown). In conclusion, considering 1.96 standard errors to quantify uncertainty in this case happens to represent an approximate 95% confidence interval, as would be the case if the errors were Gaussian-distributed. Note that for the Level-3 hourly product (Table 1), the uncertainty estimates provided for location and velocity is 95% confidence intervals, whereas for SST estimates the uncertainty estimates are standard error estimates.

Global characteristics of error variance estimates and uncertainty estimates

In Fig. 14, we examine the distribution of error variance estimates from residuals [\({\widehat{\sigma }}_{1}^{2}\), Eq. (25)] not including data points for which the error estimation failed, and the distribution of total error variance estimates incorporating the resolution error variance [\({\widehat{\sigma }}^{2}\), Eq. (28)] for Level-3 estimates. We show the distributions for Level-3 estimates only because the ones for Level-2 estimates are extremely similar. We also report some statistics rounded to the nearest 0.001 in Table 4, which differ by no more than 0.001 °C between Level-2 and Level-3 estimates. Based on the distributions in Fig. 14, we assess that the mode value, or most probable value, of the square root of the error variance estimates from residuals is 0.020 °C for drogued drifters, but is 50% larger at 0.030 °C for undrogued drifters. Over all data, the mode value is 0.026 °C. Further, we assess that the mode value of the square root of the error variance estimates incorporating the resolution error variance is 0.031 °C for drogued drifters and 0.036 °C for undrogued drifters. Over all data, the mode value of the total error variance estimates is 0.033 °C. Median values of each of these variables are typically higher by a few 1/1000-th of a degree (See Table 4): the overall median value of the square root of the total error variance estimates is 0.036 °C. The distribution of the total error variance is however not unimodal (Fig. 14, right) because of the resolution error variance is dominated by a few discrete values (Fig. 6).

Fig. 14
figure 14

Distribution of error variance estimates from residuals [left, \({\widehat{\sigma }}_{1}^{2}\), Eq. (25)] and total error variance estimates incorporating the resolution error [right, \({\widehat{\sigma }}^{2}\), Eq. (28)] for Level-3 data. The histograms of the decimal logarithm of the square root of the estimates are displayed. Mode values at the peak of the distributions and 50-th percentile values are listed in Table 4.

Table 4 Statistics of error variance estimates from residuals [\({\widehat{\sigma }}_{1}^{2}\), Eq. (25)] and final error variance estimates incorporating the resolution error [\({\widehat{\sigma }}^{2}\), Eq. (28)].

The error variance estimates are however very heterogeneous in space, which is revealed when these estimates are averaged in half-degree geographical bins (Fig. 15, top). The spatial distribution of the error variance estimates is clearly related to ocean surface dynamics: it is found to be the highest in regions of high surface kinetic energy such as western boundary currents and equatorial regions29, but is also relatively high at mid-latitudes within regions of high wind stress variability. Largest mean error variance estimates are found on average within the Agulhas Retroflection region in the Indian Ocean and north of the Gulf Stream in the North Atlantic Ocean. The geographical distribution of error variance estimates suggests that the temporal evolution model (2) might be improved by allowing the order of the polynomial sP to change spatially, in order to reduce the error variance estimates in these regions. Such a spatially-dependent model is beyond the scope of the present work but may be investigated in the future.

Fig. 15
figure 15

Top: Square root of total error variance estimates \(\left[{(\widehat{{\sigma }^{2}})}^{1/2}\right]\) averaged in half-degree spatial bins. Middle: Non-diurnal SST uncertainty estimates (\({\widehat{\sigma }}_{P}\)) averaged in half-degree spatial bins. Bottom: diurnal SST uncertainty estimates (\({\hat{\sigma }}_{D}\)) averaged in half-degree spatial bins. The maps are obtained with Level-3 data with quality flags 5 for all estimates. In all three panels the units are decimal logarithm of degrees Celsius.

Whereas an error variance estimate provides a local quantification of the magnitude of the background noise, an uncertainty estimate for SST [Eq. (24)] provides a statistical characterization of the distance between a SST estimate and the true, but unknown, SST value. In Fig. 16, we examine the distributions of standard error estimates for the non-diurnal SST estimates, the diurnal SST estimates, and the total SST estimate for Level-3 data, and we report overall statistics rounded to the nearest 0.001 in Table 5. The results for Level-2 data are extremely similar and their distributions are not shown. Overall, the uncertainty estimates for diurnal SST estimates are a factor of 2 to 3 times larger than the uncertainty estimates for non-diurnal estimates. In turn, the uncertainty estimates for total SST estimates are larger than the uncertainty estimates for diurnal SST estimates but by no more than a few 1/1000-th of a degree. The most probable value of the uncertainty estimate for non-diurnal SST estimate is 0.006 °C for all data. For drogued drifters only it is 0.005 °C, and for undrogued drifters only it is 0.006 °C. The most probable value of the uncertainty estimate for diurnal SST estimate is 0.016 °C for all data. For drogued drifters only it is 0.015 °C, and for undrogued drifters only it is 0.017 °C. The most probable value of the uncertainty estimate for total SST estimate is 0.018 °C for all data. For drogued drifters only it is slightly smaller at 0.016 °C, and slightly higher for undrogued drifters at 0.019 °C. The spatial distribution of the uncertainty estimates follow closely the spatial distribution of the error variance estimates (Fig. 15, middle and bottom). The spatial distribution of the uncertainty estimates for total SST estimates is not shown as it is extremely similar to the spatial distribution of the uncertainty estimates for diurnal SST estimates. The fact that the maps of averaged uncertainty estimates exhibit spatially coherent features related to geophysical variability implies that uncertainty estimates between drifters may also be correlated in relation to the geographical distance separating them.

Fig. 16
figure 16

Probability density function (PDF) estimates of standard error estimates for the non-diurnal (\({\widehat{\sigma }}_{P}\)), diurnal (\({\widehat{\sigma }}_{D}\)), and total (\({\widehat{\sigma }}_{m}\)), SST estimates, separated between data from drogued and undrogued drifters. The normalized histograms of the decimal logarithm of the estimates are displayed. Mode values at the peak of the distributions and 50-th percentile values are listed in Table 5.

Table 5 Statistics of SST standard uncertainty estimates.

The overall statistics of uncertainty estimates for SST estimates (Table 5) are an order of magnitude smaller than previously estimated measurement uncertainties for drifting buoys9. Such uncertainty estimates range between 0.1 °C and 0.7 °C and are typically based on analyses of collocated SST observations from drifting buoys, ships, and satellites30,31,32,33. These uncertainty estimates encompass not only the instrumental error of the drifter SST sensors but also the spatial and temporal differences between the different measurands that are targeted by the different observational platforms, such as a SST satellite’s ground footprint versus a pointwise drifter measurement. Here, our uncertainty estimates represent only random sources of instrumental and communication noise, as well as sub-hourly unresolved geophysical variability. These uncertainties are on the order of 1/100-th of a Kelvin, rather than on the order of a 1/10-th of a Kelvin, because they are not estimated from a single observation, but rather are benefiting from time series of observations that typically provide around 22 effective degrees of freedom over a 2-day observational estimation window (see mode value of ν in Fig. 7). As a result, sources of instrumental and geophysical random noise are averaged downward for our estimates. What our uncertainty estimates are not able to capture is any drifter-specific original bias of a SST sensor, and which may, or not, have evolved in time since the times of manufacture and deployment (i.e. a sensor drift)34.

Data Records

The Level-3 estimates of total SST, non-diurnal SST, diurnal SST, and each of their respective standard error estimates, along with quality flag variables for each of the three SST estimates, are distributed as part of the hourly drifter dataset of the GDP5, now in its version 2.00 with the addition of these SST estimates. The dataset, assembled as a contiguous ragged array in a single file, is officially available from the NOAA National Center for Environmental Information (NCEI) as a data collection6 called “Hourly location, current velocity, and temperature collected from Global Drifter Program drifters world-wide” and accessible at https://doi.org/10.25921/x46c-3620. The original and future releases (also called accession) of this collection can be accessed and downloaded through the “Lineage” tab of the landing page. Future releases of the dataset, scheduled twice a year, will add estimates of position, velocity, and SST variables as they become available from the GDP.

The data are also available via the ERDDAP server of the NOAA Observing System Monitoring Center at http://osmc.noaa.gov/erddap/tabledap/gdp_hourly_velocities.html where subsets of the data can be selected according to a number of temporal and spatial criteria.

Table 6 lists the names of the variables included in the NetCDF files, including the new SST-related variables. Usage of this SST data product in combination with any of the position and/or velocity data5 for release 2.00 or subsequent releases must cite this present paper as well as the original 2016 paper describing the hourly position and velocity dataset (Elipot et al.5).

Table 6 Data record information for the Level-3 data product and details of variables in drifter NetCDF files.

Technical Validation

The spatial and temporal distributions of the Level-3 hourly SST estimates are displayed in Fig. 17 in order to verify the technical quality of the product. The map of spatial data density is the result of historical deployments and the efforts of the GDP to fulfill the requirement of the array, and of the patterns of the convergence and divergence of the near-surface oceanic circulation2,11. The temporal histogram of SST estimates closely follows the distribution of hourly position and velocity estimates, showing the maturity of the array at the beginning of 2006 as well as the drop in the amount of data between 2011 and 2014 because of unfortunately numerous short-lived instruments. To support the technical validation of the new SST dataset, we compute the mean and standard deviation of SST estimates globally within 0.5° × 0.5° geographical bins (Fig. 18). The mean total SST map exhibits the expected meridional gradients as well as the west-east asymmetries within each ocean basins. As also expected, the standard deviation map of total SST estimates exhibits larger values within regions of higher surface kinetic energy such as in western boundary current regions35 but also within the mid-latitude regions where high variability of air-sea fluxes is expected to enhance SST variance. The map of diurnal SST standard deviation exhibits different patterns resulting from the competing effects of the spatial pattern of solar heating increasing diurnal variability, and the spatial pattern of wind speed decreasing diurnal variability. At the scales displayed here, the maps of mean and standard deviation of non-diurnal SST estimates (not shown) are indistinguishable from the maps for the total SST estimates. The map of mean diurnal SST estimates (not shown) is approximately zero everywhere as expected from the model of temporal SST evolution used to derive this product.

Fig. 17
figure 17

Top panel: Spatial distribution of Level-3 total SST estimates expressed as a density per (50 km)2 in half-degree spatial bins. Only quality flag 5 data are counted. Bottom panel: Level-3 total (\({\widehat{s}}_{m}\)) and diurnal (\({\widehat{s}}_{D}\)) SST estimates temporal distribution in 10-day bins from 03-Oct-1987 13:00:00 to 30-Jun-2020 23:00:00. The temporal distribution of the matching position and velocity hourly dataset5 release 2.00 is also displayed. The temporal distribution of non-diurnal SST estimates is not displayed as it would be indistinguishable from the distribution of the total SST estimates.

Fig. 18
figure 18

Top: Level-3 total SST estimates averaged in half-degree spatial bins. Middle: Level-3 total SST estimates standard deviation in half-degree spatial bins. Bottom: Level-3 diurnal SST estimates standard deviation in half-degree spatial bins. Only quality flag 5 estimates for each respective variable is used to produce these maps.

We proceed to verify the consistency of the drifter hourly SST estimates against the gridded, multi-sensor, interpolated SST Climate Change Initiative data product (ESA SST CCI Analysis v2.1, hereafter CCI), available from 1981 to 201636. The CCI product is generated by combining measurements of infrared radiance from two suites of radiometers on multiple satellites, eventually aggregated and gap-filled on a daily 0.05° grid27. The CCI product provides SST estimates representative of daily mean values and at a depth of 20 cm. These estimates are obtained by converting the instantaneous skin SST captured by satellite measurements at various times throughout a day to the closest of 10:30 or 22:20 local mean solar times, using a one-dimensional turbulence closure model driven by atmospheric fluxes. Arguably, this conversion makes the CCI estimates comparable to the drifter SST estimates because of the depth conversion, yet differences are expected to remain between the two estimates because of the drifter SST sub-daily temporal variability mostly associated with the diurnal cycle. We conduct the comparison by interpolating bilinearly in space the gridded values of the CCI product for a given day onto all drifter hourly geographical locations of that same day (from 00:00 to 23:00). SST estimates derived from satellite measurements are not independent from in situ measurements, including from drifters, because these are generally used for calibration purposes. Yet, the ESA SST CCI Analysis v2.1 is supposed to achieve a “high degree of independence” from in situ observations from 1995 onward27, so that we limit our comparison to the 1995 to 2016 time period. We choose to compare only drifter total and non-diurnal SST estimates with quality flag 5 to successfully interpolated values of the CCI product, that is when no land pixel or pixel with non-zero sea ice concentration were involved in the bilinear interpolation. Two-dimensional histograms of drifter SST estimates versus their corresponding CCI interpolated values for nearly 122 M data pairs (Fig. 19a,b) suggest qualitatively a very good consistency between the two datasets. Next, we examine difference statistics between the two datasets in order to conduct a more quantitative comparison. We interpolate the evaluated standard uncertainty of the CCI product at the drifter locations in order to assess the difference statistics.

Fig. 19
figure 19

Comparison between the drifter hourly SST dataset and the ESA SST CCI Analysis v2.1 product27,36. (a) Two-dimensional histogram in 0.05 °C bins of drifter total SST estimates (\({\widehat{s}}_{m}\)) versus the interpolated CCI values. (b) Same as in a) but for the drifter non-diurnal SST estimates (\({\widehat{s}}_{P}\)). (c) Normalized histograms of absolute differences between drifter estimates and CCI values. (d) Average differences between non-diurnal SST estimates (\({\widehat{s}}_{P}\)) and CCI values as a function of time and latitude.

We calculate the difference statistics \({d}_{m}={\widehat{s}}_{m}-{s}_{cci}\) and \({d}_{P}={\widehat{s}}_{P}-{s}_{cci}\), where scci is the interpolated CCI SST value, and the uncertainties corresponding to these differences, i.e. \({\left({\widehat{\sigma }}_{m}^{2}+{\sigma }_{cci}^{2}\right)}^{1/2}\) and \({\left({\widehat{\sigma }}_{P}^{2}+{\sigma }_{cci}^{2}\right)}^{1/2}\), where σcci is the interpolated CCI SST uncertainty value. Over all estimates, we find that the mean and median values of dm are 0.048 °C and 0.031 °C, respectively, and that the mean and median values of dP are 0.047 °C and 0.037 °C, respectively, suggesting a global positive bias of the drifter estimates compared to the CCI estimates. We next conduct a 95% confidence level two-tailed test by determining the number of instances for which the absolute values of the difference statistics are smaller than 1.96 times the difference uncertainties. We thus find that 85.5% of the differences with the drifter total SST estimates are not statistically significant, whereas we find that 88.1% of the differences with the drifter non-diurnal SST estimates are not significant, overall suggesting a high level of consistency between the two products. Because the CCI product does not resolve diurnal variability, it is expected that the consistency between the two datasets would be better for the drifter non-diurnal estimates than for the drifter total estimates, as evidenced here by the two-tailed test results. However, the proportion of statistically significant differences for the non-diurnal drifter SST estimates is still over twice the expectation from a 95% confidence level test, at 11.9% compared to 5%. This excess proportion may be due to the incorrect assumption of Gaussian-distributed errors made in our two-tailed test where heavier tailed errors are expected in practice (see section Interpretation of uncertainty estimates and Fig. 14), stochastic variability unresolved by the uncertainty estimates of either data product, loss of spatial resolution of the CCI analysis product because of its mapping, or a global bias as revealed by the mean difference values reported above. The most probable difference of values between the two products, as indicated by the maximum of the distribution of the absolute difference statistics (Fig. 19c) is 0.250 °C with the drifter total SST estimates, and 0.223 °C with the drifter non-diurnal SST estimates. These mode values are roughly consistent with a linear sum of the stated global uncertainty for the CCI product27 (0.18 °C), plus the typical uncertainty values of the drifter estimates (0.018 °C and 0.007 °C, see Table 5), plus potential global biases (0.048 °C and 0.047 °C), amounting to 0.246 °C and 0.234 °C for the total SST estimates and the non-diurnal SST estimates, respectively.

While it is beyond the scope of this paper to systematically investigate the potential sources of differences between our drifter hourly SST dataset and the CCI product (or other satellite-based products), we further examine the differences between the drifter non-diurnal SST estimates and the CCI values as a function of latitude and time in 10-day intervals from 1996 to 2016 (Fig. 19d). This analysis is consistent with a similar analysis conducted previously27, but provides more details and is performed with our Level-3 hourly drifter product. Our results show that the drifter non-diurnal estimates generally exhibit a positive bias compared to the CCI product, except between approximately 15° and 45°, N or S, where the differences exhibit alternating signs with an annual periodicity propagating poleward. These propagating differences might be related to inadequacies of the turbulent closure model used to convert satellite-measured skin SST to CCI depth SST, or to the inability of representing accurately seasonal processes in either the model or the forcing fields of the model. We also examine the dependency of differences on local mean solar time (Fig. 20). We find that the differences between drifter total SST estimates and CCI values as a function of local mean solar time follow a distribution pattern consistent with the typical SST diurnal cycle24: the drifter total SST estimates generally capture a higher temperature between the hours of 10:30 and 22:30, but generally capture a lower temperature between 22:30 and 10:30 the next day (panel a). In contrast, the distribution of differences between the drifter non-diurnal SST estimates and CCI values do not appreciably exhibit a dependency on local mean solar time, as expected, but only an overall positive bias (panel b).

Fig. 20
figure 20

(a) Two-dimensional histogram of drifter total SST estimates minus CCI values (\({\widehat{s}}_{m}\)-CCI) versus local mean solar time. (b) Same as in (a) but for drifter non-diurnal SST estimates (\({\widehat{s}}_{P}\)-CCI). In both panels the vertical dotted lines indicate 10:30 and 22:30 local mean solar times.

Usage Notes

In the NetCDF file, all SST estimates are provided to three decimal places, with the last digit rounded towards the nearest 0.001. Total SST estimates are the sum of non-diurnal SST estimates and diurnal SST estimates but because of rounding, discrepancies exist within the NetCDF files for about 44% of values between the numerical value the user will read for the total SST value and the value of the sum of the non-diurnal and diurnal SST values.

The uncertainty estimates are also provided to three decimal places but with the last digit rounded “up” (i.e. towards infinity) to the nearest 0.001. The reason for the rounding up of the uncertainty estimates is to prevent reporting null uncertainties for 13,502 non-diurnal SST estimates for which the calculated uncertainty is smaller than 0.001 but larger than 0.0001. Rounding up uncertainties is acceptable as this provides more conservative uncertainties but only increasing their values by typically 6% for the non-diurnal SST uncertainties, and by typically 2% for the diurnal and total SST uncertainties.

Instructions on how to read the data file, as well as examples of typical Lagrangian analyses, can be found in a publicly-accessible Python-based Jupyter Notebook at https://github.com/Cloud-Drift/earthcube-meeting-2022.