Background & Summary

Remote sensing, referring to the acquisition of information about the Earth’s surface through satellite imagery, has become a powerful tool for monitoring the environment and predicting risks associated with environmental changes1,2,3,4. From a plethora of applications, remotely sensed data have been used to detect landscape change5,6,7,8, assess biodiversity9, monitor carbon emissions10,11, predict infectious diseases12,13,14, and track marine coasts15. Marine ecosystems, however, have been studied with less intensity than terrestrial ecosystems due, in part, to data limitations. A limitation in the use of global-level remotely sensed data is how time-consuming it proves to be, given that sometimes complex data compilation, curation, standardization, and storage may require high-performance computational facilities16,17. An open-access, free-of-cost database of global ocean conditions is instrumental in advancing our understanding of coastal phenomena12,15.

A significant benefit of satellite-derived information is the historical archives of data2,10,12. Technological advances and innovative design have resulted in new generations of satellite sensors that monitor marine environments, such as the Moderate Resolution Imaging Spectroradiometer (MODIS). MODIS sensors are part of the National Aeronautics and Space Administration’s Earth Observing System onboard the Terra and Aqua satellites and were designed to provide measurements of global dynamics of terrestrial, freshwater, and marine ecosystems18,19,20. MODIS provides the longest standing observational marine time series data, given that both the Aqua and Terra satellites have been in orbit since the early 2000s, and it provides a larger set of marine variables for potential evaluation at the same spatial and temporal scale18. Nevertheless, there are other enhanced satellite instruments21, such as the Along Track Scanning Radiometer22, Suomi National Polar-orbiting Partnership23, Visible Infrared Imaging Radiometer Suite24,25, and Sentinel26, which offer opportunities for future multi-sensor marine variables.

Out of the possible marine variables derived from observations of MODIS, sea surface temperature (SST) and Chlorophyll-a (Chlo-a) have the potential to increase our understanding of abiotic (e.g., temperature) and biotic (e.g., primary productivity) ocean conditions4,20. SST measured by MODIS infrared radiometers is also referred to as the skin temperature of the ocean. This is because the radiance measured by infrared radiometers originates in the surface thermal skin layer of the ocean and not the water below as measured by in situ thermometers27. SST provides fundamental information on the global climate systems, and it is an essential parameter in weather prediction28. Chlo-a is a proxy for understanding fluctuations in algae and pigmented bacteria as it can elucidate photosynthetic activity in coastal systems4,20,29. The near-surface concentration of Chlo-a is calculated using an empirical relationship derived from in situ measurements, and the implementation of the standard O’Reilly band ratio OCx (e.g., OC3M, for the MODIS sensor) algorithm merged with the color index algorithm of Hu et al.30,31. SST and Chlo-a have been crucial in studies to reconstruct environmental phenomena, such as Vibrio cholerae emergence13,32,33, algae blooms29,34,35, El Niño and La Niña dynamics36, and coral bleaching37.

Satellite-derive data have many limitations given their sensitivity to absorption of solar isolation, heat exchange with the atmosphere, and sub-surface turbulence. Nevertheless, since these conditions are known and common, validation and uncertainty are estimated relative to in situ buoys to correct final datasets38,39,40. Satellite-derived data provide an opportunity to analyze large study areas during extended periods, at the cost of limiting the information to surface level. Complementary approaches may include the addition of more oceanic and atmospheric observations like bathymetry, wind direction, and wind speed1. We compiled remotely sensed data of monthly SST and Chlo-a from the exclusive economic zone (EEZ) of coastal areas globally for a 18-year period (2003–2020). Data were used to generate summary statistics at yearly and monthly composites. Code is included to update the database as data are released. This database can be downloaded freely through Figshare41.

Methods

This section describes the procedures used to generate the individual data records that comprise the SST and Chlo-a databases. Data retrieval and analysis performed during the development of the database were executed using the statistical software R42. The SST and Chlo-a databases were developed in four stages: (a) data procurement, (b) preparation, (c) processing, and (d) analysis. The first two stages were associated with input data, while the third stage was applied specific methods to construct the core of each database. The fourth stage included the statistical analyses of the data. The methodological stages are summarized in Fig. 1 and described in detail below.

Fig. 1
figure 1

Workflow diagram. (a) Remotely-sensed data were downloaded from the NASA ERDDAP server in the form of NetCDF files. (b) Data were then transformed into a raster object. (c) Data were then cropped and masked to the exclusive economic zone and imported as GeoTIFF. (d) Data were analyzed to include statistical analyses and exported as raster files.

Data procurement

The database is based on satellite observations derived from the MODIS satellite. The Terra and Aqua satellites have been orbiting around the Earth since their launch in 1999 and 2002, respectively, obtaining data of Earth’s surface every one to two days at three spatial resolutions (250, 500, 1000 m) and 36 spectral bands (from 0.405 to 14.385 µm). From the available atmospheric and oceanic observations made available from NASA’s Aqua Spacecraft, Sea Surface Temperature (SST) in °C and Chlorophyll-a (Chlo-a) in mg*m−3 were selected since they summarize major physical and biological phenomena. SST and Chlo-a are available at a temporal resolution of 1-day, 8-day, and monthly composites and a spatial resolution of ~4 km (Table 1).

Table 1 Data specifications for MODIS remotely-sensed data.

SST and Chlo-a, among other environmental variables, can be accessed through National Oceanic and Atmospheric Administration’s (NOAA) Coastal Watch Environmental Research Division (ERD) Environmental Research Division Data Access Protocol (ERDDAP) data server, also known as the NOAA’s Coastal Watch. NOAA’s Coastal Watch is a program that provides timely access to near-real-time satellite data to monitor, restore, and manage coastal ocean resources, and the ERDDAP Data Server supports manual downloads through a web application and remote downloads from any computer program (e.g., MATLAB, R, JSONP, Python) of both gridded and tabular data43.

Data downloading

The remote request to the ERDDAP Data Server relies on the creation of specially formed URLs to query the server for a specific database. A URL consists of a root, a target, and a constraint expression43. To procure the inputs needed to assemble this database especially formed URLs were created through a programming algorithm in R (Auxiliary Materials44).

The root or base URLs that provided the location of the gridded database were obtained from the ERDDAP griddap documentation webpage (https://coastwatch.pfeg.noaa.gov/erddap/griddap/documentation.html) and remained constant in all requests for a specific database.

The target is the equivalent to the unique identifier or data set ID previously assigned by the ERDDAP (https://coastwatch.pfeg.noaa.gov/erddap/griddap), in conjunction with a specific data file type extension, for this study .nc was selected producing NetCDF-3 binary files with COARDS/CF/ACDD metadata. NetCDF, Network Common Data Form, files are recommended when using software tools to analyze geospatial data as they provide multidimensional scientific data in a standardized manner (https://coastwatch.pfeg.noaa.gov/erddap/griddap/documentation.html)45,46.

The constraint expression (or query) helped define the parameters, which correspond to the study period and spatial coverage. Regarding the first parameter, the study period comprised all available observations from the MODIS instrument aboard the Aqua satellite (i.e., monthly composites from 2003 to 2020). The spatial coverage was defined by the minimum and maximum latitude (i.e., 89.98°S to 89.98°N) and longitude (i.e., 179.98°W to 179.98°E) from the original satellite image for global coverage.

Data preparation

Data within the NetCDF files were imported into R using the RNetCDF package47. A NetCDF object contains a list of at least four attributes: time, longitude, latitude, and the values of the variable being measured (i.e., SST and Chlo-a). The attribute corresponding to the specific variable being measured was extracted from the NetCDF object and transformed into a raster object using the RNetCDF and raster packages in R48. A raster object consists of a matrix of cells (i.e., pixels) organized into rows and columns where each cell contains a value representing information (i.e., temperature and pigmentation) and the metadata corresponding to spatial information of object49.

As the last piece of the data preparation process, the extent of the raster was verified to match that of the original satellite data. Extent was set to latitude and longitude of 89.98°S to 89.98°N and 179.98°W to 179.98°E, respectively. The coordinate reference system (CRS) was defined to be relative to the WGS84 datum for easy manipulation by the end user.

Data processing

A significant feature of the SST and Chlo-a databases is the addition of the segmentation by the world’s exclusive economic zone (EEZ). EEZ is a marine zone within 200 nautical miles from a country’s coastline where each country claims jurisdiction for economic activities50. Given the oceanographic nature of the data, focusing on the 200-mile buffer of EEZ provides a more comprehensive explanation of oceanic changes, with the potential to promote the development of ocean planning initiatives directly influencing human settlements on the coasts. To represent the EEZ, a geospatial vector file in shapefile format was constructed by delimiting a buffer of ~200 miles off coastlines globally.

The EEZ regions were defined using the functions crop and mask from the raster48 package. The function mask allowed to place the area of interest (i.e., the EEZ) on top of each monthly raster, assigning no value to cells outside of the area of interest, while the function crop ensured that each raster matched the extent of that of the area of interest (Fig. 2). The core database included 408 individual rasters cropped and masked to the EEZ of each country.

Fig. 2
figure 2

Data masking and cropping. Example of masking and cropping a raster. (a) Raster from original NetCDF. (b) Economic Exclusive Zone (solid lines). (c) Raster after crop and mask.

Statistical analysis

Complementary to the core database, data were treated as an m by n matrix, where m represents the years and n represents the months and stacked in two distinct ways (1) in yearly composites and (2) monthly composites.

$${\sum }_{j=1}^{n}\,{x}_{ij}\;{\rm{for}}\;i=1,\ldots ,\;m$$
(1)
$${\sum }_{i=1}^{m}\,{x}_{ij}\;{\rm{for}}\;j=1,\ldots ,\;n$$
(2)

We created the annual and monthly stacks by using stack function in the raster package48. The mean, range, maximum, minimum, and standard deviation values were estimated for annual and monthly SST and Chlo-a. We obtained a total of 90 rasters for the yearly composites (18 years, five different statistics) and 60 rasters for the monthly summaries (12 months, five different statistics).

Data Records

Final data are provided in the form of GeoTIFFs for the EEZs boundaries and statistical analysis results41. Data can be downloaded based on annually, monthly, or as summary composites of the 18-years period. Data can also be updated using the code included in the Auxiliary Material in Figshare44.

Technical Validation

Remotely sensed environmental observations from the MODIS instrument, including SST and Chlo-a, have been validated profusely by the scientific community against a number of models and in situ measurements51,52,53,54,55,56,57,58 and used in a diverse set of studies13,14,19,59,60,61,62,63,64,65,66,67. For instance, validation of the SST observation uses accurate ship-based infrared radiometers and differing and moored buoys with thermometers a meter of depth38,56,57. NASA’s standard processing and distribution of the SST products are performed using software developed by the Ocean Biology Processing Group18. SST products are validated internally by NASA using a collocated matchup database of in situ observations that are collected within 30 minutes of an overpass and 10 km of a pixel. MODIS SST observations represent the thermal skin layer of the ocean, which is <1 mm thick and is cooler than the underlying water due to vertical heat flux68,69. At night or when wind speeds are greater than ~6 m/s, the relationship between the skin temperature and the subsurface are nearly equal. It is under these conditions that validation and uncertainty estimates relative to sub-surface in situ buoys are typically reported20,38. The estimation vs. observation relationship, however, can be very variable under conditions of low wind speeds and reduced sub-surface turbulence21,70. Furthermore, NASA MODIS uses a collection of cloud classification algorithms to indicate when a pixel corresponds to clear sky conditions (i.e., no cloud coverage). The most recent cloud-classification method is the Alternating Decision Tress71. Other SST observations validations tests include a regional ice test, where reflectance thresholds are determined using the Sentinel-2 MSI calibrated reflectance72 and correction of dust contamination73.

MODIS Chlo-a observations are derived from the O’Reilly OC3M algorithm and the Hu color index30,31. The algorithm is calculated using an empirical relationship from in situ measurements and remote sensing reflectance in the blue-to-green region of the visible spectrum. Level 3 MODIS data may provide biased minima and maxima values during errors in the observation that, for example, has some cloud contamination or sunlight affecting the value captured by the sensor. Due to potential atmospheric contamination some regions could have a limited number of observations from which to estimate the monthly values, which increases uncertainty. There is an estimated ± 35% nominal uncertainty related to the OC3M algorithm used to derive the global Chlo-a product. Nevertheless, error could increase in optically complex waters like those present in coastal areas74,75.

We performed a data validation procedure comparing MODIS observation of SST and Chlo-a against gold-standard sensors. More specifically, we compared MODIS data against SST data from Sentinel-376 during the year 2020. We found that data from MODIS and Sentinel-3 were statistically indistinguishable with a Pearson correlation coefficient of r = 0.99 for the annual mean, minimum, and maximum composites (R2 = 0.99, p < 0.05; Supplementary Fig. S1). Additionally, Chlo-a data were evaluated by comparing MODIS data against SeaWiFS30 observations for the year 2010, when the SeaWiFS satellite ended operations. We found that MODIS Chlo-a data were significantly correlated with SeaWiFS Chlo-a data but with less strength than for SST evaluations. More specifically, correlation was r = 0.83 (R2 = 0.67, p < 0.05) for the mean, r = 0.71 (R2 = 0.53, p < 0.05) for the maximum, and r = 0.76 (R2 = 0.52, p < 0.05) for the minimum Chlo-a composites (Fig. S2). Together, these results suggest that MODIS data have a robust representation of environmental conditions in global coastal waters, at least when compared against gold-standard datasets of SST and Chlo-a.

Usage Notes

The proposed use of this dataset is for coarse-scale, regional or global-level studies of coastal environmental conditions. Fine-scale assessments of SST and Chlo-a are warranted to improve accuracy and detail of these variables for local-level applications. The data can be used to identify anomalies for SST and Chlo-a at local, regional, and global levels. The example demonstrates SST and Chlo-a data explorations in tropical and temperate localities, identifying patterns along time (Fig. 3). Areas in the mid-Atlantic region of the United States show an increase in mean SST during the month of June to October (Fig. 3a), while areas in the subtropics of the Americas (i.e., Ecuador and Colombia) reveal cooler temperatures during the same period (Fig. 3b). Additional exploration of the data in tropical and subtropical zones of different latitude reveal that Chlo-a increases from September to December (Fig. 4b). Contrarily, in the tropics, Chlo-a concentration increases between March and May (Fig. 4a).

Fig. 3
figure 3

Sea surface temperature mean monthly values from 2003–2020. (a) Temperate zone monthly averages between the years 2003–2020 (east coast of the United States). (b) Subtropical zone monthly averages between the years 2003–2020 (coast of Chile).

Fig. 4
figure 4

Chlorophyll-a mean monthly values from 2003–2020. (a) Tropical zone monthly averages between the years 2003–2020 (coast of Ecuador and Colombia). (b) Subtropical zone monthly averages between the years 2003–2020 (coast of Chile).