Stratifying ocean sampling globally and with depth to account for environmental variability

With increasing depth, the ocean is less sampled for physical, chemical and biological variables. Using the Global Marine Environmental Datasets (GMED) and Ecological Marine Units (EMUs), we show that spatial variation in environmental variables decreases with depth. This is also the case over temporal scales because seasonal change, surface weather conditions, and biological activity are highest in shallow depths. A stratified sampling approach to ocean sampling is therefore proposed whereby deeper environments, both pelagic and benthic, would be sampled with relatively lower spatial and temporal resolutions. Sampling should combine measurements of physical and chemical parameters with biological species distributions, even though species identification is difficult to automate. Species distribution data are essential to infer ecosystem structure and function from environmental data. We conclude that a globally comprehensive, stratification-based ocean sampling program would be both scientifically justifiable and cost-effective.

sea surface and the sea bottom are key ocean boundaries with different roles in ecosystem function, which we contrast here.
The processes of gas exchange at the ocean surface, photosynthetic productivity in the underlying epipelagic zone and associated nutrient, oxygen and carbon dioxide concentrations 22 , carbon dioxide release from respiration in the deeper twilight or mesopelagic zone, and consumption of oxygen creating large low-oxygen layers in mid-depths, are of increased interest regarding measuring how much carbon is deposited into ocean sediments 23 . These analyses will need to include estimates of particle settlement, and contributions from larger materials, such as large animals, plants, and natural and artificial debris; and how benthic animals consume, bury and re-suspend such material 24 . However, this does not mean that all depths in the ocean need equal sampling effort. Here, we illustrate the spatial variation in physical and chemical variables with depth in the ocean derived from two new open access online resources, Ecological Marine Units (EMU), and the Global Marine Environmental Datasets (GMED) (see Methods for details).

Depth gradients
At least in the water column, environmental variation decreased rapidly with depth, as illustrated by temperature, oxygen and nitrate concentrations, and current velocity (Fig. 1). An exception to this generally uniform decline is noticed between 4,000 to 5,000 m (Fig. 1). This depth represents the deepest points of the Mediterranean Sea and the Gulf of Mexico which evidently are warmer and have higher oxygen and nutrient concentrations than the open oceans. At regional and local scales topographic variation and slope may influence variation in water conditions. Thus, to better capture the increased variation in environmental data from shallow waters, sampling needs to occur on a finer spatial scale at those depths and within different sea areas. This is also the case over temporal scales because seasonal change, surface weather conditions, and biological activity are highest in shallow depths.
The average, standard deviation, and maximum current speed decreased with depth ( Fig. 1). Because the standard deviation was highly correlated with the mean (in contrast to the situation with other environmental variables), the coefficient of variation (CV) was calculated. The CV was c. 150% of the mean between 100 m to 2000 m indicating very variable velocities at these depths. The lower CV shallower than 100 m and greater than 2,500 m indicates more uniformly high and low velocities at these depths respectively.

Surface versus seabed
Most variables differ strikingly between the surface and sea bottom (Fig. 2). The primary plant nutrients, nitrogen, phosphate and silicate are much higher near the seabed than sea surface, but temperature is lower, and salinity similar (Fig. 3). Median oxygen is also similar at around 5.0 to 4.8 ml l −1 reflecting that areas of high and low oxygen can occur in both shallow and deep waters. Moreover, the strong spatial correlations between variables within their depth zone can aid estimation of variables where field recordings may be lacking.
Although the variability of water column parameters is less in the deep than shallow sea (Figs 1, 2 and 3), variation in seabed slope shows a different pattern (Fig. 4). It is highest in the deepest ocean areas but these occupy a small area and volume. Another indicator that the overall oceanic environment is more homogenous with depth is species distribution. The depth distributions and geographic ranges of marine species increase with depth reflecting environmental homogeneity, and low temperature and productivity in the deep-sea 25 . This results in a lower spatial diversity (beta diversity), species richness and endemicity in the deep-sea 26 .

3D framework
The spatial and depth (  EMU numbers with depths reflects the decreasing environmental variability in deeper waters. To get an accurate representation of the state and trends in ocean biodiversity, physical, chemical and biological data need to be collected in each EMU. The EMU were clustered hierarchically based on the similarity of six environmental variables (see Methods, Fig. 6, Supplementary Material Figure S2). All EMU were found significantly different (P < 0.05, SIMPROF test in PRIMER-E 27 except EMU 5 and 22. Cluster analysis (also using Bray-Curtis and group average on normalised data) showed the relative significance of the variables in distinguishing EMU. It found that depth was the most important and then respectively, salinity, silicate, temperature, oxygen, and equally nitrate and phosphate ( Figure S2).

Discussion
We thus propose that a spatially and temporally stratified sampling strategy will be most cost effective for exploration in the deep-sea. In accordance with decreasing environmental variability, less samples will be necessary with depth for improved characterization of the deeper water column. The EMU provide a 3D framework up to 5,500 m depth to stratify ocean sampling. However, one caveat in the use of the EMU is that their relationship to species distributions and abundance awaits more detailed analysis. It is possible that they could be aggregated, such as at a higher level on the hierarchy in Fig. 6, or need to be more finely divided, to capture spatial and temporal trends on biodiversity. For example, while EMU with a similar environment exist in different parts of the world, their species composition will vary due to geographic isolation. Contiguous EMU, such as within the Baltic, Black and Caspian Seas, may contain the same species which have adapted to the varying environmental conditions. In addition to analysis of the biological applicability of the EMU, mapping EMU with additional environmental variables and more in situ data will produce a more accurate framework. However, while additional data are available for the sea surface (e.g., in GMED), they are not for depth.
Although global maps of ocean geomorphology are available 28 , and there is a growing catalogue of seabed composition, such as rock, compacted sediment, and soft muds with promising methods to use these data to model wider spatial scales 29 , there remains a great deal of uncertainty about substratum composition, texture and depth on a global scale 30,27 . However, particle flux data are becoming available and considerable data can be available at national and regional scales (e.g 31,32 ). Ocean sampling and observation thus needs to emphasise the surface and sea bottom environments where physical, chemical and biological variability and biodiversity are highest. To some extent, annual averages capture some of the environmental variation, but significant seabed disturbances can be episodic and extreme events may have significant long-term effects on biodiversity at all depths 33 . Thus the optimal frequency of sampling over time also merits further assessment. It is likely that this could also be stratified based on how variable environmental and biological parameters vary over time. While satellites, acoustic methods, and other sensors are most cost-effective for collecting data, the more challenging observation of species must be emphasised 34 . Last et al. 14 emphasised that species endemicity was an essential part of any classification designed to aid biological resource management. Knowing the distribution and . Ocean depth (m) and slope (degrees). Red is shallower and higher slope, blue is deeper and flatter slope, respectively. Because colour scales are relative values, actual median (horizontal line), 95 percentile (box), and range (vertical line) are provided as box and whisker plots (format as in Fig. 3). The graph shows how seabed slope, as both mean (narrow red line) and CV (dotted line) are low over the largest area (wide line, thousands Km 2 ) and volume (large dotted line, thousands Km 3 ) of the ocean between 3,000 m to 6,000 m (data from 2 ). Maps from GMED. dynamics of species is fundamental to sustainable use of biodiversity, including fisheries and fish food, and also because of the biological effects on the carbon cycle and other bio-chemical processes. Furthermore, the distribution of species signals longer-term environmental conditions. Efforts to associate species distribution data with distinct abiotic environments like EMUs will further illuminate the relationship between environmental drivers and species distributions. A global classification of marine biogeographic realms based on species endemicity is now available 28 . As expected from our knowledge of the environmental variation and productivity, it found lower species' endemicity and thus fewer realms in open ocean and deep-sea environments than coastal. This could be  (1,000 m). Pink colours represent warmer, and blue colder, EMU (see 34,35 for more details).
cross-matched with EMU 35,36 seascapes 8 , geomorphological units 30 , and other environmental regions to provide a fully integrated system for marine management and monitoring.
Ocean exploration benefits from international collaboration and publication of data 37,38 , as well as data products such as EMUs and GMED. The marine community has a good track record in collaboration and data management 39 , having established the Argo floats programme, a complete inventory of all marine species (World Register of Marine Species 40 ), an open access database on marine species distributions (Ocean Biogeographic Information System), and partnering in oceanography 41 . The Group on Earth Observations (GEO), supported by over 100 countries, provides a world brokerage for collaboration in the ocean sciences 42 and has initiated a Marine Biodiversity Observation Network (MBON) 43 . This synergy of field efforts will improve the quality and cost efficiency of data collection and management through mutual exchange of know-how and resources. The benefits will include a new understanding of ocean ecosystems that will inform sustainable resource use and government policies.

Methods
The 'Ecological Marine Units' (EMU) provided the first three-dimensional (3D) partitioning of the ocean based on environmental variables 37,38 . The EMU represent 37 physically and chemically distinct volumetric regions in the ocean that were objectively derived from a non-supervised clustering of ocean environmental data. The variables clustered were from NOAA's World Ocean Atlas (WoA); namely, 57 year averages of temperature, salinity, oxygen concentration, oxidised nitrogen (~nitrate), phosphate, and silicate [44][45][46][47] . Prior to analysis using Euclidean distance and group average clustering, the data were normalised by the mean value for each variable being subtracted and divided by their standard deviation so each variable had equal weight. Their lower extent is 5,500 m due to data availability. A depth of 5,500 m is sufficient to include much of the global seabed because the average depth of the ocean is approximately 3,400 m although it extends to 11,000 m 39 . This illustrates that a priority for ocean science is to provide not only more variables in 3D, but to extend them to the seabed everywhere. Using spatial data interpolation techniques, the EMU have also been attributed with current velocity data from a global model simulation hindcast representing the climatological mean for the period 2000-2012 48,49 .
The Global Marine Environment Datasets (GMED) resource is an open-access online compendium of most global scale marine data in a standardised spatial resolution 50 , and includes variables representing both sea surface and near seabed conditions. As the source for near seabed data is also WoA this is limited to about 5,500 m depth. As with the EMUs, GMED seeks to make already existing data more accessible to non-specialists, such as educators, biologists and ecologists.