A database of zooplankton biomass in Australian marine waters

Zooplankton biomass data have been collected in Australian waters since the 1930s, yet most datasets have been unavailable to the research community. We have searched archives, scanned the primary and grey literature, and contacted researchers, to collate 49187 records of marine zooplankton biomass from waters around Australia (0–60°S, 110–160°E). Many of these datasets are relatively small, but when combined, they provide >85 years of zooplankton biomass data for Australian waters from 1932 to the present. Data have been standardised and all available metadata included. We have lodged this dataset with the Australian Ocean Data Network, allowing full public access. The Australian Zooplankton Biomass Database will be valuable for global change studies, research assessing trophic linkages, and for initialising and assessing biogeochemical and ecosystem models of lower trophic levels.


Methods
Data held in the Australian Zooplankton Biomass Database have been collated from literature, active and retired researchers, consultancies, archives and databases. Only project data with the relevant corresponding metadata regarding collection location, date and methods have been included. Samples have been collected on research and commercial vessels in coastal and oceanic waters across a range of depths from the tropics to the sub-Antarctic, with estuarine waters excluded. Samples for zooplankton biomass measurement were collected in one of three ways in the current dataset. First, using towed nets 5,11,12 , with a variety of mesh sizes and diameters. Second, using the CPR 13 , which has an aperture of 1.61 cm 2 . Last, by using optical counting instruments (e.g. the OPC or LOPC) 14,15 , which were towed in situ and measure size and number of zooplankton via attenuation of a light or a laser beam.
The collection strategy (sampling method, tow direction and time of day) used by individual projects has an important influence on the biomass of zooplankton collected. Many zooplankton vertically migrate diurnally, and larger ones can swim sufficiently to evade collection. This dataset includes samples collected by nets using different tow directions (vertical, horizontal and oblique, Fig. 3a). Vertical net sampling can mitigate effects of vertical migration if the entire water column is sampled such as in shallow coastal and shelf waters, but in oceanic locations vertical net drops are often stopped prior to the bottom, typically at about 200 m, missing deeper zooplankton. Oblique net tows sample diagonally through the water column between specified depths, and are often used to target larger, faster-swimming zooplankton. Horizontal net tows can be taken at the surface or at specific depths and are often used to target a specific feature of the water column (e.g. the deep chlorophyll maximum, the bottom of the mixed layer or an acoustically detected swarm) or to understand vertical migration. Samples in the dataset have been collected throughout the day and night and collection times are included where available.  www.nature.com/scientificdata www.nature.com/scientificdata/ We have generated the variable: day_night, in the database to determine whether the sample was collected in the day or night to aid analysis of diurnal migration. This was estimated using the crepuscule function in the maptools package in R 16 , which compares the local time of the sample with the times of dawn and dusk for the date and latitude. Where bathymetric data were directly measured by the project it is provided by variable: bot-tom_depth_measured. To provide bathymetrical data for each record in the database, we determined the bottom depth based on latitude and longitude. Bathymetric depth estimates were obtained from the ETOPO1 bathymetry and topography database developed using an one Arc-Minute Global Relief Model 17 using the Marmap package in R 18 , as the variable: bottom_depth_ETOPO_estimate.
The size of zooplankton targeted in net tows, whether micro-, meso-, or macro-zooplankton, is dependent upon the mesh size used, the diameter of the net, and the direction the net is towed, and the tow speed 19 . Finer-mesh nets generally collect more zooplankton and thus measure higher biomass because smaller species and more life stages are captured. However, finer-mesh nets (e.g. 100 µm) are more prone to clogging by phytoplankton and consequently can only be towed for short distances and at slow speeds, therefore sampling a smaller water volume. These smaller water volumes are appropriate for sampling microzooplankton and mesozooplankton such as copepods, but not ideal for sampling the rarer and larger macrozooplankton. This can be compensated  Table 1 for project details using project id. www.nature.com/scientificdata www.nature.com/scientificdata/ for by increasing the filtering area of the net 20,21 . Coarser-mesh nets (e.g. 500 µm), often with larger diameters can be towed at higher speeds to capture larger, more-motile zooplankton such as euphausiids and amphipods, but miss the small zooplankton [22][23][24][25] . Coarse mesh nets are often towed obliquely and deployed in deep waters to target the larger zooplankton that can be strong vertical migrators which swim to the surface at night and sink deeper during the day. There is no one ideal net or mesh size to capture all zooplankton sizes.
Many net mesh sizes exist in the project datasets ranging from 73-550 µm mesh, Fig. 3b. Net types and dimensions also vary considerably among projects. Some nets can open and close to target particular depths, and some vertical nets (e.g. ring nets) are deployed free-fall (sampling going down) or hauled upwards (sampling going up). The various sampling gears and measurement methods used require that care should be taken when comparing biomass values across projects, while gear and methods are usually consistent within each project. In particular, mesh size is a major determinant of the magnitude of the biomass measurement. For example, McKinnon 26 found the biomass of the 73 µm mesh fraction of the zooplankton was 2.3 times greater than the 350 µm mesh fraction collected in waters of the Great Barrier Reef and 3.6 times greater in waters off the Kimberley coast. The database includes the total zooplankton biomass as recorded by the individual projects, this may include sporadically high abundances of gelatinous species. For further information on how each project dealt with this component of the plankton, please consult the relevant project citation.
Biomass calculations require the volume of water filtered during the zooplankton collection; the volume is the product of the distance travelled and net mouth area. This can be obtained by a combination of the following variables: tow duration, calibrated flowmeter values, tow speed and/or sampling depth. Hence, a variety of calculations exist to determine the volume of water sampled 2,27-30 . The collection method and biomass measurement are included for inter-project comparisons (Online-only Table 1). For the Australian Zooplankton Biomass Database, the initial zooplankton biomass values were all converted into standard units of µg Carbon per litre (µg C/L) using accepted calculations 2,28,31-35 for the zooplankton dry mass, wet mass, displacement volume or settled volume. The frequency of use of the different biomass measurement methods varies over time, Fig. 4.
Probably the most important effect of gear on the magnitude of zooplankton biomass is mesh size 25,36,37 . Comparisons of zooplankton biomass collected by different nets and samplers were summarised by Skjoldal 38 . Certainly the traditional standard 200 μm mesh net was used more in Australia in the past, but is now considered inappropriate for capturing the smaller zooplankton typical of oligotrophic tropical waters in much of the region 26,39 . In the analysis of the global COPEPOD dataset [40][41][42][43][44] , they found 333 μm mesh to be the most commonly used mesh size, so a factor was derived to convert all the other biomass records to 333 µm-equivalent values to enable comparison among datasets. It was noted that each mesh size did not offer a complete geographic coverage and that the use of 333 μm mesh was absent in their Southern Ocean data 33,[40][41][42][43][44] . The mesh size in the Australian Zooplankton Biomass Database is highly variable and varies spatially as well as temporally (Fig. 3b). No attempt is made in this database to convert to a standardised mesh size and the mesh size is therefore included as a factor to be interpreted by the data users or used in statistical models.
In addition to the net-collected data, this database also contains 2256 zooplankton biomass records from CPR samples from on-going the Integrated Marine Observing System (IMOS) project (P_597). The CPR is towed primarily by "ships of opportunity" (commercial vessels plying their trading routes) as well as research vessels, at speeds of 10 to 25 knots and at ~10 m depth for distances up to 450 nautical miles per tow. Methods of counting and processing data are described by Richardson et al. 13 1930 1940 1950 1960 1970 1980 1990 2000 2010 2020 1930 1940 1950 1960 1970 1980 1990  www.nature.com/scientificdata www.nature.com/scientificdata/ the phytoplankton and zooplankton washed of the silk and filtered through a 75 μm filter, although the biomass is usually dominated by zooplankton. Further, the CPR underestimates zooplankton abundance and biomass compared with net-collected data sets 13,45,46 . Although conversions between CPR abundance (not biomass) and other sampling methods have been used before 47 , these are most likely species-and area-specific 13,48 . CPR data included in this dataset can therefore be used by themselves as they are internally consistent, but they are a poor measure of absolute abundance or biomass 13,48 .
Zooplankton biomass can also be estimated from measurements of plankton body-size by optical instruments. In this database, we include measurements from both the OPC 49 and LOPC 15 (P_1135). The optical instruments measure the change in light attenuance as particles pass in front of their light emitting diode (OPC) or laser (LOPC) light source. Changes in light attenuance are recorded as an Equivalent Spherical Diameter (ESD) which is the diameter of each particle, assuming each to be spherical in shape. As the many of the zooplankton (e.g. copepods) are the shape of an oblate spheroid (Length:Width:Depth ratio of 3:1:1) 49 , we use the cross-sectional area of the sphere to calculate the equivalent cross-sectional area of an oblate spheroid, giving an improved estimate of the length, width and depth measurements of the particle. These measurements, along with the density of water (1 g/mL), are used to calculate a biovolume and then biomass of each particle. The use of these instruments in oceanographic studies provides a standard procedure to rapidly count and size zooplankton at a fine spatial (metres) and temporal (0.5 seconds) resolution. However, there are limitations such as the lack of taxonomic information and the influence of sediment and marine snow 50 . The OPC and LOPC data in this dataset were collected from the RV Southern Surveyor and RV Investigator between the surface and a maximum depth of 300 m using a modified SeaSoar and Triaxus (MacArtney, Denmark). Each vertical cast was split into 50 m increments, and the summed biomass between ESDs of 300 μm and 12 mm reported 51,52 . As the OPC/LOPC data are derived from a measurement of individual size (rather than a direct biomass measurement) these data are not converted to a carbon equivalent and should be analysed separately unless standardised.
A  Table 1). Our data set contains >8000 additional net collected records of zooplankton biomass for the Australian region.
Unfortunately, while some of the earliest research cruises in the Australian region sampled zooplankton, they did not quantitatively measure the biomass 53 63,64 ; NAGA cruise S11 (cruises S1-S10 collected biomass but cruise S11 in Australian waters did not) 65 . The historical dataset with biomass data from Discovery from 1932 to 1934 66 is included from COPEPOD along with datasets from 1956-2002. Details on individual projects included from COPEPOD are available from the links provided in Online-only Table 1.

Data Records
In this database 6 , each data record belongs to a project (e.g. P_599 is the IMOS National Reference Stations dataset) and represents the total zooplankton biomass at a certain point in space and time and has been given a unique record identification number: P_(project_id) _(record_id). A project is defined as a set of data records that have been analysed together, usually as a cruise or study with the same sampling and processing methods. Metadata ascribed to a project relates to all the data records within that project (Online-only Table 1). Each sample within that project has a unique sample_id. The sample_id for each record is composed of a string of fields from the original data set to maintain traceability to the original source. Database fields for each data record are: latitude, longitude, sample date and time (local and UMT), depth (minimum, maximum, bottom_depth_measured and bottom_depth_ETOPO_ estimate), biomass (value, measurement units and method) as in the original project, biomass converted to carbon (µgC/m 3 ), net/gear type, mesh size, tow direction (vertical, horizontal, oblique) and day/night.
Most of the projects have ceased, and the final project dataset is collated for this database. However, the IMOS National Reference Stations (P_599) and Continuous Plankton Recorder Survey (P_597) datasets are ongoing long-term IMOS projects. Data are continually updated and available through the AODN separately. These two surveys from 2009 to the present, currently contribute 2988 records to the database and this figure will continue to increase.

technical Validation
Zooplankton biomass has been sampled using a variety of methods and measured in many ways. The biomass measurement method is recorded as the original measurement used by each project: biomass calculated as dry/ wet mass or carbon; or biovolume calculated as displacement volume or settled volume. While projects are internally consistent, if the original values are to be used for analyses across projects, attention to the units of measurement is required as these vary (e.g. mg/m 3 and g/m 3 ). All biomass values (except from the optical dataset, P_1135) have been converted to a carbon equivalent (µg C/L) to allow inter-project comparison, but it is possible that the conversions (developed elsewhere in the world) do not hold true in some circumstances (e.g. high volumes of gelatinous specimens). If using multiple datasets, users are encouraged to build their own statistical models using the original projects as fixed or random factors. (2020) 7:297 | https://doi.org/10.1038/s41597-020-00625-9 www.nature.com/scientificdata www.nature.com/scientificdata/ Original data records with missing location and/or sampling dates have been excluded from the dataset. Some projects are missing data in some fields such as minimum and maximum depths, tow direction and/or net type. Using 1/6 th degree squares, the bottom_depth_ETOPO_estimate values from the ETOPO1 data is quite coarse and may be inaccurate for sample locations with rapidly changing bottom topography. For some records close to land, the ETOPO1 bathymetry could not be resolved for this dataset or was inaccurately calculated (e.g. Darwin Harbour samples: P_4). This is 125 records for the net data and 700 for the optical data; depth could be estimated from nautical charts if required. At 1/6 th degree squares, the ETOPO1 data is quite coarse and may be inaccurate for areas with rapidly changing bottom topography.
Day or night was calculated for each data record that had a time of collection, the records with no time have a blank. There are also some projects which had no time of collection provided but did note whether the sample was collected in day or night, so they also have the day_night data.
The optical plankton dataset (P_1135) provides biomass estimates from ESDs of 300 μm upwards. It has a collection aperture, but minimum organism size "sampled" is set by what is resolved by the instrument. The optical plankton biomass is an estimated wet weight calculated from the biovolume and these values are not directly comparable with the conventional net-collected biomass values. Optical samplers do not discriminate between particle types (i.e. between detritus and plankton), therefore the carbon equivalent is not calculated for this project dataset.
The CPR sampler (P_597) uses silk mesh, which retains smaller organism than conventional smoother nylon mesh 13 . We therefore suggest that the optical plankton and CPR data should be either analysed separately from the net-collected data or analysed together with net samples using appropriate statistical models. Note that the CPR and OPC/LOPC datasets have a high level of internal consistency.
As with the Australian Phytoplankton Database 8 , Zooplankton Abundance Database 7 , Chlorophyll a Database 9 , the Australian Zooplankton Biomass Database has been built with ease of use and minimising user error in mind. Therefore, it provides clean, easily comparable data at a level that requires minimal interpretation. The CSIRO database holds all the raw data, original datasets and ambiguous records from the constituent datasets, and researchers can request further information from the custodians detailed in Online-only Table 1. In this database, the COPEPOD datasets include the "copePID": a permanent unique identifier to link to each project data set in COPEPOD, this is the first 8 digits of the sample_id. This enables access to the dataset and associated metadata in COPEPOD (e.g. au-04001: https://www.st.nmfs.noaa.gov/copepod/data/au-04001/).

Usage Notes
With a comprehensive continental-scale dataset, care must be taken with analysis as the data were collected using very different methods. Data were collected under vastly different environmental conditions, from tropical, to sub-Antarctic waters.
This database 6 can be used in conjunction with the Australian Zooplankton Abundance Database 7 , which provides species-level data and is available through the AODN. The Australian Phytoplankton Abundance Database 8 and the Australian Chlorophyll a Database 9 (both also available through the AODN) provide complementary data that can be matched to the zooplankton biomass data via Project_id and to individual records via sample_id or by sampling date and time details. The Australian Larval Fish database 67 also includes information from Project P_599: the IMOS National Reference Stations and is also available for comparison through the AODN.