Sounding out life in the deep using acoustic data from ships of opportunity

Shedding light on the distribution and ecosystem function of mesopelagic communities in the twilight zone (~200–1000 m depth) of global oceans can bridge the gap in estimates of species biomass, trophic linkages, and carbon sequestration role. Ocean basin-scale bioacoustic data from ships of opportunity programs are increasingly improving this situation by providing spatio-temporal calibrated acoustic snapshots of mesopelagic communities that can mutually complement established global ecosystem, carbon, and biogeochemical models. This data descriptor provides an overview of such bioacoustic data from Australia’s Integrated Marine Observing System (IMOS) Ships of Opportunity (SOOP) Bioacoustics sub-Facility. Until 30 September 2020, more than 600,000 km of data from 22 platforms were processed and made available to a publicly accessible Australian Ocean Data Network (AODN) Portal. Approximately 67% of total data holdings were collected by 13 commercial fishing vessels, fostering collaborations between researchers and ocean industry. IMOS Bioacoustics sub-Facility offers the prospect of acquiring new data, improved insights, and delving into new research challenges for investigating status and trend of mesopelagic ecosystems.

, with units of decibels referred to x ref (dB re x ref ) is capitalized.
Echosounder data. In a widely used Simrad echosounder (Table 1), the proprietary format raw data (.raw) from each transmission and reception cycle (here onwards ping) includes received echo power p er (W), with the General Purpose Transceiver (GPT) settings: frequency f (kHz), transmit power p et (W), pulse duration τ (s), transducer on-axis gain G 0 (dB re 1), area backscattering coefficient s a (m 2 m -2 ) correction factor S a corr (dB re 1), and equivalent two-way beam angle Ψ (dB re 1 sr) of the transducer. These data and associated settings were used to calculate and display volume backscattering strength S v (dB re 1 m 2 m −3 ) for one or more frequency channels as 3 : [, ] 20 log [ , ] 2 [ , ] 10 log ( ) 32 2 , where P er (dB re 1 W) is the received power, r (m) is the range to the target, α a (dB m -1 ) is the absorption coefficient, λ (m) is the wavelength, g 0 (dimensionless) is the transducer on-axis gain, c w (m s -1 ) is the sound speed in water, ψ (sr) is the equivalent two-way beam angle, and the index i and j represent vertical sample number and horizontal ping number respectively.
Echosounder calibration. Echosounder calibration is a prerequisite for quantitative bioacoustic studies.
The overall on-axis performance of echosounders installed on the participating platforms was routinely evaluated by established sphere calibration method 3,4 . This method provides calibrated G 0 and S a corr required for standardizing S v data (Eq. 1) collected by diverse platforms with a traceable calibration history. The sphere calibration also provides a check for transducer beam-pattern characteristics and related Ψ. The manufacturer-specified Ψ adjusting for the local sound speed variation at the calibration location was used due to the difficulty in obtaining an independent measurement of hull-mounted transducer beam pattern. The raw data acquired using ES60 and ES70 echosounders were modulated with a triangle wave error sequence 65 . The triangle wave error (with a 1 dB peak-to-peak amplitude and a 2720 ping period) was removed from calibration data before calculating G 0 and S a corr . Open ocean transit (here onwards transect) data were not corrected for the triangle wave error due to data management and storage constraints at the start of the program. Generally, this error will average to zero over a full period of 2720 pings for normal operations and 1 km horizontal resolution of the processed data. To facilitate the processing of high-resolution data (e.g. 100 m horizontal resolution) and slow ping rate systems, transect data files were corrected for this error (if applicable) with associated metadata, since September 2020.
Data acquisition. Ensuring the operational need of participating platforms (e.g. fishing), the data acquisition settings in Table 2 were used to optimize quality and practical utility of collected data. The transmit power was selected based on the recommended 20 settings for commonly used Simrad echosounders. The pulse duration was chosen as a trade-off between sample resolution and acceptable signal-to-noise ratio (SNR, dB re 1) in the mesopelagic zone, and the logging range was set to provide robust estimates of echosounder background noise (dB re 1 W) levels 66 .
Data registration and management. Depending on the primary purpose of participating platforms, raw data received from operators (Table 1) may cover transects and periods of fishing or scientific activities. A custom Java software suite was developed to assist data management and help identify transects for post processing (Fig. 5). These tools were used to create information (inf) files. The inf file is in plain text format that contains user-defined metadata (platform name, relevant platform call sign, voyage name, transect attributes, and relevant comments). It also includes key data acquisition settings extracted from the raw data files including frequency, transmit power, pulse duration, and echosounder details (GPT channel identifier and transducer model). The platform navigation details (total travel time, total distance covered, and average platform speed), temporal extent (start and end time of data volume), and geographic extent (limits of latitude and longitude) were also captured in the inf files. These inf files were checked for consistent data acquisition settings, transect selection, and excluding continental shelf water column backscatter data. Raw data files with inconsistent data acquisition or unknown calibration settings were not considered for further processing and archived locally.
Data processing routines. Data sets were initially processed using Echoview ® software (Echoview Software Pty Ltd, Hobart, Tasmania, Australia) that includes a sequence of data processing filters 5 designed to remove noise and improve data quality. Transect data files applying related time offset to Coordinated Universal Time (UTC) and calibration parameters were visualized (Eq. 1) as frequency-specific echograms in Echoview ® for visual www.nature.com/scientificdata www.nature.com/scientificdata/ inspection, transducer motion correction, and filtering processes (Fig. 5). Subsequent processing and packaging were completed using MATLAB ® software (MathWorks, Natick, Massachusetts, USA). All processing steps were semi-automated using a custom MATLAB ® Graphical User Interface (GUI) integrated with Component Object Model (COM) objects controlling Echoview ® software.
Visual inspection of data. Acoustic data quality from different platforms can vary significantly due to signal attenuation (i.e. attenuation of transmit and/or received signal to a level below the analysis threshold), and www.nature.com/scientificdata www.nature.com/scientificdata/ signal degradation due to combined transducer motion and noise. Data quality control involved visual inspection of echograms (Fig. 5), followed by marking the seabed (if present) and regions of bad data using echogram tools available in Echoview ® . Pings with prolonged noise interference or signal attenuation were flagged as bad data. Data shallower than 10 m were removed to exclude echosounder transmit pulse and echoes in the transducer nearfield. Similarly, data deeper than the seabed (if present) were removed from the analyses. Additionally,  Flowchart of methods implemented to produce quality-controlled bioacoustic data, providing an overview of data processing sequences in the context of key data variables present in a NetCDF file. Note that before transducer motion correction and filtering steps, calibrated S v values within each ping were resampled (by taking mean in the linear domain) to a specified vertical resolution of 2 m to smooth out vertical sample-tosample variations in S v .  Table 2. Commonly used data acquisition settings for IMOS Bioacoustics sub-Facility platforms. Note that high-frequency 70 and 120 kHz echosounders are not capable of recording high-quality biological scattering down to 1800 m range. This logging range was set to provide robust estimates of echosounder background noise levels with a presumption that at far ranges the noise will be dominating over the biological scattering due to beam spreading and absorption losses. The absorption of sound in water increases rapidly with frequency and high-frequency echosounders are limited to short ranges.
www.nature.com/scientificdata www.nature.com/scientificdata/ regions of aliased seabed echoes (i.e. seabed reverberations from preceding pings coinciding with the current ping) were manually flagged as bad data. Valid high scattering from biological sources (e.g. pelagic fish schools that may occur between surface and 250 m depth) causing an apparent transition in backscatter intensities was manually preserved from the transient noise filter described below 5 . transducer motion correction. Echo-integration results will be biased if the change in orientation of transducer beam between the times of each ping is not accounted for. The effect of transducer motion on echo-integration was studied by Stanton 67 and later Dunford 68 developed a single correction function that can be applied for a wide range of circular transducers and related s v data. To fully characterize platform movement, the Dunford 68 algorithm implemented in Echoview ® requires motion data (i.e. pitch and roll of a platform) recorded at a rate above the Nyquist rate of platform's angular motion 69 to avoid temporal aliasing due to an inadequate sampling rate. When platform motion data were available at a suitable sampling rate (see 'Technical Validation' section), transducer motion effects were corrected using Dunford 68 algorithm by ensuring time synchronization with recorded acoustic data (Fig. 5).
Data processing filters. Fishing vessels (FV) contributing to IMOS Bioacoustics sub-Facility were not purposely built for collecting high-quality bioacoustic data. Various factors including inclement weather and vessel design can affect data quality that could cause large biases in derived s v values. To minimize these biases, data processing filters were applied to the raw data (Fig. 5). Transducer motion-corrected data were subject to a sequence of data processing filters 5 designed to mitigate impulse noise, signal attenuation, transient noise, and background noise 66 .
Data processing filters were applied to each S v sample in an echogram, identified by a vertical sample number i and horizontal ping number j. The 'context window' defined for filters include a current ping, and surrounding pings on either side of the current ping. Depending on the filter used, the context window either centres on the current ping or current sample, and slides over the entire echogram. Impulse noise removal. Impulse noise affects discrete sections of the data with a duration of less than one ping, for example, transmit pulses originated from other unsynchronized acoustic systems. The impulse noise removal algorithm implemented in Echoview ® (based on Ryan, et al. 5 ) compares each S v sample in a current ping to the adjacent S v samples (at the same depth) in surrounding pings defined by a context window of specified width W (see details of context window in Table 3). A smoothed copy of original S v values (i.e. unfiltered data) within the context window was used to identify impulse noise (see details of smoothing window in Table 3). The original S v samples were identified as impulse noise if the corresponding smoothed S v samples satisfy the condition: [, ] and [ , ] and W is an odd integer value in the range 3 to 9, and δ (dB re 1 m 2 m −3 ) is an empirically determined impulse noise removal threshold value. Identified noise values were replaced as 'no data' . The impulse noise removal parameters defined in Echoview ® are given in Table 3. attenuated signal removal. Signal attenuation is generally caused by air bubbles beneath the transducer that may occur for one ping or can persist over multiple pings. The attenuated signal removal algorithm implemented in Echoview ® (based on Ryan, et al. 5 ) compares the percentile score of S v samples in a current ping with The value of a time-varied threshold TVT(r) (dB re 1 m 2 m −3 ) defined at 1 m range. This threshold will vary as a function of range from the transducer as: is the volume backscattering strength at one-meter range r (m) and α a (dB m -1 ) is the absorption coefficient. Any S r ( ) v values below this calculated TVT r ( ) were preserved from the impulse noise filter.

−170
Vertical size of smoothing window Metre Vertical window size used for smoothing. Corresponding horizontal smoothing window is one ping wide.

5
Horizontal size of context window (W) Number Width of the context window (i.e. number of pings including the current ping) used to identify noise.
www.nature.com/scientificdata www.nature.com/scientificdata/ the percentile score of S v samples in surrounding pings defined by a context window (see details of context window in Table 4). The current ping was removed and replaced as 'no data' if:  Table 4. transient noise removal. Transient noise is introduced to the received signal that can occur at irregular intervals and persists over multiple pings. The transient noise removal algorithm implemented in Echoview ® (based on Ryan, et al. 5 ) compares each S v sample in a current ping with the percentile score of S v samples in surrounding pings defined by a context window (see details of context window in Table 5). A smoothed copy of original S v values (i.e. unfiltered data) within the context window was used to identify noise (see details of smoothing window in Table 5). The original S v samples were identified as transient noise if the corresponding smoothed S v samples satisfy the condition:  Table 5.
Background noise removal. Background noise is introduced to the received signal that can vary in intensity and pattern (see section 'Technical Validation'). According to De Robertis and Higginbottom 66 , the calibrated S v values (Eq. 1) can be expressed as the sum of contributions from the signal and noise as: is the calibrated S v samples derived from the raw data (i.e. Eq. 1), S v signal (dB re 1 m 2 m −3 ) is the calibrated S v samples representing the contribution from signal, S v noise (dB re 1 m 2 m −3 ) is the calibrated S v samples representing the contribution from noise, and the index i and j represent vertical sample number and horizontal ping number respectively.
To estimate background noise levels, calibrated received power P i j [ , ] er cal (dB re 1 W) values were calculated from S i j [ , ] v cal values by subtracting the time-varied gain (TVG) function 2 (i.e. α + r r 20log 2 a 10 ) from Eq. 1 as: [, ] 20 log [ , ] 2 [ , ] er v a 10 cal c al The calibrated P i j [ , ] er cal values were averaged 66 (in linear domain) within an 'averaging cell' of M vertical samples (with an index k) and N horizontal pings (with an index l) to estimate noise as: Vertical size of the context window used to identify pings with attenuated signal. This window size defines the vertical separation between 'exclude above' and 'exclude below' depth lines (see above).

100
Horizontal size of context window (n) Number Horizontal size of the context window (i.e. number of pings) used to identify pings with attenuated signal. 301 Detection percentile (p) Percentile The percentile value used for comparison between the current ping and context window. 50 Detection threshold (δ) dB re 1 m 2 m −3 The threshold value used to identify pings with attenuated signal. 8 Table 4. User-defined attenuated signal removal parameters in Echoview ® .
www.nature.com/scientificdata www.nature.com/scientificdata/ er cal values calculated for each averaging cell with a vertical sample interval k and horizontal ping interval l, and Noise l ( ) (dB re 1 W) is the representative noise estimate for the 'middle ping' in each horizontal interval l. Note that the averaging cell slides over the entire echogram (see details of averaging cell in Table 6).
An empirically determined maximum threshold Noise max (dB re 1 W) (see Table 6) was applied to Noise l ( ) values as an upper limit of background noise levels. Any Noise l ( ) values exceeding this threshold was replaced with the predefined Noise max value.
The Noise l ( ) value estimated for a given horizontal ping interval l was assigned to all individual pings constituting the interval to establish noise Noise j ( ) (dB re 1 W) estimate for each ping. The effect of TVG was added to the Noise j ( ) levels to compute S v noise for each vertical sample number i and horizontal ping number j as: The SNR, a measure of the relative contribution of signal and noise was estimated as: v v bnc n oise where SNR i j [ , ] (dB re 1) is the signal-to-noise ratio for each vertical sample number i and horizontal ping number j.
An empirically determined threshold Minimum SNR (dB re 1) (see Table 6) was used as an acceptable SNR for background noise corrected S i j v bnc values with corresponding SNR i j [ , ] below this threshold were set to '−999' dB re 1 m 2 m −3 (an approximation of zero in the linear domain). The background noise removal parameters defined in Echoview ® are given in Table 6.
Residual noise removal. In the final stage, a 7 × 7 median filter was applied to remove residual noise retained in the core filtering stages (especially at far ranges). A median filter replaces the current S v sample with the median value of S v samples in a M × M neighbourhood. It is important to note that the output of 7 × 7 median filter was not directly used for echo-integration, rather it was used to flag residual noise retained from the core filtering process. A maximum data threshold of −50 dB re 1 m 2 m −3 and a time-varied threshold TVT r ( ) with the reference value of −160 dB re 1 m 2 m −3 (defined at 1 m range) was applied to the background noise corrected S i j [ , ] v bnc data before applying 7 × 7 median filter (see Table 3   www.nature.com/scientificdata www.nature.com/scientificdata/ Quality-controlled S v data along with: (1) calibrated and motion corrected raw data, (2) transducer motion correction factor (i.e. difference between 'motion corrected' and 'calibrated raw' data), (3) background noise, and (4) SNR were exported from Echoview ® as echo-integration cells (i.e. grid on an echogram) with a resolution of 1 km horizontal distance (i.e. ping-axis interval p) and 10 m vertical depth (i.e. range-axis interval r). Echo-integration values were stored as comma-separated values (CSV) files. Exported S v data were converted to linear scale for further processing and packaging in MATLAB ® (Fig. 5).

Secondary corrections for sound speed and absorption variation.
Quality-controlled S v data were echo-integrated and exported using a nominal sound speed c w (m s −1 ) and absorption coefficient α a (dB m −1 ) values estimated using the equations of Mackenzie 70 and Francois and Garrison 71 respectively (see sound speed and absorption coefficient variables in Eq. 1 used for S v calculation). However, open ocean transects pass through different hydrographical conditions, so a secondary range dependent correction was required to account for the changes in horizontal and vertical cumulative mean sound speed and absorption as:  [ , ] a n c r p  The temperature and salinity data for sound speed and absorption coefficient calculations were interpolated from either CSIRO Atlas of Regional Seas 72 (CARS, http://www.marine.csiro.au/~dunn/cars2009/ version 2009) or Synthetic Temperature and Salinity (SynTS) 73 analyses (http://www.marine.csiro.au/eez_data/doc/synTS.html), but can also be derived from oceanographic reanalysis and ocean circulation models. CARS2009 is a digital climatology or atlas of seasonal ocean water properties. It is based on a comprehensive set of quality-controlled vertical profiles of in situ ocean properties (i.e. temperature, salinity, oxygen, nitrate, silicate, and phosphate)  Table 6. User-defined background noise removal parameters in Echoview ® .
www.nature.com/scientificdata www.nature.com/scientificdata/ collected between 1950 and 2008. CARS2009 NetCDF files contain a gridded mean of these ocean properties and average seasonal cycles generated from the collated observations. CARS2009 covers global oceans on a 0.5 × 0.5 degree grid spatial resolution, and are mapped onto 79 standard depth levels from the sea surface to 5500 m (from this vertical profiles of ocean properties along a bioacoustic transect can be extracted). SynTS is a daily three-dimensional (3D) temperature and salinity product generated by CSIRO, where the CARS temperature and salinity fields are adjusted with daily satellite sea surface temperature (SST) and gridded sea level anomaly (GSLA). SynTS has a 0.2 × 0.2 degree grid spatial resolution, and is mapped onto 66 standard depth levels from the sea surface to 2000 m. Due to limited spatial coverage (60°S-10°N and 90°E-180°E), the SynTS products may not always cover the transect region (e.g. Southern Indian Ocean), in that case CARS climatology values were used for the secondary corrections (Fig. 5).
Data review, packaging and submission routines. For each processed transect, secondary corrections applied s v corr data together with metrics of data quality and other auxiliary data variables were stored in Network Common Data Form (NetCDF, www.unidata.ucar.edu) file (NetCDF-4 format) with a resolution of 1 km horizontal distance (i.e. ping-axis interval) and 10 m vertical depth (i.e. range-axis interval) (see 'Data Records' section for data contents). This NetCDF file conforms standardized naming conventions and metadata content defined by the Climate and Forecast (CF) 74 , IMOS 75 , and International Council for the Exploration of the Sea (ICES) 76 published over the years (Fig. 6).
Processed NetCDF files were independently reviewed by both analyst and principal investigator to further investigate data quality. If suitable, the NetCDF file along with ancillary files: (1) acquired raw data (.raw files), (2) platform track in CSV format (containing date, time, latitude, longitude, and time offset to UTC), (3) platform motion data (if recorded) in CSV format (including date, time, pitch, and roll measurements), and (4) a snapshot of processed echogram as Portable Network Graphics (PNG) format were packaged and submitted to the publicly accessible AODN Portal (Fig. 5).

Data Records
The primary components of a processed NetCDF file are shown in Fig. 6 and described in Table 7 to provide an overview of data contents and structure. Each variable in a NetCDF file is described with an associated description, specifying the data output resulting from each data-collection or analytical step (Table 7).
Processed NetCDF files are published via the Australian Ocean Data Network (AODN) Portal at: https://portal.aodn.org.au/search?uuid=8edf509b-1481-48fd-b9c5-b95b42247f82. This portal allows transect selection and data download with spatial and temporal subset options implemented for each platform and frequency.
A generic metadata record of the project is available via GeoNetwork at: https://catalogue-imos.aodn.org.au/geonetwork/srv/api/records/8edf509b-1481-48fd-b9c5-b95b42247f82. The NetCDF files are also accessible via the AODN THREDDS data server that can be accessed remotely using the OPeNDAP protocol at: http://thredds.aodn.org.au/thredds/catalog/IMOS/SOOP/SOOP-BA/catalog.html. A snapshot of processed NetCDF files at the time of this publication has been assigned a Digital Object Identifier (https://doi.org/10.26198/dv5p-t593) and will be maintained in perpetuity by the AODN 77 . Readers are directed to check the AODN Portal for the latest data set.

technical Validation
Routine calibration and monitoring of echosounders. In the context of echosounder calibration, it is important to note that respective ±X dB re 1 (where X is a real number) change in calibration parameters G 0 and S a corr factor represents a corresponding twofold ∓ X 2 dB re 1 m 2 m -3 variation in the derived S v (Eq. 1) that would result in − ∓(100(10 ) 100) X (2 /10) % change echo-integration results (if accurate calibration parameters are not applied). In principle, properly calibrated echosounders operating at the same frequency should provide matching echo-integration results for a given sampling region. However, due to platform performance (e.g. aeration beneath the transducer), the derived data may be biased and this can be verified by an intercalibration 4,78 experiment with two or more platforms simultaneously sailing over the same region, and later comparing the echo-integration results. In suitable circumstances, large uncertainty in the absolute calibrations and platform-specific factors can be quantified. This generic principle was applied to prioritize platforms for potential long-term data collection by comparing data quality metrics between participating platforms. As the spatio-temporal coverage of the data series improves, it will be possible to perform more direct comparisons between platforms and with an acoustic climatology of the regions.
Time series calibration results of selected platforms (with consistent echosounder configuration) are shown in Fig. 7 as an example to highlight repeatability and challenges with monitoring long-term performance and stability of echosounders. The FV Rehua, FV Antarctic Discovery, and RV Investigator demonstrate reasonable repeatability of 38 kHz transducer measurements with G 0 values varying between 25.4 ± 0.2, 27.0 ± 0.3, and 24.9 ± 0.2 dB re 1 respectively (Fig. 7a,c,d). In contrast, the FV Austral Leader II (Fig. 7b) indicates gradual degradation of system performance (possibly ageing effect) over six years with 1.3 dB re 1 decrease in calibrated G 0 values 79 . Keeping p et , τ, α a , c w , and ψ constant (Eq. 1), this performance change would result in ~44% decrease in S v data if G 0 and S a corr factor calculated in 2009 is applied for processing 2015 data sets.
Although the established sphere calibration method standardizes bioacoustic data collected by multiple participating platforms, there is a need for an additional diagnostic method to ensure echosounder performance in between routine calibrations. Along with calibration results, the peak values of instantaneous received power P er (2021) 8:23 | https://doi.org/10.1038/s41597-020-00785-8 www.nature.com/scientificdata www.nature.com/scientificdata/ (Eq. 1) measured within 0-1 m range (i.e. ringdown zone) are used as a complementary diagnostic method to ensure stability of echosounders, noting that monitoring is not calibration. This method can highlight noticeable gradual or abrupt changes in system performance over time. For example, spatio-temporal variations in peak power for FV Atlas Cove (Fig. 7e) highlight gradual degradation of 38 kHz echosounder performance with ~11 dB re 1 W decrease in peak power values over a year, complicating data usage. In contrast, a comparison between 18 and 38 kHz peak power values for FV Antarctic Discovery (Fig. 7f) highlights an unknown abrupt change (~3 dB re 1 W) in 18 kHz echosounder performance over 15 days docking period, necessitating routine monitoring. Such performance changes (if observed) are reported back to the operator for system maintenance (Fig. 1), and juxtaposed with relevant calibration results to assess repeatability of measurements and prioritizing transects for processing.
Simmonds and MacLennan 2 consider that in fisheries acoustics applications, properly maintained low-frequency scientific echosounders can demonstrate consistent performance within 10% in the long-term. The aim should be to develop a routine or protocol for calibration that would help achieve this accuracy consistently irrespective of the echosounder system used. But in practice, variability in echosounder on-axis sensitivity could result from a combination of factors including system electronics, data acquisition settings, SNR, environmental conditions, and density and composition of the calibration sphere 3 . The performance of an echosounder may degrade gradually or abruptly (Fig. 7e,f), and transducers are vulnerable to mechanical damage and ageing effects 80 . Therefore, it is important to quantify such changes routinely for all participating platforms to apply suitable calibration corrections required for data processing. This would further facilitate existing feedback mechanism with platform operators for subsequent system maintenance and technical inspection. transducer motion correction. Transducer motion can reduce the received signal and degrade data quality substantially at long ranges depending on the sea state and platform design. For hull-mounted circular transducer, the platform motion and transducer motion can be considered synonymous 81 . Accordingly, the angular motion of platform can be used to correct for the change in orientation of transducer beam between the times of each ping, with a precondition that platform motion data (i.e. pitch and roll) need to be recorded at a sampling rate above the Nyquist rate of platform's angular motion. The Power Spectral Density (PSD) analyses 82 of motion www.nature.com/scientificdata www.nature.com/scientificdata/

Global attributes
The global attribute section of a NetCDF file contains mandatory metadata that describes general contents and facilitates data discovery. This section is composed of the following key attributes: project, metadata record, cruise, ship, transect, instrument, calibration, data acquisition, data processing, dataset, and data. Note that global attribute names are case sensitive.

Dimensions
Dimensions provide information on the size of data variables contained in a NetCDF file, and additionally match coordinate variables to data variables. The dimensions of a data variable define the axes (i.e. TIME and DEPTH) of the quantity it contains.

Variables
NetCDF variables include coordinate variables, data and data quality metrics derived from the echosounder primary measurement (i.e. received power), and environmental parameters as given below.

Coordinate variables
Coordinate variables locate the data in space and time.

LATITUDE
Specified in decimal degrees relative to the World Geodetic System (WGS84) coordinate reference system.

LONGITUDE
Specified in decimal degrees relative to the WGS84 coordinate reference system.

DEPTH
Measures the depth (m) below the sea surface that is positive in downward direction.

TIME
Represented as decimal number of days since the reference time of 1950-01-01 00:00:00 UTC.

Primary data variables
Contains data and data quality metrics derived from the echosounder primary measurement.

motion_correction_ factor
Percentage correction applied to calibrated raw s v values for transducer motion correction (if platform motion data is available). This variable is the percentage difference between calibrated raw s v values before and after applying transducer motion correction algorithm.

Sv_unfilt
Unprocessed mean s v values (m 2 m −3 ). These are an echo-integration of calibrated and transducer motion corrected acoustic data. background_noise Background noise (dB re 1 W) values for each ping-axis interval (i.e. horizontal distance). See Eq. 7 for more information.
signal_noise Signal-to-noise-ratio (dB re 1) for each echo-integration cell. See Eq. 10 for more information.

Auxiliary data variables
Auxiliary data variables contain environmental parameters such as climatology and satellite derived data.

CARS derived climatology temperature (°C) values for each echo-integration cell.
CARS_salinity

CARS derived climatology salinity (PSU) values for each echo-integration cell.
CARS_oxygen CARS derived climatology oxygen (ml l −1 ) values for each echo-integration cell. sound_speed Sound speed (m s −1 ) in water calculated for each echo-integration cell.

CARS_nitrate
absorption Absorption coefficient (dB m −1 ) of sound in water calculated for each echo-integration cell. Table 7. Description of key variables present in a NetCDF file. These variables are described with mandatory variable attributes, linking associated quality flags as ancillary variables (not applicable to all variables in a file). Quality flags provide an assessment of quality control performed.
www.nature.com/scientificdata www.nature.com/scientificdata/ data (Fig. 8a) recorded from selected platforms indicate that a minimum sampling rate of 4 Hz is generally adequate to meet this precondition and subsequent correction. Sources of error may exist in motion-corrected data if there is a large discrepancy between measured and manufacturer specified (or nominal) beamwidths of the transducer used 83 .
Owing to the magnitude of angular displacement and beamwidth values of transducers used, the effects of transducer motion can result in a non-linear range dependent s v correction. If motion correction is not applied, it can negatively bias (or underestimate) echo-integration results, where the amount of correction increases with range. The correction is greater for narrow-beam transducers and comparatively smaller for wide-beam transducers (Fig. 8c,d). The variable 'motion_correction_factor' (Table 7) is now being stored in NetCDF files for assessing the magnitude of transducer motion correction and recalculating calibrated raw data (if needed).
Data processing filters. The quality of bioacoustic data collected from ships of opportunity sampling methods can be complex and extremely variable. If noise is not removed, it can be misinterpreted as biological signal, biasing echo-integration results. Statistical quantification of bias and error potential for data retained after www.nature.com/scientificdata www.nature.com/scientificdata/ filtering is challenging and beyond the scope of present study. However, selected examples of bioacoustic data with good and compromised quality are presented to demonstrate combined effectiveness of data processing filters. The application of data processing filters has considerably improved the quality of bioacoustic data and demonstrated to be robust across diverse platforms and weather conditions 5 . A caution is that there are subjective elements in 'visually' determining the quality of final data product after filtering, but this can be made objective to a certain extent by comparing raw and filtered echograms with metrics of data quality stored in NetCDF files.
As an example, good quality data collected by FV Will Watch in the Indian Ocean is presented in Fig. 9, highlighting diel vertical migration and deep scattering layer without any apparent artefacts in the data. To broadly quantify the combined effect of data processing filters, mean difference between unfiltered and filtered echograms (i.e. difference in mean S v before and after filtering) is calculated for epipelagic, upper mesopelagic, and lower mesopelagic layers respectively, indicating 0.3 ± 0.9 (~7%), 0.1 ± 0.4 (~2%), and 0.1 ± 0.1 dB re 1 m 2 m -3 (~2%) reduction in the filtered data (see Table 7 for layer description). The data quality metric SNR (Fig. 9b) in epipelagic, upper mesopelagic, and lower mesopelagic layers are 59.1 ± 4.6, 34.6 ± 3.5, and 31.5 ± 4.4 dB re 1 respectively, with mean ping-axis interval background noise level calculated as −165.6 ± 2.1 dB re 1 W (Fig. 9c). After the filtering process, approximately 98%, 98%, and 99% of S v data are retained in the epipelagic, upper mesopelagic, and lower mesopelagic layers respectively (Fig. 9d).  Table 7 for description) applied to 18  www.nature.com/scientificdata www.nature.com/scientificdata/ To demonstrate the usefulness of data quality metrics, data collected by FV San Tongariro in Tasman Sea is presented in Fig. 10. This example compares raw and filtered echograms, highlighting predominant transient noise in the data amplified as a function of TVG. The mean difference between unfiltered and filtered echograms in epipelagic, upper mesopelagic, and lower mesopelagic layers respectively indicates 1.7 ± 1.9 (~48%), 1.2 ± 1.5 (~31%), and 3.6 ± 1.8 dB re 1 m 2 m -3 (~129%) reduction in the filtered data, highlighting range-dependant effect of combined noise 5 (i.e. the sum of impulse, transient, and background noise). Associated data quality metric SNR (Fig. 10b) in epipelagic, upper mesopelagic, and lower mesopelagic layers are 32.6 ± 7.9, 22.2 ± 5.3, and 17.8 ± 4.4 dB re 1 respectively, with mean ping-axis interval background noise level (Fig. 10c) calculated as −152.5 ± 3.4 dB re 1 W (note this background noise is ~13 dB re 1 W higher as compared to the good quality data presented in Fig. 9c). After the filtering process, approximately 84%, 88%, and 86% of S v data are retained in the epipelagic, upper mesopelagic, and lower mesopelagic layers respectively (Fig. 10d). The raw data quality of this transect is not satisfactory (Fig. 10a) and despite the visual appearance of filtered data, the quality metrics SNR, background noise, and percentage of S v data retained after filtering are not considered to be acceptable as compared to the other transect with high data quality (Fig. 9a).
These examples (Figs. 10 and 11) suggest that caution is needed while reviewing a final data product where filtering and subsequent resampling to predefined NetCDF file resolution (1 km horizontal distance and 10 m vertical depth) may produce a visually clean echogram without any apparent artefacts, but potentially removed significant biological signal and/or retained noise in the process. Accordingly, we have not posted these two transects to the AODN, and similar data sets from other platforms with compromised data quality are archived locally. Storing metrics of data quality in NetCDF files is intended for assisting users to make an independent assessment of data quality based on the examples demonstrated here.
Secondary corrections for sound speed and absorption variation. The difference between variable 'uncorrected_Sv' (i.e. filtered data before secondary corrections) and 'abs_corrected_sv' (i.e. same data after secondary corrections but before depth interpolation) is calculated (uncorrected_Sv-abs_corrected_sv) to demonstrate the effect of secondary corrections (Fig. 12). This step introduces a range-dependent correction (Fig. 12f) that can differ substantially based on the equation used for calculating www.nature.com/scientificdata www.nature.com/scientificdata/ sound absorption in seawater (see Fig. 5 in Doonan, et al. 84 for a comparison between two commonly used equations for absorption calculation. Note that the range-dependant percentage correction to the data can differ up to 45% between Doonan, et al. 84 and Francois and Garrison 71 for a 38 kHz data at 1000 m depth).
The processed bioacoustic data sets are consistently corrected based on Mackenzie 70 sound speed and Francois and Garrison 71 absorption equations following the recommendations by Simmonds and MacLennan 2 until more evidence is available to select another formula. Macaulay, et al. 85 conducted field measurements of acoustic absorption in seawater from 38 to 360 kHz, indicating consistent results with Francois and Garrison 71 equation for frequencies of 200 kHz and lower. Macaulay, et al. 85 observed a significant difference around 333 kHz, indicating that Francois and Garrison 71 equation is incorrect for some input parameters (note that 333 kHz data is not processed under IMOS Bioacoustics sub-Facility).
It is important to note that the percentage correction shown in Fig. 12 is applicable to the example transect only that depends on the nominal sound speed and absorption values used during initial processing and echo-integration (Eq. 1). Other transects (e.g. Southern Ocean) have different correction factors that are related to regional changes in temperature and salinity values.
The key intermediate variable 'u n c o r r e c t e d _ S v ' is stored in NetCDF files for recalculating secondary corrections using other equations or data sources (if needed). The equation used for sound absorption calculation is documented in the global attribute section of a NetCDF file as www.nature.com/scientificdata www.nature.com/scientificdata/ 'data_processing_absorption_description' and 'history' . Similarly, the equation used for sound speed calculation is captured in the global attributes 'data_processing_soundspeed_description' and 'history' .

Usage Notes
When interpreting bioacoustic data it is important to understand the corrections applied at each processing step particularly calibration, transducer motion correction, data filtering, and secondary corrections for sound speed and absorption variation (Fig. 5). The transducer motion and secondary corrections are range-dependant that can greatly influence the lower mesopelagic layer derived metrics. Our goal is to keep minimum updates to data processing steps and data records so that the database remains consistent and comparable. Measurement uncertainty. The widely used Simrad EK60 echosounder is now discontinued and replaced by the Simrad EK80. A recent comparison study 86 highlighted that EK80 raw power measurements were 3-12% lower as compared to EK60, affecting weak scatterer and/or long-range acoustic observations due to nonlinear amplification of low-power signals by the EK60. Presently the users need to correct the data for this bias, and we are in the process of providing a correction update to the data sets with associated metadata. In addition www.nature.com/scientificdata www.nature.com/scientificdata/ to calibration and unknown methodological uncertainties, this new measurement uncertainty highlights the ongoing challenges in maintaining a diverse data series, and the need for storing fundamental echosounder measurement (i.e. received echo power) and appropriate metadata to enable unforeseen corrections in the future as needed.
Challenges with biomass estimation. Ships of opportunity bioacoustic sampling methods have clear advantages as well as limitations. Their usefulness in resource assessment, ecosystem monitoring, and cost-effective mapping of mesopelagic communities at regional and global scales is established with diverse acoustic-based indicators and metrics, but credible conversion of bioacoustic data (s v ) to open ocean fish biomass is a multi-step procedure and require lowest degree of bias.
For example, the processed s v values are vertically integrated over a measurement range (r 1 to r 2 ) to calculate area backscattering coefficient s a (m 2 m −2 ) along a transect. Scatterer areal density (number m −2 ) i.e. the number of organisms (e.g. fish) within the measurement range is calculated by dividing s a by the backscattering cross-section σ bs (m 2 ) of a representative single fish. Biomass of fish (kg m −2 ) can be estimated by multiplying this scatterer areal density by the weight W (kg) of a single fish. This requires separation of bioacoustic data by species composition, location, and σ bs distribution. Mean weight can be derived from observed weights (using nets) or length to weight regression. Similarly, σ bs are obtained from in situ measurements and/or σ bs to length regressions 2 . Biomass calculations from these equations will be biased if the weight and target strength TS [ σ 10 log ( ) bs 10 , dB re 1 m 2 ] of the organism are uncertain (assuming accurate calibration and echosounder linearity). For that reason, in situ and/or modelled TS must be calculated with the goal of obtaining a representative distribution.
Credible estimation of biomass using a vessel-based echosounder is very difficult for the highly diverse mesopelagic community, where gas inclusions may present in many organisms (depending on the region) that can cause frequency-and depth-dependent resonance scattering 87 , dominating the received signal. Multiple methods of ecosystem models, net capture, acoustic backscattering models, and in situ profiling acoustic optical systems 19,88 are needed to provide the necessary information to convert basin-scale bioacoustic data into specific biological metrics such as species-specific biomass 61 .
Reading the data. Generated data are stored in NetCDF files that can be readily imported into a wide variety of cross-platform software programs and programming languages. A custom MATLAB ® function 'viz_sv' to read and visualize NetCDF files conforming to the conventions described in this data descriptor can be downloaded from the IMOS Bioacoustics sub-Facility web site http://imos.org.au/facilities/shipsofopportunity/bioacoustic or GitHub https://github.com/CSIRO-Acoustics/Visualize-IMOS-Bioacoustics-data.

Code availability
Echosounder raw data files are recorded in proprietary formats that typically require dedicated commercial or open-source acoustic processing software for visualization and processing. The custom Java software tool 'basoop.
jar' used for incoming data registration and management, along with the MATLAB ® GUI used for controlling data processing steps in Echoview ® and NetCDF file creation can be obtained at: https://github.com/CSIRO-Acoustics/IMOS-Bioacoustics. The MATLAB ® codes used for generating relevant figures are available at: https:// github.com/CSIRO-Acoustics/Publications/tree/main/Scientific_Data/Data_Descriptor.