## Background & Summary

Since 2010, as a part of existing ocean industry collaboration, Australia’s Integrated Marine Observing System (IMOS) Ships of Opportunity (SOOP) Bioacoustics sub-Facility (here onwards IMOS Bioacoustics sub-Facility) has been collecting opportunistic, supervised, and unsupervised active bioacoustic data from different platforms including commercial fishing and research vessels transiting ocean basins1 (Fig. 1). The resulting acoustic snapshots2 provide a proxy for the combined effects of size, abundance, distribution, diversity, and behavior of mid-trophic mesopelagic communities including macro-zooplankton and micronekton in the twilight zone of global oceans (Fig. 2). The broad goal of IMOS Bioacoustics sub-Facility is to provide repeated active bioacoustic observations for the status and trend of ocean life to 1000 m at basin and decadal scales.

The primary data-type derived from IMOS Bioacoustics sub-Facility is the georeferenced, calibrated3,4, and processed5 single-beam water column volume backscattering coefficient sv (m2 m−3) values, representing the linear sum of backscatter from acoustically detectable individual organisms within the sampling volume2 (Fig. 2). In suitable circumstances, it is proportional to the density of dominant scattering organisms, and the primary data for estimating biomass from acoustics at regional and global scales using existing6,7 and future methods.

Mesopelagic communities are mid-water predators and prey in the twilight zone, and presumed to make the largest natural daily animal movement on earth based on their biomass, revealing diel vertical migration8,9 and large-scale spatio-temporal patterns in pelagic sound scattering layers10 (Fig. 2). They have been identified as one of the least investigated components of open ocean ecosystems11, transferring energy from primary producers to higher predators, and regulating carbon transfer from surface to deep-ocean by linking epipelagic and deep-water food chains12,13,14,15,16.

Ships of opportunity bioacoustic sampling methods are useful for cost-effective mapping and biomass estimation of mesopelagic communities at regional and global scales with recognized potentials and challenges1. Acoustic estimation of biomass using vessel-based echosounder is complicated by many confounding factors17 including lack of accurate taxonomic information about insonified organisms, complex size distribution of scattering organisms, unknown species composition, and frequency-specific selectivity of echosounder measurements18,19. From an integrated ocean observing system perspective, a way forward to address these challenges would be to acquire multi-frequency data20,21 for improved segregation of dominant scattering groups. With advancing and diverse applications of multi-frequency systems, acoustic methods have been used to classify dominant organisms into gas-filled or fluid-filled categories22,23,24,25,26,27,28,29. Such methodologies are improving with the availability of broadband and wideband echosounders30 that would help to segment and attribute basin-scale bioacoustic observations into different scattering (or functional) groups.

Despite uncertainties with echosounder calibration, methodological challenges, species identification, and frequency-specific scattering of individual organisms, ships of opportunity bioacoustic sampling methods offer great potential to better understand the structure and ecosystem function of global mesopelagic communities, necessitating continued data acquisition and processing efforts5,31,32 with increased global accessibility33. The acoustic snapshots revealing spatio-temporal sound scattering patterns, deep scattering layer, and diel vertical migration (Fig. 2) can offer improved ecological insights34,35 for marine ecosystem acoustics36,37,38, in addition to established linkages with oceanographic processes39,40,41,42,43,44,45,46,47,48 and environmental covariates49,50,51 including light52,53,54, oxygen concentration55,56,57, temperature58,59, chlorophyll a60, and primary production6,59.

Currently, 22 platforms have contributed data to IMOS Bioacoustics sub-Facility, including both commercial fishing and research vessels (Table 1). Key time series data sets have been collected across the Tasman Sea, Southern Ocean, and Southern Indian Ocean. The extent of processed bioacoustic data archived under this sub-Facility is expanding with an improved spatial (Fig. 3) and temporal coverage (Fig. 4, until 30 September 2020). The majority of archived data are single-frequency 38 kHz (565,661 km) echosounder observations, but also include growing coverage of multi-frequency 18 kHz (118,260 km), 70 kHz (44,368 km), and 120 kHz (70,400 km) data, highlighting different scattering layers and functional groups.

The main goals and potential uptake values of IMOS Bioacoustics sub-Facility data are: (1) provide calibrated time series acoustic observations for the status and trend of mesopelagic ecosystem, (2) develop a framework synergizing active bioacoustic observations and ecosystem models61,62 for studying open ocean ecosystem dynamics, and (3) develop an active bioacoustic data-based ecosystem Essential Ocean Variable (eEOV)63 to complement established and future ocean observing systems measuring physical, chemical, and biological environment of the ocean. These frameworks would help to advance scientific knowledge of marine food chains and manage marine ecosystems sustainably.

## Methods

The terminology used in this data descriptor follows Demer, et al.3, based mostly on Maclennan, et al.64. All symbols signifying variables are italicized. Any symbol for a variable (x) that is not logarithmically transformed is in lower case. Any symbol for a logarithmically transformed variable, e.g. $$X=1{0\log }_{10}\left(x/{x}_{ref}\right)$$, with units of decibels referred to xref  (dB re xref) is capitalized.

### Echosounder data

In a widely used Simrad echosounder (Table 1), the proprietary format raw data (.raw) from each transmission and reception cycle (here onwards ping) includes received echo power per (W), with the General Purpose Transceiver (GPT) settings: frequency f (kHz), transmit power pet (W), pulse duration $$\tau$$ (s), transducer on-axis gain G0 (dB re 1), area backscattering coefficient sa (m2 m–2) correction factor Sa corr (dB re 1), and equivalent two-way beam angle Ψ (dB re 1 sr) of the transducer. These data and associated settings were used to calculate and display volume backscattering strength $${S}_{v}$$ (dB re 1 m2 m−3) for one or more frequency channels as3:

$${S}_{v}[i,j]={P}_{er}[i,j]+20\,{\log }_{10}r[i,j]+2{\alpha }_{a}r[i,j]-10\,{\log }_{10}\left(\frac{({p}_{et}{\lambda }^{2}{g}_{0}^{2}{c}_{w}\tau \psi )}{32{\pi }^{2}}\right)-2{S}_{a{\rm{c}}{\rm{o}}{\rm{r}}{\rm{r}}},$$
(1)

where $${P}_{er}$$ (dB re 1 W) is the received power, $$r$$ (m) is the range to the target, $${\alpha }_{a}$$ (dB m–1) is the absorption coefficient, $$\lambda$$ (m) is the wavelength, $${g}_{0}$$ (dimensionless) is the transducer on-axis gain, $${c}_{w}$$ (m s–1) is the sound speed in water, $$\psi$$ (sr) is the equivalent two-way beam angle, and the index i and j represent vertical sample number and horizontal ping number respectively.

### Echosounder calibration

Echosounder calibration is a prerequisite for quantitative bioacoustic studies. The overall on-axis performance of echosounders installed on the participating platforms was routinely evaluated by established sphere calibration method3,4. This method provides calibrated $${G}_{0}$$ and $${S}_{a{\rm{c}}{\rm{o}}{\rm{r}}{\rm{r}}}$$ required for standardizing $${S}_{v}$$ data (Eq. 1) collected by diverse platforms with a traceable calibration history. The sphere calibration also provides a check for transducer beam-pattern characteristics and related Ψ. The manufacturer-specified Ψ adjusting for the local sound speed variation at the calibration location was used due to the difficulty in obtaining an independent measurement of hull-mounted transducer beam pattern.

The raw data acquired using ES60 and ES70 echosounders were modulated with a triangle wave error sequence65. The triangle wave error (with a 1 dB peak-to-peak amplitude and a 2720 ping period) was removed from calibration data before calculating $${G}_{0}$$ and $${S}_{a{\rm{c}}{\rm{o}}{\rm{r}}{\rm{r}}}$$. Open ocean transit (here onwards transect) data were not corrected for the triangle wave error due to data management and storage constraints at the start of the program. Generally, this error will average to zero over a full period of 2720 pings for normal operations and 1 km horizontal resolution of the processed data. To facilitate the processing of high-resolution data (e.g. 100 m horizontal resolution) and slow ping rate systems, transect data files were corrected for this error (if applicable) with associated metadata, since September 2020.

### Data acquisition

Ensuring the operational need of participating platforms (e.g. fishing), the data acquisition settings in Table 2 were used to optimize quality and practical utility of collected data. The transmit power was selected based on the recommended20 settings for commonly used Simrad echosounders. The pulse duration was chosen as a trade-off between sample resolution and acceptable signal-to-noise ratio (SNR, dB re 1) in the mesopelagic zone, and the logging range was set to provide robust estimates of echosounder background noise (dB re 1 W) levels66.

### Data registration and management

Depending on the primary purpose of participating platforms, raw data received from operators (Table 1) may cover transects and periods of fishing or scientific activities. A custom Java software suite was developed to assist data management and help identify transects for post processing (Fig. 5). These tools were used to create information (inf) files. The inf file is in plain text format that contains user-defined metadata (platform name, relevant platform call sign, voyage name, transect attributes, and relevant comments). It also includes key data acquisition settings extracted from the raw data files including frequency, transmit power, pulse duration, and echosounder details (GPT channel identifier and transducer model). The platform navigation details (total travel time, total distance covered, and average platform speed), temporal extent (start and end time of data volume), and geographic extent (limits of latitude and longitude) were also captured in the inf files. These inf files were checked for consistent data acquisition settings, transect selection, and excluding continental shelf water column backscatter data. Raw data files with inconsistent data acquisition or unknown calibration settings were not considered for further processing and archived locally.

### Data processing routines

Data sets were initially processed using Echoview® software (Echoview Software Pty Ltd, Hobart, Tasmania, Australia) that includes a sequence of data processing filters5 designed to remove noise and improve data quality. Transect data files applying related time offset to Coordinated Universal Time (UTC) and calibration parameters were visualized (Eq. 1) as frequency-specific echograms in Echoview® for visual inspection, transducer motion correction, and filtering processes (Fig. 5). Subsequent processing and packaging were completed using MATLAB® software (MathWorks, Natick, Massachusetts, USA). All processing steps were semi-automated using a custom MATLAB® Graphical User Interface (GUI) integrated with Component Object Model (COM) objects controlling Echoview® software.

### Visual inspection of data

Acoustic data quality from different platforms can vary significantly due to signal attenuation (i.e. attenuation of transmit and/or received signal to a level below the analysis threshold), and signal degradation due to combined transducer motion and noise. Data quality control involved visual inspection of echograms (Fig. 5), followed by marking the seabed (if present) and regions of bad data using echogram tools available in Echoview®. Pings with prolonged noise interference or signal attenuation were flagged as bad data. Data shallower than 10 m were removed to exclude echosounder transmit pulse and echoes in the transducer nearfield. Similarly, data deeper than the seabed (if present) were removed from the analyses. Additionally, regions of aliased seabed echoes (i.e. seabed reverberations from preceding pings coinciding with the current ping) were manually flagged as bad data. Valid high scattering from biological sources (e.g. pelagic fish schools that may occur between surface and 250 m depth) causing an apparent transition in backscatter intensities was manually preserved from the transient noise filter described below5.

### Transducer motion correction

Echo-integration results will be biased if the change in orientation of transducer beam between the times of each ping is not accounted for. The effect of transducer motion on echo-integration was studied by Stanton67 and later Dunford68 developed a single correction function that can be applied for a wide range of circular transducers and related $${s}_{v}$$ data. To fully characterize platform movement, the Dunford68 algorithm implemented in Echoview® requires motion data (i.e. pitch and roll of a platform) recorded at a rate above the Nyquist rate of platform’s angular motion69 to avoid temporal aliasing due to an inadequate sampling rate. When platform motion data were available at a suitable sampling rate (see ‘Technical Validation’ section), transducer motion effects were corrected using Dunford68 algorithm by ensuring time synchronization with recorded acoustic data (Fig. 5).

### Data processing filters

Fishing vessels (FV) contributing to IMOS Bioacoustics sub-Facility were not purposely built for collecting high-quality bioacoustic data. Various factors including inclement weather and vessel design can affect data quality that could cause large biases in derived $${s}_{v}$$ values. To minimize these biases, data processing filters were applied to the raw data (Fig. 5). Transducer motion-corrected data were subject to a sequence of data processing filters5 designed to mitigate impulse noise, signal attenuation, transient noise, and background noise66.

Data processing filters were applied to each $${S}_{v}$$ sample in an echogram, identified by a vertical sample number $$i$$ and horizontal ping number $$j$$. The ‘context window’ defined for filters include a current ping, and surrounding pings on either side of the current ping. Depending on the filter used, the context window either centres on the current ping or current sample, and slides over the entire echogram.

### Impulse noise removal

Impulse noise affects discrete sections of the data with a duration of less than one ping, for example, transmit pulses originated from other unsynchronized acoustic systems. The impulse noise removal algorithm implemented in Echoview® (based on Ryan, et al.5) compares each $${S}_{v}$$ sample in a current ping to the adjacent $${S}_{v}$$ samples (at the same depth) in surrounding pings defined by a context window of specified width $$W$$ (see details of context window in Table 3). A smoothed copy of original $${S}_{v}$$ values (i.e. unfiltered data) within the context window was used to identify impulse noise (see details of smoothing window in Table 3). The original $${S}_{v}$$ samples were identified as impulse noise if the corresponding smoothed $${S}_{v}$$ samples satisfy the condition:

$${S}_{v}[i,j]-{S}_{v}[i,j-m] > \delta \,{\rm{a}}{\rm{n}}{\rm{d}}\,{S}_{v}[i,j]-{S}_{v}[i,j+n] > \delta ,$$
(2)

where $${S}_{v}[i,j]$$ (dB re 1 m2 m−3) represents smoothed copy of current ping with a vertical sample number $$i$$ and horizontal ping number $$j$$, $$m$$ and $$n$$ are the positive integer offsets from the current ping determined by the width ($$W$$) of context window, where $$m,n\in \left\{1,\ldots ,\frac{W-1}{2}\right\}$$ and $$W$$ is an odd integer value in the range 3 to 9, and $$\delta$$ (dB re 1 m2 m−3) is an empirically determined impulse noise removal threshold value. Identified noise values were replaced as ‘no data’. The impulse noise removal parameters defined in Echoview® are given in Table 3.

### Attenuated signal removal

Signal attenuation is generally caused by air bubbles beneath the transducer that may occur for one ping or can persist over multiple pings. The attenuated signal removal algorithm implemented in Echoview® (based on Ryan, et al.5) compares the percentile score of $${S}_{v}$$ samples in a current ping with the percentile score of $${S}_{v}$$ samples in surrounding pings defined by a context window (see details of context window in Table 4). The current ping was removed and replaced as ‘no data’ if:

$$p({S}_{v}[m\times n])-p({S}_{v}[i,j])\ge \delta ,$$
(3)

where the symbol $$p$$ denotes the desired percentile value, $${S}_{v}[i,j]$$ (dB re 1 m2 m−3) is the current ping with a vertical sample number $$i$$ and horizontal ping number $$j$$, $${S}_{v}\left[m\times n\right]$$ (dB re 1 m2 m−3) represents $${S}_{v}$$ samples in the context window defined by $$m$$ vertical samples and $$n$$ horizontal pings, and $$\delta$$ (dB re 1 m2 m−3) is an empirically determined attenuated signal removal threshold value. The attenuated signal removal parameters defined in Echoview® are given in Table 4.

### Transient noise removal

Transient noise is introduced to the received signal that can occur at irregular intervals and persists over multiple pings. The transient noise removal algorithm implemented in Echoview® (based on Ryan, et al.5) compares each $${S}_{v}$$ sample in a current ping with the percentile score of $${S}_{v}$$ samples in surrounding pings defined by a context window (see details of context window in Table 5). A smoothed copy of original $${S}_{v}$$ values (i.e. unfiltered data) within the context window was used to identify noise (see details of smoothing window in Table 5). The original $${S}_{v}$$ samples were identified as transient noise if the corresponding smoothed $${S}_{v}$$ samples satisfy the condition:

$${S}_{v}[i,j]-p\left({S}_{v}\left[m\times n\right]\right) > \delta ,$$
(4)

where the symbol $$p$$ denotes the desired percentile value, $${S}_{v}[i,j]$$ (dB re 1 m2 m−3) represents smoothed copy of current ping with a vertical sample number $$i$$ and horizontal ping number $$j$$, $${S}_{v}\left[m\times n\right]$$ (dB re 1 m2 m−3) represents smoothed copy of $${S}_{v}$$ samples in the context window defined by $$m$$ vertical samples and $$n$$ horizontal pings, and $$\delta$$ (dB re 1 m2 m−3) is an empirically determined transient noise removal threshold value. Identified noise values were replaced as ‘no data’. The transient noise removal parameters defined in Echoview® are given in Table 5.

### Background noise removal

Background noise is introduced to the received signal that can vary in intensity and pattern (see section ‘Technical Validation’). According to De Robertis and Higginbottom66, the calibrated $${S}_{v}$$ values (Eq. 1) can be expressed as the sum of contributions from the signal and noise as:

$${S}_{{v}_{{\rm{cal}}}}[i,j]=10\,{{\rm{\log }}}_{10}\left(1{0}^{\left({S}_{{v}_{{\rm{signal}}}}[i,j]/10\right)}+1{0}^{\left({S}_{{v}_{{\rm{noise}}}}[i,j]/10\right)}\right),$$
(5)

where $${S}_{{v}_{{\rm{cal}}}}$$ (dB re 1 m2 m−3) is the calibrated $${S}_{v}$$ samples derived from the raw data (i.e. Eq. 1), $${S}_{{v}_{{\rm{signal}}}}$$ (dB re 1 m2 m−3) is the calibrated $${S}_{v}$$ samples representing the contribution from signal, $${S}_{{v}_{{\rm{noise}}}}$$ (dB re 1 m2 m−3) is the calibrated $${S}_{v}$$ samples representing the contribution from noise, and the index i and j represent vertical sample number and horizontal ping number respectively.

To estimate background noise levels, calibrated received power $${P}_{e{r}_{{\rm{cal}}}}[i,j]$$ (dB re 1 W) values were calculated from $${S}_{{v}_{{\rm{cal}}}}[i,j]$$ values by subtracting the time-varied gain (TVG) function2 (i.e. $$2{0\log }_{10}r+2{\alpha }_{a}r$$) from Eq. 1 as:

$${P}_{e{r}_{{\rm{cal}}}}[i,j]={S}_{{v}_{{\rm{cal}}}}[i,j]-20\,{{\rm{\log }}}_{10}r[i,j]-2{\alpha }_{a}r[i,j].$$
(6)

The calibrated $${P}_{e{r}_{{\rm{cal}}}}[i,j]$$ values were averaged66 (in linear domain) within an ‘averaging cell’ of $$M$$ vertical samples (with an index $$k$$) and $$N$$ horizontal pings (with an index $$l$$) to estimate noise as:

$$Noise\left(l\right)={\rm{\min }}(\bar{{P}_{e{r}_{{\rm{cal}}}}}[k,l]),$$
(7)

where $$\bar{{P}_{e{r}_{{\rm{cal}}}}}[k,l]$$ (dB re 1 W) is the averaged $${P}_{e{r}_{{\rm{cal}}}}[i,j]$$ values calculated for each averaging cell with a vertical sample interval $$k$$ and horizontal ping interval $$l$$, and $$Noise\left(l\right)$$ (dB re 1 W) is the representative noise estimate for the ‘middle ping’ in each horizontal interval $$l$$. Note that the averaging cell slides over the entire echogram (see details of averaging cell in Table 6).

An empirically determined maximum threshold $$Nois{e}_{max}$$ (dB re 1 W) (see Table 6) was applied to $$Noise\left(l\right)$$ values as an upper limit of background noise levels. Any $$Noise\left(l\right)$$ values exceeding this threshold was replaced with the predefined $$Nois{e}_{max}$$ value.

The $$Noise\left(l\right)$$ value estimated for a given horizontal ping interval $$l$$ was assigned to all individual pings constituting the interval to establish noise $$Noise\left(j\right)$$ (dB re 1 W) estimate for each ping. The effect of TVG was added to the $$Noise\left(j\right)$$ levels to compute $${S}_{{v}_{{\rm{noise}}}}$$ for each vertical sample number $$i$$ and horizontal ping number $$j$$ as:

$${S}_{{v}_{{\rm{noise}}}}[i,j]=Noise\left(j\right)+20\,{{\rm{\log }}}_{10}r\left[i,j\right]+2{\alpha }_{a}r[i,j].$$
(8)

The background noise corrected volume backscattering strength $${S}_{{v}_{{\rm{bnc}}}}[i,j]$$ (dB re 1 m2 m−3) values for each vertical sample number $$i$$ and horizontal ping number $$j$$ were estimated as:

$${S}_{{v}_{{\rm{bnc}}}}[i,j]=10\,{{\rm{\log }}}_{10}\left(1{0}^{\left({S}_{{v}_{{\rm{cal}}}}[i,j]/10\right)}-1{0}^{\left({S}_{{v}_{{\rm{noise}}}}[i,j]/10\right)}\right).$$
(9)

The SNR, a measure of the relative contribution of signal and noise was estimated as:

$$SNR[i,j]={S}_{{v}_{{\rm{bnc}}}}[i,j]-{S}_{{v}_{{\rm{noise}}}}[i,j],$$
(10)

where $$SNR[i,j]$$ (dB re 1) is the signal-to-noise ratio for each vertical sample number $$i$$ and horizontal ping number $$j$$.

An empirically determined threshold $$Minimu{m}_{SNR}$$ (dB re 1) (see Table 6) was used as an acceptable SNR for background noise corrected $${S}_{{v}_{{\rm{bnc}}}}[i,j]$$ data. The $${S}_{{v}_{{\rm{bnc}}}}[i,j]$$ values with corresponding $$SNR[i,j]$$ below this threshold were set to ‘−999’ dB re 1 m2 m−3 (an approximation of zero in the linear domain). The background noise removal parameters defined in Echoview® are given in Table 6.

### Residual noise removal

In the final stage, a 7 × 7 median filter was applied to remove residual noise retained in the core filtering stages (especially at far ranges). A median filter replaces the current $${S}_{v}$$ sample with the median value of $${S}_{v}$$ samples in a $$M$$ × $$M$$ neighbourhood. It is important to note that the output of 7 × 7 median filter was not directly used for echo-integration, rather it was used to flag residual noise retained from the core filtering process. A maximum data threshold of −50 dB re 1 m2 m−3 and a time-varied threshold $$TVT(r)$$ with the reference value of −160 dB re 1 m2 m−3 (defined at 1 m range) was applied to the background noise corrected $${S}_{{v}_{{\rm{bnc}}}}[i,j]$$ data before applying 7 × 7 median filter (see Table 3 for a description of time-varied threshold). $${S}_{{v}_{{\rm{bnc}}}}[i,j]$$ values above the maximum threshold (i.e. −50 dB re 1 m2 m-3) were set to ‘−999’ dB re 1 m2 m−3. Similarly, $${S}_{{v}_{{\rm{bnc}}}}[i,j]$$ values below the calculated $$TVT(r)$$ values were set to ‘−999’ dB re 1 m2 m-3 (note that median filter may replace ‘−999’ with the median of samples in the 7 × 7 neighbourhood). The output of the median filter was used to create a Boolean data range bitmap (between −998 to −20 dB re 1 m2 m−3) with ‘true’ or ‘false’ values for each sample. This Boolean data range bitmap was applied to the background noise corrected $${S}_{{v}_{{\rm{bnc}}}}[i,j]$$ data for removing any residual noise before echo-integration. $${S}_{{v}_{{\rm{bnc}}}}[i,j]$$ values corresponding to ‘false’ values in the data range bitmap were set to ‘−999’ dB re 1 m2 m−3.

Quality-controlled $${S}_{v}$$ data along with: (1) calibrated and motion corrected raw data, (2) transducer motion correction factor (i.e. difference between ‘motion corrected’ and ‘calibrated raw’ data), (3) background noise, and (4) SNR were exported from Echoview® as echo-integration cells (i.e. grid on an echogram) with a resolution of 1 km horizontal distance (i.e. ping-axis interval $$p$$) and 10 m vertical depth (i.e. range-axis interval $$r$$). Echo-integration values were stored as comma-separated values (CSV) files. Exported $${S}_{v}$$ data were converted to linear scale for further processing and packaging in MATLAB® (Fig. 5).

### Secondary corrections for sound speed and absorption variation

Quality-controlled $${S}_{v}$$ data were echo-integrated and exported using a nominal sound speed $${c}_{w}$$ (m s−1) and absorption coefficient $${\alpha }_{a}$$ (dB m−1) values estimated using the equations of Mackenzie70 and Francois and Garrison71 respectively (see sound speed and absorption coefficient variables in Eq. 1 used for $${S}_{v}$$ calculation). However, open ocean transects pass through different hydrographical conditions, so a secondary range dependent correction was required to account for the changes in horizontal and vertical cumulative mean sound speed and absorption as:

$$\bar{{S}_{{v}_{{\rm{corr}}}}}[r,p]=\bar{{S}_{{v}_{{\rm{uncorr}}}}}[r,p]+20\,{{\rm{\log }}}_{10}\left(\frac{\bar{{c}_{w}}\left[r,p\right]}{{c}_{w}}\right)+2{r}_{n}[r,p]\left(\bar{{\alpha }_{a}}[r,p]\frac{\bar{{c}_{w}}\left[r,p\right]}{{c}_{w}}\,-\,{\alpha }_{a}\right)-10\,{{\rm{\log }}}_{10}\left(\frac{{c}_{w}[r,p]}{{c}_{w}}\right),$$
(11)

or in linear terms:

$$\bar{{s}_{{v}_{{\rm{corr}}}}}[r,p]=\frac{\bar{{s}_{{v}_{{\rm{uncorr}}}}}[r,p]{\left(\frac{\bar{{c}_{w}}[r,p]}{{c}_{w}}\right)}^{2}1{0}^{\frac{2{r}_{n}[r,p]}{10}\left(\bar{{\alpha }_{a}}[r,p]\frac{\bar{{c}_{w}}[r,p]}{{c}_{w}}-{\alpha }_{a}\right)}}{\left(\frac{{c}_{w}[r,p]}{{c}_{w}}\right)},$$
(12)

where $$\bar{{s}_{{v}_{{\rm{uncorr}}}}}[r,p]$$ (m2 m−3) is the uncorrected (but filtered) volume backscattering coefficient values exported from Echoview® at the specified range-axis interval $$r$$ (i.e. 10 m) and ping-axis interval $$p$$ (i.e. 1 km), $${r}_{n}[r,p]$$ (m) is the regularly spaced depth values for each echo-integration cell, $$\bar{{c}_{w}}\left[r,p\right]=\frac{{\sum }_{r=1}^{n}{c}_{w}\left[r,p\right]}{n};\,\forall p$$ (m s−1) is the cumulative mean sound speed values estimated at each echo-integration cell for the new range $${r}_{a}[r,p]={r}_{n}[r,p]\frac{\bar{{c}_{w}}[r,p]}{{c}_{w}}$$ (m) calculation, $$\bar{{\alpha }_{a}}\left[r,p\right]=10\,{{\rm{\log }}}_{10}\left(\frac{{\sum }_{r=1}^{n}1{0}^{\left(\frac{{\alpha }_{a}\left[r,p\right]}{10}\right)}}{n}\right);\forall p$$ (dB m−1) is the cumulative mean absorption coefficient values ‘interpolated’ at the new range $${r}_{a}[r,p]$$, and $$\bar{{s}_{{v}_{{\rm{corr}}}}}[r,p]$$ (m2 m−3) is the corrected volume backscattering coefficient values at the new range $${r}_{a}[r,p]$$.

Due to changes in cumulative mean sound speed, this correction step creates a grid with irregular $${r}_{a}[r,p]$$ values. Therefore, the $$\bar{{s}_{{v}_{{\rm{corr}}}}}[r,p]$$ values at the new ranges $${r}_{a}[r,p]$$ were interpolated and reported at the regularly spaced $${r}_{n}[r,p]$$ values.

The sound speed and absorption coefficient values for secondary corrections were estimated using the equations of Mackenzie70 and Francois and Garrison71 respectively. Francois and Garrison71 estimate their ‘total absorption equation’ to be accurate within 5% for ocean temperature values of −1.8–30 °C, frequencies of 0.4–1000 kHz, and salinity values of 30–35 PSU. The typical hydrographical conditions (temperature values of 0–27 °C and salinity values of 34–36 PSU) present along the open ocean transects are generally within the reliability limits of Francois and Garrison71 equation.

The temperature and salinity data for sound speed and absorption coefficient calculations were interpolated from either CSIRO Atlas of Regional Seas72 (CARS, http://www.marine.csiro.au/~dunn/cars2009/ version 2009) or Synthetic Temperature and Salinity (SynTS)73 analyses (http://www.marine.csiro.au/eez_data/doc/synTS.html), but can also be derived from oceanographic reanalysis and ocean circulation models. CARS2009 is a digital climatology or atlas of seasonal ocean water properties. It is based on a comprehensive set of quality‐controlled vertical profiles of in situ ocean properties (i.e. temperature, salinity, oxygen, nitrate, silicate, and phosphate) collected between 1950 and 2008. CARS2009 NetCDF files contain a gridded mean of these ocean properties and average seasonal cycles generated from the collated observations. CARS2009 covers global oceans on a 0.5 × 0.5 degree grid spatial resolution, and are mapped onto 79 standard depth levels from the sea surface to 5500 m (from this vertical profiles of ocean properties along a bioacoustic transect can be extracted). SynTS is a daily three-dimensional (3D) temperature and salinity product generated by CSIRO, where the CARS temperature and salinity fields are adjusted with daily satellite sea surface temperature (SST) and gridded sea level anomaly (GSLA). SynTS has a 0.2 × 0.2 degree grid spatial resolution, and is mapped onto 66 standard depth levels from the sea surface to 2000 m. Due to limited spatial coverage (60°S–10°N and 90°E–180°E), the SynTS products may not always cover the transect region (e.g. Southern Indian Ocean), in that case CARS climatology values were used for the secondary corrections (Fig. 5).

### Data review, packaging and submission routines

For each processed transect, secondary corrections applied $$\bar{{s}_{{v}_{{\rm{corr}}}}}$$ data together with metrics of data quality and other auxiliary data variables were stored in Network Common Data Form (NetCDF, www.unidata.ucar.edu) file (NetCDF-4 format) with a resolution of 1 km horizontal distance (i.e. ping-axis interval) and 10 m vertical depth (i.e. range-axis interval) (see ‘Data Records’ section for data contents). This NetCDF file conforms standardized naming conventions and metadata content defined by the Climate and Forecast (CF)74, IMOS75, and International Council for the Exploration of the Sea (ICES)76 published over the years (Fig. 6).

Processed NetCDF files were independently reviewed by both analyst and principal investigator to further investigate data quality. If suitable, the NetCDF file along with ancillary files: (1) acquired raw data (.raw files), (2) platform track in CSV format (containing date, time, latitude, longitude, and time offset to UTC), (3) platform motion data (if recorded) in CSV format (including date, time, pitch, and roll measurements), and (4) a snapshot of processed echogram as Portable Network Graphics (PNG) format were packaged and submitted to the publicly accessible AODN Portal (Fig. 5).

## Data Records

The primary components of a processed NetCDF file are shown in Fig. 6 and described in Table 7 to provide an overview of data contents and structure. Each variable in a NetCDF file is described with an associated description, specifying the data output resulting from each data-collection or analytical step (Table 7).

Processed NetCDF files are published via the Australian Ocean Data Network (AODN) Portal at:

This portal allows transect selection and data download with spatial and temporal subset options implemented for each platform and frequency.

A generic metadata record of the project is available via GeoNetwork at:

The NetCDF files are also accessible via the AODN THREDDS data server that can be accessed remotely using the OPeNDAP protocol at:

A snapshot of processed NetCDF files at the time of this publication has been assigned a Digital Object Identifier (https://doi.org/10.26198/dv5p-t593) and will be maintained in perpetuity by the AODN77. Readers are directed to check the AODN Portal for the latest data set.

## Technical Validation

### Routine calibration and monitoring of echosounders

In the context of echosounder calibration, it is important to note that respective $$\pm X$$ dB re 1 (where $$X$$ is a real number) change in calibration parameters $${G}_{0}$$ and $${S}_{a{\rm{c}}{\rm{o}}{\rm{r}}{\rm{r}}}$$ factor represents a corresponding twofold $$\mp 2X$$ dB re 1 m2 m-3 variation in the derived $${S}_{v}$$ (Eq. 1) that would result in $$\mp \left(100\left(1{0}^{(2X/10)}\right)-100\right)$$% change echo-integration results (if accurate calibration parameters are not applied). In principle, properly calibrated echosounders operating at the same frequency should provide matching echo-integration results for a given sampling region. However, due to platform performance (e.g. aeration beneath the transducer), the derived data may be biased and this can be verified by an intercalibration4,78 experiment with two or more platforms simultaneously sailing over the same region, and later comparing the echo-integration results. In suitable circumstances, large uncertainty in the absolute calibrations and platform-specific factors can be quantified. This generic principle was applied to prioritize platforms for potential long-term data collection by comparing data quality metrics between participating platforms. As the spatio-temporal coverage of the data series improves, it will be possible to perform more direct comparisons between platforms and with an acoustic climatology of the regions.

Time series calibration results of selected platforms (with consistent echosounder configuration) are shown in Fig. 7 as an example to highlight repeatability and challenges with monitoring long-term performance and stability of echosounders. The FV Rehua, FV Antarctic Discovery, and RV Investigator demonstrate reasonable repeatability of 38 kHz transducer measurements with $${G}_{0}$$ values varying between 25.4 ± 0.2, 27.0 ± 0.3, and 24.9 ± 0.2 dB re 1 respectively (Fig. 7a,c,d). In contrast, the FV Austral Leader II (Fig. 7b) indicates gradual degradation of system performance (possibly ageing effect) over six years with 1.3 dB re 1 decrease in calibrated $${G}_{0}$$ values79. Keeping $${p}_{et}$$, $$\tau$$, $${\alpha }_{a}$$, $${c}_{w}$$, and $$\psi$$ constant (Eq. 1), this performance change would result in ~44% decrease in $${S}_{v}$$ data if $${G}_{0}$$ and $${S}_{a{\rm{c}}{\rm{o}}{\rm{r}}{\rm{r}}}$$ factor calculated in 2009 is applied for processing 2015 data sets.

Although the established sphere calibration method standardizes bioacoustic data collected by multiple participating platforms, there is a need for an additional diagnostic method to ensure echosounder performance in between routine calibrations. Along with calibration results, the peak values of instantaneous received power $${P}_{er}$$ (Eq. 1) measured within 0–1 m range (i.e. ringdown zone) are used as a complementary diagnostic method to ensure stability of echosounders, noting that monitoring is not calibration. This method can highlight noticeable gradual or abrupt changes in system performance over time. For example, spatio-temporal variations in peak power for FV Atlas Cove (Fig. 7e) highlight gradual degradation of 38 kHz echosounder performance with ~11 dB re 1 W decrease in peak power values over a year, complicating data usage. In contrast, a comparison between 18 and 38 kHz peak power values for FV Antarctic Discovery (Fig. 7f) highlights an unknown abrupt change (~3 dB re 1 W) in 18 kHz echosounder performance over 15 days docking period, necessitating routine monitoring. Such performance changes (if observed) are reported back to the operator for system maintenance (Fig. 1), and juxtaposed with relevant calibration results to assess repeatability of measurements and prioritizing transects for processing.

Simmonds and MacLennan2 consider that in fisheries acoustics applications, properly maintained low-frequency scientific echosounders can demonstrate consistent performance within 10% in the long-term. The aim should be to develop a routine or protocol for calibration that would help achieve this accuracy consistently irrespective of the echosounder system used. But in practice, variability in echosounder on-axis sensitivity could result from a combination of factors including system electronics, data acquisition settings, SNR, environmental conditions, and density and composition of the calibration sphere3. The performance of an echosounder may degrade gradually or abruptly (Fig. 7e,f), and transducers are vulnerable to mechanical damage and ageing effects80. Therefore, it is important to quantify such changes routinely for all participating platforms to apply suitable calibration corrections required for data processing. This would further facilitate existing feedback mechanism with platform operators for subsequent system maintenance and technical inspection.

### Transducer motion correction

Transducer motion can reduce the received signal and degrade data quality substantially at long ranges depending on the sea state and platform design. For hull-mounted circular transducer, the platform motion and transducer motion can be considered synonymous81. Accordingly, the angular motion of platform can be used to correct for the change in orientation of transducer beam between the times of each ping, with a precondition that platform motion data (i.e. pitch and roll) need to be recorded at a sampling rate above the Nyquist rate of platform’s angular motion. The Power Spectral Density (PSD) analyses82 of motion data (Fig. 8a) recorded from selected platforms indicate that a minimum sampling rate of 4 Hz is generally adequate to meet this precondition and subsequent correction. Sources of error may exist in motion-corrected data if there is a large discrepancy between measured and manufacturer specified (or nominal) beamwidths of the transducer used83.

Owing to the magnitude of angular displacement and beamwidth values of transducers used, the effects of transducer motion can result in a non-linear range dependent $${s}_{v}$$ correction. If motion correction is not applied, it can negatively bias (or underestimate) echo-integration results, where the amount of correction increases with range. The correction is greater for narrow-beam transducers and comparatively smaller for wide-beam transducers (Fig. 8c,d). The variable ‘motion_correction_factor’ (Table 7) is now being stored in NetCDF files for assessing the magnitude of transducer motion correction and recalculating calibrated raw data (if needed).

Associated global attributes (Fig. 6) ‘data_processing_motion_correction’ and ‘data_processing_motion_correction_description’ capture important metadata of transducer motion correction applied.

### Data processing filters

The quality of bioacoustic data collected from ships of opportunity sampling methods can be complex and extremely variable. If noise is not removed, it can be misinterpreted as biological signal, biasing echo-integration results. Statistical quantification of bias and error potential for data retained after filtering is challenging and beyond the scope of present study. However, selected examples of bioacoustic data with good and compromised quality are presented to demonstrate combined effectiveness of data processing filters. The application of data processing filters has considerably improved the quality of bioacoustic data and demonstrated to be robust across diverse platforms and weather conditions5. A caution is that there are subjective elements in ‘visually’ determining the quality of final data product after filtering, but this can be made objective to a certain extent by comparing raw and filtered echograms with metrics of data quality stored in NetCDF files.

As an example, good quality data collected by FV Will Watch in the Indian Ocean is presented in Fig. 9, highlighting diel vertical migration and deep scattering layer without any apparent artefacts in the data. To broadly quantify the combined effect of data processing filters, mean difference between unfiltered and filtered echograms (i.e. difference in mean $${S}_{v}$$ before and after filtering) is calculated for epipelagic, upper mesopelagic, and lower mesopelagic layers respectively, indicating 0.3 ± 0.9 (~7%), 0.1 ± 0.4 (~2%), and 0.1 ± 0.1 dB re 1 m2 m-3 (~2%) reduction in the filtered data (see Table 7 for layer description). The data quality metric SNR (Fig. 9b) in epipelagic, upper mesopelagic, and lower mesopelagic layers are 59.1 ± 4.6, 34.6 ± 3.5, and 31.5 ± 4.4 dB re 1 respectively, with mean ping-axis interval background noise level calculated as −165.6 ± 2.1 dB re 1 W (Fig. 9c). After the filtering process, approximately 98%, 98%, and 99% of $${S}_{v}$$ data are retained in the epipelagic, upper mesopelagic, and lower mesopelagic layers respectively (Fig. 9d).

To demonstrate the usefulness of data quality metrics, data collected by FV San Tongariro in Tasman Sea is presented in Fig. 10. This example compares raw and filtered echograms, highlighting predominant transient noise in the data amplified as a function of TVG. The mean difference between unfiltered and filtered echograms in epipelagic, upper mesopelagic, and lower mesopelagic layers respectively indicates 1.7 ± 1.9 (~48%), 1.2 ± 1.5 (~31%), and 3.6 ± 1.8 dB re 1 m2 m-3 (~129%) reduction in the filtered data, highlighting range-dependant effect of combined noise5 (i.e. the sum of impulse, transient, and background noise). Associated data quality metric SNR (Fig. 10b) in epipelagic, upper mesopelagic, and lower mesopelagic layers are 32.6 ± 7.9, 22.2 ± 5.3, and 17.8 ± 4.4 dB re 1 respectively, with mean ping-axis interval background noise level (Fig. 10c) calculated as −152.5 ± 3.4 dB re 1 W (note this background noise is ~13 dB re 1 W higher as compared to the good quality data presented in Fig. 9c). After the filtering process, approximately 84%, 88%, and 86% of $${S}_{v}$$ data are retained in the epipelagic, upper mesopelagic, and lower mesopelagic layers respectively (Fig. 10d). The raw data quality of this transect is not satisfactory (Fig. 10a) and despite the visual appearance of filtered data, the quality metrics SNR, background noise, and percentage of $${S}_{v}$$ data retained after filtering are not considered to be acceptable as compared to the other transect with high data quality (Fig. 9a).

Similarly, data acquired by FV Isla Eden is presented in Fig. 11, highlighting an abrupt (~5 dB re 1 W) change in the background noise level over 24 hours, presumably indicating electrical interference and electrical noise in the echosounder. The mean difference between unfiltered and filtered echograms in epipelagic, upper mesopelagic, and lower mesopelagic layers respectively indicates 1.0 ± 2.1 (~25%), 1.0 ± 1.6 (~25%), and 1.4 ± 1.2 dB re 1 m2 m-3 (~38%) reduction in the filtered data, predominantly highlighting range-dependant effect of background noise. Related data quality metric SNR (Fig. 11b) in epipelagic, upper mesopelagic, and lower mesopelagic layers are 37.5 ± 8.2, 12.9 ± 4.2, and 12.9 ± 2.3 dB re 1 respectively, with mean ping-axis interval background noise level (Fig. 11c) calculated as −145.5 ± 2.1 dB re 1 W (note this background noise is ~20 dB re 1 W higher as compared to the good quality data presented in Fig. 9c). After the filtering process, approximately 89%, 85%, and 82% of $${S}_{v}$$ data are retained in the epipelagic, upper mesopelagic, and lower mesopelagic layers respectively (Fig. 11d).

These examples (Figs. 10 and 11) suggest that caution is needed while reviewing a final data product where filtering and subsequent resampling to predefined NetCDF file resolution (1 km horizontal distance and 10 m vertical depth) may produce a visually clean echogram without any apparent artefacts, but potentially removed significant biological signal and/or retained noise in the process. Accordingly, we have not posted these two transects to the AODN, and similar data sets from other platforms with compromised data quality are archived locally. Storing metrics of data quality in NetCDF files is intended for assisting users to make an independent assessment of data quality based on the examples demonstrated here.

### Secondary corrections for sound speed and absorption variation

The difference between variable ‘uncorrected_Sv’ (i.e. filtered data before secondary corrections) and ‘abs_corrected_sv’ (i.e. same data after secondary corrections but before depth interpolation) is calculated (uncorrected_Svabs_corrected_sv) to demonstrate the effect of secondary corrections (Fig. 12). This step introduces a range-dependent correction (Fig. 12f) that can differ substantially based on the equation used for calculating sound absorption in seawater (see Fig. 5 in Doonan, et al.84 for a comparison between two commonly used equations for absorption calculation. Note that the range-dependant percentage correction to the data can differ up to 45% between Doonan, et al.84 and Francois and Garrison71 for a 38 kHz data at 1000 m depth).

The processed bioacoustic data sets are consistently corrected based on Mackenzie70 sound speed and Francois and Garrison71 absorption equations following the recommendations by Simmonds and MacLennan2 until more evidence is available to select another formula. Macaulay, et al.85 conducted field measurements of acoustic absorption in seawater from 38 to 360 kHz, indicating consistent results with Francois and Garrison71 equation for frequencies of 200 kHz and lower. Macaulay, et al.85 observed a significant difference around 333 kHz, indicating that Francois and Garrison71 equation is incorrect for some input parameters (note that 333 kHz data is not processed under IMOS Bioacoustics sub-Facility).

It is important to note that the percentage correction shown in Fig. 12 is applicable to the example transect only that depends on the nominal sound speed and absorption values used during initial processing and echo-integration (Eq. 1). Other transects (e.g. Southern Ocean) have different correction factors that are related to regional changes in temperature and salinity values.

The key intermediate variable ‘uncorrected_Sv’ is stored in NetCDF files for recalculating secondary corrections using other equations or data sources (if needed). The equation used for sound absorption calculation is documented in the global attribute section of a NetCDF file as ‘data_processing_absorption_description’ and ‘history’. Similarly, the equation used for sound speed calculation is captured in the global attributes ‘data_processing_soundspeed_description’ and ‘history’.

## Usage Notes

When interpreting bioacoustic data it is important to understand the corrections applied at each processing step particularly calibration, transducer motion correction, data filtering, and secondary corrections for sound speed and absorption variation (Fig. 5). The transducer motion and secondary corrections are range-dependant that can greatly influence the lower mesopelagic layer derived metrics. Our goal is to keep minimum updates to data processing steps and data records so that the database remains consistent and comparable.

### Measurement uncertainty

The widely used Simrad EK60 echosounder is now discontinued and replaced by the Simrad EK80. A recent comparison study86 highlighted that EK80 raw power measurements were 3–12% lower as compared to EK60, affecting weak scatterer and/or long-range acoustic observations due to nonlinear amplification of low-power signals by the EK60. Presently the users need to correct the data for this bias, and we are in the process of providing a correction update to the data sets with associated metadata. In addition to calibration and unknown methodological uncertainties, this new measurement uncertainty highlights the ongoing challenges in maintaining a diverse data series, and the need for storing fundamental echosounder measurement (i.e. received echo power) and appropriate metadata to enable unforeseen corrections in the future as needed.

### Challenges with biomass estimation

Ships of opportunity bioacoustic sampling methods have clear advantages as well as limitations. Their usefulness in resource assessment, ecosystem monitoring, and cost-effective mapping of mesopelagic communities at regional and global scales is established with diverse acoustic-based indicators and metrics, but credible conversion of bioacoustic data ($${s}_{v}$$) to open ocean fish biomass is a multi-step procedure and require lowest degree of bias.

For example, the processed $${s}_{v}$$ values are vertically integrated over a measurement range ($${r}_{1}$$ to $${r}_{2}$$) to calculate area backscattering coefficient $${s}_{a}$$ (m2 m−2) along a transect. Scatterer areal density (number m−2) i.e. the number of organisms (e.g. fish) within the measurement range is calculated by dividing $${s}_{a}$$ by the backscattering cross-section $${\sigma }_{bs}$$ (m2) of a representative single fish. Biomass of fish (kg m−2) can be estimated by multiplying this scatterer areal density by the weight $$W$$ (kg) of a single fish. This requires separation of bioacoustic data by species composition, location, and $${\sigma }_{bs}$$ distribution. Mean weight can be derived from observed weights (using nets) or length to weight regression. Similarly, $${\sigma }_{bs}$$ are obtained from in situ measurements and/or $${\sigma }_{bs}$$to length regressions2. Biomass calculations from these equations will be biased if the weight and target strength $$TS$$ [$$10\,{{\rm{\log }}}_{10}({\sigma }_{bs})$$, dB re 1 m2] of the organism are uncertain (assuming accurate calibration and echosounder linearity). For that reason, in situ and/or modelled $$TS$$ must be calculated with the goal of obtaining a representative distribution.

Credible estimation of biomass using a vessel-based echosounder is very difficult for the highly diverse mesopelagic community, where gas inclusions may present in many organisms (depending on the region) that can cause frequency- and depth-dependent resonance scattering87, dominating the received signal. Multiple methods of ecosystem models, net capture, acoustic backscattering models, and in situ profiling acoustic optical systems19,88 are needed to provide the necessary information to convert basin-scale bioacoustic data into specific biological metrics such as species-specific biomass61.

Generated data are stored in NetCDF files that can be readily imported into a wide variety of cross-platform software programs and programming languages. A custom MATLAB® function ‘viz_sv’ to read and visualize NetCDF files conforming to the conventions described in this data descriptor can be downloaded from the IMOS Bioacoustics sub-Facility web site http://imos.org.au/facilities/shipsofopportunity/bioacoustic or GitHub https://github.com/CSIRO-Acoustics/Visualize-IMOS-Bioacoustics-data.