A 14-year time series of marine megafauna bycatch in the Italian midwater pair trawl fishery

Fisheries bycatch is recognised as a global threat to vulnerable marine megafauna and historical data can contribute to quantify the magnitude of the impact. Here, we present a collection of three datasets generated between 2006 and 2019 by a monitoring programme on marine megafauna bycatch in one of the main Italian fisheries, the northern central Adriatic midwater pair trawl fishery. The three datasets consist of: (i) monitored fishing effort; (ii) bycatch and biological data of dolphins, sea turtles and elasmobranchs; (iii) and dolphin sightings. Some information included in these datasets has already proved to provide a unique opportunity to estimate total incidental capture of species of conservation concern and trends of their relative abundance over time in the northern - central Adriatic Sea. These datasets are expected to be considered by different end users to improve the conservation of species and fishery management approaches to assess the impact of a fishery on species of conservation concern. Measurement(s) marine megafauna bycatch Technology Type(s) monitoring • digital curation Factor Type(s) temporal interval Sample Characteristic - Organism Tursiops truncatus • Caretta caretta • Elasmobranchs Sample Characteristic - Environment sea Sample Characteristic - Location Adriatic Sea Measurement(s) marine megafauna bycatch Technology Type(s) monitoring • digital curation Factor Type(s) temporal interval Sample Characteristic - Organism Tursiops truncatus • Caretta caretta • Elasmobranchs Sample Characteristic - Environment sea Sample Characteristic - Location Adriatic Sea Machine-accessible metadata file describing the reported data: https://doi.org/10.6084/m9.figshare.17152577


Background & Summary
The incidental catch of non-target species (bycatch) during fishing operations is one of the major global threats for marine megafauna, including cetaceans, sea turtles and elasmobranchs 1-5 . Since these species have different life-history traits and distribution, their populations are considered to be particularly vulnerable to direct mortality caused by fishing operations, and fisheries bycatch can contribute to their decline [6][7][8] . However, the ability of scientists to provide advice on potential options for management measures on vulnerable species is still limited due to data availability 9 .
Within the EU Data Collection Framework, Member States have an obligation to collect and deliver a wide range of fisheries data needed for scientific advices 10 . Data are usually recorded and stored at national or at regional level by different bodies (e.g., public institutions, Non-Governmental Organizations NGOs) in different databases. Hence, datasets are fragmented, not readily available, nor easily accessible for scientific, management and conservation purposes.
Appropriate management strategies that can mitigate the impact of fisheries on vulnerable species are urgently needed, but they require easily accessible data from systematic and independent monitoring programmes 11 . In the case of the Mediterranean Sea, many authors have documented that dolphins 12 , sea turtles 13 , sharks and rays 14,15 interact with and are incidentally taken by different types of fisheries, including trawlers, longlines, and gillnets 16 . Nevertheless, there is little quantitative data available on historical bycatch data of this marine megafauna and, only recently, few authors have started to share time series of Mediterranean fishery data in public repositories [17][18][19][20] .
To our knowledge, this is one of the first initiatives to make public historical bycatch data of marine megafauna recorded in the most heavily impacted basin of the Mediterranean Sea 21 , the northern central Adriatic Sea. This area supports a rich and valuable marine biodiversity including marine megafauna and is subjected to a variety of sources of anthropogenic pressures, mainly intense fishing activities, eutrophication, large urban development along coastal areas, and environmental pollution [22][23][24][25] . Since the early eighties, the northern central Adriatic Sea has been intensively exploited by many fisheries, including the Italian midwater pair trawl fishery, which is one of the largest in the Mediterranean 26 . Between 2006 and 2019, an extensive monitoring programme of accidental catches of marine megafauna has been conducted on this fishery under permit issued by the Italian Ministry of Agriculture, Food and Forestry (Fishery and Aquaculture directorate), in compliance with the Italian obligations to the Council Regulation (EC) 812/2004 and the EU Data Collection Framework. The primary goal of the programme was to identify and assess the impact of fisheries bycatch on cetaceans in the Italian midwater pair trawl fishery. Then, species of conservation concern like sea turtles and elasmobranchs were also included in the monitoring activity. In this framework, long-term fishery dependent data collected by trained observers provided the most reliable information on the interaction of different vulnerable species with a specific fishing gear in the northern central Adriatic Sea.
From the data collection, a database was built, and three datasets were extracted and described in the present work. Some information included in these datasets have already proved to be useful evaluating the impact 27 and predicting the incidental catches of elasmobranchs 28,29 and sea turtles 30 in the Italian midwater pair trawl fishery. The information gathered in the datasets can help to assess the extent of fisheries bycatch on different vulnerable species in the Mediterranean Sea. Hence, these datasets can be particularly helpful for understanding the ecology of these species and identifying appropriate fishery management measures. For each haul, they recorded operational parameters (e.g., haul duration, time of net setting and hauling, trawling speed) and environmental variables (e.g., geographical coordinates, water depth). Bycatch specimens were measured to the nearest cm using a measuring board and weighed to the nearest gram using an electronic scale or a dynamometer for the largest specimens. For each individual, physical status was assessed by examining body condition including the presence of any injuries, bleeding, the response to external stimuli, the general activity and locomotion ( Table 1).

Methods
The monitoring activity was designed according to fleet dynamics that in the case of midwater pair trawl fishery is highly variable in space -depending on the distribution of the target species (small pelagics) -and time -in terms of the national regulation of fishing effort (i.e., trawlers must respect temporal closures during weekends and spawning periods of the target species) and should operate within Italian waters. Based on these considerations, observations covered between 3 and 7% of the total annual fishing effort of midwater pair trawlers operating in the northern-central Adriatic Sea.
Database framework. All information collected on board by fishery observers were reported in a dedicated spreadsheet. Each file was read and checked for potential erroneous entries by using a series of Python routines. After validation, the data were uploaded in the "BYCATCH" database hosted at the Italian National Research Council (CNR) Institute of Marine Biological Resources and Biotechnologies (IRBIM) of Ancona, Italy. BYCATCH was built in MySQL and it was managed and maintained using Python, R and different database management tools (e.g., phpMyAdmin and MySQL Workbench). BYCATCH consists of a collection of tables that store interrelated data. The main database tables are illustrated in the diagram shown in Fig. 1. Each record is associated to a unique ID which allows to create relationships between tables and to generate different datasets. BYCATCH contains fishing operations of the Italian midwater pair trawl fishery monitored between 2006 and 2019, the geographic distribution and biological information of incidental catches of cetaceans, sea turtles and elasmobranchs and dolphin sightings. An overview of all data stored in BYCATCH is provided Online-only Table 1. The geographic distribution of all bycatch specimens is shown in Fig. 2 and for each species, the number of specimens recorded every year and their bycatch rate are shown in Fig. 3. While BYCATCH was built specifically to house bycatch data of the species noted above, the database structure was designed to easily be applied to other species and to include different type of information (e.g., catches of target species and environmental variables). From BYCATCH, three datasets were extracted via queries directly in MySQL or through the free software environment R 31 using RMySQL package, a Database Interface and 'MySQL' Driver for R 32 .

Data records
The three datasets are available on the Marine Data Archive (MDA) 33 . All datasets include data aggregated per year and per cell using grid cells of 5 nm which cover the northern central Adriatic Sea. The datasets consist of a collection of shapefiles arranged in a GeoPackage format 34 . In all shapefiles, the attribute table displays the cell ID and the mean of geographical coordinates of each cell. Then, each shape file contains the following specific information:

Technical Validation
The collection of tables stored in BYCATCH is the result of an intense compilation and validation process of a 14-year time series of marine megafauna bycatch. All information collected on board by fishery observers were reported in a dedicated spreadsheet developed in Microsoft Excel. To preserve the quality of the data, avoiding data entry errors and typing (e.g., wrong entry format, misspelling, missing information), a series of conditional formatting rules were created with Excel Visual Basic for Application (VBA) macro. When the user was filling the spreadsheet, if the data were not validated against a specific entry format or range, the inconsistency was highlighted by an error message. Then, the user could immediately solve any identified issue, and proceed with data entry process. Specifically, the rules were set up to highlight potential: -Mismatches between fishery observers and their corresponding monitored fishing vessel and harbour of provenience: each fishery observer monitored a specific fishing vessel from a specific harbour.
-Inconsistencies between temporal variables: all fishing operations recorded at specific dates and times should be included within the time frame of the corresponding fishing trip. In addition, the timing of the starting of the hauling should be reported before than the timing of its conclusion.
-Inconsistencies of recorded fishing operations: the duration of each fishing operation recorded on board was compared with an estimated duration, calculated as the ratio between the length (distance between the starting and ending points of a fishing operation), and the average trawling speed (kn) of the vessel. A maximum of 30% of discrepancy between observed and calculated durations was allowed.
-Wrong vessel speed: considering that the usual speed of a midwater pelagic trawler should fall within the 3-5 (maximum) kn speed range, only values inside this range are allowed.
After data validation, all spreadsheets were uploaded in BYCATCH through Python, and tables were updated to hold it in a consistent format. During the upload, the programme repeats some checks on the data (e.g., temporal inconsistency, length and speed of fishing operations) and screen geographical location of fishing operations (e.g., position on land). If an error was found, the wrong record was flagged to be further investigated and corrected. www.nature.com/scientificdata www.nature.com/scientificdata/

Usage Notes
The datasets are freely available and stored in the Marine Data Archive (MDA) and should be appropriately referenced by citing the present paper. The datasets can be used by various end users. For instance, fishery scientists can examine where the Italian midwater pair trawl fishery occurred over 14 years in the northern central Adriatic Sea. Knowing where fishing is occurring is crucial to assess potential impact on different vulnerable species being taken in the basin. From the datasets, the monitored fishing effort can be coupled with Automatic Identification System (AIS) transmitted data to quantify unknown vessel tracks providing a more realistic picture of the overall fishing activity. Then, ecologists can also couple the present historical bycatch data of different species with other existing time series coming from different sources (e.g., monitoring activities from different Mediterranean regions, aerial surveys, commercial landings, interviews with fishers). In this contest, meta-analyses, Bayesian statistics and other approaches can be used to combine and analyse data from a variety of sources to evaluate long-term trends of potential threatened marine megafauna and how their populations have changed over time 14,[35][36][37][38] . The datasets described in this work can also help conservation biologists and managers to fill gaps in marine megafauna knowledge and to develop more accurately managements strategies.
However, users should take into account some limitations of the datasets. Overall, historical bycatch data of different marine megafauna species exhibit a large number of zero observations (no catch). This is a regular case in fishery, when data regards non-target species, like dolphins, sea turtles and elasmobranchs which are caught less frequently than target species (small pelagic fish in the case of midwater pair trawl fishery). Thus, when evaluating the relative abundance of such species over time, the modelling approach considered should deal with potential interpretation problems 35,36 . Previous studies showed that CPUE data can be modelled to address the effect of operational and environmental factors that can affect catch rate. For instance, historical CPUE of two myliobatids, the common eagle ray (Myliobatis aquila) and the bull ray (Aetomylaeus bovinus) included in the dataset were modelled by 28 using Generalized Additive Models (GAMs). This procedure was applied in a delta   www.nature.com/scientificdata www.nature.com/scientificdata/ modelling approach which allowed to model the probability of species occurrence and the magnitude of catch events separately (see details in 28 ). The results indicated that the predictive accuracy of the delta-modelling strategy was rather good and a similar approach was used by 30 to evaluate the seasonal distribution of bycatch events of loggerhead (Caretta caretta). Furthermore, Zero-inflated General Linear Models (GLMs) were used by 29 to examine the relative abundance of four elasmobranch species -common smooth-hound (Mustelus mustelus), common eagle ray (Myliobatis aquila), spiny dogfish (Squalus acanthias), and pelagic stingray (Pteroplatytrygon violacea) -included in the dataset. The results aimed to standardize annual trends of the CPUE, considering the best set of covariates among those tested (see details in 29 ). In addition, users should consider that the datasets include fishery-dependent data which can suffer of intrinsic bias related to the stochasticity of the distribution a.  www.nature.com/scientificdata www.nature.com/scientificdata/ of marine megafauna and a lack of a well-defined sampling design of the monitoring activity in space and time. Indeed, following the considerations in 27,29,30 a non-equal distribution of the monitoring activity was conditioned by fleet dynamics (e.g., fishing closure, fish market preferences and price) and bureaucratic delays of the project, which affected both observed pattern and estimation of bycatch events.

Code availability
The datasets were extracted via queries directly in MySQL Workbench and through the free software environment R 31 , using: the dbConnect function from the RMySQL package, a Database Interface and 'MySQL' Driver for R 32 . The maps in Fig. 2 were created using several R packages like tidyverse 39 for data handling, rgdal 40 a package providing bindings to the "Geospatial Data Abstraction Library", sf package, a standardized way of encoding spatial vector data in R 41 and the package ggplot 42 for graphical visualization. The package geosphere 43 was also used for the calculation of haul midpoints, starting from the coordinates of gear deploying and retrieving of each fishing operation. A few examples of these operations (MySQL and R) and of the R code for the creation of the maps are available among the shared datasets attached to this work as Supplementary Information.  c. d.