Using a Bayesian modelling approach (INLA-SPDE) to predict the occurrence of the Spinetail Devil Ray (Mobular mobular)

To protect the most vulnerable marine species it is essential to have an understanding of their spatiotemporal distributions. In recent decades, Bayesian statistics have been successfully used to quantify uncertainty surrounding identified areas of interest for bycatch species. However, conventional simulation-based approaches are often computationally intensive. To address this issue, in this study, an alternative Bayesian approach (Integrated Nested Laplace Approximation with Stochastic Partial Differential Equation, INLA-SPDE) is used to predict the occurrence of Mobula mobular species in the eastern Pacific Ocean (EPO). Specifically, a Generalized Additive Model is implemented to analyze data from the Inter-American Tropical Tuna Commission’s (IATTC) tropical tuna purse-seine fishery observer bycatch database (2005–2015). The INLA-SPDE approach had the potential to predict both the areas of importance in the EPO, that are already known for this species, and the more marginal hotspots, such as the Gulf of California and the Equatorial area which are not identified using other habitat models. Some drawbacks were identified with the INLA-SPDE database, including the difficulties of dealing with categorical variables and triangulating effectively to analyze spatial data. Despite these challenges, we conclude that INLA approach method is an useful complementary and/or alternative approach to traditional ones when modeling bycatch data to inform accurately management decisions.


Results
All the models that included the spatial effect showed lower DIC than those without it (Supplementary Table S4). Similarly, most of the models that do not account for non-linear relationship showed higher DIC values than the ones using smoothing functions. When the type of set was included as a dummy variable good prediction performance statistics and smoother predictions were obtained (Supplementary Table S4). Based on the combination of different aspects to obtain the most accurate model, both in terms of estimations and predictions (AUC, Sensitivity, Specificity, prediction and DIC values), the best fit INLA model included presence-absences as the response variable and oxygen, chlorophyll, nitrate, sea surface temperature, month and type of set as explanatory variables. The spatial effect was included in the model.
The final INLA-SPDE (option 10) model had both the lowest DIC (8773.68) and LCPO (3.66), compared to the others (see Supplementary Table S4). The mean posterior probability of occurrence, the standard deviation and the first and third quartiles for each parameter of the fixed effects included in the final model are shown in Table 1. Results showed a positive relationship between chlorophyll and the presence of M. mobular between 0.1-0.2 mg·m −3 . Similarly, results demonstrated that higher occurrences of M. mobular are expected to be found in waters with oxygen concentrations between 210-220 mg/l and low-medium nitrate concentrations (Fig. 1). A negative correlation was also identified between sea surface height values and the probability of occurrence of M. mobular, with higher probability in low SSH. Finally, the highest probability of presence of M. mobular was found mainly during winter (Fig. 1). The lowest relationship between the type of set and the presence of the species was found in Floating object sets (posterior mean = − 1.918; SD = 18.239); compared with the presence in School (posterior mean = 1.026; SD = 18.239) and Dolphin sets (posterior mean = 0.917; SD = 18.239) ( Table 1).
The overall predictability of the models was evaluated using the Area Under the receiver-operating Curve (AUC), Sensitivity, Specificity and Kappa. Kappa measures the proportion of correctly classified presence and Table 1. Numerical summary of the marginal posterior distribution of the fixed effects for the best INLA model for Mobula mobular. For each variable the mean, standard deviation, median (Q 0.5 ) and a 95% credible intervals (Q 0.025 -Q 0.975 ) are provided, containing 95% of the probability under the posterior distribution.

Discussion
This study uses a Bayesian approach to model the occurrence of Mobula mobular using IATTC observer bycatch data from the tropical tuna purse-seine fishery in the eastern Pacific Ocean (EPO). We consider the INLA-SPDE Bayesian approach as a complementary method to SDM traditional ones to obtain the prediction of hotspots of vulnerable species and to inform accurately management decisions. SDMs have become one of the most powerful tools to address certain fisheries issues, such as bycatch species distribution 19 . One of the first steps to reducing bycatch mortality is to identify and manage conservation priority areas, or "hotspots", where bycatch species may be important 19,33 . Correct identification of these areas could lead to effective spatial management strategies for their conservation. However, for regulations to be effective it requires an understanding of the spatiotemporal distribution of the species, given that wrong identification of bycatch "hotspots" can lead to erroneous mitigation practices with irreversible ecological consequences 34 . Ideally, space and time should be better incorporated into models when bycatch data is analyzed, and the choice of the best SDM model should depend on the spatial pattern of the input data 19,35 . The Bayesian approach considered in this study tried to describe these issues along with the advantages and disadvantages of using this technique in an effort to predict M. mobular occurrence in the EPO.
Results of the model confirm that the presence of M. mobular is determined by the most important seasonal upwelling systems in the EPO. The Bayesian method was able to estimate the relationship between the distribution of a species and its environment.
The non-lineal relationships observed by the models suggest that M. mobular may inhabit areas with different environmental characteristics but showing higher preferences for coastal, productive (with concentrations of chlorophyll between 0.1-0.2 mg·m −3 ) and low oxygen areas (around 210-220 mg/l). The presence of the species in areas with negative SSH values also suggest the association of the M. mobular to mesoscale process, such as eddies and coastal upwelling systems, where the food availability seems to be more abundant.
Spatial autocorrelation of residuals is normally induced by lack of a random distribution of individuals, absence of a covariate in the model or incorrect specification of the relationship between the covariate and the response variable 36 . Generally when analyzing geo-referenced by-catch data, geographic coordinates (latitude and/or longitude) are included in the models as continuous explicative variables 37,38 given that fixed effects and, therefore, the spatial dependency of observations, is not considered. Similarly, non-random spatial variables or geographic fishing boundaries can be included as predictors in models to try to capture spatial species trends. For example, Escalle, et al. 39 accounted for spatial autocorrelation by incorporating a contiguity matrix based www.nature.com/scientificreports/ on a residual's autocovariate (RAC) as an explanatory variable in their models. However, only geo-statistical techniques intrinsically incorporate a component to account for spatial autocorrelation. Hierarchical Bayesian spatial models extend the concept of spatial autocorrelation in multilevel structures, including a spatial random effect that is a stochastic process indexed in space, which represents all spatially explicit processes that may influence the species pattern. By applying hierarchical Bayesian spatial models to species data the multiple sources of uncertainty associated with both the observed data and the species process can be included in the analysis to generate a more robust statistical inference and lead to more realistic predictions 1,35 . The standard deviation, the first and third quantile of the posterior distribution of the prediction and the spatial effect map and its uncertainty can also be mapped as another component of the model. Moreover, one of the advantages of using INLA-SPDE approach is that is permits Delaunay triangulation over the regular grids that are normally used in SDMs. This technique congregates more information in the areas where there are more observations and, therefore, triangulation is denser in these regions and contributed to more accurate predictions. This technique is also less computationally demanding and considers the boundary effect by generating a mesh with a smooth transition from areas dominated by small triangles (which correspond to the domain of interest) to areas with larger triangles (areas out of the domain and used to avoid boundary effects). Since inference is deduced from the domain rather than the observations (which could change from year by year), the corresponding interpolation creates a better prediction surface than the traditional one using regular grid 17 . This study is also an example of these advantages. The INLA-SPDE approach was able to highlight new areas of interest, such as the Gulf of California, where the species are known to inhabit these areas. The Gulf of California is known to be an important ecological hotspot for this species 32 . Indirect exploitation of this species in the Gulf of California is mainly attributed to small-scale Mexican fisheries 40 , as there is scarce information of presence of mobulid rays due to little fishing effort of the large-scale tropical tuna purse-seine fisheries in this area 28,29 . Because this study has no access to small-scale fishery data, the correct prediction of the spatial distribution of Mobula mobular in the Gulf of California is even more important, as it could be considered a possible area for conservation purposes. Since most surveys and research are carried out in coastal waters (due to accessibility, funding, etc.), results from the model in this area should be taken into consideration in future analyses. The Equatorial area was also predicted to be an important area of presence for the species. In that sense, the INLA-SPDE model confirms the results obtained by Lezama-Ochoa, et al. 41 with the correct identification of the most important areas for the species.
In this work, the model fit was different depending of the parameters considered as well as the covariates selected. For example, the inclusion of the spatial effect in the model improved significantly the model fit (lower DIC values). Therefore, we suggest including the spatial effect in future works for accounting the spatial autocorrelation of the occurrence data; really necessary to obtain real model predictions that may be used to inform management decisions. In the case of the variables chosen to explain the distribution of M. mobular, we also found that specific variables significantly contributed to obtain a good model fit. This is the case of "month" or "oxygen (O 2 )". When these variables were included in the model, lower DIC, Specificity and accuracy values were obtained, representing a better model performance. These results lead to consider that the species could have a seasonal distribution and that oxygen is a limiting factor on their horizontal but also vertical distribution. However, all the covariables included in the different models were having non-linear effect on the presence of the species (since marine species do not usually respond linearly to the environment), but showing variability depending on areas or time of the day. Therefore, future work should explore a combination of linear and nonlinear effects when modelling presence/absence data with environmental variables.
The spatial effect map ( Fig. 1) created with the INLA-SPDE approach suggests that most of the variability in the occurrence dataset of M. mobular could not be explained by only the variables selected by the model. This could be true for oceanographic variables related to productivity features, such as upwelling systems, e.g., chlorophyll and sea surface height. The spatial effect represents the intrinsic spatial variability of the data after excluding the environmental variables. Therefore, when the pattern of the spatial map is similar to the map of the species prediction, it implies that there is an unconsidered effect that is driving the majority of the observed spatial distribution. In that sense, including the spatial effect as another component in the model improves model fit in addition to identifying the spatial effects that affect the distribution of the species of interest 42 .
The Bayesian approach uses probability distributions to model uncertainty in the value of parameters 43 . In that sense, not only is a point estimate of the probability of presence obtained, but it is also possible to assess the uncertainty surrounding an estimation 20 . Indeed, by using INLA-SPDE approach, it is possible to obtain the classical statistics, including standard deviation and the credible interval of the posterior probability of occurrence of the species, therefore providing an explicit quantification of the uncertainty associated to the prediction trough spatial maps. Explicitly quantifying uncertainty through spatial maps is essential to providing end-users with a reliable species distribution to determine management options.
INLA-SPDE is a relatively new approach, it is continuously being tested and improved. INLA models can also deal with traditional smoothing approaches (such as GAMs) but they also provide full inference by quantifying the uncertainty of each model parameter in a fast computational way compared to traditional MCMC simulations 17,42,44,45 . Moreover, INLA models also offer additional advantages, such as the capability to (i) simultaneously calculate inference and prediction, (ii) deal with missing data or (iii) consider data biases (e.g., survey effort can be incorporated into the models as a spatial-random effect) 10,17 .
Although the number of studies where INLA models have been compared to other approaches using fisheries data is limited 1,16,20,42 , the available studies have shown good results using Bayesian approaches. However, improvements are still needed. For instance, Lezama-Ochoa, et al. 41  www.nature.com/scientificreports/ database. When lineal components were considered, it took minutes to run models compared to approximately one hour with the non-linear relationships, however, the predictions were less precise when linear relationships were modeled (Supplementary Table S4). The Matérn covariate function was used to model spatial autocorrelation. The correlation of every cell with every other cell in the modelling approach has a high computational cost, known as the big n problem 13 . The SPDE approach is normally used to address this problem, i.e., dealing with a big dataset that requires some additional computational time 12 . As such, the regression model process is faster and easier. Specific distribution models should be developed, depending of the objective of the study and the data limitations. The present study revealed that when either multiple factors or complex relationships are included in the INLA-SPDE model, the running process finished but the estimation was difficult to interpret. For example, when the variable "Type of set" (Dolphin set = 1, Floating object sets = 2, School sets = 3) was considered to be a factor (in preliminary analysis of the model) in both the estimation and the prediction, estimation of the model was correct but the evaluation and prediction was wrong.
Thus, INLA-SPDE models still face some difficulties when it comes to dealing with factors when compared to frequentist GAM models that provide easy interpretation of the ecological relationships. When "type of set" was introduced as a dummy variable in the prediction, the results improved considerably (Supplementary Table S4). This does not necessarily mean that this is the best model, but it is a good option to obtain correct predictions with our data. Regarding the standard errors or set type dummy variables (1 and 0), they seemed very large. The model without type of set showed an increment in DIC of 699.62 (Option 8, Supplementary Table S4). SD gives some rapid information about the degree of "balance" in the data from groups coded 0 and 1. For example, hypothetically the mean for the set type "Dolphin" equal to 0.95 would mean that 95% of our sample is coded 1 and the rest 0. The same in the case of "School" set type. For "Floating object", the mean is be sensibly lower than those for "Dolphin" and "School" set type, indicating that the data are less balanced for the groups determined by the values of "Floating object". The dummy variables included in the model, in this case the type of set, had an effect on the response variable (i.e. the distribution of the species). The negative values estimated from the model in the case of the Fishing Aggregating devices show a weak preference of these species for areas where FAD fishery is operating. This is corroborated by the fact that mobulid rays seem to be found significantly more in Dolphin and School sets compared with floating object sets 41 . The reason are unknown, but probably is due to the distribution of FAD sets in open ocean far away from coastal areas; where the productivity is much lower and, hence, mobulid rays do not find high aggregations of food available as in coastal areas. Moreover, mobulid rays do not seem to show a strong aggregating behavior around FADs as other pelagic species, such as sharks. Their preference for shallow and productive waters makes them more likely to be found in areas of the other two types of sets. This fact could explain why the variability of mobulid presence in the case of the floating object sets was so high.
Moreover, for the INLA-SPDE approach, careful consideration should be given to the selection of prior distributions or the triangulation process given that the wrong choice could lead to biased results and, therefore, more options should be compared to improve performance of the Bayesian model.
Regarding evaluation of the model predictions, there are not many differences between frequentist and Bayesian approaches. For example, Lezama-Ochoa, et al. 41  In any case, as this conclusion is based on a comparison between similar models with the same environmental variables, more research is needed to compare different SDM algorithms and model parametrizations of different environmental variables. One of the objectives of this study was to explore the weaknesses and strengths of the INLA model when using observer bycatch information to model the habitat of a data poor species Mobula mobular. Ultimately, selection of the best model should be determined based on the objective of the study and the data. One limitation of this work arises from the lack of detailed fishing effort information. Therefore, it wasn't possible to account for the effect of the number of sets in a particular grid on the probability of presence of the species. Future studies should consider the inclusion of fishing effort as an offset or as another explanatory variable in the model especially when modelling abundance.
From a conservation point of view, M. mobular, along with the rest of mobulid rays, has recently included in Appendix II of the Convention on International Trade in Endangered Species (CITES) (Appendix II) and Appendices I and II of the Convention of Migratory Species (CMS) (Appendices I & II) 47,48 . Given that the species could be exploited both as target and bycatch species 27 , it is believed that some populations could be declining in some regions 27,49 .
In the EPO, the IATTC adopted a resolution (Res. 15-04) that aims to reduce the mortality of these rays in purse seine vessels 50 . This conservation measure prohibits retaining onboard, transshipping, landing, storing, selling, or selling any part or entire carcasses of mobulid rays taken by purse seiners. Given this decision adopted by IATTC, the conservation of this species may be expected to improve in the region, however, for that best practices for handling and safe-release should be developed and implemented to ensure the highest post-release survival possible. From this perspective, prediction of the spatial distribution and hotspots will contribute to incorporate spatial strategies in the future as management options to reduce their mortality, while keeping an economically viable fishery.
This work implements a Bayesian GAM to investigate habitat occurrence of the Spinetail Devil Ray using data from the IATTC tropical tuna purse-seine fishery observer bycatch database. Using a novel approach and methodology it provides good model habitat occurrence predictions, which are as good as the predictions obtained with other algorithms (e.g. Random Forest, Maxent, GLM, etc.). These predictions are considered www.nature.com/scientificreports/ enough accurate to be included in future management plans by the tuna RFMOs. For example, model predictions from this work could be included in a new Ecological Risk Assessment approach (EASI-Fish) 51 to study the impact of the fishery on data-poor bycatch species. This methodology could be extended to other mobulid rays or vulnerable bycatch species (i.e. sharks, turtles) and other Oceans to obtain accurate habitat occurrence predictions to inform management actions. The main achievement of this work was to provide novel and relevant information on the distribution of M. mobular that usually is only available from diver surveys or tagging studies limited to coastal areas.
To obtain realistic and accurate hotspots of the species, comparisons between different species distribution models (e.g., Random Forests, Maxent, Classification or Boosted Regression Trees) are needed. This would allow researchers to identify each model weaknesses and strengths to be taken into account when informing management decisions to protect the species. A community of researchers, in collaboration with the fishing industry, governments and the NGOs, that work together to implement science-based specific spatial management measures and plans depending on the areas of importance (i.e., nursery areas, reproductive, or feeding areas, etc.) or species characteristics (vulnerable, endemic, migratory, etc.) is essential for the conservation of mobulid rays.

Conclusion
This study used a Bayesian approach to model the occurrence of Mobula mobular using data from the IATTC tropical tuna purse-seine fishery observer bycatch database in the EPO. The spatially-explicit Bayesian INLA-SPDE model performed well as it was able to account for the spatial autocorrelation in the data and quantify the uncertainty of parameters. Additionally, contrary to other SDM models using the same bycatch data, INLA-SPDE model correctly predicted areas of importance, such as the Gulf of California, where the presence of the species is known to occur. Although INLA-SPDE methods offer improvements to traditional models, we consider that both frequentist and Bayesian model approaches should still be combined in a complementary approach to benefit from the advantages of each method and, thus, better interpret the species distribution patterns of this vulnerable bycatch species to inform management decisions.

Species data. Mobula mobular bycatch data were collected between 2005 and 2015 by the Agreement on the
International Dolphin Conservation Program (AIDCP) onboard observer program, which employs observers from both the National Observer Program and Inter-American Tropical Tuna Commission (IATTC). Data were collected in large purse seine vessels (> 363 t carrying capacity-Class 6) using three types of fishing modes or sets: tunas associated with dolphins ("Dolphin sets"), tunas associated with Floating objects [encountered ("Log sets") or deployed by the fishers ("Fish Aggregating Devices or FAD sets")] and unassociated schools ("School sets"). The difference between the fishing modes is the strategy used to find the school of tuna and how the set is performed: School sets are normally monospecific and schools of tuna are detected by sonar marks, jumpers or breezes in surface waters. Drifting Fish Aggregating Device sets (FADs) are done on floating objects and are used to attract tuna and other species around them. Finally, in the case of the eastern Pacific Ocean (EPO) tuna (mainly yellowfin tuna) they are frequently associated with groups of dolphins and, therefore, called Dolphin sets ( Supplementary Fig. S1) 29 .
Two topographic covariates were also included in the models: bathymetry and distance to the coast. Both variables were obtained in raster format (ASCII format) from the Global Marine Environmental Datasets (GMED) database (https ://gmed.auckl and.ac.nz/downl oad.html), and positions were matched with the positions of the fishing sets (Table 3).
To avoid correlation and collinearity between explicative variables, the Pearson's rank correlation index and the variance inflation factor (VIF) 52 were calculated before running the models. Specifically, correlation among variables was checked by performing a Pearson's correlation test with the corrplot package in R software 53 . Red ellipses represent negative correlation and blue ellipses positive correlation. High correlation between two variables was represented in both cases by ellipses with thin thickness. Collinearity was tested by computing the generalized variance-inflation factors (GVIF), which are the corrected VIF values, by the number of degrees of freedom of a predictor variable. GVIF was assessed using the corvif function in R software. Pairs of variables with high correlation values (Pearson correlation r > 0.6) or high variance inflation (VIF > 5) were identified and only one was included in the modelling process ( Supplementary Fig. S2) 38 .
Modeling mobulid presence. Generalized Additive models (GAMs) 8 are semi-parametric extensions of Generalized Linear Models (GLMs) that are able to model continuous and categorical variables, yet show nonlinear responses by fitting smooth functions to predictor variables 54 .
The general structure of a GAM is as follows 5 : www.nature.com/scientificreports/ where g is the link function (logit for binomial family), µ i is the expected response variable (probability of bycatch in a binomial structure), a is the intercept, f n are smooth functions (regression splines), and X n are the covariates 5 .
Overall, the IATTC observer bycatch database recorded 260,002 species absences and 1270 species presences during the study period, obtained from surveys (i.e., sets with no presence of M. mobular recorded).
The INLA framework 14 was implemented using the inla package in R software. A hierarchical Bayesian spatial GAM was implemented to model the M. mobular bycatch data 55 . INLA uses the Stochastic Partial Differential Equations (SPDE) approach 56 for the spatial effect, which approximates a continuously indexed Gaussian Field (GF), where z(s) is a zero-mean Gaussian Markov Random Field (GMRF) in which the correlation between locations s i and s j, is Matérn. The smoothness of the field under this condition is typically denoted by the Kappa statistics index 57 . The spatial effect is a numeric vector that links each observation to a spatial location, and thus it accounts for independent region-specific noise that cannot be explained by the available covariates 20 . As recommended by Lindgren and Rue 57 , multivariate Gaussian distributions with zero means and a spatially-structured covariance matrix were assumed for the spatial component.
The response variable was modelled using the common binomial family and logit link function. All explanatory variables, except the type of set, were modeled using a second order random walk (RW2) latent model that allowed for possible non-linear relationships 12 . The variable month was included in the model as a cyclical effect. The type of set was considered a factor (Dolphin, Floating object, School) in the inference and a dummy variable in the prediction (i.e., 1 and 0 for each level of the factor, see Supplementary Table S4 for details). Blangiardo and Cameletti 10 recommend that dummy variables be used to best deal with factors in INLA models. Thus, the model can be specified as: species presence or absence at fishing location i (i = 1,…,n, n = 261,272) is given as y i , where y i = 1 if species was present, and y i = 0 if species was not present. We assumed y i ~ Bernoulli(π i ) where π i is the probability of presence of Mobula mobular at location i. Then we define the model as logit(π i ) = α 0 + X i β + W i where α 0 is the intercept, β is the vector of regression parameters, X i is the matrix of the explanatory covariates at location i, and W i represents the spatially structured random effect at location i .
Because no prior information was available, a vague zero-mean Gaussian prior distribution with a variance of 100 was used for all the parameters involved. Posterior distributions were obtained for all the parameters that delimit the region of each posterior distribution by the 0.025 and 0.975 quantiles, where each unknown parameter is 95% likely to fall within this range of values 58 .
Model selection. Different options were tested to obtain the best model. First, variables were included in the model without a smoothing function (i.e., linear relationship). Second, the influence of the spatial effect was explored by removing it from the model. Third, the type of set was included in the model as a dummy variable (Supplementary Table S4). Selection of the final models also occurred after carrying out a forward stepwise procedure. These options were evaluated by considering the Deviance Information Criterion (DIC) 59 . The DIC values were selected as they are the most common ones used to evaluate the performance of the models. Moreover, the Condition Predictive Ordinate (CPO) was also calculated. CPO is computed via its logarithmic score (LCPO) according to Roos and Held 60 . The CPO was used as effective index to evaluate the predictions as it is able to make an internal cross-validation taking each time just one value. Specifically, DIC measures the compromise between fit and parsimony in the model, and LCPO is a "leave one out" cross-validation index to assess the predictive power of the model 17,18 . Lower DIC and LCPO values suggest better model performance. www.nature.com/scientificreports/ Model validation and evaluation. A cross-validation was applied with a k-fold partitioning method (with k = 5), to assess model performance 61,62 . The relationship between occurrence data and the environmental variables was modeled by using a training dataset (80% of data), and the quality of predictions was assessed using test data for validation (20% of data) 17,38,39 . Validation was repeated five times for the best model and results were averaged over the different random subsets 17 . Models were evaluated to formally assess their overall predictability by calculating the Area Under the receiver-operating Curve (AUC), Sensitivity, Specificity and Kappa 63,64 . The AUC measures the ability of the model to correctly predict presences and absences, Sensitivity measures the percentage of presences correctly predicted, and Specificity measures the percentage of absences correctly predicted 65 . Kappa is a statistic index that corrects the overall accuracy of model predictions by the accuracy expected to occur by chance. The index ranges from 1 to + 1, where + 1 indicates perfect agreement and values of zero or less indicate a performance no better than random 65 . Model validation was performed using the cmx function of the PresenceAbsence package 66 in R software.
Model prediction. Prediction maps of the posterior mean, standard deviation, first and third quartile of probability of presence of M. mobular were obtained from the INLA model. Predictions were made using the inla.mesh.project and raster functions of the inla and the raster packages 14 in R software. A Bayesian kriging was applied by treating the parameters as random variables in order to incorporate uncertainty into the prediction process 17 . Bayesian kriging is incorporated into the INLA approach through the SPDE module, which enables Delaunay triangulation around the presence/absence points in the sampling area ( Supplementary Fig. S3) 57 . INLA perform inference and prediction simultaneously, by considering prediction locations to be points where the response is missing 15,17,20,42 . Once the prediction is generated in the selected locations, additional functions interpolate linearly to generate results for the entire study area. Model outputs were scaled from 0 to 1.

Data availability
The datasets generated during and/or analyzed for the current study are not publicly available due to fishers' confidentiality but are available from the IATTC's Director under reasonable request. However, the dataset aggregated by 1 × 1º level are available at the public domain (https ://www.iattc .org/publi cdoma indat a/iattc -catch -by-speci es1.htm).