Introduction

The Northeastern coast of the United States is one of the most populated portions of the country and supports some of the highest intensity of shipping, fishing and marine development in the nation. Not only has ocean use increased dramatically during the past 40 years, but the underlying marine ecosystem has also experienced changes in ocean water temperatures1.

A number of cetacean species listed in the Endangered Species Act (ESA) and protected by the Marine Mammal Protection Act (MMPA) are subject to these environmental and anthropogenic pressures2. Cetaceans play important roles in the marine ecosystems as predators whose dynamics are associated with the mid-trophic levels through trophic linkage3. Consequently, these species not only affect entire food webs, but are also affected by the dynamics of the physical and biological environment4,5. Detailed current knowledge of the distributions of cetaceans and their suitable habitat is important for the effective management and conservation not only of cetacean species but also of entire marine ecosystems3. This is particularly important given the rapidly changing oceanic environment in the Northwest Atlantic Ocean off the U.S.6 and the increasing demands for energy production that promoted the development of renewable energy areas on the outer continental shelf7.

Results from habitat suitability models, their underlying spatial-temporal density distribution maps and the relationships between habitat features and density patterns are a cornerstone to support conservation and management. For example, they can be used to predict and monitor species’ response to changes in the climate and anthropogenic impacts8,9, and generate abundance estimates that support conservation and management10,11. In addition, these models have the potential to identify priority conservation areas, and diversity hot or cold-spots12. Several U.S. federal agencies require information about spatial-temporal density, habitat, abundance, population size and predictive models for marine protected species to support their environmental compliance documentation related to the National Environmental Policy Act (NEPA), MMPA and ESA. For example, the U.S. Department of Energy and the U.S. Bureau of Ocean Energy Management are working closely with several states, to establish offshore renewable energy developments within 50 miles of the eastern U.S. coastline on the outer continental shelf7. Other examples are regulations under the MMPA to govern the unintentional taking of marine mammal incidental to training and testing activities conducted by the U.S. Navy13. In both of these examples, identifying suitable habitat would help constituents to determine how to minimize human and cetacean interactions and an important component of their respective Environmental Impact Statements.

The objective of this study was to provide this background information for the above conservation and management needs; specifically, to use generalized additive models to establish a current habitat suitability description for cetacean species, to identify the main environmental covariates related to cetacean distribution, and to estimate abundance accounting for habitat relationships. Further, and example application of these models in conservation and management issues is discussed.

Results

Habitat models

A total of 103,395 km of track line was divided into 13,792 spatial-temporal cells (3,329 for spring, 5,978 for summer, 3,237 for fall and 1,248 for winter) that covered offshore and coastal habitats including the renewable energy areas (Fig. 1, Supplementary Table S1 previously reported in Palka et al.)14. A total of 3,158 sightings of cetacean species/groups were available for 1,413 of these spatial-temporal cells (273 for spring, 796 for summer, 257 for fall and 87 for winter). The species/groups sightings per season are summarized in Supplementary Table S2, (previously reported in Palka et al.)14 and the environmental covariates are summarized in Supplementary Table S3. Information related to effort and seasonal sightings were previously reported in Palka et al.14.

Figure 1
figure 1

Effort track lines in the AMAPPS study area during 2010–2013, most track lines were surveyed multiple times by shipboard and aerial surveys. Renewable energy areas include a 10 km buffer zone.

A total of 19 season/species predictive models were developed, which included 10 single species models using data from all seasons combined (spring, summer and fall), 6 species/group models using data from summer only, and 3 models for harbour porpoise which had sufficient sample size to develop separate seasonal models to explain fine scale seasonal migration for spring, summer and fall. Due to low effort and low animal density detected for most of the species, the winter data was not included in the model development, with the exception of the model for common bottlenose dolphin where sufficient numbers of animals were detected.

The most parsimonious model for each species included 4–6 different environmental covariates. The total deviance explained by the models ranged from 16.2% for the summer model of Atlantic spotted dolphin to 69.3% for the summer model of harbour porpoise. Overall, latitude and SST were included in 67% and 61% of the models respectively, followed by bottom temperature, primary productivity and sea floor slope in 39% of the models. The least frequent predictors included in the models were particulate organic carbon and sea surface height anomaly both with 5.6%. (Table 1).

Table 1 Percent deviance explained (DE) by term for each species model.

Habitat suitability maps generated from the model outputs identified clear differences in the core habitat for the species that have the tendency to converge in the same space and time, giving indications of habitat partition (Supplementary Figs S1 to S17).

Goodness-of-fit measures of the models were determined to be adequate as evaluated in two ways. First, overall the models fitted the input data well according to the Spearman’s rank correlation, mean absolute error, and mean absolute percent error statistics, using both the non-zero input data and cross-validation methods (Table 2). Second, a comparison of habitat suitability maps to field sighting locations revealed a good correspondence between the model predictions and observed data used to develop the models (Supplementary Figs S1 to S17).

Table 2 Results of diagnostic tests to evaluate fit of the habitat models.

Distribution-abundance connection with environmental covariates

Partial regression smooth plots generated by the habitat models provided a good metric of the physical and biological habitat of these species, and represent how animal abundance changes relative to its mean in response to changes in each model covariate term. For example, the relationship between dolphin abundance and sea surface temperature showed well-defined thermal habitats that provide evidence of habitat partitioning (Fig. 2). This is illustrated by white-sided and bottlenose dolphins who are found more frequently in cooler waters, peaking at about 10 °C and 12 °C, respectively. Common and striped dolphins are found in warmer waters peaking at about 16 °C and 20 °C, respectively. In contrast, Risso’s dolphins are found most often in the warmest waters that are over 20 °C. The partial regression plots of sea surface temperature also provided information on the range of temperatures commonly used by the species. For example, common dolphins are found in waters with a wider range of temperatures (5–24 °C) in contrast to striped dolphins (14–25 °C).

Figure 2
figure 2

Examples of the partial effect of SST (°C) on the changes in abundance relative to its mean for common dolphin (CODO), white-sided dolphin (WSDO), common bottlenose dolphin (BODO), Risso’s dolphin (GRAM), striped dolphin (STDO), humpback whale (HUWH), minke whale (MIWH), sei whale (SEWH).

The habitat covariates in the models were able to capture seasonal movement patterns and fine scale distribution patterns by including not only static covariates, like latitude, but also dynamic environmental covariates that change over the seasons and on a finer scale. For example, both humpback and minke whales have similar patterns in the partial regression relationships for sea surface temperature (Fig. 2) and latitude (Fig. 3). However, the humpback model also includes chlorophyll a and the minke whale model includes particulate organic carbon which aggregates phytoplankton, zooplankton, bacteria and detritus, suggesting a difference in trophic relationships and also resulting in very different habitat suitability patterns (Table 1, Supplementary Figs S8 and S10).

Figure 3
figure 3

Examples of the partial effect of latitude on the changes in abundance relative to its mean for common dolphin (CODO), Atlantic spotted dolphin (ASDO), humpback whale (HUWH), minke whale (MIWH), fin whale (SEWH), Sowerby’s beaked whale (SBWH) and Cuvier’s beaked whale (CBWH).

Abundance estimates

Average seasonal abundance estimates derived from the habitat suitability models for the entire study area for all the cetacean species and species guilds are found in Table 3, these estimates were previously reported in Palka et al.14. For species that were detected during spring, summer and fall, various distribution and abundance patterns were observed. For the pilot whale, beaked whale, and Kogia groups, the abundance estimates were developed for the guild and not for individual species. In addition sightings of Kogia spp., beaked whales and striped dolphins occurred in habitats further offshore in regions only the shipboard surveys were able to access during summer. Consequently, the estimate of abundance for these species/groups only represents the summer season.

Table 3 Average seasonal abundance estimates derived from the 2010–2013 habitat models with its associated coefficient of variation in parenthesis.

Robustness validation

The models were also shown to be robust as defined by comparing the predicted model values to the data that were collected at times different than the data used to develop the models. Specifically, a total of 386 sightings were not included in the development of the models for humpback whale, fin whale, sperm whale, short/long-finned pilot whale, Risso’s dolphin, common dolphin and common bottlenose dolphin collected during spring 2014, were located within the core habitat regions as predicted by the habitat models when applied to the values of the covariates for the spring of 2014 (Supplementary Figs S18 to S24). In further analysis only for common dolphin, the 2010-2013 model definition applied to the 2004 summer environmental covariates predicted an abundance estimate that was less than 1% greater (not statistically different) than the previously reported 2004 abundance estimate when corrected for availability bias (Table 4). The previously reported 2004 estimate15 was produced from only data collected by shipboard and aerial surveys during summer of 2004. Even though the species distribution patterns detected during the surveys between summer 2004 and spring 2014 were quite different, the predicted habitat suitability maps matched the common dolphin sightings distribution recorded for both seasons and years (Fig. 4). Further indications of model fit are found in Palka et al.14 where the results from the goodness-of-fit tests of each the modelling steps are provided. In addition visual comparisons of the predicted seasonal density maps and locations of historical sightings since 1970 from OBIS-SEAMAP16 are also provided.

Table 4 Robustness validation results of the common dolphin model.
Figure 4
figure 4

Robustness validation results of the common dolphin habitat model. Habitat suitability for (A) summer 2004; and (B) spring 2014 overlapped with the actual species sightings for the correspondent season and year. These sightings were not included in the habitat model development.

Discussion

Federal agencies like U.S. Department of Energy, National Marine Fisheries Service, U.S. Fish and Wildlife Service, BOEM, and U.S. NAVY and other ocean developers require information from a diverse suite of topics such as density/abundance, distribution, stock structure, life history, behaviour, habitat use, environmental drivers, impact assessment and spatial modelling to support their mandates. Habitat models and model outputs presented in this paper provide some of the background information need by those agencies for spatial planning and conservation purposes and are available in a user friendly interface as a part of the AMAPPS model viewer at www.nefsc.noaa.gov/AMAPPSviewer. First, we will discuss the development of the models, and then compare these models to other developed in the same general region, then finally potential uses of the model results.

Partial regression smooth plots of the animal density in relation to SST and latitude generated by the models are in agreement with the current knowledge of the species distributions17,18. Despite model uncertainty, the robustness validation supported the predictive inferences of the models tested and expanded the potential application to detect species shifts in response to habitat changes. Though the amount of deviance explained is comparable to similar studies, improved deviance explained may be possible if additional physical and biological environmental covariates were included, such as fronts due to temperature, salinity and primary productivity, and densities of forage fish or other potential prey species.

One important assumption about regression type models, like those presented in this document, is that the models assume the link between animal density and habitat factors have consistent statistical relationships within the spatial-temporal variables included in the model. Given this assumption, it is then possible to predict the average density in locations or time periods where surveys did not actually occur19. However, if those proxies are unable to detect changes in the underlying ecological processes through time and space, then those assumptions are no longer valid. This means that a causal or mechanistic relationship is not explicitly assumed. Consequentially, the type of model used in this document provides an average pattern of the habitat suitability and abundance.

Comparison with previous studies in the region

This effort is not the first of its kind for the western North Atlantic waters. The evolution of the theoretical and computational improvements related to modelling animal density is evident by the past and present efforts that have used line transect sightings data collected by the NEFSC and SEFSC. Hamazaki17 used multiple logistic regression to model the presence/absence of sightings from 1990–1996 with oceanographic and topographic variables to predict habitat maps. The U.S. Department of the Navy20,21 used sightings data from 1998–2005 in generalized additive density surface models to predict the density in a prediction grid, where g(0) was assumed to be one. The habitat suitability maps and abundance estimates presented in this document were generated with additional data that were not used in previous studies, and built upon past efforts. Most recently Roberts et al.11 used data from 1992–2014 to develop habitat-based climatological density maps and abundance estimates, where g(0) was not assumed to be one; though none of the data used in the current paper were used in the Roberts et al.11 models. In all of these efforts, the modelling approaches used the best available data and make logical assumptions and decisions.

Though making a direct comparison between these studies is a complex task, it is important to establish some of the fundamental differences between Roberts et al.11 and the present paper which include: (A) Differences in spatial and temporal coverage: most of the data used to develop the Roberts et al.11 models used data that were collected mostly from 1995 to 2009 (though there are some data from some species up to 2014) and did not include the data used in this paper, in addition the data were from surveys ranging from the US Pacific to Europe collected by multiple organizations and methods. In contrast, the models developed in this paper only use data from the area of interest. (B) Differences in analysis strategies and methods: to standardize all surveys used in Roberts et al.11 it was necessary to restrict data collected by only one sighting team per platform. Consequently, to correct for perception and availability bias, when local information was not available, it was necessary to apply correction factors from surveys conducted in the Pacific Ocean, Eastern Atlantic and Gulf of Mexico, which may result in unrepresentative corrections because of differences between the survey methods and animal’s behaviour. Finally, the current analyses also accounted for availability bias by estimating species-specific correction factors, some of which were estimated from recently tagged animals within the study area14. Another difference was the way the covariates were processed. For example, Robert’s developed climatological covariates which were a mean from 1995 to 2014 for a specific time period, say 8-days. In contrast, in the current paper the covariates were an average over a time period (8-days) from only the same year as the sighting observation. And (C) difference in presentation: average annual and monthly density-surface maps and abundances were presented in Roberts et al.11, in contrast to average seasonal estimates and the habitat suitability for the species presented in this study. In general, the average annual estimates from Roberts et al.11 are the most different from the seasonal estimates presented in this document for species that migrate out of US waters during some parts of the year, or for species that changed their seasonal spatial distribution patterns over the last two decades.

In summary, the two observer teams approach used in the present study that includes data from only the area of interest allowed the estimation of the perception bias correction from the same data that was used to calculate the density estimates thus resulting in regionally more representative and current abundance estimates and habitat suitability maps.

Example applications of the habitat models

Society has increasing demands for energy production triggering the development of renewable energy areas on the outer continental shelf, currently reaching a total of 16,149 km2 from Massachusetts to Florida. These areas have a significant overlap with MMPA strategic dolphins and ESA whale species’ habitats and the level of interaction was documented and quantified by the models (Supplementary Table S4 and Figs S1 to S17). For example, even though most of the waters in the potential renewable energy areas are shallow and close to the shore, the models for pilot whales, Risso’s dolphins, white sided dolphins, common dolphins and Atlantic spotted dolphins identified the deeper offshore regions of the potential renewable energy areas as part of their preferred habitat. Interestingly, the model for sperm whales, which are generally considered deep water animals, predicted very low abundance at the farthest offshore regions of the potential renewable energy areas that were either close to the shelf break or extended into deeper waters like those in Massachusetts/Rhode Island and North Carolina and this has been confirmed by recorded sightings14.

There was a spatial difference in diversity patterns associated with latitude, in which the northern areas were more diverse in comparison to the southern areas. For example, the Massachusetts/Rhode Island area is located in a region with the highest species richness and showed the highest estimated abundance of ESA whale species (humpback, fin, sei and sperm whales) for all seasons. In addition the diversity index changed seasonally driven by animal migration (Supplementary Table S4). In the rest of the renewable areas the habitats become suitable for these species only during spring when the whales migrate through. In the case of MMPA strategic dolphins (harbour porpoises, white-sided dolphins, common dolphins, long/short finned pilot whales and common bottlenose dolphin), Massachusetts/Rhode Island, New Jersey and North Carolina/South Carolina showed the highest abundance estimates with similar spatial diversity patterns.

Changes in key physical and biological oceanographic features can alter marine ecosystems and atmospheric patterns. For example, in the Gulf of Maine spatial shifts of species assemblages associated with shallower, warmer waters tended to shift towards waters with cooler temperatures, while species assemblages associated with relatively cooler and deeper waters shifted deeper, but with little latitudinal change. Species assemblages associated with warmer and shallower water on the broad, shallow continental shelf from the Mid-Atlantic Bight to Georges Bank shifted strongly northeast along latitudinal gradients with little change in depth22. Habitat-based cetacean models such as those developed here will be able to be used to explore the potential changes in the distribution and abundance of cetaceans relative to the changes to the physical and biological changes.

It is clear that the effects on the movement and extent of species assemblages will hold important implications for management, mitigation and adaptation on these waters. The models and maps presented in this document provide a recent habitat characterization of the species discussed, and based on the assumptions and the predictive nature, have the potential to support management decisions and conservation measures in a marine spatial planning context.

Methods

Study area

The study area ranged from Halifax, Nova Scotia, Canada to the southern tip of Florida; from the coastline to slightly beyond the US exclusive economic zone and covers approximately 1,193,320 km2 (Fig. 1). It was subdivided into 10 × 10 km cells and sampled during 16 Atlantic Marine Assessment Program for Protected Species (AMAPPS) surveys, using NOAA Twin Otter aircrafts in coastal regions and NOAA ships Henry B. Bigelow by the Northeast Fisheries Science Center (NEFSC), and Gordon Gunter by the Southeast Fisheries Science Center (SEFSC) in offshore regions. These surveys covered approximately 103,995 km of line-transect survey effort during July 2010 to August 2013 (Supplementary Table S1). Habitat suitability models were built for 14 species and 3 species guilds (Table 1).

Habitat predictors

Habitat predictors included a suite of static physiographic data and dynamic environmental covariates and were obtained from ETOPO1 1-min global relief data23, AVISO+24, the Hybrid Coordinate Ocean Model (HYCOM)25, and from NOAA’s Environmental Research Division Data Access Program (ERDDAP)26 website (Supplementary Table 3). The environmental data were downloaded from the source using a bounding box whose extent covered the study area, and subsequently processed using custom code developed in R (v. 3.1.1)27 with the R packages “raster” (v 2.5-2)28, “ncdf” (v 1.6.8)29, “rgdal” (v 1.1-6)30, “RNetCDF” (v 1.8-2)31, “lubridate” (v 1.5.3)32, “RODBC” (v 1.3–10)33 and “geosphere” (v 1.5–1)34. The process included a re-sampling of the data to the geographical midpoint of each 10 × 10 km stratum using oblique Mercator grid with bilinear interpolation. When possible, the data were obtained for dynamic covariates on an 8-day basis. Alternatively, daily images were downloaded and spatially synced to the cells and averaged into 8-day periods. In case of cells with missing values, a simple interpolation process was applied using the mean from the nearest-neighbour cells, and if needed the mean from the 8-day period before and after.

Distance Analysis

Samples for modelling animal density were created by dividing the AMAPPS continuous survey effort into the 10 × 10 km cells. Species-specific information related to the number of sightings and group size was assigned to each cell. In addition, average sea state and glare within each cell was included as a continuous predictor variable to account for sighting conditions encountered on the surveyed track lines. Line-transect sightings parameter estimates derived from the surveys were based on effort in Beaufort Sea states from 0 through 435, because the probability of detection decreases as the sea states increases36,37.

The density estimates were based on the independent observer approach assuming point independence38, calculated using the mark-recapture distance sampling (MRDS) with the computer program Distance (version 6.2)39, for each sampled 10 × 10 km cell using a Horvitz-Thompson-like estimator40. With this approach there was no need to independently model group size and the error due to extrapolation was minimized. To capture sightability differences between observation platforms and regions, data collected by each aircraft and ship from SEFSC and NEFSC surveys were analysed independently due to the differences in observers, data collection methods and habitats surveyed. A traditional MRDS distance analysis was used for the data collected by the shipboard surveys35. Data collected by the aerial surveys were analysed using a two-step process as described by Palka et al.14.

Significant covariates were chosen following the method suggested by Marques & Buckland41 and Laake & Borchers38. For all of the analyses, the detection probabilities were estimated using right truncated perpendicular distances as suggested in Buckland et al.42 and model selection was based on the goodness-of-fit using the AIC score (Akaike Information Criterion)43, Chi- squared test, Kolmogorov-Smirnov goodness-of-fit test, Cramer-von Mises goodness-of-fit test and a visual inspection of the fit, the results of these test are available in Palka et al.14. The estimated sighting probability included an estimation of g(0) which is the probability of detecting an animal on the survey track line.

To ensure sufficient sample sizes to accurately estimate model parameters, several similar species were pooled when needed. The criteria used to define species guilds included shape of the species’ detection functions, general animal behaviour, perceived sightability of the species, and sample size. The estimated global parameters were applied to the values of the covariates associated with each species in the species group to account for species-specific detection functions. An overall species-specific abundance estimate was then calculated for each cells/time period and corrected for species-specific availability bias by platform, as described in Palka et al.14. The availability bias correction was based on the probability of an animal being detectable at the surface during a survey, and took into consideration the species diving and aggregation behaviours, in addition to the amount of time the observer had to analyse any spot of water from each of the survey platforms. This correction tended to be larger for aerial surveys than for shipboard surveys, and larger for long diving species than for short diving species.

Modelling

Generalized Additive Models (GAM)44 were developed in R (v. 3.1.1)27 using the package “mgcv” (v.1.8–6)45. The density estimates for each species/group in sampled cells by the shipboard and aerial surveys were defined as the response variable. The parameter estimates were optimized using restricted maximum likelihood criterion and the data were assumed to follow an overdispersed Tweedie distribution46 with null space penalization and thin plate splines with shrinkage47. Further, to avoid overfitting that could render parameter estimates difficult to interpret biologically, the maximum number of degrees of freedom was limited to 4. Correlations among environmental covariates ranged from 0.01–0.80 in absolute values. Although “mgcv” is considered to be robust to such correlations45, variables in a highly correlated pair above r = 0.60, were not used together in the same model.

Variable selection was performed with automatic term selection48 and a suite of diagnostic tests as proposed by Kinlan et al.49 and Barlow et al.50. Models with the lowest overall prediction errors and the highest percentage of deviance explained were selected for further diagnostic testing which included k-fold cross-validation with 25 random data subsets. K-fold cross-validation methods, in contrast to the method where data are partitioned into separate training and test sets, have the advantage of deriving a more accurate model, especially in cases with limited sample sizes51, such as in this study.

The relative importance of each term of the final model was estimated by calculating the terms’ approximate deviance explained following the process described by Whitlock et al.52. Briefly, this process involves fitting a sequence of models to obtain the deviance of the full model, null model and reduced models in which one smooth term was removed at a time, while retaining the other parameter estimates from the full model constant. Deviance explained (DE) for each term i was then calculated with the Eq. (1):

$$D{E}_{i}=(\frac{{D}_{ireducedmodel}-{D}_{Fullmodel}}{{D}_{nullmodel}})$$
(1)

where D is the deviance for a model, and Di reduced model is a model where variable i is omitted.

Following model selection and validation for each of the species, the 2010–2013 average modelled seasonal (spring, summer and fall) abundance estimates for all cells in the study area were used to generate habitat suitability maps using QGIS (v. 2.10)53.

The habitat suitability (HS) was assumed to be directly correlated with the species’ abundance and distribution. That is, in times and regions with the greatest estimated abundance it was assumed that the habitat was the most suitable for the species, then using the Eq. (2):

$$HS=\sum _{i}^{n}\,\widehat{{N}_{i}}$$
(2)

where \(\widehat{{N}_{i}}\) was the seasonal estimated abundance for each cell from the species-specific model. The seasonal abundance estimates were calculated by summing the mean predicted abundance of each cell, and the uncertainty estimates reflect only the uncertainty in the GAM parameter estimates.

Abundance estimates for smaller scale regions within study area that are being considered for development of offshore renewable energy were also summarized. In some cases it was needed to merge several wind energy lease areas/wind planning areas together when the areas were relatively small and close together. In addition a buffer zone was added around all areas in an attempt to designate a generic area in which an animal may be exposed to due to construction/operation activities within the renewable energy area. The size of an appropriate buffer is dependent on a variety of factors including species-specific factors, such as natural short-term foraging and movement patterns which could then influence the animal’s response and sensitivity to the activity. In addition, the types of activities being undertaken in the offshore renewable energy area, and the physical topography and oceanographic features have a direct impact on the sound level and propagation. However, for simplicity in this study, offshore wind energy areas in addition to 10 km buffer zone is referred to as renewable energy areas and is reflected in the abundance estimates.

Robustness validation of the habitat models was investigated in two ways. First the 2010–2013 model parameters were applied to the spring 2014 environmental data, the resulting predicted habitat suitability was compared with the actual spring 2014 sightings locations from the AMAPPS surveys. From the species included in this document, only common dolphin, Risso’s dolphin, common bottlenose dolphin, short/long-finned pilot whale, fin whale, humpback whale and sperm whale were detected during the surveys, thus only the habitat models of these species were included in the comparison. The second way was by hindcasting the 2010–2013 models by using the summer 2004 environmental data. But given the quality of the environmental data needed for the models were not readily available for the entire study area for 2004, the test was restricted to the common dolphin model. Thus, the modelled output was compared to not only the summer 2004 sightings locations from a NEFSC abundance survey but also the abundance estimate reported in the 2005 Stock Assessment Report15 that was derived from the NEFSC 2004 summer abundance survey data. The work presented in this document conforms to accepted international ethical standards.