Background & Summary

Very high-resolution (VHR) satellite imagery allows us to survey regularly remote and large areas of the ocean, difficult to access by boats or planes. The interest in using VHR satellite imagery for the study of great whales (including sperm whales and baleen whales) has grown in the past years1,2,3,4,5 since Abileah6 and Fretwell et al.7 showed its potential. This growing interest may be linked to the improvement in the spatial resolution of satellite imagery, which increased in 2014 from 46 cm to 31 cm. This upgrade enhanced the confidence in the detection of whales in satellite imagery, as more details could be seen, such as whale-defining features (e.g. flukes).

Detecting whales in the imagery is either conducted manually1,4,5,7, or automatically2,3. A downside of the manual approach is that it is time-demanding, with manual counter often having to view hundred and sometimes thousands of square kilometres of open ocean. The development of automated approaches to detect whales by satellite would not only speed up this application, but also reduce the possibility of missing whales due to observer fatigue and standardize the procedure. Various automated approaches exist from pixel-based to artificial intelligence. Machine learning, an application of artificial intelligence, seems to be the most appropriate automated method to detect whales efficiently in satellite imagery2,3,8,9.

In machine learning an algorithm learns how to identify features by repeatedly testing different search parameters against a training dataset10,11. Concerning whales, the algorithm needs to be trained to detect the wide variety of shapes and colour characterising whales. Shapes and colour will be influenced by the type of species, the environment (e.g. various degree of turbidity), the light conditions, and the behaviours (e.g. foraging, travelling, breaching), as different behaviours will result in different postures. The larger a training dataset is, the more accurate and transferable to other satellite images the algorithm will be. At the time of writing, such a dataset does not exist or is not publicly available.

Creating a large enough dataset necessary to train algorithms to detect whales in VHR satellite imagery will require the various research groups analysing VHR satellite imagery to openly share examples of whales and non-whale objects in VHR satellite imagery, which could be facilitated by uploading such data on a central open source repository, similar to the GenBank12 for DNA code or OBIS-Seamap13 for marine wildlife observations. Ideally clipped out image chips of the whale objects would be shared as tiff files, which retains most of the characteristics of the original image. However, all VHR satellites are commercially owned, except for the Cartosat-3 owned by the government of India14, which means it is not possible to publicly share image chips as tiff file. Instead, image chips could be shared in a png or jepg format, which involve loosing some spectral information. If tiff files are required, georeferenced and labelled boxes encompassing the whale objects could also be shared, including information on the satellite imagery to allow anyone to ask the commercial providers for the exact imagery.

Here we present a database of whale objects found in VHR satellite imagery. It represents four different species of whales (i.e. southern right whale, Eubalaena australis; grey whale, Eschrichtius robustus; humpback whale, Megaptera novaeangliae; fin whale, Balaenoptera physalus; Fig. 1), which were manually detected in images captured by different satellites (i.e., GeoEye-1, Quickbird-2, WorldView-2, WorldView-3). We created the database by (i) first detecting whale objects manually in satellite imagery, (ii) then we classified whale objects as either “definite”, “probable” or “possible” as in Cubaynes et al.1; and (iii) finally we created georeferenced and labelled points and boxes centered around each whale object, as well as providing image chips in a png format. With this database made publicly available, we aim to initiate the creation of a central database that can be built upon.

Fig. 1
figure 1

Database of annotated whales detected in satellite imagery covering different species and areas. Humpback whales were detected in Maui Nui, US (a); grey whales in Laguna San Ignacio, Mexico (b); fin whales in the Pelagos Sanctuary, France, Monaco and Italy (c); southern right whales were observed in three areas, off the Peninsula Valdes, Argentina (d); off Witsand, South Africa (e); and off the Auckland Islands, New Zealand (f). The dot size represents the number of annotated whales per location. Whale silhouettes were sourced from philopic.com (the grey and humpback whales silhouettes are from Chris Luh).

Methods

Image acquisition

Twelve satellite images were used to build the database. They were acquired by different very high-resolution satellites owned by Maxar Technologies, formerly known as DigitalGlobe (Table 1). The choice of imagery was linked to other projects1,3,7,8 or specifically acquired to enlarge the database. Some images were selected from Maxar Technologies’ archives15 and others were requested to be captured during a specific time window (see “Usage note” section for advice about access to satellite images).

Table 1 Characteristics of the satellite imagery analysed for the presence of whales.

Criteria to select the imagery were: 1) less 20% cloud cover, 2) calm sea state (i.e. no white caps and low swell), and 3) where it was known that only one species would be present at the time of image acquisition. The percentage of cloud coverage was assessed by the satellite imagery provider. We visually assessed the sea state for the presence of white caps and the level of swell. As it is currently unknown whether species could be differentiated in VHR satellite images, we selected well studied locations to ensure the presence in the imagery of only one great whale species.

Detecting whales

The satellite images were manually scanned for the presence of whales using ArcGIS 10.4 ESRI 2017, following Cubaynes et al.1 systematic method, which involved overlaying a grid on top of the imagery and scanning one cell after the other at a scale of 1:1,500 m. Prior to scanning, the imagery was pansharpened, a process of joining the high spatial resolution of the panchromatic image (grey scale image) to the high spectral resolution of the multispectral image (colour image) to get one image of high spatial and spectral resolutions. We used the ESRI pansharpening algorithm.

Whale objects were marked with a point and were subsequently assigned a level of confidence as explained below in the “Technical Validation” section.

Creating labelled and georeferenced points and boxes

For each detected whale, a point was placed on it with associated metadata (see Data description). Boxes were created around each point indicating a whale object using ArcGIS 10.4 ESRI 2017, and following the workflow illustrated in Fig. 2. We created square boxes with a power of 2 (i.e. 128 × 128 pixels) to facilitate its use for machine learning approaches, particularly deep learning algorithms. Each whale object was represented by a point and a box (delimiting the pixels in the pansharpened image). The boxes created around the whale object were saved as one shapefile (a georeferenced file) per satellite image, as the coordinate system varied from one image to the next (Table 2), similarly for points. With the exception of the four satellite images of the Pelagos Sanctuary, for which one box and one point shapefile were created for the four images.

Fig. 2
figure 2

Workflow presenting the various steps to create the Whales from Space database, using ArcGIS 10.4 ESRI 2017. The multispectral image is outlined by large black dashes, the panchromatic by small black dashes and the pansharpened by a full black line. Satellite images © 2022 Maxar Technologies.

Table 2 List of shapefiles included in the dataset that represents whale objects examples in VHR satellite imagery.

Creating image chips

Image chips were created using the box created in the above section, following the workflow presented in Fig. 3. Prior to creating the image chips for Valdes 2012 and 2016, the corresponding box shapefiles and satellite images had to be re-projected to WGS 1984 UTM Zone 20 S. We did the same for Auckland 2006 and 2011, using WGS 1894 UTM Zone 58 S. The file name of the image chips corresponds to the box ID of the respective boxes, allowing to find the associated data within the corresponding box shapefiles that defines the specific raw satellite image that the image chip came from.

Fig. 3
figure 3

Workflow presenting the steps to create the image chips using ArcGIS 10.4 ESRI 2017 and the pansharpened image and boxes created in Fig. 2. Satellite images © 2022 Maxar Technologies.

Future updates of the datasets

As we acquire and analyse more satellite imagery, we aim to annually update the Whales from Space dataset. The updates will be available under the Whales from Space dataset deposited on the NERC Polar Data Centre repository16,17 to ensure consistency and long-term public availability of the data.

Data Records

The “Whales from space dataset” is available on the NERC UK Polar Data Centre repository and separated in two sub-datasets: a dataset that contains the whale annotations (box and point shapefiles with associated csv files) named “Whales from space dataset: Box and point shapefiles”16; and a dataset with the image chips named “Whales from space dataset: Image chips”17. The “Whales from space dataset: Box and point shapefiles” dataset can be accessed on the NERC UK Polar Data Centre directly using the DOI link (https://doi.org/10.5285/C1AFE32C-493C-4DC7-AF9F-649593B97B2C). This dataset contains nine shapefiles with boxes centered on each whale and nine point shapefiles marking each individual detected whales (Table 2) totalling 633 annotated whale objects (Table 3 and Fig. 4). This dataset also includes four csv files: 1) a csv file joining all the attribute tables linked to every box and point shapefiles for whale objects (WhaleFromSpaceDB_Whales.csv); 2) a second csv file explaining each column of the first csv file (WhaleFromSpace_Guidance.csv); and 3) two other csv files describe the naming of each box (WhaleFromSpaceDB_BoxNaming.csv) and each point (WhaleFromSpaceDB_PointNaming.csv).

Table 3 Summary of the number of whale objects counted in the imagery.
Fig. 4
figure 4

Proportion of whale objects included in the database per species (top to bottom: southern right whale, humpback whale, fin whale and grey whale) and per certainty categories (“definite”, “probable”, and “possible”). The proportion is given separately for each satellite image analysed in this study (Table 1).

The “Whales from space dataset: Image chip” comprises of the 633 annotated whale objects as image chips. To fulfil the End User Licence Agreement with Maxar Technologies18, these image chips are shared in a png format, and access to the dataset is available upon request from the NERC UK Polar Data Centre that can be contacted at PDCServiceDesk@bas.ac.uk. Data access requires user name and email address, which will be shared with Maxar Technologies. Anyone using any of the image chips is also required to attribute the images properly (See Usage Notes).

Each box and point has metadata associated to it, which is included in the attribute table associated to the specific shapefile. It contains information about the detected whale: certainty level (i.e. “definite”, “probable”, “possible”) derived from the classification score assessed based on various criteria (i.e. body length, body width, body shape, body colour, flukeprint, blow, contour, wake, after-breach, defecation, other disturbance, fluke, flipper, head callosities and mudtrail) following Cubaynes et al.1 method, most likely species, and potential other species. For each annotated whale, we also provide information about the imagery analysed: the location, latitude and longitude (in decimal degrees and recorded using the same geographic coordinate system and projection as the satellite imagery), imagery ID, imagery date, type of satellite, spatial resolution, number of multispectral bands, product level and type (e.g. Standard2A). The size of each boxes was also specified in terms of pixels.

Technical Validation

Certainty of whale identification

Ground truthing, the process of verifying on the ground what is observed in a satellite image19, is not possible when attempting to detect a mobile object, such as whales, because whales visible in the imagery will have moved by the time the imagery is received by the customer for analysis, which can take between six hours up to a couple days. Alternatives have been tried, such as timing the collection of satellite image with a boat or aerial survey20,21. However, it is difficult to synchronise the acquisition of a satellite image with such surveys, due to several factors; including competing tasking where satellite image orders for defence and disaster relief take priority over other orders. This is currently relevant as only one very high-resolution satellite can acquire 30 cm resolution imagery. The presence of clouds is also a limiting factor, as it will prevent the detection of whales in satellite images but not impact the detection capabilities from a boat survey20,21. There has also been an attempt to match whales tagged with tracking devices to those observed in an imagery, but the low accuracy of the coordinates provided by the tracking devices fixed on whales did not permit this matching8. With this dataset, to assess our confidence whether the object observed was a whale, 1) we analysed images of well surveyed areas, where only one species was recorded at a specific time1; and 2) we have established a certainty level reflecting our confidence in the detection. As whales will not always be at the sea surface and as light gets attenuated with increasing depth, whales below the surface will not be as visible as those near the surface, for which characteristic-whale features can be observed (e.g. fluke, flipper). Recognising that some whale objects will be easier to detect than others, we created three levels of confidence (i.e. definite, probable, and possible). The certainty level was assigned based on a combination of criteria1. We recommend that only the whales with a “definite” certainty level be used to train automated detection systems.

Species differentiation

As species differentiation has not been tested when analysing satellite images, we reference the most likely species in this database. The most likely species was assigned based on the scientific literature, hence our decision to acquire images of specific areas when only one large whale species was expected to be present1.

Usage Notes

Correct attribution for satellite images

Anyone using any of the image chips is required to attribute the image chips as follow: “Satellite image © 2022 Maxar Technologies”.

Advice on getting access to satellite images

All the satellite images that we have used to build the dataset were provided by Maxar Technologies (formerly DigitalGlobe). We recommend contacting Maxar Technologies national office to enquire about acquisition and cost, as pricing is conducted on a user case scenario. To ensure you acquire the same satellite images we have created the boxes for, we have provided the Catalogue ID in Table 1. All the images we have used are now considered archival and accessible at a lower cost. There are different types and levels for a same satellite image and we recommend acquiring the satellite images with the same product level and type, as specified in Table 1. Acquiring a different product level or type may shift the image, meaning the whale-object boxes will not be centred on the whales they were created for.