Background & Summary

This study describes a unique point-based data set for coral reef environments, collected using a photoquadrat survey method published for seagrass environments1. The data set describes the spatial and temporal distribution of benthic community abundance and composition for Heron Reef, a 28 km2 shallow platform reef located in the Capricorn Bunker Group, Southern Great Barrier Reef (GBR), Australia. On average, 3,600 coral reef data points were collected annually over the period 2002 to 2018. Annual data sets were acquired for independent research projects, but the collection methods were consistent. The initial field data collection design was planned to acquire detailed field data to describe the spatial distribution and variability of benthic composition across the study site to assist with calibration and validation of earth observation-based mapping products.

To create a map based on earth observation imagery, it is common to use training or calibration data to transform the imagery into a map of surface properties using a supervised algorithm (e.g. multivariate statistical clustering, random forest)2. To report on the accuracy measures of the maps, reference or validation data are contrasted with the output maps3. Hence for calibration and validation purposes, georeferenced field data must be representative of all the features to be mapped and collection should ideally coincide with satellite image acquisition. Many earth observation approaches have been implemented for mapping the benthic communities of Heron Reef4,5,6,7,8,9,10,11,12 and several of these maps are now accessible online6,13,14.

Several studies have utilised time series benthic data to analyse changes in benthic community and coral type trends, supporting broad ecological knowledge of coral reef ecosystems such as the Caribbean reef degradation15 and coral cover decline on the GBR16. Similarly, benthic community and coral cover data sets have been identified as important indicators of coral reef health providing the backbone for monitoring and management initiatives around the world17,18.

Articles and data sets have been published that describe the benthic community properties of Heron Reef, however, their spatial coverage, number of georeferenced data points, and revisit times are limited19. The time series photoquadrat data sets presented in this paper could be used for further understanding of benthic community distribution, including statistical analysis of trends in coral cover, analysis of changes in benthic community and coral type, or used for testing of other earth observation-based mapping and modelling approaches. Additionally, as our methodology describes machine annotation of the field photoquadrats, it would be possible to reanalyse the photoquadrats with new categories not previously considered important from a biological perspective (e.g. unknown disease or impact, or a specific benthic community type), or for other features (e.g. the counting of sea cucumbers (Holothuroidea sp.)).

Detailed analyses of our complete data set may permit a greater understanding of the persistence and/or dynamics of the benthic community at Heron Reef. As such, our ongoing analyses include evaluation of changes in community composition following major impacts such as cyclones, coral bleaching, crown of thorns predation, etc., and additionally, statistical analyses of coral recovery after such impacts. To this degree, these benthic community data sets are invaluable.


The photoquadrat-based data in this study was collected for Heron Reef, Southern Great Barrier Reef, Australia (Fig. 1). Here we provide a short overview of the collection methods, however a detailed description can be found in11. These methods are applicable to any habitat. Photoquadrats were analysed for substrate and/or benthic community types known to be present on the reef (Fig. 1). The benthic community classes included in the analysis are shown in Table 1.

Fig. 1
figure 1

Heron Reef, southern Great Barrier Reef, Australia. (a) Location of photoquadrat transect surveys on Heron Reef collected over a period of 17 years, (b) example of the individual photoquadrat locations along the transect survey where each individual point represents a photoquadrat, and (c) conceptualisation of snorkeler-based georeferenced photoquadrat transect surveys.

Table 1 Benthic community and coral type descriptions and their class codes used for photoquadrat annotation.

Georeferenced photoquadrat data collection

Detailed information on benthic community composition was gathered at Heron Reef on the reef flat (0–2 m depth) and at the 5 m contour on the reef slope using a repeatable and fine spatial scale (sampling every 2–4 m) technique for surveying benthic cover11. The technique required a snorkeler or diver manually capture georeferenced photoquadrats along defined transect surveys using a standard digital camera in a waterproof housing (e.g. Sony Cyber shot, Canon AA540, Lumix, or Olympus T4). A plumb-line attached to the camera, ensured that the footprint of each photoquadrat approximated 1 m2 of the benthos.

From 2002–2004, a 100 m transect tape was deployed at each defined survey start site at a maximum depth of 3 m, or on scuba at 5 m depth. From 2005 onwards, instead of deploying a tape, the surveyor towed a standard handheld GPS (e.g. Garmin eTrex, Garmin 72) at the surface in a waterproof bag for all surveys. This enabled accurate registration of the location of the acquisition of each photoquadrat, which was subsequently assigned via time synchronization, with the track log from the towed GPS. Once this method was established transect survey lengths were extended to distances of 500 m–1500 m. The start and end point of each transect was defined by GPS waypoints, permitting accurate revisits in subsequent years. The distance between successive photoquadrats was estimated by the surveyor’s kick cycle. However this was not considered a problem as the exact location of each photograph was known through the GPS synchronisation.

All surveys were performed during the day, and derivation of sunlight and sun angle can be ascertained through the timestamp of each photoquadrat and its corresponding GPS location. Reef Flat surveys were collected at high tide to provide sufficient water depth for the snorkeler to safely traverse the reef. Reef Slope surveys were collected at low tide. No water quality information was recorded.

The locations of the transect surveys were chosen to ensure they traversed gradients or edge features to detect any change in benthic cover over these features. This was done initially through visual assessment of existing satellite imagery in combination with expert knowledge of the study area. The aim was to produce data that provided an adequate representation of the variation in benthic community cover across Heron Reef. Limited transect surveys were located within the deep lagoonal area of the reef, as this area is hard to access by boat due to tidal range restrictions permitting short working times in the lagoon. Transect surveys were revisited in subsequent years, and additional transect surveys were included on subsequent trips based on increased knowledge of the environment. The benthic data sets and photoquadrat images are available at20.

Automated photoquadrat analysis for benthic community composition

Percentage cover of the benthic communities for each photoquadrat was determined through a machine-learning (ML) approach which assessed benthic community composition. A previously devised category scheme consisting of 63 class codes that differentiated all major GBR-specific coral morphologies and other bottom types was used21 which, following machine annotation, were collapsed first into broad groups and subsequently into six simplified groups for validation purposes (Table 1).

Initial training of the ML platform was achieved via manual annotation of approximately 5% of the total number of photoquadrats (equivalent to 108,700 annotated points; based on21), to achieve a machine annotation accuracy of >70% as determined by the classifier21. A unique source was created for each camera used. To give a default and uniform image annotation area, boundaries of 5% were used for the top and left sides of the photoquadrat, whilst a boundary of 95% was used for the right and bottom sides of the photoquadrat. Annotation points (50) were generated randomly over the entire annotation area per photoquadrat. For manual annotation of photoquadrat sets, the level of confidence was set to 100%. A further approximately 2.5% of photoquadrats were manually annotated in an identical manner to provide a validation data set to calculate the accuracy of the machine annotation. Automated annotation of the remaining 92.5% of the photoquadrats was achieved subsequently22.

Data Records

Detailed information regarding the output benthic cover percentages and the number of benthic photoquadrats acquired for each field campaign are documented in Table 2. The benthic data sets and photoquadrat images are available at20, with the photoquadrats and benthic cover analysis for individual survey years accessible online through the campaign specific DOIs listed in the table, from where the data can be downloaded directly.

Table 2 Overview of the data files that represent the 58,941 georeferenced photoquadrats captured during the field campaigns, in addition to links to the percentage benthic cover data sets generated via machine learning for each year.

Technical Validation

To understand the validation technique applied to these data sets, it is important to reiterate the purpose of collecting the data set itself, which was a fast field method to gather benthic community information over a large spatial extent, whilst accurately representing variability. Validation of the data set was conducted on various levels, and included: standardisation of photoquadrat capture method and conditions, and a quantitative accuracy assessment.

Standardisation of photoquadrat image capture

To standardise photoquadrat image capture, the camera and lens setup used was calibrated prior to annual survey, so as to capture a footprint that covered the same extent of the benthos. This was accomplished by attaching a plumb-line to the camera system such that when it touched the bottom, the captured photoquadrats represented ~1 m2 of the benthos. To do this standardisation, the camera was moved vertically over a marked 1 m2 until the field of view enveloped the area, and the plumb-line was fixed. During the survey the operator used the plumb-line to determine the camera height above the ground. When held vertically with the weight touching the substrate this permitted reproducible capture of photoquadrats that covered the same area for all surveys. Light conditions were generally the same for each expedition, the data collected over a consecutive 4–5 day period, with stable weather, water clarity conditions and tidal range. Ideally light conditions would have been standardised using a strobe, however this would slow down the speed of the transect surveys.

Quantitative accuracy assessment

To determine the accuracy of the machine annotation we constructed a confusion matrix that compared, for a select set of validation photoquadrats, the benthic composition output from the machine learning annotation (modelled data), with the equivalent manual annotations (reference data). Using the confusion matrix we calculated the overall accuracy and the individual benthic label user and producer accuracy following a well-documented method3. All cameras demonstrated an overall accuracy of between 74% and 82% (Table 3;3). To provide a validation data set, ~2.5% of photoquadrats were manually annotated in an identical manner to the training data (36,950 annotated points; see Methods Section).

Table 3 Quantitative assessment of the machine annotation stevia construction of a confusion matrix.