Throughout the tropics, coral reef ecosystems, which are critically important to people, have been greatly altered by humans. Differentiating human impacts from natural drivers of ecosystem state is essential to effective management. Here we present a dataset from a large-scale monitoring program that surveys coral reef fish assemblages and habitats encompassing the bulk of the US-affiliated tropical Pacific, and spanning wide gradients in both natural drivers and human impact. Currently, this includes >5,500 surveys from 39 islands and atolls in Hawaii (including the main and Northwestern Hawaiian Islands) and affiliated geo-political regions of American Samoa, the Commonwealth of the Northern Mariana Islands, Guam, and the Pacific Remote Islands Areas. The dataset spans 2010–2017, during which time, each region was visited at least every three years, and ~500–1,000 surveys performed annually. This standardised dataset is a powerful resource that can be used to understand how human, environmental and oceanographic conditions influence coral reef fish community structure and function, providing a basis for research to support effective management outcomes.
Machine-accessible metadata file describing the reported data (ISA-tab format)
Background & Summary
Coral reefs ecosystems are critically important to people; they provide food and livelihoods to millions of people worldwide1 and contribute to the cultural fabric of coastal communities. They are also important from a global heritage perspective, due to the intrinsic value of the biodiversity and richness of life they contain2. Globally, we have just experienced the third and longest mass coral reef bleaching event on record3. The dramatic loss of living coral and reef-associated populations caused by human-induced climate change4,5 threatens the integrity of coral reef ecosystems worldwide. Around human population centres, the impacts of climate change are typically compounded by multiple other stressors, such as unsustainable fishing practices and land-based sources of pollution6,7. There is therefore an urgent need to better understand the natural geographic and environmental variability of these systems, along with the key drivers of change, to inform and promote effective coral reef ecosystem management.
Large-scale and long-term monitoring datasets have an important role to play in this process. In particular, the implementation of standardised monitoring methods across gradients of oceanographic conditions and levels of human impact yields a powerful data resource that can be used to better understand the natural variability and differential susceptibility of coral reef ecosystems to local and global drivers. Here, we present such a dataset, which is collected for a long-term monitoring program, the National Oceanic and Atmospheric Administration (NOAA) Pacific Reef Assessment and Monitoring Program (RAMP). The NOAA Ecosystem Science Division (ESD) and partners have implemented RAMP—i.e., multi-disciplinary coral reef ecosystem monitoring—across U.S. and U.S.-affiliated territories in the Western and Central Pacific Ocean since 2000. The focus of this data descriptor is the reef fish and paired benthic habitat-monitoring component of Pacific RAMP that has been implemented since 2010, that being the entire period in which the survey design and monitoring methods have followed those specified in the National Coral Reef Monitoring Plan (NCRMP)8.
Prior to 2010, the Pacific RAMP used different survey methods and statistical sampling design to assess fish populations. Specifically, each region was visited every 2 years, with reef fish surveys conducted using belt-transects at haphazardly-located sites. Following a 2-year methods comparison period between 2007 and 2009, the reef fish-monitoring component was revamped into its current form, with the aims of systematising the design, maximizing survey site replication, and broadening the survey domain. To a large degree, the current Pacific RAMP statistical sampling design and survey methods were modelled on the fishery-independent diver visual survey program conducted in Florida by the NOAA Southeast Fisheries Science Center and partners9. The overarching goal that motivated the change was to generate data representative of coral reef hard-bottom substrate at the islands-scale for the numerous Pacific jurisdictions covered by our program. To that end, a wide (39 islands surveyed) but thin (~3 days per island per survey cycle, approximately 30–50 sites per island) was adopted. Around that time, we also shifted from a 2-year cycle to a 3-year cycle (i.e., each jurisdiction surveyed once every three years). Pacific RAMP surveys have been supplemented by additional survey efforts around American Samoa, Hawaiʻi and Guam and by data gathered on monitoring cruises led by Papahānaumokuākea Marine National Monument (PMNM)—in all cases using identical methods, design and, often, the same personnel.
This data descriptor is limited to data collected from 2010 onwards using a stationary point count (SPC) survey method and a randomized depth-stratified design. The survey domain for Pacific RAMP is all hard-bottom substrate in ≤30 m depth. In addition to fish counts, divers visually estimate benthic cover and habitat structural complexity, so that each fish count is paired with habitat information.
Between 2010 and 2017, ~4,700 such surveys were conducted across 39 islands and atolls. The data collected serve four main purposes: 1) to fulfil NCRMP mandates to assess the status and trends of reef fish assemblages across coral reefs of the U.S.8; 2) to provide data suitable to assess the status of coral reef fisheries stocks10,11; 3) to support federal and jurisdictional management by providing a broad spatial context to status and trends apparent from, generally, spatially smaller-scale surveys conducted by those agencies12; and 4) to generate a consistent and large-scale dataset as a resource for the scientific community13,
The data cycle spans three different steps: pre-field, in the field and post-field (Fig. 1). Prior to field data collection, we selected sites via a randomized depth-stratified design. In the field, the monitoring team accessed the survey regions on board the NOAA Ships Hiʻialakai and Oscar Elton Sette, but daily work was conducted from small boats that were launched from the ship and recovered each day. Survey divers collected data while using open-circuit SCUBA and entered data into a relational database (Microsoft Access). Upon completion of a survey mission, the data were migrated to an enterprise relational database (Oracle), and subsequently synthesized, and processed into analysis-ready data using standard scripts. Each of the three steps are described in turn below, and more detail is available in our standard operating procedure document19.
Statistical sampling design
Monitoring occurred in four regions; American Samoa, the Mariana Archipelago (the Commonwealth of the Northern Mariana Islands and Guam), Hawaiʻi (the main and Northwestern Hawaiian Islands) and the Pacific Remote Islands Areas (an administrative rather than geographic grouping). The goal was to survey reefs as widely as possible—i.e., survey effort is spread across many islands and atolls and, within each reef area, across as wide a domain as feasible—in this case all hard-bottom substrate in water shallower than 30 m. In total, 39 islands and atolls spread across the U.S. Pacific territories are surveyed for this program (Table 1). The 30-m depth limit is a safe-diving regulation. Typically, 3–5 days were spent at each island or atoll during each visit (generally once every 3 years), conducting 30–50 fish surveys during that time. Each island or atoll (henceforth ‘reef area’) is stratified by reef zone (backreef, forereef, protected slope, or lagoon, although the majority of reef areas only have forereef) and by depth zone: shallow (0–6 m), mid (6–18 m), and deep (18–30 m). In addition, there is a level of stratification based on ‘sector’ (i.e., section of coastline and/or management status). Sectors are only utilized at a number of the larger populated islands where there can be very different levels of management (e.g., protected areas), human population density or access. For example, Guam is subdivided into three sectors: ‘Marine Preserve’ (being all areas within Guam’s Marine Preserve System); ‘Guam Open East’ (areas outside of Marine Preserves on east side of Guam); and ‘Guam West.’ Similarly, the main Hawaiian Islands and Tutuila have between two and seven sectors per island, with sector boundaries designed to reflect broad differences in oceanographic exposure, reef structure, local human population density and management status. At the majority of the inhabited islands, supplemental survey operations or additional survey days during routine cruises, have allowed for higher sampling density around those human population centres than at remote islands. Finally, three neighbouring islands in the northern Commonwealth of the Northern Mariana Islands (CNMI)—Alamagan, Guguan and Sarigan (‘AGS’)—are routinely pooled into a single statistical sampling and reporting unit, as their small size makes it infeasible to allocate sufficient time (and therefore number of surveys) for us to be able to generate meaningful summary metrics for each island individually. The statistical sampling design, terminology and reporting units are summarized in Table 2. A summary of the number of sites surveyed per island per habitat strata is presented in Table 3.
Prior to each survey mission, sample site locations were randomly selected from geographic information system (GIS) substrate and strata maps maintained by the ESD. These maps were created using information from the NOAA National Centers for Coastal Ocean Science (NCCOS), reef zones (e.g., forereef) and geomorphologic structures digitized from IKONOS satellite imagery or nautical charts, bathymetric data from the ESD-affiliated Pacific Islands Benthic Habitat Mapping Center, University of Hawaiʻi at Mānoa, and prior knowledge gained from previous visits to survey locations.
Logistical and weather conditions factor into the planning and allocation of survey effort around each island. Small islands can be assumed to be randomly surveyed in their entirety i.e., all stretches of coastline can have a random site assigned to it, as these islands, weather permitting, can typically be circumnavigated by a small boat in a day. For islands too large to sample in their entirety, we break the coastline into 4–6 fixed evenly spread out sections, in which random sites are assigned to. Prior to data collection, these constraints determined the section of target habitat from which sites are randomly selected and the position of the ship during the survey mission. Prior to each cruise, the target number of sites per stratum is determined by proportionally allocating total expected sites at the reef areas (generally 30–50) based on a weighting factor calculated from the size of the strata and the variance of the target output metrics (e.g., consumer group biomass and total fish biomass), and adjusted for what is feasible given operational constraints—e.g., the safety limit on frequency and duration of deep dives.
At each reef fish survey site, two types of data are collected as part of a rapid ecological assessment (REA): visual counts of the fish assemblage and an assessment of benthic habitat including site characteristics such as water clarity and depth. We use a form of stationary point count (SPC)19, which involves a pair of divers conducting simultaneous counts in adjacent, visually estimated 15-m diameter cylindrical plots extending from the substrate to the limits of vertical visibility. Prior to beginning each SPC pair, a 30-m gray polyester transect line is laid across the substratum. Markings at 7.5, 15 and 22.5 m enable survey divers to locate the midpoint (7.5 m or 22.5 m) and two edges of their survey plots. Prior to 2015, divers visually estimated water visibility, but since 2015 horizontal visibility on survey sites has been measured using a Secchi disc. To do this, the diver who laid out the transect swims back down the 30 m transect line towards the second diver who, positioned at 0 m, holds a slate up with black and white Secchi quadrants. The first diver then estimates water clarity as the point along the line where the black and white Secchi disc quadrants become visible.
Surveying the reef fish assemblage.
Each fish count consists of two main parts. The first of these is a 5-min species enumeration period in which each diver generates a list of the taxa observed within their cylinder—to species if possible. Divers record the taxa using four-letter codes, which are linked to a species table in the database with full names and other information about each species. Species identification is based on the assessment of experienced fish survey divers, who are trained using the Fish SPC Method training package available on the Pacific Islands Fisheries Science Center webpage (https://www.pifsc.noaa.gov/cred/survey_methods/fish_surveys/rapid_ecological_assessment_of_fish-survey_method_training.php), and who verify species identification with various sources (e.g., www.fishbase.org20,
Surveying the reef habitat.
After completing the fish survey, both divers scan the benthos in their survey cylinder for 2–3 min and visually estimate the percentage cover of: encrusting algae, fleshy macroalgae, hard coral, sand and other (turf algae, soft coral and cyanobacteria grouped together). Divers also record the depth at the centre and high- and low-edges of their cylinders (the latter two values providing a measure of slope), broad habitat type and structural complexity. Since 2012, divers record reef habitat complexity by visually estimating the percentage of the cylinder that falls into a series of bins representing different heights from the plane of the reef: <0.20 m, 0.20–0.50 m, 0.50–1 m, 1–1.5 m and >1.5 m. Prior to 2011, divers estimated reef substrate complexity on a five point scale (1–5). Divers also record the maximum height of substrate within their cylinders. Diver conduct a rapid visual assessment on the abundance of ‘free’ (e.g., Tripneustes spp., Heterocentrotus spp., Diadema spp. and Echinothrix spp.) and ‘boring’ (e.g., Echinometra spp. and Echinostrephus spp.) urchins using a DACOR scale each urchin category (for free urchins those are: D: Dominant [>100], A: Abundant [51–100], C: Common [21–50], O: Occasional [6–20], R: Rare [<5]; for boring urchins categories are D: Dominant [>500], A: Abundant [251–500], C: Common [101–250], O: Occasional [26–100], R: Rare [<25]). Finally divers identify the broad-scale habitat type in the general area of the survey. The habitat codes follow the geomorphological structures identified by the NCCOS24: 1) aggregate reef; 2) aggregated patch reefs; 3) aggregated patch reef (i.e., an individual patch reef); 4) pavement; 5) pavement with patch reefs; 6) pavement with sand channels; 7) rock/boulder; 8) reef rubble; 9) spur and groove; and 10) sand with soft coral or rock.
Calculating fish biomass and benthic cover estimates per site
Using the count and size estimate data collected per diver in each replicate cylinder survey, the body weight of individual fish is calculated using length-to-weight (LW) conversion parameters, and, where necessary, length-length (LL) parameters [e.g., to convert TL to fork length (FL) for species with LW parameters based on FL]. LW and LL conversion parameters are largely taken from two sources25,26. Where W is weight in grams, L is length in centimeters and a and b are constants. The term ‘biomass’ herein refers to the aggregate body weight of a group of fishes per unit area (g m−2). The diver-level data that are collected in adjacent cylinders at the same survey site are not independent replicates; therefore, a survey is always the combined data from the adjacent cylinders and this combined site-level data is the base sample unit of survey data. Site-level estimates (e.g., abundance, biomass, benthic cover, complexity) are calculated by taking the mean of the values from the adjacent diver-level counts conducted for each survey. Site-level fish metrics (e.g., abundance and biomass) can be pooled into the standard Pacific RAMP consumer group classification. The consumer groups are Primary Consumer, Secondary Consumer, Planktivore and Piscivore, and are based largely on diet data from FishBase.
The complete Pacific RAMP fish SPC dataset (2010–2017) (Data Citation 1: Figshare https://doi.org/10.6084/m9.figshare.c.3808039: xxx) is provided as a comma-separated file named NOAA_PACIFIC_RAMP_FISH_SPC_2010_2017_SCI_DATA_.csv. Each data record in the data file is an estimated count and size of a single fish taxa from a single observer (DIVER) at a single cylinder as part of a single survey at a particular site. Each record includes an observation type (see method description) along with metadata that relates to the species observed. Each data record has a unique numeric identifier (SITEVISITID) that relates to the survey (i.e., one dive at one site by one group of divers). Each site, i.e., survey location, also has a unique identifier (SITE) and because we do not revisit survey locations, SITE is also a unique identifier for the survey. Typically a site contains data records from a single SPC-pair, i.e., the two adjacent cylinders that are simultaneously surveyed by the dive team. Meta-data on benthic habitat, site location and sampling date relate to all data records within the same survey. Each column field in the provided data file is explained in full in Table 4 (available online only).
The processing code to generate a variety of survey-level summary metrics from the data records in the data file is provided as Supplementary File 1, this code depends on loading the custom set of functions provided as Supplementary File 2. The dataset we are providing is derived from the raw observation level data we store in Oracle at the Pacific Islands Fisheries Science Center. It differs in that redundant fields from the base data export from Oracle are excluded, diver identifiers are converted to unique numeric codes to preserve their anonymity and obsolete benthic habitat categories are removed.
Underwater visual censuses (UVCs) are commonly used to survey fish assemblages and benthic habitats for coral reef ecosystems. Potential sources of error within UVC methods include: inter- and intra-diver variability; the depth and time restrictions associated with using SCUBA, which limit more detailed assessments; differential detectability of species due to the habitat and environment (i.e., highly complex versus less complex habitats) or the behavioural interaction (attraction or repulsion) between divers and fish species. Estimates of fish abundance can also vary depending on the UVC method of choice. The impact that these potential sources of uncertainty can have on the quality of these data presented here are discussed in turn.
We address intra- and inter-diver variability in this dataset in two ways, both of which are considered a part of our routine data quality and validation procedures. Firstly, new divers who collect fish and benthic data for Pacific RAMP are trained in both fish identification and the survey protocol, in classroom and in-water sessions. The complete training package for CREP fish divers is available at: https://www.pifsc.noaa.gov/cred/survey_methods/fish_surveys/rapid_ecological_assessment_of_fish-survey_method_training.php. Prior to each cruise, all divers (whether new or experienced) must accurately identify >90% of fishes shown in a regional-specific fish species identification test. This test is intended to be difficult—in that it is weighted towards rare species and those that have conspecifics with similar appearance. Secondly, outside of the survey cruise season, fish divers take part in regular training exercises, typically on a bimonthly basis. Generally, this in-water training includes two dives: one to conduct a practice SPC survey, including the benthic habitat assessment, and another to estimate fish sizes, using a series of fish models of various sizes from 10 to >150 cm. Divers’ size estimates are then compared against the known sizes of the fish models used in that test (Fig. 2).
The remainder of our routine data quality and checking methods occur in the field, where we typically have between 4 and 10 fish survey divers. The dive buddy pairs are regularly mixed up throughout a survey cruise, and divers routinely discuss and compare species identification and sizes in the field. This is done immediately after a survey, as well as during the data entry stage, when divers check data entered by their diver partner against their datasheet for sizing, species identification and data entry errors. The difference between the estimates of each diver and those of their dive partner at each site is calculated, and referred to as diver performance. This can be done for any parameter estimated, but during field operations, we assess total fish biomass, species richness (number of unique species counted), the size distributions of commonly observed species, and benthic cover. Real differences between dive partners are expected, as divers survey adjacent cylinders, but not identical areas of reef. However, if there is no consistent bias in the estimates made by a diver, the median difference with their partner should be close to zero (i.e., half their estimates being higher than their partners’ half lower). Boxplots of diver performance, therefore, give 1) a strong but general indication of relative bias; if there is not consistent bias, then the median differences between a single diver and their dive partners will be close to zero, and 2) an indication of how variable each divers’ counts are compared to their dive partners. We generate boxplots of diver performance every few days to provide feedback on diver performance relative to the rest of the team and to allow for the early detection of observer error27.
Experienced survey divers are capable of accurately estimating coral cover based on visual assessments of the survey area28,29. During the fish surveys both a rapid visual assessment and a photo-quadrat survey of the benthos (not part of this Data Descriptor) are conducted after the fish count. From an earlier study comparing the visual assessment with photo-quadrats at the Pacific RAMP SPC survey sites, we know that, relative to photo-quadrat surveys of the same survey plot, divers tend to underestimate hard coral cover (by −3%), and encrusting algae (−2.3%) and overestimate fleshy macroalgae (6.5%)30. Rapid visual estimates have greater scope for observer-bias relative to the photo-transect method. Nevertheless, we believe this method provides a coarse but meaningful and immediately available estimate of benthic cover at the functional group level, estimates that are suitable for characterizing the benthos at a survey-site.
As with all visual survey techniques of fish assemblages, survey counts are affected by imperfect detectability. With the SPC this is particularly true for very small, cryptic and nocturnally active species. We consider the detectability of fishes in a SPC survey similar to other common whole assemblage reef fish survey techniques (such as a belt transect), because: 1) divers remain within the same survey area throughout the survey, therefore have multiple opportunities to observe species present within their cylinders; 2) fishes are counted in a series of rapid sweeps of the cylinder with similar species grouped together (i.e., divers are focused on one search image per sweep); this prevents divers form being overwhelmed by the exceptional abundance and high diversity we encounter at some—particularly remote- locations we survey and; 3) divers carefully swim through their cylinder at the end of the survey, recording species and fishes that may have been missed. Methods that survey larger areas of reef, such as a long belt transect, may allow for greater detectability of skittish fishes that move relatively far ahead of the survey divers, before they move out of the survey area. However, as we record observations of species for up to 30 min after the start of the survey, the SPC method provides opportunities to record fishes that return to the survey area, if divers are not perceived as a threat31.
Differences in fish behaviour in response to diver presence can be a source of bias in survey estimates. For example, target fishes may be wary of divers in locations where they have come to associate divers with a risk of being hunted; and alternatively, curious fishes may be more likely to approach divers closely enough to be counted in remote locations where divers are not perceived as a threat32. Although that effect can be substantial in some cases33, the scale of those effects varies among locations and depending on method used. In an attempt to quantify one important component of the disturbance caused by divers, we recently compared counts by divers using our methods on SCUBA (which emit noisy and conspicuous bubbles) with counts made by divers using closed circuit re-breathers (CCR, which do not) at a range of locations in the main Hawaiian Islands. While there were significantly higher counts of target fishes by CCR divers around the most heavily fished location (Oʻahu), those effects were much smaller than those reported around heavily fished parts of Guam33, and we found no clear effect at other locations, including Maui-nui where there is still considerable fishing effort34. Diver avoidance is clearly a potential concern for all underwater survey programs, but the evidence available indicates that these effects are relatively insignificant except at the extreme high-end of fishing pressure.
Each underwater visual survey method has its strengths, weaknesses and inherent biases. One of the more common UVC methods is the belt transect, and indeed before 2010, when we fully adopted the current survey design using the SPC method, our program monitored fishes via multiple 25-m belt transects per site. Between 2007 and 2009, we co-located belt and SPC surveys at 332 sites, across the regions surveyed for Pacific RAMP. Comparisons of the data generated by these two different methods indicated that (i) densities were similar for most taxa; (ii) SPC data tended to have lower variability (with the exception of small, benthic associated fishes such as damselfish, wrasse and hawkfish); and that (iii) the SPC method tended to generate fewer zero counts for most taxa than belt surveys (Supplementary File 3). Clearly these results are highly dependent on the specific variation of the belt method we implemented, but this comparison justified our program’s adoption of the SPC approach from 2010 on.
One aspect of the SPC that makes it suitable for the purposes of Pacific RAMP is our recording of ‘instantaneous’ count data, which are equivalent to a series of snapshots of fish presence. Divers systematically record one group of fishes at a time, carefully estimating their sizes in one sweep, rather than requiring divers to count and size multiple groups simultaneously. This instantaneous data is used to generate density estimates per unit area. In contrast, methods such as the belt transect typically record fishes present in, moving into or across a survey area ahead of the diver during some -often undefined- period of time. Depending on the purpose of the survey, it can be beneficial that open counts provide increased opportunities to record observations of mobile species, but time integrated counts of ‘open’ survey areas, like the belt transect will tend to overestimate the density of mobile species—potentially substantially35.
Indeed, the biomass estimates derived from the Pacific RAMP surveys tend to be lower than those from other reef fish surveys in the Pacific36. This is likely, in part, due to the aforementioned methodological differences between the SPC and the more commonly implemented belt transect method. The lower biomass recorded in this dataset could be due to our sampling design. In particular, the sampling domain of Pacific RAMP is likely broader than most other reef fish surveys. Specifically, we sample all hard-bottomed habitats in less than 30-m, which includes considerable areas of low relief and low coral cover habitats that typically have lower biomass than more structurally-complex habitats that are the focus of most survey programs (e.g., spur and groove or aggregate reef). Survey method choice and statistical sampling design can have large impacts on reef fish biomass estimates produced by reef fish surveys and, therefore, we caution data users against simply blending the data provided here together with data from other sources and recommend that the biomass estimates generated should always be considered as a relative, rather than an absolute, measure when compared to other data sources.
The code to generate site-level estimates of summary fish and benthic estimates from the raw observations is available with this paper (Supplementary Files 1 and 2). Pooling these data to generate island-level estimates requires knowledge of the statistical sampling scheme for each year and whether there were any additional projects that deviate from the standard Pacific RAMP design, such as an intensive survey effort within a particular bay. For this reason, we encourage data users to contact us (email: firstname.lastname@example.org with subject line: For the Attention of the Fish Team Lead) to discuss how best to handle these instances.
Users of these Pacific RAMP reef fish and paired benthic survey data should be aware of the following aspects of the dataset:
The different observation types. Data are recorded as one of five different ‘observation types.’ The majority of records—those where a species is observed during the enumeration period and where individuals of that species are present in the cylinder at the time of the tallying portion for that taxa—are recorded as ‘instantaneous’ observations (OBS_TYPE=‘I’). When a species is observed during the enumeration period but is not present during the instantaneous sweep for that taxa, divers record size and number present in the cylinder when it was first observed during the enumeration period and mark the data record as ‘non-instantaneous’ (OBS_TYPE=‘N’). Since 2012, we also record three other types of observations: 1) when a species is first observed in the cylinder between 5 and 10 min into the survey (i.e., in the first 5 min of the tallying portion), the diver conducts a rapid visual sweep of their cylinder for that species and records number and size as ‘five-to-ten’ (OBS_TYPE=‘F’); 2) when a species is first observed inside the cylinder any time after that, up to 30 min into the survey, the diver records the number and size as ‘ten-to-thirty’ (OBS_TYPE=‘T’); and 3) when the presence of other species of interest in the general vicinity of the survey, and seen at any time throughout the survey period is recorded as ‘present’ (OBS_TYPE=”P”). ‘Instantaneous’ data therefore come from a ‘closed count’ (i.e., representing the density of fishes within a defined area at one point in time). Other data types allow us to integrate data over longer time periods (i.e., to count fishes that are present in or move across the cylinder at some point through the course of the survey). That integrated data allows us to gather systematic data on relative abundance and size distribution of relatively rare, or skittish and/or more mobile species. Depending on the question of interest, we filter the data by its observation type. By default, we pool ‘I’ and ‘N’ data for routine reporting of density estimates, as that allows for the most continuous, comparable dataset, and because we found biomass estimates from I and N data to be relatively similar to those from our previous method (25-m belt transects, which are conducted using an ‘open’ survey method).
Data from adjacent SPC cylinder surveys conducted simultaneously are non-independent replicates that are averaged to create a mean estimate for the SPC-pair that is the normal base-level unit of data. In some cases, a site was surveyed by means of two SPC-pairs. However, we still do not consider those independent replicates as those were typically conducted within 20 m or less of each other. When that happens, data are averaged within the SPC-pairs, and then between SPC-pairs to generate site-level estimates.
These data are hierarchical in statistical sampling design. Summary statistics (e.g., mean and variances) of survey quantities (e.g., biomass) are calculated from the surveys within each stratum. To pool those up into larger units (e.g., island), we weight each stratum by its relative size (i.e., if a stratum is 50% of the total area for each reporting unit (typically island or atoll) then the weighting factor will be 0.5, and total of all weighting factors sums to 1 ref. 9. Per-strata mean and variance values are aggregated to a higher level (e.g., to island scale) using the formulas below:
pooled mean biomass (X) across S strata:
pooled variance of mean biomass (VAR) across S strata: where Xi is the estimate of mean biomass within stratum i, VARi is the estimated variance of Xi and wi is the stratum-weighting factor.
The SPC is a generalist survey technique, which is the method of choice for Pacific RAMP because our priority indicators are composite groups of reef fishes, as opposed to focusing on individual species. Cryptic, nocturnal, and rare species are not well represented by these surveys.
The presence of divers has the potential to alter fish behaviour which can inflate or deflate the counts of fishes, over-counting species that are attracted to divers, as is the case for sharks and jacks in the northwestern Hawaiian Islands, or undercounting species that tend to avoid divers, presumably through a flight response triggered by fishes associating divers with fishing29,36,37.
The method with which reef substrate complexity is measured has changed over time. To use substrate complexity data from 2010–2017, a linear regression can be applied to generate a standard conversion formula between the two methods38.
How to cite this article: Heenan, A. et al. Long-term monitoring of coral reef fish assemblages in the Western central pacific. Sci. Data 4:170176 doi:10.1038/sdata.2017.176 (2017).
Publisher’s note: Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Heenan, A. Figshare https://doi.org/10.6084/m9.figshare.c.3808039 (2017)
The statistical sampling design and survey methodology of this dataset is largely based on the coral reef monitoring efforts in Florida, led by the NOAA Southeast Fisheries Science Center and the University of Miami. Steve Smith and Jerry Ault, in particular, were instrumental in ESD adopting the random-depth stratified sampling design. We thank the many fish survey divers who contributed to data collection, in particular the core ESD fish survey divers, Jacob Asher, Paula Ayotte, Kelvin Goropse, Andrew Gray, Kevin Lino, Kaylyn McCoy and Jill Zamzow. These data are collected for NOAA’s National Coral Reef Monitoring Plan (NCRMP), which is funded by the Coral Reef Conservation Program (CRCP), and the Pacific Islands Fisheries Science Center. Surveys conducted in the northwestern Hawaiian Islands in years 2011–2013, and 2014–15 were led and funded by NOAA’s Papahānaumokuākea Marine National Monument (PMNM). Our thanks to Amanda Dillon who created Fig. 1.