A global coral-bleaching database, 1980–2020

Coral reefs are the world’s most diverse marine ecosystems that provide resources and services that benefit millions of people globally. Yet, coral reefs have recently experienced an increase in the frequency and intensity of thermal-stress events that are causing coral bleaching. Coral bleaching is a result of the breakdown of the symbiosis between corals and their symbiotic microalgae, causing the loss of pigments and symbionts, giving corals a pale, bleached appearance. Bleaching can be temporary or fatal for corals, depending on the species, the geographic location, historical conditions, and on local and regional influences. Indeed, marine heat waves are the greatest threat to corals worldwide. Here we compile a Global Coral-Bleaching Database (GCBD) that encompasses 34,846 coral bleaching records from 14,405 sites in 93 countries, from 1980–2020. The GCBD provides vital information on the presence or absence of coral bleaching along with site exposure, distance to land, mean turbidity, cyclone frequency, and a suite of sea-surface temperature metrics at the times of survey.


Background & Summary
The ubiquity of reef-building corals stems from their capacity to support symbiotic unicellular dinoflagellates, from the family Symbiodiniaceae, within their tissues 1 . The symbionts photosynthesize and translocate photosynthates to the coral animals, and in return corals produce organic wastes upon which the symbionts thrive 2 . This mutually beneficial relationship between corals and their symbionts has allowed corals to thrive in shallow, tropical and subtropical localities and build coral reefs for millennia. Recently, however, this relationship has become dysfunctional during marine heat waves, when seawater temperatures are anomalously high 3,4 . This dysfunctionality leads to the paling of corals through loss of pigmentation or loss of symbionts -more commonly referred to as coral bleaching ( Fig. 1) 3,5 . There is however considerable spatial and temporal variation in coral bleaching, depending on the intensity of thermal-stress events, geographic location 6 , the coral species 7 , historical conditions 8 , and on local and regional influences 9 . Here we were motivated to collate data on coral bleaching from around the globe, starting from 1980.
Two databases have previously been compiled, one by ReefBase (4146 records) (http://www.reefbase.org), which was terminated around 2010, and the second by Donner et al. 10 who collated 7429 data records on coral bleaching. Here we follow the previous database conventions to present a Global Coral-Bleaching Database (GCBD), obtained from seven data sources that encompasses 34,846 coral bleaching records from 14,405 sites in 93 countries, over 40 years, from 1980-2020 (Fig. 2). The database contains information on the presence and absence of coral bleaching-allowing comparative analyses and the determination of geographical bleaching thresholds-together with site exposure, distance to land, mean turbidity, cyclone frequency, and a suite of sea-surface temperature metrics at the times of survey.

Methods
The Global Coral Bleaching Database (GCBD) is available as a Microsoft Access database file and as a SQLite database file, the latter of which is directly accessible through R 11 . Examples of the R code that extracts data from the SQLite files ready for data analysis are provided in Table "R_Scripts_tbl". Data in the GCBD are stored in 20 related tables (see Fig. 3 Schematic of the database structure). The static location data (latitudinal and longitudinal coordinates, distance to land, and exposure) are stored in the Table "Site_Info_tbl". The primary geographical variable is a 'site' on a reef, recorded as latitude and longitude coordinates. A site can have multiple sampling events (i.e., multiple depths and/or multiple dates sampled), and these temporal events are stored separately in the Table "Sample_Event_tbl". Data collected during these sampling events are stored in three related tables: "Coral Bleaching data tbl" (% bleaching), "Coral Cover data tbl" (% hard coral cover), and "Environmental data tbl". Bleaching is an estimate of the number of bleached coral colonies relative to the number of colonies that are not bleached at a given site (i.e., site-wide bleaching). For any range estimates of coral bleaching, we took the mean value. Published works and any R code related to extracting or manipulating data are also stored in the R_Scripts_tbl and the Relevant_Works_tbl connected to the sampling event. Tables with enumerated lists are used to ensure integrity in naming conventions -such tables are denoted with "LUT", where LUT stands for look-up-table.
Normalization. If the site coordinates were not already in decimal degrees, they were converted to decimal degrees. The coordinates were entered into Google Earth and the location names, distance to land in meters, and exposure were determined for each site. Exposure was defined based on a site's potential exposure to predominate winds, swell, and fetch (i.e., extent of open ocean). Sampling points that fell on land or were >1 km from any coral reef were removed. The Marine Ecoregions of the World (MEOW) shapefiles were used to determine the marine realm of each site 12 . Veron's ecoregions shapefiles were used to determine the ecoregion of each site 13 . The Coral Reef Temperature Anomaly Database (CoRTAD version 6), which is a collection of sea surface temperature variables, were extracted for each sampling event 14 . CoRTAD values were only extracted for a sampling event if the coral bleaching data had a clearly defined month and year -where sampling events were missing a date, the 15 th day of the month was used. Cyclone frequency and turbidity data were added for each site 15 . For turbidity, we used a 4-km resolution data from NASA's (National Aeronautics and Space Administration's) Earth Observing System Data and Information System (EOSDIS) Modis-Aqua satellite database. We acquired these data from mid-2002 through to December 2017 (https://oceandata.sci.gsfc.nasa.gov/MODIS-Aqua/Mapped/Monthly/4km/Kd_490/). Cyclone data were collected from International Best Track Archive for Climate Stewardship (IBTrACS; www.ncdc. noaa.gov/ibtracs/index.php?name=ibtracs-data) as spatial points and imported into R 11 . These data were subset into storm categories based on wind speed, according to the Saffir-Simpson scale 15 . A raster file for the spatial frequency of cyclones was made in Quantum Geographical Information Systems (QGIS) using the 'heatmap' function, with a radius matching the radius of damaging winds (>26 ms −1 ) for each cyclone category. These radii followed Moyer et al. 16 and considered 50 yr of consistent sampling effort, between 1964 and 2014. Individual yearly raster files were summed to determine the number of cyclones per 9.2 km cell for the 50-year period. A raster file for the frequency of cyclones was created by interpolating wind speeds across all storm tracks using the inverse distance weighted interpolation in QGIS 15 . The Atlantic and Gulf Rapid Reef Assessment (AGRRA) 17 and the Florida Reef Resilience Program (FRRP) 18 had bleaching codes that were presented by transect instead of by site; these data were averaged and presented here at the site level. We did not include coral cover estimates for AGGRA and FRRP because both sampling strategies were designed to estimate coral populations at regional scales and not specifically to examine coral cover on reefs. Average depths (m) were used for the Donner et al. 10 data that had ranges in depth.

1) Site Information (Site_Info_tbl)
Latitude_Degrees: latitude coordinates in decimal degrees. Longitude_Degrees: longitude coordinates in decimal degrees. Ocean_Name: the ocean in which the sampling took place. Realm_Name: identification of realm as defined by the Marine Ecoregions of the World (MEOW) 12 . Ecoregion_Name: identification of the Ecoregions (150) as defined by Veron et al. 13 . Country_Name: the country where sampling took place. State_Island_Province_Name: the state, territory (e.g., Guam) or island group (e.g., Hawaiian Islands) where sampling took place. City_Town_Name: the region, city, or nearest town, where sampling took place. Site_Name: the accepted name of the site or the name given by the team that sampled the reef. Distance_to_Shore: the distance (m) of the sampling site from the nearest land. Exposure: a site was considered exposed if it had >20 km of fetch, if there were strong seasonal winds, or if the site faced the prevailing winds. Otherwise, the site was considered sheltered or 'sometimes' . 'Sometimes' refers to a few sites with a >20 km fetch through a narrow geographic window, and therefore we considered that the site was potentially exposed during cyclone seasons. We left the category 'sometimes' in the database because those sites were not clearly exposed sites, nor were they clearly sheltered sites, and future researchers may be interested in temporary exposure. Turbidity: kd490 with a 100-km buffer. Cyclone_Frequency: number of cyclone events from 1964 to 2014. Comments: comments of any issues with the site or additional information.

2) Sample Event Information (Sample_Event_tbl)
Site_ID: site ID field from Site_Info_tbl. Reef_ID: name of reef site that was adopted by sampling group (from ReefCheck www.nature.com/scientificdata www.nature.com/scientificdata/ Percent_Bleaching_RC_Old_Method: old method of determining percent bleaching from Reef_Check. Severity_Code: coded range of bleaching severity from Donner et al. 10  www.nature.com/scientificdata www.nature.com/scientificdata/ 11) Data Source Information (Data_Source_LUT) Data_Source: name of source of original data set. Sample_Method: Description of the sampling methods used to collect the data. If more than one method was used then we stated that an amalgamation of methods were used to collect the data, and the original papers are found in "Relevant_Papers_tbl", and can be referenced therein.

Database Queries
Fourteen summary queries have been created so researchers can easily extract the information they might need from the database and generate spreadsheets for data analysis. The queries are labelled sequentially. For example, a summary query has been generated that shows the sites, dates, mean coral cover, and mean bleaching, which is entitled "Query 1_Summary_Bleaching_Cover. " Some queries are necessary for the summary queries and are labelled subqueries.

technical Validation
The GCBD was curated by a Database Administrator (CK). No outside contributions are expected at this time. When coral bleaching datasets were added, there was a procedure to validate and standardize the site localities, including the following: