The Coral Trait Database, a curated database of trait information for coral species from the global oceans

Trait-based approaches advance ecological and evolutionary research because traits provide a strong link to an organism’s function and fitness. Trait-based research might lead to a deeper understanding of the functions of, and services provided by, ecosystems, thereby improving management, which is vital in the current era of rapid environmental change. Coral reef scientists have long collected trait data for corals; however, these are difficult to access and often under-utilized in addressing large-scale questions. We present the Coral Trait Database initiative that aims to bring together physiological, morphological, ecological, phylogenetic and biogeographic trait information into a single repository. The database houses species- and individual-level data from published field and experimental studies alongside contextual data that provide important framing for analyses. In this data descriptor, we release data for 56 traits for 1547 species, and present a collaborative platform on which other trait data are being actively federated. Our overall goal is for the Coral Trait Database to become an open-source, community-led data clearinghouse that accelerates coral reef research.


Background & Summary
Most ecosystems are rich in species that display a wide diversity of characteristics 1 (i.e., traits). One way to make meaningful generalizations from this diversity has been to identify physiological, ecological or functional traits of organisms to infer (e.g., using traits as explanatory variables) patterns of demography, distribution and abundance, and more broadly, ecosystem function and evolution 2 . Moreover, species traits can be used as explanatory variables for the responses of ecosystems to environmental change, as functionally significant traits mediate species' responses to disturbances 3 . Recently, research has demonstrated the utility of trait-based approaches for understanding the effects of anthropogenic disturbances 4 , the provisioning of ecosystem services 5 , species distributions 6-8 , species composition 9,10 , and energetic and ecological trade-offs 11,12 . In seminal papers, compilations of species trait data with broad taxonomic coverage have revealed, for example, a general axis of variation in plants that describes costs and benefits of key chemical, structural and physiological traits 11 ; and factors influencing the metabolic rates of organisms 13 . However, such broad-scale insights have been restricted to relatively few taxonomic groups, often due to lack of data, particularly information about the ecological context in which data were collected, when such data do exist. Trait data for stony corals (Cnidaria: Scleractinia) have been collected for more than 100 years and published in many languages. Sufficient data might well exist already for addressing broad-scale hypotheses regarding the ecology and evolution of corals. Although trait compilations are accumulating 4,14-16 , and new statistical approaches for analysing such data are emerging 7,12 , these datasets are typically gathered for specific traits in isolation to address specific questions which can result in duplication of effort by separate research groups (e.g., Darling et al. 12 and Pratchett et al. 17 both independently compiled growth rate data). Trait data also tend to be gathered rapidly, for instance with means extracted from tables that present a mixture of original data and data collected previously by others (i.e., meta-analyses). Such a rapid assembly of data can result in omission of important contextual information (e.g., local environmental conditions and levels of variation and replication), confusion about the origin of the data, preventing appropriate provenance and credit 18 , and the accidental duplication of data points in large datasets.
In this data descriptor, we introduce the Coral Trait Database: a curated database of trait information for coral species from the global oceans. The goals of the Coral Trait Database are: (i) to assemble disparate information on coral traits, (ii) to provide unrestricted, open-source access to coral trait data, (iii) to facilitate and encourage the appropriate crediting of original data sources, and (iv) to engage the reef coral research community in the collection and quality control of trait data. We release 56 errorchecked, validated and referenced traits, and also provide their context of measurement, together with an online system for transparently and accurately archiving and presenting coral trait data in future research. Our vision is an inclusive and accessible data resource to more rapidly advance the science and management of a sensitive ecosystem at a time of unprecedented environmental change.

Methods
The data are held in the Coral Traits Database (https://coraltraits.org). The database was designed to contain individual-level traits and species-level characteristics and is currently focused on shallow water zooxanthellate ('reef building') scleractinian corals. Individual-level traits include any potentially heritable quality of an organism 19,20 . In the database, individual-level traits are accompanied by contextual characteristics, which give information about the environment or situation in which an individual-level trait was measured (e.g., characteristics of the habitat, seawater or an experiment). These contextual variables are important for understanding variation in individual-level traits (e.g., as predictor variables in analyses). For example, if measurement of colony growth rate was measured at a given depth, the latter datum is included to provide important information for the focal measurement. Some individual-level traits have no or little variation (e.g., mode of larval development), and therefore contextual information is not required. Species-level characteristics do not have contextual information because they are characteristics of species as entities (such as geographical range size and maximum depth observed). For simplicity, we use the single term 'trait' to refer to individual-level (variant and invariant), specieslevel (emergent) and contextual (environmental or situational) measurements. Moreover, these traits are grouped into ten use-classes based on various sub-disciplines of reef coral research: biomechanical, conservation, ecological, geographical, morphological, phylogenetic, physiological, reproductive, stoichiometric, and contextual.

Observation and measurements
The database contains two core data tables-Observations and Measurements-each of which has a series of associated tables (Fig. 1). We follow the high-level structure of the Observation and Measurement Ontology 21 in that observations bind related measurements and potentially provide context for other observations.  The observation table contains information about the observation of a coral or coral species. Observation-level data must include the Enterer, Species, Location and Resource. Access is an optional variable, and can be controlled by database users entering data for a project that has not yet been published (see https://coraltraits.org/procedures for more information). Observation-level data are the same for all measurements corresponding to the observation. Measurement-level data include the Trait, Value, Standard (measurement unit), Methodology, and estimates of precision (if applicable).   hypothetical example given in Fig. 1b is for growth rate that was measured within the context of a water depth and habitat that were given in the published resource.
The Species table provides taxonomy that is regularly updated by the Taxonomy Advisory Board (https://coraltraits.org/procedures) to keep pace with the rapid rate of revision 22-24 . The table contains the valid name for each coral species based largely on the World Register of Marine Species (http://www. marinespecies.org), the major clade (Basal, Robust or Complex 25 ), family based on molecular work 26 , family based on morphology (following Cairns 27 or Veron 28 ), and other names and synonyms.

Data acquisition
All public data in the Coral Trait Database and included in this data descriptor release are linked with published resources, which include peer-reviewed papers, taxonomic monographs and books. The original source of entered data must be included (called the primary resource), even when extracted from secondary compilations (e.g., for the purpose of meta-analyses). Secondary sources can be included optionally, and so the database captures both the original data collector and subsequent data compilers, which allows both to be credited when re-using data. Measurement value types, which can be flexibly added to, currently include: raw, mean, median, maximum, minimum, expert opinion (the view of a single expert), group opinion (the consensus of a group of experts), and model derived. Continuous data are typically means extracted from tables or figures unless raw data are available. When available, aggregate values such as means and medians should be accompanied by the number of replicates and a measure of dispersion (e.g., standard deviation). Means and estimates of dispersion from figures in resources were captured using ImageJ 29 . The data released in this data descriptor have broad taxonomic (Fig. 2), global (Fig. 3) and phylogenetic (Fig. 4) coverage. However, some large data gaps exist, because few species have been comprehensively measured in many locations.

Data Records
A static release of the 56 traits contained in this descriptor is available from the Coral Trait Database (Data Citation 1) and Figshare (Data Citation 2). Details and references for the trait data are summarised in Table 1 (available online only). Up-to-date data can be downloaded directly from the database. However, as validation (see Technical Validation, below) and data entry is ongoing, users are recommended to pull data from the static releases, to ensure results remain consistent as the database is updated. Both static releases and datasets downloaded from the database are accompanied by the primary (and, if applicable, secondary) resource lists for the data, which should be credited wherever feasible.

Technical Validation
The database is curated on a voluntary basis, which includes a Managerial Board, Editorial Board, Taxonomy Advisory Board and Database Administrator (https://coraltraits.org/procedures). Database Contributors who add data for a new trait are typically asked to be that trait's editor. Quality control of data and editorial procedures include: Contributor approval: Database users must request permission to become a database contributor, and any observations entered by the contributor are associated with their user account.
Editorial approval: Once a contributor enters an observation of a coral trait, an email is sent automatically to the editor of that trait. The editor must approve the observation to remove the 'pending' flag from the observation record.
User feedback: Data issues can be reported for any observation using a simple form. Editors are automatically emailed if an issue with one of their traits is reported. Duplicate detection: Measurements with the same value, resource, location and species are flagged for confirmation.
Outlier detection: Frequency histograms are generated in real time when loading trait pages. Outliers can be detected visually (e.g., a very large value for continuous data or a category that has one or few associated measurements for categorical data).

Usage Notes
The data release is a compressed folder containing two files: 1. A csv-formatted data file containing all publicly available observation and measurement data, which includes contextual data. 2. A csv-formatted resource file containing all the resources (primary and secondary) that correspond with the data. Users are expected to cite the data correctly using these resources.
An example for extracting and reshaping release data for analysis can found online (https://coraltraits. org/procedures).