Individual back-calculated size-at-age based on otoliths from Pacific coral reef fish species

Somatic growth is a critical biological trait for organismal, population, and ecosystem-level processes. Due to its direct link with energetic demands, growth also represents an important parameter to estimate energy and nutrient fluxes. For marine fishes, growth rate information is most frequently derived from sagittal otoliths, and most of the available data stems from studies on temperate species that are targeted by commercial fisheries. Although the analysis of otoliths is a powerful tool to estimate individual growth, the time-consuming nature of otolith processing is one barrier for collection of comprehensive datasets across multiple species. This is especially true for coral reef fishes, which are extremely diverse. Here, we provide back-calculated size-at-age estimates (including measures of uncertainty) based on sagittal otoliths from 710 individuals belonging to 45 coral reef fish species from French Polynesia. In addition, we provide Von Bertalanffy growth parameters which are useful to predict community level biomass production. Measurement(s) growth Technology Type(s) otolithometry Sample Characteristic - Organism Actinopterygii Sample Characteristic - Environment coral reef Sample Characteristic - Location French Polynesia Measurement(s) growth Technology Type(s) otolithometry Sample Characteristic - Organism Actinopterygii Sample Characteristic - Environment coral reef Sample Characteristic - Location French Polynesia Machine-accessible metadata file describing the reported data: https://doi.org/10.6084/m9.figshare.13027817


Background & Summary
Anthropogenic disturbances, such as resource exploitation, pollution, and climate change, can significantly alter the structure and function of marine ecosystems [1][2][3] . Species differ in their contributions to ecological processes 4,5 ; thus, accurately gauging the susceptibility of ecosystems to disturbances requires high-resolution data on life history traits across a broad suite of species, especially in highly diverse ecosystems 1,6,7 . Somatic growth, the increase of size (and weight) over time, is a critical trait to gauge biological processes that range from individuals to entire ecosystems. For fishes, this trait is particularly important because it links past, present, and future population trajectories in the context of fisheries and stock management; thus, it directly pertains to the provision of ecosystem services. Moreover, somatic growth rate is directly correlated with the energetic demands of organisms. As such, it underlies bioenergetic models that quantify energetic fluxes from individuals to ecosystems [8][9][10] , such as biomass production [11][12][13] and nutrient cycling 14,15 . Quantifying somatic growth offers an opportunity to examine ecosystem function based on rates of ecological processes rather than employing traditional variables such as abundance or standing biomass 12,16 . Numerous temperate species have been extensively studied due to their commercial importance, but less information exists for the majority of coral reef species 17 . Reef fishes are extremely diverse, display a wide range of life history strategies, and provide an invaluable food source to millions of people in the world's tropics. Therefore, a detailed understanding of reef fish growth rates is critical.
Fish growth parameters can be estimated using several approaches, but those that link age to body size are the most common. Growth can be measured from features preserved in hard structures, such as scales, vertebrae, fin spines, cleithra, opercula, and otoliths 18  grow with the deposition of successive calcium carbonate layers, which respond to both circadian and seasonal rhythms [19][20][21][22] . Fish growth parameters can then be obtained with various models such as Gompertz, Logistic, or Von Bertalanffy (with the latter being the most commonly used approach) 23 . Such growth models can only be fitted based on a large number of individuals that cover the complete size range of the study species. However, due to the required sample sizes and the need for lethal sampling, obtaining such datasets is time consuming. Further, the raw data that permit size-at-age estimates are often unpublished, available only from technical reports, and/or available for a limited suite of commercial species. Multi-species growth curve comparisons are particularly rare, especially across a wide range of environmental conditions that may influence individual growth rates. Therefore, a back-calculation model that estimates fish size across previous ages based on otoliths represents an alternative to model growth 24 .
Here, we provide a comprehensive dataset of raw otolith reads (51 species, 855 individuals) for corals reef fishes, collected across six islands in French Polynesia. Further, we provide the back-calculated size-at-age by species (45 species, 710 individuals); and by species across multiple locations (44 species, 669 individuals) using a Bayesian back-calculation model inspired by Vigliola and Meekan 24 . The inclusion of back-calculated size-at-age values alongside the raw data allows users to fit any regression model in line with their scientific question (Fig. 1). Finally, we provide Von Bertalanffy growth parameters estimated with Bayesian framework both by species and by species across multiple locations (when possible).

Study locations.
Extending over 2,500,000 km 2 , French Polynesia includes 118 islands spread across five archipelagos: the Society Islands, Tuamotus, Marquesas, Austral Islands and Gambiers. We collected data across four archipelagos, including six distinct islands: Mo' orea and Manuae (Society Islands), Hao and Mataiva (Tuamotus), Mangareva (Gambiers), and Nuku Hiva (Marquesas) (Fig. 2). All fishes were collected in the lagoon and/or reef slope, depending on the accessibility of the respective habitats. Sea surface temperatures (SST) substantially varies around these six islands distributed across French Polynesia (Table 1) Additional fishes from Mataiva were bought at the fish market in Tahiti. All applicable international, national, and/or institutional guidelines for the care and use of animals were followed.
Taxonomy and systematics. Fishes were identified using Bacchet et al. 25 and Moore and Colas 26 .
Permits. Sample collection was permitted by the French Polynesian government (authorization number: 681MCE/ENV).
Research methods. Field/Laboratory. In the laboratory, total length (TL) was measured to the nearest millimeter, and fishes were weighed to the nearest 0.1 grams. Then, pairs of sagittae (the largest otoliths of the inner ear) were extracted, cleaned with distilled water, dried, and stored in microtubes.
For each species, otoliths were cut transversely, using a diamond disc saw (Presi Mecatome T210) to obtain a section of 500 µm. Sections were then fixed on a glass side with thermoplastic glue (Crystalbond TM). Small otoliths were directly embedded in the thermoplastic glue and polished to obtain a transversal section. Otoliths were www.nature.com/scientificdata www.nature.com/scientificdata/ sanded with abrasive discs of decreasing grain size (2,400 and 1,200 grains cm −2 ) and polished with a 0.25 µm diamond suspension to reach the nucleus. All sections were photographed under a Leica DM750 light microscope with a Leica ICC50 HD microscope camera and LAS software (Leica Microsystems). When sections were too large for a single photograph, multiple photographs were taken and assembled with the software Photostitch (Canon).
A standardized transect across the otoliths (from the nucleus to the edge) was chosen for each species. On this transect, fish age was estimated and distances between annual growth increments were measured using the software ImageJ (Supplementary File 1). The age estimation was performed twice by two independent researchers to prevent biases induced by a single observer. When the coefficient of variation between the two observers was greater than 5%, a common reading was assessed for each section 21 .
Back-calculation. We then used a back-calculation procedure 24 to estimate fish length at previous ages, which we modified to also quantify the uncertainty around the obtained length estimates. This method requires an examination of the shape of the relationship between the length at capture (L cpt ) and the radius of the otolith at capture across all samples (R cpt ) as follows:  www.nature.com/scientificdata www.nature.com/scientificdata/ For some individuals, it was not possible to measure the R 0p value. Nevertheless, these individuals were still included in the back-calculation model. To do so, we included all missing R 0p values as parameters in the model that are estimated in the posterior 28 . Specifically, these missing R 0p values were simultaneously modelled with the known R 0p values, so that their prior distribution was defined by the distribution of the known R 0p values. These prior distributions were then updated with the information provided by the aforementioned relationship (Eq. 1). Consequently, each missing R 0p value had a unique posterior distribution.
For all 4,000 iterations used to fit the models, we used parameters b and c (Eq. 1), to then quantify another parameter, the parameter a, combining both (Eq. 2).
p p c 0 0 Next, the back-calculation with the Modified Fry (MF) model (Eq. 3) 29 was applied to quantify fish lengths at all ages for each individual, using parameter a for each iteration. where L i and R i are the fish length and otolith radius at age i, L 0p and R 0p are the fish size and radius of otolith at hatching. L 0p is provided for each species (Online-only Table 1). We calculated L i for the species that had sufficient replicates, and when possible also per species in each location separately. The estimation of parameters b and c (Eq. 1) required at least two values of R 0p , so the back-calculation was not carried out when only one R 0p was available for a given species (or a given species in a certain location).
Individuals with estimated age at capture of one year where not used for back-calculation. Finally, we reported the averages and standard deviations of those length estimates based on the 4,000 iterations. As such, the back-calculated estimates include a measure of uncertainty that can be integrated in the future applications.
Von bertalanffy growth curves. The Von Bertalanffy growth model (Eq. 4) is the most frequently used model to describe fish growth. This model is defined as: where Lt is the average length at age i, L ∞ is the asymptotic average length, K is the growth rate coefficient, and t 0 is the age when the average length was zero. In order to validate the accuracy of our back-calculated size-at-age data,  www.nature.com/scientificdata www.nature.com/scientificdata/ we compared growth curves fitted with raw data (total length at capture and estimated age at capture) to those fitted with back-calculated data. As back-calculated size-at-age data within individuals are highly auto-correlated, we designed a Bayesian hierarchical model that takes this auto-correlation into account by fitting individual growth curves as well as an average population-level growth curve. The model was applied on back-calculated data with at least five individuals and for individuals with an age at capture that was greater than two years.
We fitted models both for each species and for each species per location. In all models, we used informative priors for growth parameters extracted from FishBase (https://www.fishbase.se/search.php). We ran models with 2,000 iterations and a warmup of 1,000. When the R was above one, indicating non-convergence of the Markov Chains Monte Carlo (MCMC), we ran models again augmenting iterations to 4,000 with a warmup of 2,000. If despite that, model convergence was still not achieved, we use MCMC chain plots of the model parameters to remove the individual(s) responsible for non-convergence.
As a comparison, we also ran a general non-linear Bayesian model on the raw data (i.e. using size and age at capture only). Back-calculated data contains more points (multiple points for each individual) than raw data (one point by individual), so the comparison was limited to the species with a sufficient number of individuals (n > 10) and age range in the raw data. These models were run using the package brms 30 .
All analyses were done with the software R v.3.6.3 31

technical Validation
• The validity of fish names and families were verified on the World Register of Marine Species (WoRMS; http:// www.marinespecies.org/index.php) and FishBase (https://www.fishbase.in/search.php). • Each otolith was read twice by two readers to limit observer biases for age estimations. When the coefficient of variation between observers was greater than 5%, a common reading was assessed for each section 21 . Moreover, for each species, we provide a photograph of an otolith section with annual increments and reading axes (Supplementary File 1). • To validate the accuracy of back-calculated data, growth curves fitted on back-calculated size-at-age were compared to those from raw data (total length at capture and estimated age at capture) (Fig. 3). This comparison was not possible for a species when the number of collected individuals was too low to fit a growth curve. Comparisons were possible for fifteen species, and for each of them, the 95% credible intervals overlapped between growth curves fitted on back-calculated data versus raw data, suggesting negligible differences between the two approaches (Fig. 3). Moreover, for all species, the curves from back-calculated data were always below those from raw data, indicating no overestimation of L ∞ . Further, because the back-calculated lengths also include the lengths at age zero, the length at hatching is more realistically represented in the regression models from the back-calculated data. Consequently, when using back-calculation, estimates of K tend to be higher and L ∞ tend to be lower. The Von Bertalanffy growth parameters from our back-calculated size-at-age data by species and species across multiple locations are available online (Online-Only Table 2). • Further, Von Bertalanffy growth parameters estimated from otoliths were extracted from published articles, book chapters, reports and Ph.D. theses and compared to back-calculated parameters from our study (Online-only Table 3). For most species, the growth parameters from our study were similar to those in the literature. Differences may stem from different geographical locations (different temperatures, primary productivity, etc.), the number of analyzed fishes, different length measurements (standard, fork, or total length), or variations in modeling approaches. • Finally, we compared our age estimates to the maximum ages reported in the literature (Online-only Table 3).

Usage Notes
The dataset is provided as a csv file, which can be directly used by most statistical software. It contains eighteen variables, as described in Table 2. Additional growth parameters can be obtained by fitting other growth models (e.g. Gompertz model) using the variables 'Age i ' and 'Li_sp_m' (species across all locations) or 'Li_sploc_m' (species by location). Back-calculated data are highly auto-correlated, so we recommend using a hierarchical structure to fit growth models.
Within the dataset, 'NA' indicates a missing value. Missing values are present for the variables 'R i ' (n = 387), 'R 0p ' (n = 2,811), 'Li_sp_m' (n = 410), 'Li_sp_sd' (n = 410), 'Li_sploc_m' (n = 757), 'Lp_sploc_sd' (n = 757), and 'Weight' (n = 603). For the variable 'R i ' , missing values correspond to individuals for which it was impossible to estimate the radius at hatching from photographs. The 'R 0p ' values correspond to 'R i ' values where 'Age i ' is equal to zero. Because the 'R 0p ' value is the same for all 'Age i ' of a given individual ('ID'), a large number of NAs arises as soon as the 'R i ' value