## Background & Summary

Anthropogenic disturbances, such as resource exploitation, pollution, and climate change, can significantly alter the structure and function of marine ecosystems1,2,3. Species differ in their contributions to ecological processes4,5; thus, accurately gauging the susceptibility of ecosystems to disturbances requires high-resolution data on life history traits across a broad suite of species, especially in highly diverse ecosystems1,6,7. Somatic growth, the increase of size (and weight) over time, is a critical trait to gauge biological processes that range from individuals to entire ecosystems. For fishes, this trait is particularly important because it links past, present, and future population trajectories in the context of fisheries and stock management; thus, it directly pertains to the provision of ecosystem services. Moreover, somatic growth rate is directly correlated with the energetic demands of organisms. As such, it underlies bioenergetic models that quantify energetic fluxes from individuals to ecosystems8,9,10, such as biomass production11,12,13 and nutrient cycling14,15. Quantifying somatic growth offers an opportunity to examine ecosystem function based on rates of ecological processes rather than employing traditional variables such as abundance or standing biomass12,16. Numerous temperate species have been extensively studied due to their commercial importance, but less information exists for the majority of coral reef species17. Reef fishes are extremely diverse, display a wide range of life history strategies, and provide an invaluable food source to millions of people in the world’s tropics. Therefore, a detailed understanding of reef fish growth rates is critical.

Fish growth parameters can be estimated using several approaches, but those that link age to body size are the most common. Growth can be measured from features preserved in hard structures, such as scales, vertebrae, fin spines, cleithra, opercula, and otoliths18. For teleost fishes, the most commonly used and reliable approach to estimate age is the analysis of growth rings found on otoliths. Otoliths are calcified structures of the inner ear that grow with the deposition of successive calcium carbonate layers, which respond to both circadian and seasonal rhythms19,20,21,22. Fish growth parameters can then be obtained with various models such as Gompertz, Logistic, or Von Bertalanffy (with the latter being the most commonly used approach)23. Such growth models can only be fitted based on a large number of individuals that cover the complete size range of the study species. However, due to the required sample sizes and the need for lethal sampling, obtaining such datasets is time consuming. Further, the raw data that permit size-at-age estimates are often unpublished, available only from technical reports, and/or available for a limited suite of commercial species. Multi-species growth curve comparisons are particularly rare, especially across a wide range of environmental conditions that may influence individual growth rates. Therefore, a back-calculation model that estimates fish size across previous ages based on otoliths represents an alternative to model growth24.

Here, we provide a comprehensive dataset of raw otolith reads (51 species, 855 individuals) for corals reef fishes, collected across six islands in French Polynesia. Further, we provide the back-calculated size-at-age by species (45 species, 710 individuals); and by species across multiple locations (44 species, 669 individuals) using a Bayesian back-calculation model inspired by Vigliola and Meekan24. The inclusion of back-calculated size-at-age values alongside the raw data allows users to fit any regression model in line with their scientific question (Fig. 1). Finally, we provide Von Bertalanffy growth parameters estimated with Bayesian framework both by species and by species across multiple locations (when possible).

## Methods

### Study locations

Extending over 2,500,000 km2, French Polynesia includes 118 islands spread across five archipelagos: the Society Islands, Tuamotus, Marquesas, Austral Islands and Gambiers. We collected data across four archipelagos, including six distinct islands: Mo’orea and Manuae (Society Islands), Hao and Mataiva (Tuamotus), Mangareva (Gambiers), and Nuku Hiva (Marquesas) (Fig. 2). All fishes were collected in the lagoon and/or reef slope, depending on the accessibility of the respective habitats. Sea surface temperatures (SST) substantially varies around these six islands distributed across French Polynesia (Table 1).

### Sampling design

Fishes were collected from Mo’orea (March 2016, March 2018, July 2018, and November 2018), Manuae (December 2014), and Nuku Hiva (August 2016 and March 2017) by spearfishing and clove oil, while fishes were collected from Hao (March 2017 and July 2017) and Mangareva (June 2018) only by spearfishing. Additional fishes from Mataiva were bought at the fish market in Tahiti. All applicable international, national, and/or institutional guidelines for the care and use of animals were followed.

#### Taxonomy and systematics

Fishes were identified using Bacchet et al.25 and Moore and Colas26.

#### Permits

Sample collection was permitted by the French Polynesian government (authorization number: 681MCE/ENV).

### Research methods

#### Field/Laboratory

In the laboratory, total length (TL) was measured to the nearest millimeter, and fishes were weighed to the nearest 0.1 grams. Then, pairs of sagittae (the largest otoliths of the inner ear) were extracted, cleaned with distilled water, dried, and stored in microtubes.

For each species, otoliths were cut transversely, using a diamond disc saw (Presi Mecatome T210) to obtain a section of 500 µm. Sections were then fixed on a glass side with thermoplastic glue (Crystalbond TM). Small otoliths were directly embedded in the thermoplastic glue and polished to obtain a transversal section. Otoliths were sanded with abrasive discs of decreasing grain size (2,400 and 1,200 grains cm−2) and polished with a 0.25 µm diamond suspension to reach the nucleus. All sections were photographed under a Leica DM750 light microscope with a Leica ICC50 HD microscope camera and LAS software (Leica Microsystems). When sections were too large for a single photograph, multiple photographs were taken and assembled with the software Photostitch (Canon).

A standardized transect across the otoliths (from the nucleus to the edge) was chosen for each species. On this transect, fish age was estimated and distances between annual growth increments were measured using the software ImageJ (Supplementary File 1). The age estimation was performed twice by two independent researchers to prevent biases induced by a single observer. When the coefficient of variation between the two observers was greater than 5%, a common reading was assessed for each section21.

#### Back-calculation

We then used a back-calculation procedure24 to estimate fish length at previous ages, which we modified to also quantify the uncertainty around the obtained length estimates. This method requires an examination of the shape of the relationship between the length at capture (Lcpt) and the radius of the otolith at capture across all samples (Rcpt) as follows:

$${L}_{cpt}={L}_{0p}-b{R}_{0p}^{c}+b{R}_{cpt}^{c}$$
(1)

where L0p and R0p are the fish size and radius of the otolith at hatching. The regression parameters b and c were estimated by fitting Bayesian models with RStan27. We used informative priors for both parameters [b ~ normal (200, 200) and c ~ normal (1, 1)].

For some individuals, it was not possible to measure the R0p value. Nevertheless, these individuals were still included in the back-calculation model. To do so, we included all missing R0p values as parameters in the model that are estimated in the posterior28. Specifically, these missing R0p values were simultaneously modelled with the known R0p values, so that their prior distribution was defined by the distribution of the known R0p values. These prior distributions were then updated with the information provided by the aforementioned relationship (Eq. 1). Consequently, each missing R0p value had a unique posterior distribution.

For all 4,000 iterations used to fit the models, we used parameters b and c (Eq. 1), to then quantify another parameter, the parameter a, combining both (Eq. 2).

$$a[i]={L}_{0p}-b\times {R}_{0p}{[i]}^{c}$$
(2)

Next, the back-calculation with the Modified Fry (MF) model (Eq. 3)29 was applied to quantify fish lengths at all ages for each individual, using parameter a for each iteration.

$${\rm{MF}}\,{\rm{model}}:{L}_{i}=a+exp\left(ln\left({L}_{0p}-a\right)+\frac{\left[ln\left({L}_{cpt}-a\right)-ln({L}_{0p}-a)\right]\left[ln\left({R}_{i}\right)-ln({R}_{0p})\right]}{\left[ln\left({R}_{cpt}\right)-ln({R}_{0p})\right]}\right)$$
(3)

where Li and Ri are the fish length and otolith radius at age i, L0p and R0p are the fish size and radius of otolith at hatching. L0p is provided for each species (Online-only Table 1).

We calculated Li for the species that had sufficient replicates, and when possible also per species in each location separately. The estimation of parameters b and c (Eq. 1) required at least two values of R0p, so the back-calculation was not carried out when only one R0p was available for a given species (or a given species in a certain location).

Individuals with estimated age at capture of one year where not used for back-calculation.

Finally, we reported the averages and standard deviations of those length estimates based on the 4,000 iterations. As such, the back-calculated estimates include a measure of uncertainty that can be integrated in the future applications.

#### Von bertalanffy growth curves

The Von Bertalanffy growth model (Eq. 4) is the most frequently used model to describe fish growth. This model is defined as:

$$Lt={L}_{\infty }\left(1-{e}^{-K\left(t-{t}_{0}\right)}\right)$$
(4)

where Lt is the average length at age i, L is the asymptotic average length, K is the growth rate coefficient, and t0 is the age when the average length was zero. In order to validate the accuracy of our back-calculated size-at-age data, we compared growth curves fitted with raw data (total length at capture and estimated age at capture) to those fitted with back-calculated data. As back-calculated size-at-age data within individuals are highly auto-correlated, we designed a Bayesian hierarchical model that takes this auto-correlation into account by fitting individual growth curves as well as an average population-level growth curve. The model was applied on back-calculated data with at least five individuals and for individuals with an age at capture that was greater than two years.

We fitted models both for each species and for each species per location. In all models, we used informative priors for growth parameters extracted from FishBase (https://www.fishbase.se/search.php). We ran models with 2,000 iterations and a warmup of 1,000. When the $$\widehat{R}$$ was above one, indicating non-convergence of the Markov Chains Monte Carlo (MCMC), we ran models again augmenting iterations to 4,000 with a warmup of 2,000. If despite that, model convergence was still not achieved, we use MCMC chain plots of the model parameters to remove the individual(s) responsible for non-convergence.

As a comparison, we also ran a general non-linear Bayesian model on the raw data (i.e. using size and age at capture only). Back-calculated data contains more points (multiple points for each individual) than raw data (one point by individual), so the comparison was limited to the species with a sufficient number of individuals (n > 10) and age range in the raw data. These models were run using the package brms30.

All analyses were done with the software R v.3.6.331 and the packages rstan (2.19.3), tidyverse (1.3.0)32, plyr (1.8.6)33, rfishbase (3.0.4)34, and brms (2.13.0)30.

## Data Records

The dataset is publicly accessible in the permanent figshare repository (https://doi.org/10.6084/m9.figshare.12156159.v5)35. This dataset consists of:

1. 1.

855 individuals from 51 fish species in 15 families collected across six locations in French Polynesia,

2. 2.

Fish total length and weight (when measured) for each individual,

3. 3.

Age estimations and back-calculated size-at-age for each individual, by species (45 species and 710 individuals) and by species across multiple locations (44 species, and 669 individuals).

## Technical Validation

• The validity of fish names and families were verified on the World Register of Marine Species (WoRMS; http://www.marinespecies.org/index.php) and FishBase (https://www.fishbase.in/search.php).

• Each otolith was read twice by two readers to limit observer biases for age estimations. When the coefficient of variation between observers was greater than 5%, a common reading was assessed for each section21. Moreover, for each species, we provide a photograph of an otolith section with annual increments and reading axes (Supplementary File 1).

• To validate the accuracy of back-calculated data, growth curves fitted on back-calculated size-at-age were compared to those from raw data (total length at capture and estimated age at capture) (Fig. 3). This comparison was not possible for a species when the number of collected individuals was too low to fit a growth curve. Comparisons were possible for fifteen species, and for each of them, the 95% credible intervals overlapped between growth curves fitted on back-calculated data versus raw data, suggesting negligible differences between the two approaches (Fig. 3). Moreover, for all species, the curves from back-calculated data were always below those from raw data, indicating no overestimation of L. Further, because the back-calculated lengths also include the lengths at age zero, the length at hatching is more realistically represented in the regression models from the back-calculated data. Consequently, when using back-calculation, estimates of K tend to be higher and L tend to be lower. The Von Bertalanffy growth parameters from our back-calculated size-at-age data by species and species across multiple locations are available online (Online-Only Table 2).

• Further, Von Bertalanffy growth parameters estimated from otoliths were extracted from published articles, book chapters, reports and Ph.D. theses and compared to back-calculated parameters from our study (Online-only Table 3). For most species, the growth parameters from our study were similar to those in the literature. Differences may stem from different geographical locations (different temperatures, primary productivity, etc.), the number of analyzed fishes, different length measurements (standard, fork, or total length), or variations in modeling approaches.

• Finally, we compared our age estimates to the maximum ages reported in the literature (Online-only Table 3). Comparisons were possible for species with available data (seventeen species). Only five species were above the maximum reported age (Caranx melanpygus, Cephalopholis urodeta, Chlorurus spilurus, Epinephelus merra, Plectropomus laevis).

## Usage Notes

The dataset is provided as a csv file, which can be directly used by most statistical software. It contains eighteen variables, as described in Table 2. Additional growth parameters can be obtained by fitting other growth models (e.g. Gompertz model) using the variables ‘Agei’ and ‘Li_sp_m’ (species across all locations) or ‘Li_sploc_m’ (species by location).

Back-calculated data are highly auto-correlated, so we recommend using a hierarchical structure to fit growth models.

Within the dataset, ‘NA’ indicates a missing value. Missing values are present for the variables ‘Ri’ (n = 387), ‘R0p’ (n = 2,811), ‘Li_sp_m’ (n = 410), ‘Li_sp_sd’ (n = 410), ‘Li_sploc_m’ (n = 757), ‘Lp_sploc_sd’ (n = 757), and ‘Weight’ (n = 603). For the variable ‘Ri’, missing values correspond to individuals for which it was impossible to estimate the radius at hatching from photographs. The ‘R0p’ values correspond to ‘Ri’ values where ‘Agei’ is equal to zero. Because the ‘R0p’ value is the same for all ‘Agei’ of a given individual (‘ID’), a large number of NAs arises as soon as the ‘Ri’ value is missing (where ‘Agei’ is equal to zero). For the variables ‘Li_sp_m’, ‘Li_sp_sd’, ‘Li_sploc_m’, and ‘Lp_sploc_sd’, missing values correspond to values with insufficient numbers of individuals or known ‘R0p’ measurements to accurately fit the Bayesian back-calculation model. The number of NAs for the variables ‘Li_sp_m’ and ‘Li_sp_sd’ (estimates by species) is lower than the number of NAs for the variables ‘Li_sploc_m’ and ‘Lp_sploc_sd’ (estimates for species by location). Finally, for the variable ‘Weight’, missing values are the result of missing sampling measurements.