## Introduction

Body size is one of the most prominent sexually dimorphic traits in animals. Squamate reptiles, an important group of vertebrates with a huge variability in body size, represent no exception. In many lineages of squamates, even closely related species may differ in the magnitude and also in the direction of sexual dimorphism in body size1,2,3,4. Despite several studies, the proximate mechanism allowing this notable evolutionary plasticity in sexual dimorphism in body size is still poorly understood and several hypotheses on the proximate mechanisms responsible for the ontogeny of sexual dimorphism in body size have been suggested for squamates2,3,4,5,6,7.

In our set up, we tested the effect of castration with and without T replacement via application of exogenous T in males and the effect of the application of exogenous T in females on growth curves, final body size and casque size. This design allows us to test the contrasting hypotheses on the control of the growth of skeletal structures. The male androgen hypothesis predicts that the removal of gonads in males will demasculinize their growth and that the effect will be reversed by the application of exogenous T in male castrates. In females, this hypothesis predicts masculinization of growth after treatment by exogenous T. On the other hand, the ovarian control hypothesis predicts that gonadal removal and castration followed by the application of exogenous T will have little effect on structural growth in males, while exogenous T will defeminize growth in females.

## Materials and Methods

The veiled chameleon, Chamaelo calyptratus, is an arboreal lizard native to southern Arabia25. It is a popular pet and has also become a well-studied laboratory reptile for physiology, developmental biology and behavioural ecology26,27,28,29,30,31.

The experimental animals were obtained from a private breeder at the age of two days. They were the progeny of two related females. We housed them individually in standardized plastic boxes containing tree branches for climbing. The ambient temperature ranged between 23 to 26 °C with a basking spot of around 34 °C produced by UVB bulbs (Exo Terra Reptile UVB100) during the light phase of the day (12 hours). The chameleons were fed to satiety every day with crickets (Gryllus assimilis) dusted with vitamins (Roboran, Univit, Olomouc, Czech Republic) and calcium (Vitacalcin, Zentiva, Prague, Czech Republic).

We determined the sex of individuals according to tarsal spur, a male-typical sexually dimorphic trait which is visible in males already at the time of hatching although this species generally reach sexual maturity not sooner than at sixth month of age30. At the age of one month we established three groups of males and two groups of females. Each group was balanced with respect to body mass and head length. Before any surgery or experimental manipulation, we assigned each group randomly to three treatments in males and two in females: Control males (sham-operated), Testosterone males (castrated males treated with T), Castrated males, Control females (sham-operated virgin females) and Testosterone females (sham-operated virgin females treated with T).

According to experimental design, the following surgery was performed in all (40) experimental animals, i.e., 8 individuals per treatment group, at the very young age of 40 to 43 days, which was long before reaching final body size or sexual maturity (see e.g. the arrow pointing to the time of surgery in the Supplementary Fig. 1, where the whole growth trajectory of all experimental animal is depicted). Prior to surgery, animals were anaesthetized by intramuscular injection of ketamine (Narkamon 5%, Spofa a.s., Prague, Czech Republic; applied twice in 15 min. intervals, together 300 μg/g of body mass) combined with hypothermia. The gonads were exposed via a lateral incision. Bilateral orchiectomy was performed on Castrated males and Testosterone males by ligating each testis with surgical silk, then ablating and removing it. For the remaining groups, “sham” surgeries were performed, where the gonads were exposed via incision but remain intact. The incision was closed using Prolene surgical suture (Ethicon INC, Somervile NJ, USA) and covered with Glubran 2 surgical glue (GEM S.r.l., Viareggio, Italy). The stitches were removed within two weeks after the wound had healed sufficiently and at this time the cutaneous application of oil-diluted T commenced in groups with T treatment7,12,13,14. Briefly, 0.25 μg of crystalline T (Sigma Aldrich) per gram of body mass dissolved in pharmaceutical quality sunflower oil was applied to the skin on the back of the casque of each experimental individual twice a week at regular intervals (every 3 to 4 days). The mixture was absorbed into the skin within several hours. At the same time, pure sunflower oil was applied in the same way to Control males, Control females and Castrated males. Experimental animals were weighed twice a week for the mass-specific estimation of hormonal or placebo dosage. In total, hormonal manipulation or placebo treatment proceeded 82 following weeks.

Head measurements i.e., head length (HL, measured from the tip of the snout to the mandibular joint), head-casque height (HCH, measured from the lowest part of the mandibula to the top of the casque) and head height (HH, measured from the lowest part of the mandibula to the superciliary arc) were taken using callipers every two weeks during the period of the fast growth (first 26 weeks), then every four weeks. Not to stress the animals with excessive manipulations too often, the leg length (LL, length of hind tibia and tarsus) was measured every second week during the period of fast growth (first 26 weeks) and later again from the 52nd week of age in four-week intervals. Snout-vent length is often used as a measure of structural body size in squamates; however, it was not included here due to being extremely difficult to achieve in chameleons, especially in larger individuals. All Control females and Testosterone females laid clutches of unfertilized eggs during the experiment. Chameleons have large clutch sizes32 causing dramatic fluctuations in body mass according to the reproductive stage in experimental females and so we did not include growth in body mass in our analyses. The growth experiment was terminated within the 89th week of age, long after the growth of all experimental animals had slowed considerably, which was observed c. 50 weeks after surgery (e.g., Supplementary Fig. 1). Prior to the last measurement, the social behaviour of the animals was recorded for another project (up to eight 20 minute interactions per individual).

Originally, each treatment group consisted of 8 individuals. Unfortunately, some animals died before the end of the growth experiment. Seven of these (i.e., one Control male, two Testosterone males, one Castrated male, one Control female and two Testosterone females) died suddenly from unknown reason within the age of 18 to 48 weeks. We found no differences in growth rate when compared to the living members of given experimental group and we observed no indication of reduced viability before they deceased. However, these seven prematurely deceased animals were excluded from all statistical analyses. After their exclusion, experimental groups did not differ in body mass (mean of 4.61 grams) and LL (mean of 10.81 mm), taken at the time of surgery (ANOVA: F4,28 < 0.71, p > 0.59 for both cases) but were significantly different in HL (ANOVA: F4,28 = 2.97, p = 0.036) which was slightly larger in Control males (mean of 12.96 mm) than in Testosterone males (mean of 12.05 mm) and Testosterone females (mean of 11.63 mm) (Post-hoc Fisher tests: p < 0.029 for comparisons between these groups). In all other groups, HL was comparable at the time of surgery (Post-hoc Fisher tests: p > 0.07), as the mean HL for Control females was 12.26 mm and for Castrated males 12.31 mm. The relative head shape represented either by HCH or HH with HL as a covariate, was comparable among all treatment groups at the time of surgery (full-factorial ANCOVA: differences neither in interaction, nor in factor group, F4,23 < 1.12, p > 0.37 in all cases).

Another five animals (two Control males, one Testosterone male and two Control females) died relatively close to the final measurement (at the age of 63–72 weeks). Veiled chameleons are relatively short-living reptiles with average life span of two years30. All these five animals died long after they exceed the first year of life. Their growth also did not depart from the common trend for given experimental groups (Supplementary Fig. 1). Therefore, due to all circumstances, their death can be tentatively attributed to the natural senescence. Moreover, as the original growth data were already sufficient for the analyses we were able to estimate asymptotic HL and LL and, with one exception (one Testosterone male), we were also able to collect their blood plasma for subsequent treatment validations. The experimental treatment of this Testosterone male was verified by behavioural assay with female stimulus, where he followed the behaviour of other males with high Testosterone level. Namely, this male performed courtship behaviour towards stimulus female while none of castrated males did (own unpublished data). The final number of experimental animals in each experimental group was: seven Control males, six Testosterone males, seven Castrated males, seven Control females and six Testosterone females.

Prior the end of the experiment, we took blood plasma from the tail blood vessel in each of the experimental individuals for hormonal treatment verification. Circulating plasma levels of estradiol (E2), androstenedione (AD), dihydrotestosterone (DHT) and T were measured at the Institute of Endocrinology (Prague, Czech Republic). Liquid chromatography-tandem mass spectrometry (LC-MS/MS) after Sosvorova et al.33 was used for the detection of AD and T. Therefore, the results should not be biased by possible cross-reaction with other androgens34. Briefly, the method consists of plasma extraction with diethyl ether followed by the appropriate derivatization step (to enhance detection responses of steroids in the MS) and separation using the ultra-high liquid chromatography Eksigent ultraLC 110 system (Redwood City, CA, USA). Detection of analytes was performed on an API 3200 mass spectrometer (AB Sciex, Concord, Canada) with the electrospray ionization probe operating in a positive mode. Their quantification was determined using calibration curves based on known analyte concentrations. The limits of detection were 0.01 ng per ml. For DHT, the standard radioimmunoassay (RIA) protocol after Hampl et al.35 was used. The method consists of extracting plasma with diethyl-ether followed by a radioimmunoassay using rabbit polyclonal antiserum to dihydrotestosterone-7-(carboxymethyloxime) bovine serum albumin conjugate, and [3H]DHT. Selective oxidation with potassium permanganate was applied to the sample to eliminate T to prevent cross-reaction with the antiserum. Intra-assay and inter-assay coefficients of variation for the analyses are typically 17.1% and 17.7%, respectively. The limit of detection of the assay was 0.001 ng per ml. E2 was assayed using a commercial estradiol RIA kit A21854 (LOT No. 180607) from Beckman Coulter, Brea, California, USA, with a declared detection limit of 0.006 ng per ml. As the levels of hormones were measured in three different ways (one LC-MS/MS, two RIAs), it required a relatively high volume of blood plasma. Therefore, in several cases, the whole plasma volume obtained from a single individual was not sufficient to perform E2 measurements.

Because lizards are characterized by indeterminate growth (i.e., growth does not stop after sexual maturation), the reliable estimation of sexual dimorphism in body size independent from the effects of age requires knowledge of growth curves in both sexes1. For comparison of the final size among groups, we used asymptotic size in HL and LL estimated from the von Bertalanffy asymptotic growth model:

$$L=a(1-{e}^{-k(t-{t}_{0})})$$

where a is the asymptotic length L, e is the base of the natural logarithm, k is the rate of approach to asymptotic length, t is age (in days), and t0 is the hypothetical time at length zero. This model has previously been used to successfully describe the growth pattern in many lizards1,7,16,17,36,37. As asymptotic size is an estimation based on fitting the growth curve to multiple measurements, it is much less sensitive to errors of a single measurement. Moreover, using asymptotic size allows the comparison of animals of different ages, which in our case permitted the inclusion of five individuals which died shortly before the end of the experiment. The Shapiro-Wilk test was applied to detect any departure from normal distribution. When the null hypothesis of normal data distribution was rejected at α = 0.05, non-parametric tests were used. We used one-way ANOVA for comparison of values of asymptotic HL among treatment groups and successive Post-hoc Fischer LSD tests to reveal differences between treatment groups. Values of asymptotic LL and hormone levels departed from normality, therefore, Kruskal-Wallis ANOVA with Post-hoc Conover pairwise multiple comparison test were used for the comparison among groups. In addition, we compared static allometries in relative head shape among groups using full-factorial ANCOVA with either HCH or HH as the dependent variable, HL as the continuous predictor (covariate) and group identity as the categorical predictor. We also analysed differences in ontogenetic allometries in casque size (expressed by HCH) and HH on HL between groups by the linear mixed-effects model to account for repeated measurements of the same individual. The individual identity was built into the null model as the random effect to account for individual differences in intercepts of the ontogenetic allometries. We computed several models by adding to the null model treatment group as a fixed factor, HL only as a fixed continuous predictor, both group and HL as fixed predictors, and both group and HL and their interaction as fixed predictors. We compared the fit of these models using ANOVA and Akaike information criterion (AIC). When ΔAIC was <2, the models were considered equivalent and we selected the one with the less variables as the preferred model, while the more complex version was considered supported when ΔAIC > 238,39.

Statistical analyses were conducted using Statistica version 10.0 (StatSoft, Tulsa, USA), lme4 modul40 implemented in R project41, and BrightStat.com (©Daniel Stricker and scians, GmbH Switzerland). Graphs were prepared using GraphPad Prism (version 6.07; GraphPad Software, San Diego, USA) and MS Excel (MS office 365+).

We followed the national guidelines for the care and use of animals. The experiment was conducted with the approval of the Ethical Committee of Charles University and the Central Commission for Animal Welfare and the environment of the Czech Republic (permit number 10803/2016-2).

## Results

The hormone assays validated treatment of all examined individuals (Table 1; Supplementary Fig. 2). In some cases, the hormonal levels were below the detection limit. In comparisons of hormone levels among treatment groups, we assigned these cases with the value of the limit of detection for a given hormone. The T and DHT plasma levels differed significantly among treatment groups (Kruskal-Wallis ANOVA: T: H4,N=32 = 27.90, p < 0.001; DHT: H4,N=32 = 26.63, p < 0.001; Table 1; Supplementary Fig. 2A,B). T and DHT levels of Testosterone males and Testosterone females were the highest and comparable between these two groups (Post-hoc Conover test: p > 0.05). All other groups possessed significantly different levels of T and DHT (Post-hoc Conover test: p < 0.05 in all cases) with Control females having the lowest T levels. There were also significant differences among treatment groups in AD (Kruskal-Wallis ANOVA: H4,N=32 = 25.92, p < 0.001; Table 1; Supplementary Fig. 2C). The highest and comparable AD levels were in Testosterone males and Testosterone females (Post-hoc Conover test: p > 0.05), the lowest and comparable levels were in Castrated males and Control females (Post-hoc Conover test: p > 0.05) and intermediate in Control males (Post-hoc Conover test: p < 0.001 in all cases). All remaining comparisons among groups significantly differed (Post-hoc Conover test: p < 0.001 in all cases). The plasma levels of E2 did not significantly differ among treatment groups (Kruskal-Wallis ANOVA: H4,N=27 = 6.15, p = 0.188; Table 1; Supplementary Fig. 2D). Here, however, the sample size was smaller, as there was not enough plasma to accurately measure this hormone in five animals (two Control males, one Castrated male, one Control female and one Testosterone female). Nevertheless, females possessed higher E2 levels than males regardless of experimental treatment when the animals were grouped according to sex (Mann-Whitney U test: U = 44.00, p = 0.032). Despite the hormonal treatment, Control and Testosterone females laid one to five clutches of unfertilized eggs. Both female treatment groups did not differ in mean number of clutches laid during the experiment (t-test: t = −0.61; p = 0.556).

The asymptotic von Bertalanffy model applied to the original data explained 94.4 to 99.2% of variability for HL and 95.2 to 99.9% for LL in each individual, demonstrating the applicability of this growth model to both body-size measurements. The asymptotic HL did not statistically differ from HL at the final measurements (t-test for dependent samples: t = 2.00; p = 0.054) and the same was true also for LL (Wilcoxon matched pairs test: Z = 0.49, p = 0.62) showing that the animals had already reached the size close to their asymptotic size.

The treatment groups differed significantly in the asymptotic HL (ANOVA: F4, 28 = 12.59, p < 0.001; Fig. 1A). The experimental males of each three groups reached comparable asymptotic HL (Post-hoc Fisher test: p > 0.18 for comparisons between these groups) and their heads were significantly longer than in both female groups (Post-hoc Fisher test: p < 0.013 in all comparisons). Control females and Testosterone females attained comparable asymptotic HL (Post-hoc Fisher test: p = 0.15). Similarly, asymptotic LL differed significantly among treatment groups (Kruskal-Wallis ANOVA: H4,N=33 = 22.60, p < 0.001; Fig. 1B). Asymptotic LL was comparable among the three male treatment groups (Post-hoc Conover test: p > 0.05 for all cases) and between female treatment groups (Post-hoc Conover test: p > 0.05). In intersexual comparison, all male treatment groups always possessed longer legs than both female treatment groups (Post-hoc Conover test: p < 0.001 in all intersexual comparisons; Fig. 1B). Although we are aware that our sample sizes are relatively small, we should keep in mind that we are testing large effects reflecting pronounced differences in size between sexes. On average, Control females were 30% and 18% smaller in asymptotic HL and LL, respectively, in comparison to Control males. Castrated males were comparable in size to Control males being on average 6% smaller in the asymptotic HL and 2% larger in the asymptotic LL (Fig. 1). These size differences between Castrated males and Control males could differ significantly when larger sample sizes are considered; however, for the test of the masculinization hypothesis, there is important that these male groups are comparable in size and both are much larger than Control females. Planning the experiment, we knew that there is a very large sexual size dimorphism in the studied species with males being around 1/3 larger in linear dimensions in comparison to females. Our question was whether the manipulation with androgens would change the final body size, most importantly whether the castration would shift males to female growth trajectory, or whether castrated males would keep typical male growth, i.e. we tested a rather large effect. We selected so highly dimorphic species of chameleons to robustly test this large effect using even small sample size. The post hoc analysis of the effect size confirmed that our sample size is adequate for testing sexual dimorphism and its ontogeny in final body size in this species. Cohen’s d (>2.5) computed from our data using means, standard deviations and sample size for Control males and Control females in asymptotic head length (our proxy of body size) points to a large effect indeed and the estimated minimum sample size for getting significant results for testing two-tailed t-test between these two groups is eight individuals (four in each group). Given the large differences between males and females, our sample size (n = 14, i.e. seven in each group) is adequate to reach statistically significant differences at the levels of α = 0.05 and β = 0.8. Similar differences were observed in leg length, our alternative measure of body size. Therefore, it is not surprising that the differences between Control males and Control females were significant in our ANOVA analyses as we expected during designing of our experiment. At that time, we also assumed based on the results of long-term growth experiments in other lizards15,16,17 that the effect of castration would be small. The major test of the male androgen hypothesis is the comparison between Control males and Castrated males, comparison among all other groups are not so important. There was a risk that the effect of castration would not be so small as we expected and that the results would be difficult to interpret given our small sample size. However, the results confirmed that the effect of castration is indeed very small. Based on the calculation of effect size (Cohen’s d) between Control males and Castrated males in asymptotic head length, one would need a minimum sample size of 94 individuals (47 in each group) at α = 0.05 and β = 0.8 to reach a significant result in two-tailed t-test between these two groups. Moreover, mean in the asymptotic leg length is higher in Castrated males than in Control males, which points that the differences in means are likely random and strongly supports the interpretation that the effect of castration on structural body size is negligible. Based on the post hoc statistical analysis of the effect size, one would have to repeat our experiment with at least 250 individuals, likely much more, to reach statistically significant differences between Control males and Castrated males in the asymptotic leg length. However, these differences would be still small and would change little in the interpretation that Control males and Castrated males grow to very similar size. Our interpretation that male gonadal androgens are not responsible for the large differences in body size between male and female chameleons are thus robust even with the small sample size. The robustness is not based on non-significant differences between Control males and Castrated males, but on the negligible effect of the castration on size.

At the time of the cessation of growth (56th week after surgery, Supplementary Fig. 1), head shape, represented by HCH with HL as a covariate, was comparable among all treatment groups (full-factorial ANCOVA: neither interaction, nor group were significant, F4,23 < 0.51, p > 0.73 in both cases, Fig. 2).The similar head shape among groups was also found when HH (full-factorial ANCOVA: neither interaction, nor group were significant, F4,23 < 0.47, p > 0.76 in both cases) instead of HCH was used. However, ontogenetic allometries in HCH relatively to HL revealed a very different pattern. Control males and Control females differed greatly in the relationship between HL and HCH during ontogeny, with the males demonstrating much steeper ontogenetic allometry (mixed model ANOVA comparison of all models: differences in slope: p < 0.05, ΔAIC = 108.6; Fig. 3A). Castrated males and Testosterone males were similar in their ontogenetic allometries of casque size to Control males (mixed model ANOVA: differences in slope and intercepts between groups: p > 0.10, ΔAIC < 2.0; Fig. 3B), while Testosterone females possessed significantly steeper ontogenetic allometry in HCH than control females (mixed model ANOVA: differences in slope: p < 0.05; ΔAIC = 18.3), albeit not as steep as Control males (mixed model ANOVA: differences in slope: p < 0.05; ΔAIC = 48.7; Fig. 3C). There were no significant differences among treatment groups in the ontogenetic allometry of HH on HL (both group and HL-group interaction/ both group and HL, and group and HL and their interaction were not significant in mixed model ANOVA; ΔAIC < 2) supporting that the above differences in HCH reflect casque size.