Phylogeny and herbivory are related to avian cecal size

Avian ceca, a pair of blind sacs arising from the junction of the ileum and colon, are homologous to the cecum in mammals. Cecal size is hypothesized to depend on dietary proclivities and pressures, with faunivorous species having short ceca, whereas herbivorous species have long ceca. Previous tests of this hypothesis, however, did not account for phylogenetic pseudoreplication among closely related taxa. We collated published data on cecal length, dietary category, flying ability, and body mass from 155 avian taxa. Character states were mapped onto a phylogenetic framework, and the permutation tail probability test was used to detect phylogenetic signal in each character. Phylogenetic signal is significant among the characters. As with the cecoappendicular complex in mammals, closely-related birds tend to have similar cecal length. To account for phylogenetic pseudoreplication, we performed phylogenetic generalized least squares regression on cecal length and body mass with dietary category, superordinal-level clade, and flying ability as cofactors. The best-fitting regression model supports the dietary hypothesis for the avian cecum. Among sampled birds of comparable body mass, mean cecal length is significantly longer in herbivorous species than in carnivorous ones (p = 0.008), presumably allowing the extraction of nutrients without the burden of fermenting bulky masses of dietary fiber. Exceptions to this trend, however, suggest that avian ceca are functionally complex and may have additional roles in water balance and nitrogen recycling.


Results
Assessment of phylogenetic signal. In this sample of 155 avian taxa (Table S1), values of Pagel's lambda for log-transformed body mass and log-transformed cecal length approximate one (p < 0.001), consistent with a close fit to Brownian motion evolution (Table 1). In addition, the permutation tail probability (PTP) tests on categorical variables reveal that none of randomly permuted trees had fewer steps than 46 for dietary category and 13 for flying ability (Fig. 1). Therefore, these tests confirm that each continuous and categorical character shows significant phylogenetic signal (p < 0.01). phylogenetic comparative methods. Regression analysis reveals that models with phylogenetic weighting have substantially stronger evidential support than those without weighting (i.e., OLS: ordinary least squares).
The AIC values of simple, additive, and interaction (multiplicative) models with phylogenetic weighting are substantially smaller (better supported) than corresponding OLS models ( Table 2). This result is broadly consistent with assessments using Pagel's lambda and the PTP test, and suggests that the data (specifically their residual values) have significant phylogenetic signal.

Categorical characters
Dietary category -0.001 Table 1. Phylogenetic signal in sampled categorical and continuous characters. All characters were found to have a highly significant phylogenetic signal. Continuous characters were assessed using Pagel's Lambda (λ). Categorical characters were evaluated using permutation tail probability (PTP) tests.
An alternative model with Ornstein-Uhlenbeck-weighting (α = 0.172) and flying ability as an additional additive cofactor has substantially less support than the best model ( Table 2: ΔAIC = 3.3). At least in the current sample of birds, flying ability does not significantly affect cecal length (p > 0.051).
We could not assess a multiplicative model with different slopes for clade, dietary category, and flying ability due to the small sample size of some categories. However, multiplicative models, each with a single cofactor (either clade, dietary category, or flying ability) generally have less support than corresponding additive models with a single cofactor ( Table 2).

Discussion
Cecal length, body mass, dietary categories, and flying ability each show phylogenetic signal. The presence of phylogenetic signal in these characters recommends caution when interpreting previous analyses of these characters based on traditional non-phylogenetic techniques. That stated, our results support the hypothesis linking long ceca to herbivory 4 . Phylogenetic regression demonstrates that when accounting for body size and clade, herbivory and relatively long ceca are correlated. As suggested in Fig. 3, herbivory evolved independently in Galloanserae and Palaeognathae; in both clades, herbivorous species tend to have long ceca (color-coded green to red). However, exceptions to the general trend (e.g., Opisthocomus hoazin and Fulica americana, and Meleagris gallopavo) suggest that for some species, additional factors other than dietary category may influence cecal length. For example, experimental studies in quail and grouse have shown that ceca elongate as a response to changes in food consumption rates rather than in fiber content 29,30 . The ceca filter large volumes of food, selecting the fibrous indigestible fraction for frequent excretion while retaining the nutrient-rich liquid fraction for additional processing and absorption. In this way, ceca may be an avian adaptation for efficient processing of ingested food 29 .
Alternatively as suggested by DeGolier and colleagues 4 , avian ceca may correlate with water balance and nitrogen recycling. To our knowledge, no phylogenetically-informed analyses have tested the water-balance and nitrogen-cycling hypotheses. Whereas herbivorous species are predicted have large ceca to filter and absorb the nutrient-rich fraction from bulky indigestibles, carnivorous species may also benefit from these organs, which may further process uric acid that forms as a waste product of high protein consumption. Thus, there may be several adaptive pressures selecting for large ceca and herbivory may simply be just one of them.
Interestingly, avian ceca show similar functional and evolutionary patterning to the mammalian cecoappendicular complex. Smith and colleagues 5,6 tracked cecoappendicular evolution across mammals, and found no correlation between dietary category and any of the variables associated with the cecum or appendix, including appendix size, appendix presence, cecal morphology, or cecal size. Therefore, they concluded that dietary proclivities alone are not driving cecoappendicular evolution in mammals 5,6 , just as we have shown that diet alone is not driving cecal evolution in birds. Instead, both the mammalian cecoappendicular complex and avian colic ceca demonstrate significant phylogenetic signal, indicating that behavioral or body size characters are not independent of ancestry. Factors other than diet affect cecoappendicular size and shape, and this is likely true for birds as well. For example, accommodation also plays a role in determining appendix morphology, such that the appendix can change in size and histological composition throughout an individual's lifetime. In humans, for example, the appendix reduces size and changes shape with age, due to loss of lymphoid tissue [31][32][33] . Future studies could investigate how heritable cecal accommodation is in birds to determine whether its role in the evolution of avian cecal morphology.
Previous studies have hypothesized that the constraints of flight may have led to reduced cecal size and fermentation capabilities in flighted birds 3,34 . Our analyses did not detect a correlation between cecal length and flying ability across the sample, suggesting that flight is not an inherently limiting factor for cecal length. It is www.nature.com/scientificreports www.nature.com/scientificreports/ possible, however, that other measures of cecal size and capabilities not included here, such as cecal volume, may be the variable limiting flight.

sampling.
We used the framework of a recently published avian phylogeny, which is based on conserved regions in 259 nuclear genes across 198 avian species 28 . Dense taxonomic sampling of non-passerine birds clarifies controversial relationships, particularly deep nodes towards the base of crown-group Aves 27,28 . For superficial nodes, we relied on the topology of a recent supertree containing approximately 5000 species 26 . This supertree  www.nature.com/scientificreports www.nature.com/scientificreports/ was assembled using source trees that were recovered mostly from molecular data, with cytochrome b sequencing being the largest contributor (38.9% of source trees). Far fewer source trees came from morphological characters, with 0.004% of source trees based on digestive morphological data. A shortcoming of the supertrees is the lack of accurate branch-length information. In lieu of this information, we estimated branch lengths in our  www.nature.com/scientificreports www.nature.com/scientificreports/ composite phylogenetic topology using Pagel's method 35 , which generates arbitrary ultrametric (time-proportional) branches. Assembly of the composite phylogeny was performed using Mesquite Phylogenetic Software 36 (v. 3.51).
Onto these phylogenies, we mapped morphological, dietary, and flight ability data from 155 avian species 4,37-46 (Table S1). Morphological data consist of cecal length and body mass, both of which have strong right-skewed distributions that violate normality assumed by many statistical analyses. To better approximate the normal distribution, body mass and cecal length data were log-transformed. A small constant (0.1 cm) was added to cecal length data before log-transformation to avoid calculating undefined values. Primary dietary category was assigned for each species using cadaveric dissection and analysis of stomach contents (e.g., herbivore-greens, including the leaves of aquatic and terrestrial plants, comprise at least 50% of the stomach contents) 4 . In the absence of stomach contents, data from the literature were used. Primary dietary category was coded as a multi-state factor (herbivore, carnivore: aquatic invertebrates, carnivore: vertebrates, granivore/frugivore, insectivore, nectarivore, and omnivore). Flight ability was coded as a multi-state factor (strong flyer, weak flyer, flightless). All data collected and analyzed during this study are included in this published article (and accompanying Supplementary Information files, Data S1 and Table S1).
Assessment of phylogenetic signal. Comparative data as described above are results of evolution and phylogenetic processes. Such data may show strong phylogenetic signal and violate the assumption that data points are statistically independent as required in many traditional statistical techniques (e.g., pairwise comparisons or regression) 22 . There are different methods to test for phylogenetic signal in continuous and categorical characters.
We assessed phylogenetic signal in continuous characters such as log-cecal length and log-body mass using the package "phytools" 47 written for R 48 . This package implements Pagel's lambda, which estimates the transformation needed to fit a Brownian phylogenetic model to the character data 49 . Pagel's lambda typically ranges between zero (characters are independent of phylogeny) and one (characters evolved following Brownian motion); lambda greater than one is possible if traits are more similar than expected by Brownian motion. Unlike www.nature.com/scientificreports www.nature.com/scientificreports/ alternative measures of phylogenetic signal (e.g., Blomberg's K), Pagel's lambda seems robust to phylogenies with suboptimal branch-length information 50 as is the case with the composite avian phylogeny. The null hypothesis (no phylogenetic signal) was rejected if the p-value was less 0.05.
To assess phylogenetic signal in categorical characters (diet and flight), we performed a permutation tail probability (PTP) test 51 using Mesquite 36 . The minimum number of evolutionary changes (steps or tree length) required to explain the distribution of each character across the original tree was recorded. Two sets of randomized trees were generated using the settings uniform (yule) speciation (with a tree depth of ten) and parsimony character steps. The first set included 999 permutations and the second included 9,999 permutations. Character data were mapped onto the random topologies in each simulation and the number of evolutionary steps was recorded. This procedure permits the character distributions across the real phylogenetic tree to be compared to a random distribution, in order to determine whether the actual character distribution was significantly different than would be expected by chance alone. To calculate a p-value for each character, we counted the number of permutations with a tree length less than or equal to the observed length and divided by the total number of permutations 52 . Note that if none of the permutated data sets were as short or shorter than the observed values, we reported the p-value as 10 divided by the number of permutations-that is the probability of a Type I error is very small but not zero 53 . The null hypothesis (no phylogenetic signal) was rejected if the proportion of permutated data sets with tree lengths as short or shorter than the original data set was less than 0.05 54 . phylogenetic comparative methods. We assessed relationships among cecal length, body mass, and various factors, using increasingly complex sets of regression models in the following order. In the first set, we evaluated log-transformed cecal length (dependent variable) regressed on log-transformed body mass (independent variable)-simple allometry. The second set consists of regressions in which categorical factors (i.e., dietary category, superordinal-level clade, and flying ability) were modeled as additive effects in isolation as well as in combination. The final set consists of models with the categorical factors as interactive (multiplicative) effects. Unfortunately, small sample size in various categories precluded the evaluation interactive effects beyond isolated factors.
Regressions were performed using phylogenetic generalized least squares (PGLS) 24,25 with the "ape" 55 and "nlme" 56 package in R 48 . PGLS uses weighting to "correct" for non-independence of residual variation among data points 57 . Weighting is in the form of a variance-covariance matrix based on the sample phylogeny and model of evolution. Three models of evolution are commonly assumed in comparative analyses: (1) no evolution-all taxa are statistically independent 23 ; (2) Brownian motion evolution-variance among related taxa increases linearly with time 58 ; and (3) Ornstein-Uhlenbeck evolution-variance among related taxa increases exponentially with time 59 . We repeated PGLS regressions using each method of weighting.
Regression models differ in their complexity and fit to the data, so we compared them using Akaike's Information Criterion (AIC). In general, the best supported model has the lowest AIC value 60 . Relative support between the best model and alternative models was assessed with difference (ΔAIC) values. Alternative models with ΔAIC values of at least 3, which is equivalent to a p-value of 0.051 61 , were rejected as substantially weaker than the best model.