Evidence against tetrapod-wide digit identities and for a limited frame shift in bird wings

In crown group tetrapods, individual digits are homologized in relation to a pentadactyl ground plan. However, testing hypotheses of digit homology is challenging because it is unclear whether digits represent distinct and conserved gene regulatory states. Here we show dramatic evolutionary dynamism in the gene expression profiles of digits, challenging the notion that five digits have conserved developmental identities across amniotes. Transcriptomics shows diversity in the patterns of gene expression differentiation of digits, although the anterior-most digit of the pentadactyl limb has a unique, conserved expression profile. Further, we identify a core set of transcription factors that are differentially expressed among the digits of amniote limbs; their spatial expression domains, however, vary between species. In light of these results, we reevaluate the frame shift hypothesis of avian wing evolution and conclude only the identity of the anterior-most digit has shifted position, suggesting a 1,3,4 digit identity in the bird wing.

shows diversity in the patterns of genetic differentiation of digits, although the anterior-most digit 23 of the pentadactyl limb has a unique, conserved expression profile. Further, we identify a core set 24 of transcription factors that are differentially expressed among the digits of amniote limbs; their 25 spatial expression domains, however, vary between species. In light of these results, we 26 reevaluate the frame shift hypothesis of avian wing evolution and conclude that only the identity 27 of the anterior-most digit has shifted position, suggesting a 1,3,4 digit identity in the bird wing. 28 29 30 Limbs evolved from paired fins in the Late Devonian, and early tetrapods possessed 31 more than five digits on the fore-and hindlimbs 1,2 . Later in the tetrapod stem, a pentadactyl 32 pattern stabilized as the ground plan for the limb. Individual digits are homologized between 33 species and between fore-and hindlimbs in reference to this pentadactyl ground plan 3 . However, 34 it remains controversial whether such hypotheses of identity correspond to distinct developmental 35 programs among the digits (developmental identities), or just the relative position of digits along 36 the limb's anteroposterior axis (positional identities) [4][5][6][7] . Below we use the symbols D1, D2, etc. to 37 indicate positional identities in the pentadactyl ground plan, rather than to indicate developmental 38 identities. 39 The anterior-most digit (D1) (e.g., human thumb) appears to have a distinct 40 developmental identity as compared to the more posterior digits (D2-D5). D1 is marked by a 41 unique gene expression profile-low expression of HoxD11 and HoxD12 and high expression of 42 Zic3 relative to other digits 7-9 -and it appears able to develop independently of Shh signaling 9-11 . 43 Additionally, analysis of morphological variation in primates identified a high degree of variational 44 independence of D1 relative to the more posterior digits 12 . Models of posterior digit identity have 45 been proposed according to the relative exposure of limb bud mesenchymal cells to Shh, which 46 emanates from the zone of polarizing activity prior to digit condensation 10,11 . However, broadly 47 conserved marker genes for individual posterior digits have not been identified in the interdigital 48 mesenchyme, the signaling center that patterns digits 13,14 . For instance, while the combinatorial 49 expression of Tbx2 and Tbx3 is necessary to generate the phenotypes of D3 and D4 in chicken 50 hindlimb 15 , it is questionable whether these developmental identities are conserved in other 51 species, like mouse, with limited morphological differentiation of the posterior digits. 52 Debates of digit homology are especially challenging to resolve when limbs have fewer 53 than five digits. This problem has been most actively investigated in the tridactyl avian wing, 54 because of the appearance of conflict between paleontological and developmental data 16 . The 55 fossil record of theropod dinosaurs shows a clear pattern of reduction of the posterior two digits in 56 the lineage leading to birds, yet digits in the wing have been described as developing in the 57 middle three positions of a pentadactyl developmental groundplan [17][18][19][20][21][22] . To explain this 58 discrepancy, the frame shift hypothesis was proposed 16 . It posited that a homeotic shift occurred 59 in the avian stem such that the developmental programs that were once expressed in D1, D2, 60 and D3 are now executed in the digits that develop in positions D2, D3, and D4 respectively. 61 Comparative analyses of gene expression have found support for this hypothesis: in situ 62 hybridization and transcriptomics have revealed similarity between the anterior digit of the adult 63 avian wing, which develops in position D2, and D1 of other limbs 7,23 . 64 To characterize the gene expression profiles of digits in pentadactyl amniote limbs, we 75 sequenced RNA of developing digits and their associated posterior interdigital mesenchyme from 76 the forelimbs of mouse, green anole (Anolis), and American alligator (Fig. 1 a). In each of these 77 species, hierarchical cluster analysis (HCA) and principal component analysis (PCA) of the 78 transcriptomes shows a weak signal of sample clustering by digit (Extended Data Fig. 1). The 79 strongest signals of digit-specific expression profiles are observed in D1 of mouse and D4 of the 80 alligator. Groupings of the other digit samples are not well supported. We hypothesized that this 81 result might imply that any signal of gene expression differentiation among digits is overwhelmed 82 by noise when all genes are considered, because most genes are likely irrelevant to the 83 developmental identity of digits. If such a signal exists, we predict that it will be reflected 84 preferentially in the expression of transcription factor and signaling genes. Therefore, we again 85 performed HCA and PCA on the samples of each species, this time using two gene lists: a 86 curated set of known limb patterning genes that are sensitive to Shh signaling (N=159) 24 , and 87 transcription factor genes (N=2183) 25 . 88 In mouse and alligator, HCA and PCA of known limb patterning genes results in 89 clustering of samples by digit ( Fig. 1 b, c). In mouse, D1 is strongly differentiated from the other 90 digits. In alligator, an anterior cluster, comprised of digits D1, D2, and D3, is differentiated from a 91 posterior cluster, comprised of D4 and D5. By contrast, analysis of known limb patterning genes 92 in Anolis shows weak clustering of samples by digits ( Fig. 1 d). This suggests a level of 93 homogeneity among Anolis digits that is not observed in either mouse or alligator. Analysis of all 94 transcription factors for these species yields comparable results to what is recovered for limb 95 patterning genes, but with generally lower adjusted uncertainty values in HCAs (Extended Data 96 To further test the hypothesis that there is limited gene expression differentiation among 98 Anolis digits as compared to the other pentadactyl limbs sampled, we took advantage of a result 99 from multiple testing theory 26 : If a differential expression analysis is conducted on two sample 100 types that are not genetically differentiated, then the resultant frequency distribution of p values 101 will be uniform within the [0, 1] interval. On the other hand, if there are truly differentially 102 expressed genes among the compared sample types, then the p value distribution is expected to 103 be biased towards p=0. We conducted differential expression analyses of adjacent digits of the 104 forelimbs of mouse, alligator, and Anolis using EdgeR 27,28 and inspected p value distributions 105 ( Fig. 2). In Anolis, all comparisons of adjacent digits result in p value distributions that are close to 106 uniform, suggesting that there is very weak, if any, genetic differentiation of adjacent fingers. We 107 note that this result is independent of any p value significance threshold or false discovery 108 correction method. By contrast, most of adjacent pairwise digit comparisons for mouse and 109 alligator show a strongly biased p value distribution, the exception being D2 and D3 in mouse. 110 c). Tbx2, which was previously shown to regulate posterior digit identity in the chicken hindlimb 14 , 148 shows divergent patterns of expression in the posterior digits of other species. Tbx3 differentiates 149 D3 from D4 in mouse, alligator, and chicken hindlimb, and the likelihood that it was recovered by 150 chance alone is 7.2x10 -6 (binomial test); however, it is not differentially expressed at this position 151 in Anolis forelimb. Tbx15 differentiates D4 from D5 among pentadactyl limbs (Fig. 3), and the 152 likelihood that it was recovered by chance alone is 1.9x10 -5 (binomial test). 153 Analyses aiming to identify genes that are conserved and differentially expressed at a 154 particular position within the limb (e.g., between D1 and D2 in mouse, alligator, and Anolis) can 155 be affected by the threshold stringency of the false discovery rate (FDR). Binomial tests, as 156 presented above, are one means of accounting for this. We present a second strategy for 157 assessing whether genes identified as differentially expressed in one species behave similarly in 158 other species that does not depend on a particular FDR threshold being reached in all species. 159 Specifically, we consider the genes identified as differentially expressed in one species between 160 adjacent digits (e.g., in mouse, 129 transcription factor genes are identified between D1 and D2). 161 Then we ask how expression fold change between the two digits in the original species compares 162 to expression fold change of the same genes and also a set of randomly selected genes of similar 163 expression levels in other species. To make these comparisons, we calculated Pearson's 164 correlation of the fold changes between the original genes versus each of the two gene sets 165 (orthologs and random genes) in other species. Results of this approach broadly mirror those 166 described, above. 167 Among the pentadactyl limbs sampled, genes differentially expressed between D1 and 168 D2 behave consistently between species and can be distinguished from random genes, and 169 comparisons of the more posterior digits do not clearly distinguish orthologs from random genes, 170 (Extended Fig. 5 a-d). If chicken hindlimb is considered instead of the Anolis forelimb, we again 171 obtain strong support for conserved behavior of genes at the position D1 and D2, weaker support 172 for conserved gene behavior between D2 and D3, and comparisons at the position D3 and D4 do 173 not clearly distinguish orthologs from random genes (Extended Fig. 5 e-g). Thus, testing for 174 genes that are differentially expressed at the same position can recover genes that behave 175 consistently across species (i.e., Tbx15 between D4 and D5 among pentadactyl limbs, and Tbx3 176 between D3 and D4 between mouse, alligator, and Anolis), while comparisons of all genes 177 differentially expressed for these species might not show evidence of broadly conserved profiles. 178 Conversely, while we might obtain modest evidence for shared behavior among differentially 179 expressed genes (i.e., between digits D2 and D3 among mouse, alligator, and chicken), there 180 might be no individual genes recovered as differentially expressed among the taxa at that 181 position. However, both types of comparisons between digits D1 and D2 paint the consistent 182 picture that D1 exhibits a shared digit identity across these limbs. 183 184 185 A core set of digit patterning genes 186 Given our result that the gene expression profile of digits is evolutionarily dynamic, we 187 next tested whether a conserved set of genes might pattern amniote autopods, albeit in different 188 spatial patterns. Specifically, we reanalyzed transcriptomic data of mouse, alligator and Anolis 189 forelimbs and chicken hindlimb, conducting ANOVA to test for genes that were differentially 190 expressed between any two digits in the limb, not just adjacent digits. This analysis recovers 191 genes that are differentially expressed between some digits in the limb, but it does not indicate 192 between which digits a gene is differentially expressed. The number of differentially expressed 193 transcription factor genes differs greatly among species: 356 in mouse, 377 in alligator, 34 in 194 Anolis, and 144 in the chicken hindlimb (FDR <0.05, Fig. 5 a). This is consistent with previous 195 results (above) that showed the Anolis forelimb to be more homogeneous than other sampled 196 limbs. Therefore, we focused on transcription factor genes that are one-to-one orthologous 197 between mouse, alligator, and chicken and identified a set of 49 genes that are differentially 198 expressed in these three limbs (Fig. 5 b). We call these conserved differentially expressed genes 199 (CDEGs). The expected number of overlapping genes among these sets by chance alone is 7.57, 200 and the probability of observing an overlap of 49 genes or more by chance is <10 -6 (binomial 201 test). Thirteen of the CDEGs are included in the list of limb patterning genes sensitive to Shh 202 signaling 24 . To assess whether this gene set is biologically meaningful, we performed HCA and 203 PCA on the samples of each species using the 49 CDEGs. In Anolis, we considered the subset 204 (n=42) that are one-to-one orthologs across all four species. In combination, CDEGs can produce 205 unique expression profiles of each digit within a limb (Fig. 5 c) and show patterns similar to those 206 generated by analyses of known limb patterning genes ( Fig. 1 b-d, Extended Fig. 3

c). 207
Analysis of amniote limbs showed that targeted gene lists generated either 208 experimentally (i.e., known limb patterning genes 24 ), by gene ontology (i.e., all transcription 209 factors 25 ), or statistically (i.e., 49 CDEGs), can reveal distinct gene expression profiles among 210 digits of a limb, which are not observed in the full transcriptome. The spatial digit expression 211 profiles of these genes, however, is species specific. In light of these results, we reevaluated the 212 frame shift hypothesis of bird wing origin 16 . 213 214 215

Reevaluating the frame shift hypothesis 216
The frame shift hypothesis predicts that the three digits of the adult avian forelimb, which 217 we refer to here as D2, D3, and D4 according to their developmental position 17-22 , will express the 218 developmental programs observed in the digits D1, D2, and D3 of other limbs 16 . This hypothesis 219 was tested previously by analyzing the transcriptomes of chicken fore-and hindlimb digits 7 . That 220 study found correspondence between forelimb D2 and hindlimb D1, consistent with the frame 221 shift hypothesis. However, correspondence of more posterior digits was not detected 7 . 222 We re-analyzed published transcriptomic data of digits from the chicken forelimb 7 and 223 compared them to digits of the chicken hindlimb. Surprisingly, when the 49 CDEGs are 224 considered, gene expression profiles of forelimb digits D2, D3, and D4 correspond to hindlimb 225 digits D1, D3, and D4, respectively (Fig. 6 a). Analyses of transcription factor genes and known 226 limb patterning genes show a consistent pattern (Extended Data Fig. 6). Similarity between the 227 posterior two digits of the chicken fore-and hindlimb (D3 and D4 in each limb) can also be 228 observed in the expression patterns of numerous individual genes that are known to be involved 229 in the patterning of digits (Fig. 6 b). 230 To assess whether spatial gene expression profiles can be conserved between the fore-231 and hindlimbs of a species, even when they differ in digit number, we performed in situ 232 hybridization in alligator. We evaluated expression of Tbx2, Tbx3, and Sall1, three transcription 233 factor genes identified as differentially expressed between alligator forelimb D3 and D4. Serial homologs are repeated body parts, generated by a common developmental 260 program. In the case of digits, chondrogenic condensations are generated by a reaction-diffusion 261 Turing-type mechanism 31,32 . Serial homologs can be developmentally identical (homomorph 262 parts) or they can assume distinct developmental identities through the differential expression of 263 regulatory genes (paramorph parts) 33 . The degree to which serial homologs are individuated can 264 be difficult to assess from morphology alone, because the same developmental program can lead 265 to different morphological outcomes depending on the developmental environment 34,35 . However, 266 detailed analyses of gene expression and regulation can identify developmentally individualized 267 body parts. 268 In this study, we performed a comparative analysis of whole genome expression data to 269 test the hypothesis that digits have conserved developmental identities. In interpreting our data, 270 we acknowledge that gene expression does not demonstrate gene function. Nevertheless, a lack 271 of differential gene expression between digits is evidence of a lack of developmental 272 individuation, and a high level of differential expression (particularly in transcription factor and 273 signaling genes) is evidence for distinct gene regulatory states. Although not all species were sampled at multiple time points, we argue on the basis of these 287 comparisons in chicken and alligator that it is unlikely our conclusions on the evolution and 288 development of digit identity are biased by temporal dynamism in gene expression within the 289 developmental window studied here. 290 Our analyses show that patterns of regulatory gene expression in digits are evolutionarily 291 dynamic (Fig. 7 a). The developmental identities of digits are evolving across amniotes and can 292 be lineage-specific. The exception is a conserved developmental identity that characterizes the 293 D1 of mouse, alligator and Anolis forelimbs, chicken hindlimb, and human fore-and hindlimbs 294 ( Fig. 4 b). This digit identity is unlikely to be an edge-effect (i.e., merely a corollary to which digit 295 occupies the most-anterior position in a limb). In the rabbit hindlimb, which has lost the digit D1, 296 this developmental identity is not observed in D2, despite that digit now occupying the anterior-297 most position in the limb 36 . Additionally, in the hindlimb of Silkie chicken mutants, which have 298 additional anterior digit on their foot, developmental identity is preserved in the digit of the 299 morphology of the native D1, despite that digit no longer occupying the anterior-most position in 300 the limb 23 . 301 In contrast to D1 we do not find support for conserved digit identities in the more posterior 302 digits. Among the pentadactyl limbs we studied, no genes consistently differentiate the median 303 digits (D2, D3, and D4) from one another. And when we consider the chicken hindlimb rather than 304 Anolis, because similarity among Anolis digits might be secondarily derived, we find no gene 305 differentiates D2 and D3, and only one gene (Tbx3) differentiates D3 and D4. There is limited 306 evidence for a conserved developmental identity for digit D5. A single gene (Tbx15) is 307 differentially expressed between D4 and D5 among mouse, alligator and Anolis, however more 308 genes are shared between just mouse and alligator ( Our analyses also identified a core set of regulatory genes, which we call CDEGs, that 310 are differentially expressed among digits, although species differ in which digits differentially 311 express the genes (Fig. 5). We propose that the CDEGs represent a "digit differentiation tool kit" 312 deployed for the individuation of different sets of digits in different lineages, depending on the 313 adaptive needs of the species. Between mouse and human, 28 the 49 CDEGs have 314 demonstrated roles in patterning distal limb skeleton (Extended Data Table 1). Of the CDEGs, 315 only 15 are differentially expressed across the Anolis forelimb. This homogeneity appears to be a 316 derived condition among the taxa sampled, as it is unlikely that the other 34 CDEGs reflect 317 homoplasy between mammals and archosaurs. 318 In Anolis most fingers, though they differ in number of phalanges, lack developmental 319 individuality and, thus, appear to be homomorphic. We consider a number of alternative, non-320 biological explanations for the unique Anolis pattern; however, these do not adequately explain 321 homogeneity in the data. For example, it is possible that is the limbs were sampled at too-late a 322 stage, after signals pertinent to digit patterning were expressed. We regard this explanation as 323 unlikely because, as discussed above, in limbs sampled at multiple time points gene expression 324 profiles are stable over broad developmental window, through late stages of development. 325 Another possible alternative explanation is that variance among Anolis samples is greater as 326 compared to other data sets, and that this diminished our ability to detect differentially expressed 327 genes. We assessed this possibility in two ways. First, we repeated all differential expression 328 analyses considering only the two most highly correlated samples of each digit for mouse, 329 alligator and Anolis, which consistently had correlation values above 0.99 (Extended Data Fig. 7  330 a). Results of these two-sample comparisons are consistent with analyses of all three samples 331 (e.g., compare Fig. 3 and Extended Data Fig. 7 b), indicating that the unique Anolis pattern is not 332 an artifact of sample quality. 333 Second, we evaluated the dispersion values of our samples. Dispersion is a measure of 334 variance among samples that is calculated by the software edgeR. This parameter affects the 335 sensitivity of differential expression analyses (e.g., a set of samples with high dispersion will have 336 low sensitivity in tests of differential expression), and it can be impacted by specimen pedigree 37 . 337 Anolis embryos were collected from non-siblings, whereas mouse and alligator samples were 338 collected from siblings. As expected, the mean dispersion value of Anolis samples is greater than 339 either mouse or alligator (Extended Data Fig. 8). The Anolis mean dispersion value is consistent 340 with other data sets in which samples were collected across a population 37 . However, such 341 differences in dispersion cannot explain the unique Anolis pattern. Chicken hindlimb digits, which 342 were also collected from non-siblings and have dispersion values comparable to Anolis 343 (Extended Data Figure 8), show patterns of differential expression comparable to mouse and 344 alligator (Fig. 2, Extended Data Fig. 4 a, Fig. 5 a). Thus, neither timing, sample quality, nor 345 pedigree appears sufficient to explain the Anolis data. It appears that homogeneity among the 346 digits reflects biological reality, and digits in this lineage have undergone secondary 347 homogenization. Other lineages might have similarly experienced loss of digit identities (e.g., 348 ichthyosaur forelimbs), and the secondary homogenization of paramorphic serial homologs has 349 been described in other anatomical systems (e.g., the homodont dentition in cetaceans 38  wing digits (D3 and D4), however, do not show evidence of translocation. This is observed most 357 clearly by comparison to the hindlimb of the chicken, with the pattern recovered when three 358 different gene lists are considered (transcription factors, limb patterning genes, and CDEGs). As 359 discussed above, although we cannot diagnose conserved gene expression profiles for the digits 360 D3 and D4 across amniotes, we obtain indirect evidence for a correspondence of avian digits to 361 the digits D1, D3, and D4 of other amniote limbs (Fig. 6 d). The possibility of a 1-3-4 pattern of 362 digit identity in the bird wing has been proposed previously 40 on the basis of experimental 363 studies 41 . Still, this pattern of correspondence is surprising. It challenges the predominant 364 hypotheses of digit identity and suggests an alternative scenario for how limb development 365 evolved in the lineage leading to Aves (Fig. 7 b). Significantly, it indicates that diagnoses of digit 366 identity from the paleontological record and hypotheses of digit identity based upon gene 367 expression profiles have a more complex relationship than previously anticipated. 368 The frame shift hypothesis is an integrative model. It aimed to explain an apparent 369 incongruity between paleontological and neontological data sets by providing a developmental 370 account for evolutionary transformation rooted in a mechanistic basis of homology. Our results 371 show that any such integrative model will be more complicated than previously presumed. Moving 372 forward, we recommend systematic reappraisal of phalangeal and metacarpal characters along 373 the avian stem. It has been proposed that patterns of digit reduction in theropods might be more 374 complex than is generally assumed 40 . For example, study of the ceratosaur Limusaurus led to the 375 hypothesis that in basal tetanurans metacarpal characters correspond to identities 2-3-4, while 376 phalanges have identities 1-2-3 41 , although specifically how this taxa informs the plesiomorphic 377 avian condition has been contested 42 . Additionally, we recommend continued, broad taxonomic 378 sampling in studies of limb development. Building expanded, comparative data sets will allow for 379 quantification of homoplasy between species and between the fore-and hindlimbs, which could 380 impact hypotheses of digit identity presented here. Finally, continued functional genetic studies 381 are required to understand how digit-specific phenotypes are regulated and to test the hypothesis 382 that CDEGs play privileged roles in establishing gene regulatory states in the interdigital 383 mesenchyme. 384 The question of how to diagnose the digits of the avian wing is among the oldest in 385 comparative morphology 3,43 . This study tests several assumptions that underlay many 386 contemporary studies of the homology and developmental identity of digits. Indeed, it is the first to 387 comparatively analyze the full gene expression profiles of digits of different species. Such data, 388 and a willingness to consider hypotheses that previously might have been regarded as heterodox, 389 is required for the testing and refinement of integrative theories on the nature of limbs. 390 391

Methods 392
Limbs of each species were sampled after digital condensations have formed and after inter-393 digital webbing has begun to reduce. RNA was extracted from digits and their associated 394 posterior inter-digital webbing following the dissection strategy shown in Figure 1 a of Wang et 395 al. 7 . A summary of the taxonomic and tissue sampling strategy is presented in Extended Data 396 Digits were pooled into a single vial (n=20 digits) and divided into four samples of five randomly 416 selected digits. RNA was extracted from each sample with TRIzol (Thermo Fisher Scientific) 417 following described methods 46 . For digit D3 stage 18, two of the extractions yielded too little RNA 418 for sequencing approach described below; therefore, there are only two replicates of this sample 419 type. RNA quality was assessed using an Agilent Technologies 2100 Bioanalyzer, and samples 420 with RIN scores above 8.5 were submitted for sequencing at the Yale Genome Sequencing 421 Center. Sample size (three replicates per sample type) was selected for downstream differential 422 expression analyses, according to References 27 and 28. To generate strand-specific 423 polyadenylated RNA libraries, samples were processed as follows: Approximately 500 ng of RNA 424 was purified with oligo-dT beads and the mRNA recovered was sheared by incubation at 94 C. 425 First strand synthesis was performed with random primers, and then second strand synthesis was 426 performed with dUTP to generate strand-specific libraries for sequencing. cDNA libraries were 427 end-paired, A-tailed adapters were ligated, and the second strand was digested with Uricil-DNA-428 Glycosylase. qRT-PCR was performed using a commercially available kit (KAPA Biosystems) to 429 confirm library quality, and insert size distribution was determined with Agilent Bioanalyzer. 430 Samples were multiplexed on an Illumina Hiseq 2000. Each sample was sequenced to a depth of 431 approximately 50 million reads (single-stranded, 75 base pair length). 432 Reads were mapped to the American alligator genome assembly (allMis0.2) with genome 433 assembly described by Green et al. 47 . Sequenced reads were mapped to the genome using 434 Tophat2 v2.0.6 on Yale University's Ruddle computing cluster. In Tophat2, reads were first 435 mapped to the transcriptome, and the remaining reads were then mapped to the genome. 436 Mapped reads were assigned to genes with HTSeq v0.5.3p 48 , which was implemented with 437 Python v2.7.2. In HTSeq, we required that reads be mapped to a specific strand, and to account 438 for reads that mapped to more than one feature, we ran with the setting "intersection-nonempty." Differential expression testing of adjacent digits. EdgeR (Release 3.1) 27,28 was used to test 502 for differential expression of adjacent digits (e.g., D1 vs. D2) of mouse, alligator, Anolis and 503 chicken. We used function glmFit and glmLRT in EdgeR, which implemented a generalized 504 regression model for differential expression test. Specifically, in alligator and chicken PCA and 505 PCA revealed stable grouping of samples by digit number across stages. Therefore, subsequent 506 analyses consider these data from two stages simultaneously. 507 In Anolis, pairwise testing of samples revealed nonconventional p value distribution, with 508 a decrease near zero and sometimes a bump near 0.5. Because correlation between replicates 509 was lower than what was observed in either mouse or alligator (Extended Data Fig. 7), and 510 because PCA of the full transcriptomes revealed two major clusters of data that did not 511 correspond to biological phenomena (Extended Data Fig. 1 c), we corrected for the artifact of the 512 non-biological clusters by including the first principle component in the regression model in 513 EdgeR. Analyses were also run without this PC1 correction. Results presented in the manuscript 514 are robust to both analytic approaches, although PC1 correction results in discovery of slightly 515 more differentially expressed genes for a given false discovery rate. (e.g., compare Fig. 3 and 516 Extended Data Fig. 10). 517 Following analyses of differential expression, multiple hypothesis testing was accounted 518 for by adjusting p values following the Benjamini-Hochberg method 26 . We also considered a 519 second correction method, the q-value of Storey 56 . The major results presented in the study are 520 robust to both methods. Although the Storey method uniformly called more genes as significant at 521 the FDR threshold of 0.05, the same genes are recovered in the center of Venn diagrams (Fig. 3, 522 Fig 4a, Fig 5 b). 523 524

Comparing correlation of fold change between orthologs and random genes. 525
To assess whether genes differentially expressed at a given position in one species are behaving 526 similarly in other species, we compared the relative fold change of these genes to random genes 527 of similar expression level in the other species. For example, between D1 and D2 of alligator 46 528 genes are recovered as differentially expressed among the one-to-one orthologs of the three 529 pentadactyl species sampled (FDR threshold of 0.05) (Fig. 3). For these genes, we calculated the 530 fold change in TPM according to the equation in Fig. 4 b for all three species (e.g., mouse, 531 alligator and Anolis). Next, for each of the 46 genes, we identified the gene most similar in its 532 TPM value at the position of the anterior digit (e.g., for the comparisons of D1 vs D2, we matched 533 the TPM of D1) among the one-to-one orthologous transcription factor genes for the two other 534 species. Then, we calculated to Pearson's correlation for the vector comprised the gene fold 535 changes from the original species (alligator) and the orthologs of the other species (mouse and 536 Anolis). We also calculated Pearson's correlation between gene fold changes from the original 537 species (alligator) and the random list of random genes of similar expression level for each of the 538 other species. This was repeated for each the three species, and if a limb was sampled at 539 multiple time points, for each time point. Finally, to assess whether at a given position 540 orthologous genes could be distinguished in their behavior to random genes, we compared the 541 means of these correlations using two tests: t-test and a Mann-Whitney U test. 542 543 Differential expression testing between all digits. EdgeR was also used to test for genes that 544 differed between any combination of digits within a limb. This was done by specifying multiple 545 coefficients to function glmLRT. As with the pairwise tests, we considered both stages of alligator 546 and chicken simultaneously and included the first principle component in the regression model for 547 Anolis. Genes identified as CDEGs for each species are available as supplemental information. human Ensembl gene IDs in Ensembl assembly v85 using BioMart (N=2183). To recover the 552 species-specific lists of transcription factors, two approaches were taken. In mouse and Anolis, 553 orthologous genes were identified using BioMart's orthology predictions for Ensemble assembly 554 v85. In alligator and chicken, because these species were analyzed using different assembly 555 builds, orthology was determined by matching gene symbols to those of human from Ensembl 556 assembly v85. By this approach, we recovered transcription factors for each species as follows: 557 1838 in mouse, 1563 in Anolis, 1455 in Alligator, and 1217 in chicken. To identify human 558 orthologs of select genes identified by differential expression analyses, we first used gene 559 symbols and then confirmed that the ensemble IDs were consistent across Ensemble 560 assemblies. Genes identified as transcription factors for each species are available as 561 supplemental information. 562 To identify one-to-one orthologous transcription factor genes in mouse, alligator, and 563 Anolis, we used BioMart to generate a list of one-to-one orthologous genes between mouse and 564 Anolis in Ensembl assembly v85 that correspond to the published transcription factor Entrez 565 IDs 31 . This gene list was then matched to alligator and chicken by gene symbol to recover 566 transcription factor genes that are one-to-one orthologs in multiple species. 567 568 Limb patterning genes. A Ph.D. dissertation by Carkett 24 identified genes that are sensitive to 569 Shh signaling by experimental pertubations and in silico analyses. From these studies, a 570 summary list of genes that pattern the autopod was produced (pg. 172). This gene list includes 571 transcription factor and signaling genes. These gene symbols were matched with each species to 572 identify the subset of genes present in the genome assemblies that we considered for mouse 573 (n=151), alligator (n=142), Anolis (n=140), and chicken (n=136). Genes identified as limb 574 patterning genes for each species are available as supplemental information. 575 576 577 In situ hybridization. RNA from a stage 18 alligator limb was extracted, as described above for 578 sequencing, and cDNA was generated with the High Capacity cDNA Reverse Transcription kit 579 (Applied Biosystems). Primers were designed with Primer3 57 to amplify fragments of Tbx2 580  1  596  597  2  598  599  600  3  601  4  602  603  5  604  605  6  606  607  608  7  609  610  611  8  612  613  614  9  615  616  617  10  618  619  11  620  621  622  12  623  624  625  626  13  627  628  629  14  630  631  632  15  633  634  635  16  636  637  638  17  639  640  18  641  642  19  643  644  645  20  646  21  648  649  650  22  651  652  653  654  23  655  656  24  657  658  659  25  660  661  26  662  663  664  665  27  666  667  668  28  669  670  671  29  672  673  30  674  675  676  31  677  678  32  679  680  681  33  682  34  683  35  684  685  36  686  687  688  37  689  690  691  38  692  693  39  694  40  695  696  41  697  698  699  42  700  701 43 702

Data availability 748
The RNA-sequencing data for mouse, alligator and Anolis (including counts of mapped reads), is 749 available on Gene Expression Omnibus (GEO) repository (https://www.ncbi.nlm.nih.gov/geo/) 750 under accession number GSE108337. Sequencing data for human limb samples is available 751 through the database of Genotypes and Phenotypes (dbGaP) under study accession number 752 phs001226.v1.p1. Supplementary data files include (1)             Extended Data Figure 5 | Comparing fold change of differentially expressed genes to random genes. Fold change of genes differentially expressed between adjacent digits for one species was compared to those of orthologous genes in other species and also to randomly selected genes of similar expression level for the other species. Comparisons were made among the pentadactyl limbs (a-d), and also considering chicken, rather than Anolis (e-g). Broadly, genes differentially expressed between D1 and D2 behave consistently between species. Among more posterior digits there is limited evidence for conserved behavior. The genes number of genes recovered as differentially expressed at each position for each species are reported in Fig 3. and Extended Data Fig. 4 b. Vertical lines in each plot represent the mean values of correlation among comparisons between the sets of orthologous or random genes.    replicates are correlated with values comparable to the other species (>0.99). Therefore, to assess whether variance in Anolis was biasing our analyses, tests of differential expression were replicated using only the two most-highly correlated replicates of each digit. (b) Venn diagrams of one-to-one orthologous transcription factors genes identified as differentially expressed between adjacent digits with a FDR threshold of 0.05 when differential expression analyses considered only the two most-highly correlated samples of each digit.     -Sampled at st.18 -2 replicates per sample type, designated a and b when plotted on Fig. 2 b.
-One sample of each type from Individual 1; one sample of each forelimb type from Individual 2; one sample of each hindlimb type from Individual 3