Giant tortoise genomes provide insights into longevity and age-related disease

Giant tortoises are amongst the longest-lived vertebrate animals and as such provide an excellent model to study traits like longevity and age-related diseases. However, genomic and molecular evolutionary information on giant tortoises is scarce. Here, we describe a global analysis of the genomes of Lonesome George, the iconic last member of Chelonoidis abingdonii, and the Aldabra giant tortoise (Aldabrachelys gigantea). The comparison of these genomes to those of related species, using both unsupervised and supervised analyses, led us to detect lineage-specific variants affecting DNA repair genes, inflammatory mediators and genes related to cancer development. Our study also hints at specific evolutionary strategies linked to increased lifespan and expands our understanding of the genomic determinants of ageing. These new genome sequences also provide important resources to help the efforts for restoration of giant tortoise populations.


Supplementary Information
Giant tortoise genomes provide insights into longevity and agerelated disease Quesada  1. Genomic sequencing and annotation 1.1. Genome sequencing and assembly DNA was recovered from a blood sample from Lonesome George, the last member of the species Chelonoidis abingdonii, one of the several giant tortoises living in the Galapagos Islands (Fig. 1a, main text). This DNA was sequenced using the Illumina HiSeq 2000 platform, from a 180 bp-insert paired-end library, a 5 kb-insert mate-pair library and a 20 kb-insert mate-pair library. The assembly of these libraries was performed with the AllPaths algorithm 2 to form a draft genome of 64,657 contigs with an N50 of 74 kb. Then, we scaffolded these contigs using Sspace 3 (v3.0) employing the long-insert mate-pair libraries. Finally, we filled the gaps using PBJelly 4 (v15. 8.24) and the reads obtained from 18 PacBio cells. Since PacBio sequencing features frequent errors, we compiled the coordinates of these regions and added them to the header of the assembly. Point variants belonging to these regions are not reported unless validated by other means. This procedure generated 10,623 scaffolds with an N50 of 1.27 Mb ( Table S1). The final assembly (CheloAbing 1.0) is 2.3 Gb long, a size consistent with other turtle assemblies such as Chrysemys picta bellii (2.59 Gb) 5 , and Chelonia mydas and Pelodiscus sinensis (about 2.2 Gb) 6 . Over this final assembly, we soft-masked repeated regions with RepeatMasker 7 using a database of chordate repeated elements (provided by the software) as reference. These repetitive elements represent 28.49% of the genome, including retroelements (17.86%) ( Table  S2). The type of repeat, occurrences (n), length (bp) and total percentage of the genome covered by them are indicated.
The same values for C. p. bellii's genome are shown for comparative purposes.
In parallel, we used the Benchmarking Universal Single Copy Orthologs (BUSCO v3.0.0) 8 to estimate the overall completeness of the genome. This tool tests the status of a set of 2,586 well-conserved vertebrate genes (from the comprehensive catalog of complete, fragmented or missing orthologs 9 ). The results showed that a 92.4% of the genes were complete (0.6% of them with duplications), 5.7% were fragmented, and 1.9% were missing. This result is quite similar to those of other genomes we tested for comparison, such as humans or the Agassiz's desert tortoise, Gopherus agassizii (G. agassizii) ( Table S3). The average characteristics of the predicted genes are also similar to those predicted in other genomes (Table S16). For the DNA sample from the Aldabra tortoise we used Illumina technology and a 180 bp paired-end library to obtain whole-genome data. We then aligned the resulting reads to CheloAbing 1.0 with BWA 10 (v0.7.5a). The resulting alignment mapped 97.8% of the 511,873,470 reads. We also calculated the callable proportion of the genome with the A. gigantea reads using GATKs CallableLoci walker. We found 1,885,758,801 callable bases, or 81.9% of the assembly size. We similarly aligned the genomic reads from C. abingdonii to CheloAbing 1.0.
With both alignments, we found 427,949 SNPs in C. abingdonii and 18,852,620 SNPs in A. gigantea. An analysis of coalescence using the Pairwise Sequentially Markovian Coalescent (PSMC) model 11 with these data showed that the effective population of C. abingdonii has been steadily declining for the past million years, with a slight uptick about 90,000 years ago (Fig. 1b, main text). Using a generation time of 25 years and a genome-wide mutation rate of 2.5 x 10 -8 , we estimated that the effective population size was no more than a few thousand individuals for the past tens of thousands of years. By contrast, this analysis suggests that the effective population of A. gigantea has experienced greater variations in the last million years, though some have pointed out that similar patterns may represent historical population structure, leading to an overestimation of effective population size by the model 12 . It must be noted that the prediction for C. abingdonii loses statistical power at the million-year time frame, probably due to complete coalescence. This suggests that overall diversity in C. abingdonii must have been very low throughout many generations. Consistent with this, the genome-wide median heterozygosity per bp is 0.00013 for C. abingdonii one of the lowest values reported to date  and 0.00078 for A. gigantea.
Finally, we aligned RNA-Seq data from C. abingdonii's blood and A. gigantea's granuloma to the assembled genome using TopHat 13 (v2.0.14). Additionally, we used Illumina technology and a 180 bp paired-end library to obtain whole-genome data from A. gigantea. We then aligned the resulting reads to our C. abingdonii's assembly with BWA 10 (v0.7.5a). Similarly, raw reads from C. abingdonii were aligned to CheloAbing 1.0 for manual curation purposes.

Automatic annotation of CheloAbing 1.0
We performed de novo annotation on the genome assembly of C. abingdonii using MAKER2, a multi-threaded, parallelized computational tool designed to produce accurate annotations for novel genomes based on a machine-learning approach 14 . We provided the algorithm with both the CheloAbing 1.0 assembly and the RNA-Seq data, as well as reference sequences from human and P. sinensis. It completed two runs in a Microsoft Azure virtual machine. As a result, we obtained 27,208 predicted genes, to which a putative function was assigned using MAKER2.
In a preliminary analysis of the general expansion of gene families, we performed several pairwise alignments of the predicted proteins from the automatic annotation to the Uniprot 15 databases of human and P. sinensis proteins using BLAST 16 (v2.6.011). Using in-house perl scripts (https://github.com/vqf/LG), we grouped these sequences into one-to-one, one-to-many and many-to-many orthologous relationships. Only alignments with a coverage of at least 80% of the longer protein and with more than 60% of identity were considered for the analysis. Finally, we searched for family expansions specifically present in C. abingdonii, by examining the aforementioned groups of orthologs. The results were manually curated. This way, we constructed extended orthology sets that may contain more than one sequence per species. These orthology sets capture most of the known protein families, although some of these families appear split according to sequence similarity. Almost all of these splits occur both in the human-to-P. sinensis and in the human-to-C. abingdonii comparisons. Since assembly errors may mimic gene losses, we decided to only test these sets for C. abingdonii-specific expansions. The interrogation of these sets suggests the existence of several high copy-number gene families in tortoises and turtles but absent in humans. Most of these families show homology to viral retrotranscriptases, consistent with the hypothesized expansion of CR1 retrotransposons in ancestral amniotes followed by dominance of L1 LINEs in therians 17 . After manual curation of the results, we found 12 examples of gene families displaying extra copies in C. abingdonii compared to turtles ( Table S4). Each of those genes was also identified from the aligned reads in the Aldabra tortoise genome, and 10 of these amplifications were also identified in the recently sequenced genome of Agassiz's desert giant tortoise (G. agassizii) 18 . Interestingly, a functional annotation clustering analysis using DAVID 19 of these 12 genes found a significant enrichment in the "extracellular exosome" GO category (8 genes, p=0.044 after Benjamini correction). Exosomes play an important role in cell-to-cell communication and have a strong impact on multiple biological processes and signalling pathways related to immunity, cancer and ageing [20][21][22] . Therefore, these data suggest that gene expansion of exosome-related genes may have provided a mechanism related to gigantism and cancer protection in giant tortoises. Table S4. Tortoise-specific gene expansions. Family Next, we searched for signatures of selection affecting the predicted set of genes. For this purpose, we used BLAST and in-house perl scripts to pairwise align all available protein sequences from human (Homo sapiens), mouse (Mus musculus), dog (Canis lupus familiaris), gecko (Gekko japonicus), green anole lizard (Anolis carolinensis), python snake (Python bivittatus), common garter snake (Thamnophis sirtalis), Habu viper (Trimeresurus mucrosquamatus), budgerigar (Melopsittacus undulatus), zebra finch (Taeniopygia guttata), flycatcher (Ficedula albicollis), duck (Anas platyrhynchos), turkey (Meleagris gallopavo), chicken (Gallus gallus), Chinese softshell turtle (P. sinensis), green sea turtle (C. mydas) and painted turtle (C. p. bellii). We focused only on those genes with a one-to-one ortholog status in every species, and missing in no more than 3 species (excluding C. abingdonii), as described in previous studies 23 . This way, we obtained 1,592 groups of sequences. We then aligned each group separately with PRANK v.150803 using the codon model and analysed the alignments with codeml from the PAML package 24 .
To search for genes with signatures of positive selection affecting C. abingdonii genes specifically, we executed two different branch models, M0, with a single ω0 value (where ω represents the ratio of non-synonymous to synonymous substitutions) for all the branches (nested), and M2a, with a foreground ω2 value exclusive for C. abingdonii and a background ω1 value for all the other branches. Genes with a high ω2 value (>1) and a low ω1 value (ω1<0.2 and ω1≈ω0) in C. abingdonii, but not in P. sinensis (Table S5 and Table S17) were then considered to be under positive selection. As a control, M2a was repeated using P. sinensis as the foreground branch, and no overlapping genes were found in the result. After this, we used the M8 branch model to assess the individual importance of every site in these positively selected genes, obtaining a list of sites possibly under selection. The equivalent sites were examined in the Aldabra tortoise through alignments, to evaluate which of these important residues were altered (Table S18). The ω value for C. abingdonii (ω2) and the background (ω1), both of them under selection model 2, and the ω value under selection model 1 for all the aligned species (ω0) are indicated. Differences between C. abingdonii-specific and the general values (ω2-ω1) are also shown. The complete Table resulting from this analysis is presented as Table S17.
The results of this analysis point to multiple biochemical pathways which may have been affected by selection, although automatic functional annotation of this set did not find significant overrepresentation of any GO term (data not shown). Two of the top three genes with the strongest evidence for positive selection were tubulins (TUBE1 and TBG1), suggesting that changes in cytoskeletal dynamics have been important in the evolution of giant tortoises. Consistent with the important role of this pathway in the biology of the cell, the alterations found in the selected residues (p.E169D and p.I186V in TUBE1) are conservative and probably do not affect the main role of this protein in microtubule formation. However, the effects of regulating tubulin assembly on cell cycle progression 25 suggest that these alterations might be related to the increase in the number of cellular divisions associated to gigantism. In addition, two genes with evidence for positive selection, BAG2 (NEF) and UBE2J1 (Ubc6/7) are involved in endoplasmic-reticulum-associated protein degradation (ERAD) (Fig. S1). Notably, one of the genes duplicated in giant tortoises (VCP, also known as p97) also plays a central role in this pathway. ERAD is important in the unfolded-protein response and consequently in the correct proteostasis of the cell, whose deregulation constitutes a hallmark of ageing 26,27 . In this regard, the list of positively selected genes also features TDO2, whose product is involved in the regulation of tryptophan-mediated proteostasis. Interestingly, the inhibition of TDO2 has been shown to protect against age-related neurodegeneration 28 . Therefore, the selective process acting on these proteins might reflect the constraints imposed on proteostasis due to increasing lifespan.
In addition, two positively selected genes, AHSG and FGF19, are listed in a panel of four proteins whose expression levels correlate with successful ageing in humans 29 . Notably, one of the selected alterations in FGF19 (p.S116A) is expected to affect the receptor-interaction site of this protein. These factors intervene in the regulation of glucose and lipid metabolism 30,31 , another hallmark of ageing, suggesting that the adaptation to the challenges that longevity poses on this system may have been important in the evolution of giant tortoises. Finally, three genes with evidence for positive selection in these organisms (MVK, IRAK1BP1 and IL1R2) play important roles in the modulation of the immune system, which in turn participates in the phenotypes of altered intercellular communication associated with ageing 26 . In this regard, it is important to notice that, in addition to its role in proteostasis, ERAD is a target of viral infection, as multiple viruses depend on this process for successful delivery to the cytoplasm 32 . Taken together, this hypothesis-free analysis highlights proteostasis, metabolism regulation and immune response as key processes during the evolution of giant tortoises, and provide starting points for future work on this subject. Figure S1. Positively selected and amplified genes in the genome of C. abingdonii implicated in Endoplasmic Reticulum Associated Degradation (ERAD). Highlighted genes are shown on top of the protein processing pathway in endoplasmic reticulum (KEGG ko04141) 33-35 . We also performed an analysis of microRNA (miRNA) sequences in C. abingdonii's genome with BLASTN using as databases the high-confidence miRNA set available at miRBase v21 (http://www.mirbase.org/) and the list of miRNAs from P. sinensis. Due to limitations in the BLASTN algorithm, we used the longer consensus precursor sequence instead of the seed sequence. The results were manually analysed to assess their reliability. We identified a total of 316 miRNAs, 123 of those being present in both lists (Table S19).

Manual annotation of CheloAbing 1.0
In addition to the automatic annotation process, we manually annotated about 3,000 genes of general interest in the fields of ageing, cancer, immunology and DNA repair, and involved in physiological functions such as reproduction, development, shell formation and sensorial perception (Fig. S2). These genes were specifically selected a priori based on our experience in these fields and after a revision of the available publications in each subject. BAG2 VCP UBE2J1 UBE2J1 CRYAA We used a custom human protein database from Ensembl (http://www.ensembl.org/ index.html) as a reference to predict each ortholog in the genome of C. abingdonii by using the BATI algorithm (Blast, Annotate, Tune, Iterate) 36 . Accordingly, protein residues in all alignments have been numbered using the human transcript as reference. Those cases with a low identity level between human and C. abingdonii were treated separately using sequences from closer relatives such as the chicken (G. gallus), the green anole lizard (A. carolinensis) and other turtles like P. sinensis, the western painted turtle (C. p. bellii) or the green turtle (C. mydas) as references. Once the selected genes were properly annotated, we performed several comparisons between the gene sequences of C. abingdonii, human (H. sapiens), mouse (M. musculus), Sauria (A. carolinensis), chicken (G. gallus), and the other turtles (P. sinensis, C. p. bellii and C. mydas). We also listed changes considered interesting in aligned genomic reads from A. gigantea as a representative of another giant tortoise species that is distantly related to Galapagos giant tortoises (Table S6). We first focused on collecting the putatively most relevant truncations, point mutations (according to functional information in the human ortholog), and copy-number variations. The last column of the table summarizes the results of our screen of other Galapagos tortoise species, the outgroups for both the Galapagos and Aldabra tortoises, and the other turtle species for which we have data publicly available (G. agassizii, P. sinensis and C. p. bellii) species, and the continental relative for giant tortoise species (continental outgroups). To validate the most relevant findings, we used DNA samples of different tortoises from the Galapagos Islands, including the islands of Santa Cruz (Chelonoidis porteri), Española (Chelonoidis hoodensis), Isabela (Chelonoidis becki and Chelonoidis vicina), Pinzón (Chelonoidis duncanensis) and Pinta (Chelonoidis abingdonii). The DNA sample from the island of Pinta corresponded to Lonesome George, as it was the only available individual from the C. abingdonii species. We also used DNA samples of tortoises from the Aldabra atoll (Aldabrachelys gigantea) as well as from two continental outgroups: Chelonoidis chilensis, Chelonoidis carbonaria and Chelonoidis denticulata for Galapagos Islands tortoises, and Stigmochelys pardalis for Aldabra tortoise (Fig. S3).
To validate the existence of specific SNPs we carried out PCR experiments with locus-specific primer pairs. When studying specific variants, primers were designed targeting the region where they are located. To study copy-number variations, we performed PCR reactions with primer pairs which allow the amplification of each copy separately. When this method was not feasible, we designed specific pairs to amplify a target region with different nucleotide variants in each copy, and then examined the resulting electropherogram for evidence of both copies. A list of the primers used and other additional information about every PCR reaction is summarized in Table S20. In all cases, PCR reactions were carried out with 10 ng of DNA under the same specific conditions, where melting temperature (Tm) depends on the pairs used. For each PCR reaction these conditions were: 95 o C for 3 min; 35 cycles (95 o C for 60 s; Tm for 30 s; 72 o C for 30 s); 72 o C for 7 min; 20 o C for 5 min. Then, we tested the success of the PCR reactions by electrophoresis in a 2% agarose gel with the resulting products. Finally, the products were sequenced using the Sanger method with 1 μl of one of the PCR primers at 10 μM and 50 ng of amplified DNA in a total volume of 10 μl. When feasible, the results of the manual annotation were also confirmed via RNA-Seq data from C. abingdonii whole blood and from an A. gigantea granuloma (Table S7).
When orthology relationships were dubious, phylogenetic analyses were performed. In the cases of myosins and RAS proteins, the alignment of the sequences was performed separately for each gene with MAFFT v.7 with default parameters and auto-strategy 38 . The Bayesian Information Criteria (BIC) implemented in ProtTest v.3.4.2 was used for the selection of the bestfit model of amino acid replacement 39 .
Bayesian Inference (BI) and Maximum Likelihood (ML) analyses were performed with MrBayes 40 and RAxML 8.1.21 41 in RAxMLGUI v.1.5.2, respectively. In BI, eight concurrent chains (one cold and seven heated) for 5x10 7 generations were run and recorded samples every 5,000 generations. The first 25% of the samples were discarded as burn-in, and the remaining samples were used to summarize the posterior probability distributions for parameters (≥95% indicate significant support) 42 . Results were analysed in Tracer v.1.6 to assess convergence and effective sample sizes (ESS) for all parameters. The best ML tree was selected from 500 iterations and the confidence of the branches of the best ML tree was assessed based on 1,000 thorough bootstrap replicates using the PROTGAMMA model. In both analyses, Drosophila amino acid sequences were used as outgroup. Maximum likelihood bootstrap values (bs) and Bayesian posterior probability (pp) values were joined and mapped to the Bayesian tree (i.e., the 50% majority-rule consensus tree calculated from the posterior distribution of trees).

Species names and groups
To clarify the usage of the different taxonomic terms, "giant tortoises" refers to C. abingdonii, other tortoises from Galapagos Islands (described below in section 1.5.1.) and A. gigantea. The term "tortoises" also includes G. agassizii. Sauropsida is a reference to turtles, birds and lizards, while "reptiles" excludes the birds. However, Archelosauria refers to birds and turtles only. Additionally, when using the terms Testudines, Chelonii or "turtles" we are referring to all turtles, but when referring to Testudinoidea, we are excluding the soft-shell turtle. Divergence times between taxa where obtained using TimeTree 43 . This analysis places the split between the two clades leading to the Galapagos and the Aldabra giant tortoises at about 40 million years ago, and the split between the ancestors of these tortoises and the ancestors of humans at about 312 million years ago (Fig. S3). We manually annotated about 3,000 genes in CheloAbing 1.0, including the complete degradome 45 (set of protease genes) and other genes related to the above mentioned topics of special interest. We observed hallmarks suggesting selective pressure over some of these processes, such as longevity and stress response (including its interplays with pathogens). The immune system, sometimes referred to as an 'evolutionary driver' 46 , presents several variants that, taken together, suggest a stronger innate immune system in turtles and a tight modulation of the inflammatory process, which could also play a role in the control of cancer development and longevity. Another major source of variations in C. abingdonii's genome was found in genes associated with development. Thus, we observed a substantial copy-number variation in the βkeratin family, which is likely to play an important role in shell development. Additionally, we present evidence for variants in the transcription factor TP53 and in the family of globins that may be related to enhanced hypoxia tolerance in turtles. We also describe the putative role of the IGF system in both size and longevity of giant tortoises. Finally, other variants found in different giant tortoises suggest biochemical changes that might merit further study (Tables S6  and S7).
Green boxes indicate the duplication/expansion of the mentioned gene, while blue boxes indicate its putative absence in the genomic sequences of C. abingdonii. Red boxes indicate premature stop codons or frameshifts in the genomic sequence. In addition, the presence of missense variants localized in functional domains of the protein was indicated with orange boxes. *, annotated using automatic approaches. **, genes belonging to a specific gene family, but whose orthology relationship is unclear. The approximate number of genes found in these cases is indicated between parentheses. ***, genes with very low identity to their human orthologs. ****, shorter genes, possibly due to a different initial methionine.

Genetic basis of development
The determination of morphological identity and anteroposterior regionalization of structures in vertebrates has been classically considered of main interest in evolutionary and developmental biology 47 . As central players of animal development, homeobox-containing (Hox) genes are involved in controlling and determining the final shape and body-plan of an organism 48 . Thus, loss of Hox gene functions or differences in their regulation can lead to striking morphological changes 49,50 . In this context, we analysed the Hox gene set in C. abingdonii and identified 39 different family members in this tortoise, which are highly conserved throughout evolution. This set of 39 Hox genes was also conserved in A. gigantea.
As in other Testudines, placental mammals, crocodiles and birds, no HOXC3-related sequences could be identified in C. abingdonii, A. gigantea or G. agassizii (Fig. S4). This gene is only present in fishes, amphibians, snakes and lizards 51 . The absence of HOXC3 in this tortoise supports its early loss in the radiation of Archelosauria 6,52 . Furthermore, we assessed the presence of paraHox genes in C. abingdonii, involved in the anteroposterior development of gut and nervous system of animals 53 . The annotation of this group of genes revealed the presence of a duplication affecting CDX4 (Table S21) which is characteristic of Sauria.
Other genes that perform basic functions in the embryo developmental process and the maintenance of homeostasis, such as those coding for bone morphogenetic proteins (BMPs) and growth differentiation factors (GDFs) 54 , are well conserved in CheloAbing 1.0. Thus, we identified orthologues for BMP1, BMP2, BMP3, BMP4, BMP5, BMP6, BMP7, BMP10 and BMP15, but contrary to primates which have duplicated BMP8 (BMP8A and BMP8B), there is only a single BMP8-like gene in C. abingdonii. We also identified the orthologs of GDF2, GDF5, GDF6, GDF7, GDF8, GDF9, GDF10, GDF11 and GDF15 in C. abingdonii. By contrast, we could not distinguish GDF1 and GDF3 orthologs, as only one common ancestor-related gene, GDF1L, is present in Archelosauria. The high level of identity between these genes in the selected species suggests that their function is indispensable in Testudines. Figure S4. Evolutionary scenario of HOXC3 loss within different vertebrates. Green lines show the putative independent losses of the HOXC3 gene in mammalian and Archelosauria lineages, in contrast to fish, amphibians and the rest of reptiles. The phylogenetic tree was generated using Common Tree Taxonomy tool from NCBI 1 .
Finally, we studied the WNT family, which includes a large number of cysteine-rich glycoproteins that activate signal-transduction cascades. These cascades are involved in the control of a wide range of developmental events during embryogenesis, and in adult tissue homeostasis 55 . The analysis of CheloAbing 1.0 revealed duplications in two members of this family, WNT4 and WNT11 (Table S21). Although both genes are essential during the development of the embryo, the case of WNT11 is particularly interesting, since this gene contributes to skeletal development through osteoblast maturation 56 . This duplication was found in turtles and G. gallus, but not in A. carolinensis. On the other hand, the two copies of WNT4 were only identified in turtles. Given the role of these genes in the development of bone structures, these duplications might be related to the growth of the shell.

Teeth
Testudines lost the ability to grow teeth approximately 150-200 million years ago, becoming the oldest extant edentulous lineage of tetrapods 57 . Previous studies in birds and edentulous Mysticeti whales proved that tooth loss is closely associated with the pseudogenization and subsequent loss of the tooth-specific genes enamelin (ENAM), amelogenin (AMEL), ameloblastin (AMBN), dentin sialophosphoprotein (DSPP), and the proteases kallikrein-4 (KLK4) and enamelysin (MMP20) 23,58 . Recently, these findings were also confirmed in the western painted turtle 5 . As expected, we did not find any of these tooth-specific genes in C. abingdonii or in A. gigantea, indicating their possible absence or pseudogenization in Testudines. These results confirm a pattern of multiple pseudogenizations associated with tooth loss, followed consistently and independently in multiple vertebrates.

Neurological development
The development of the nervous system is a complex and intricate process in which, alongside the paraHox family, the participation of multiple genes is required. Our study revealed a series of variants in different genes involved in this mechanism and putatively associated with neurological disorders.
In this regard, the serine/threonine protein kinase ATM, involved in cell cycle arrest, apoptosis, brain development and synaptic function, showed two variants in conserved residues (p.R250Q and p.Y316H). Both variants are present in all the Galapagos tortoises, while the first variant (p.R250Q) is also present in their continental outgroups and the Aldabra giant tortoise (Fig. S5). These variants (confirmed by Sanger sequencing) have been linked to ataxia telangiectasia in humans (ClinVar 59 ), a syndrome characterized by impairment in movement and coordination, weakened immune system and difficulty to repair DNA breaks 60 . Presumably, these variants found in tortoises are associated with other balancing changes to avoid any deleterious consequences. This mechanism of the Dobzhansky class 61 would allow the genome of tortoises to explore different solutions for the regulation of ATM that are not accessible to other organisms. Moreover, this gene also presents a variant (p.V2873I) at one of the nine residues involved in the catalytic loop formation 62 . This variant was also confirmed through Sanger sequencing and is present in all the different species of Galapagos giant tortoises tested and also in fishes, but it is absent in A. gigantea and in both continental outgroups. Alongside these changes in ATM, the neurology-related protease gene XPNPEP1 presents a premature stop codon (confirmed by RNA-Seq) at the beginning of the protein (p.D16*). This variant is present in C. abingdonii, other Galapagos tortoises, A. gigantea and their continental outgroups, as revealed through Sanger validation, and likely results in the loss of the function of this protease. The absence of XPNPEP1 is associated with several dysfunctions, including microcephaly and neurodevelopmental retardation 63 . Similarly, CNDP1 presents a truncating frameshift (at residue p.E274) in all Galapagos giant tortoises, the Aldabra tortoise and their continental outgroups, as validated by Sanger sequencing. In mammals, the loss of this protein causes carnosinemia, a recessive deficiency that leads to mental retardation, developmental delay and various neuropathies 64,65 .
Motopsin, PRSS12, encodes a serine protease linked to nonsyndromic mental retardation 66 . This gene appeared to be truncated only in C. abingdonii (validated by Sanger sequencing) due to a premature stop codon (p.R691*). Similarly, the aspartyl protease PSEN1 was found to present a point variant (p.R352E) common to all Sauropsida, validated through Sanger sequencing in giant tortoises and their relative outgroups. In humans, a mutation at the same residue (p.R352C) has been linked to the early development of Alzheimer's disease 67,68 .
Altogether, these alterations suggest that the neurological development of giant tortoises has undergone a number of molecular changes involving losses of gene functions. On the other hand, the evidence for positive selection affecting genes involved in neural development (FGF19, ITM2B and NTF3) suggests that this system has evolved through complex mechanisms, which may include and balance apparently deleterious variants. In this context, changes specifically associated to giant tortoises, such as those in XPNPEP1, could be linked to the evolutionary history of these species.

Reproduction
In contrast with placental mammals, turtles and reptiles are oviparous 69 . To explore the genomic hallmarks associated with this trait, we annotated genes related to egg and eggshell formation. Among the egg-related genes, we identified in C. abingdonii and A. gigantea the presence of vitellogenins (VTG1, 2 and 3), apovitellenin-1 (APOV1), ovomucin-α (also known as MUC5B), 4 paralogues of VMO1 (vitelline membrane outer layer protein 1), and 4 paralogues of AVD (avidin). The formation of the eggshell is crucial for the protection of the egg contents and the embryo after the lay, as well as for the prevention of contamination by microorganisms. To exert its function correctly, the eggshell has to undergo a complex process of calcification, which occurs in the uterus and is one of the most rapid biological mineralization processes known 70 .
To study this process both in C. abingdonii and A. gigantea, we analysed several genes that play an important role in both Ca 2+ transport and Ca 2+ signalling pathways, which are essential mechanisms in eggshell formation. The presence of CACNA1D, CACNA1E, CACNA1H, PRKCB, NCX1 (or SLC8A1), GNA11 and OXTR reinforces the importance of these pathways in eggshell calcification. Likewise, it is well known that calcium transport in the uterus is associated with the counter-transport of Na + and Cl -, and with the secretion of shell matrix proteins 71 . In this context, we found different genes involved in these processes, like SCNN1A, SCNN1B, SCNN1G (transport of Na + ) and CLCN2 (transport of Cl -). Finally, we also identified in CheloAbing 1.0 and confirmed in the genome of A. gigantea some genes, such as SPP1, LTF, ITGA8, LGALS3 and COL11A1, which encode proteins of the eggshell matrix, the non-mineral component of the eggshell.
Between his discovery in 1971 and his death in 2012 there were many attempts to mate the last individual of C. abingdonii with females from other species of Galapagos giant tortoises 72 . Unfortunately, non-fertilized eggs were found in the pen where he lived together with two closely related females. To explore his inability to reproduce, we analysed key genes involved in this process such as CCNB1IP1 or FSHR. However, we did not find any specific change to explain this observation. That being said, alterations in genes related to the thyroid function that might be also involved in reproductive difficulties were found (see section 4.3).

Shell formation
One of the most conspicuous characteristics of turtles is the presence of a shell that covers their entire body. In fact, the turtle shell constitutes an evolutionary novelty in which the developmental pattern of the ribs is radically modified 73 . Specifically, turtle ribs grow laterally to form the carapace, the dorsal roof of the shell, unlike the ribs of other amniotes that grow ventrally 74 . A possible candidate for these turtle-specific changes in development is the carapacial ridge (CR), a specific structure of the turtle embryo 75 . Some CR-specific genes such as LEF1, APCDD1, CRABP1 and SP5 which are effectors or transcriptional targets of the canonical WNT pathway -were identified in both the C. abingdonii and A. gigantea genomes. Then, by using manual annotation, we detected most of the remaining genes involved in this pathway. All the members of the WNT family are conserved in giant tortoises including the aforementioned duplication of WNT4 and WNT11. The DVL and FZD families are also apparently complete and functional in C. abingdonii and A. gigantea. LRP5, LRP6, GSK3A, GSK3B and the orthologues of LEF1, TCF7, TCF7L1 and TCFL2 are also present in these tortoises. Likewise, RUNX1, RUNX2, GREM1, GREM2, MSX1, TWIST1, SHH, HGF, MYF5, FGF8, FGF10 and genes of the BMP family (discussed in section 2.1), likely involved in shell formation, are also present in C. abingdonii and A. gigantea. In summary, we found a wide range of presumably functional genes that likely participate in the formation of the shell. These genes are well conserved between vertebrates. Currently, the place and time of expression is thought to be a main determinant to define the role of these genes in shell formation 75 .
During evolution, the integument of amniotes has transitioned from a basic cornified epidermis, for protection against the environment and retention of water, into an elaborate covering studded with epidermal structures 76 . These structures, such as claws, beaks, scales or feathers of reptiles and birds, are used additionally for sexual display, camouflage, locomotion and thermoregulation 77 . These complex evolutionary inventions result from the products of two gene families: alpha (α) and beta (β) keratins. α-keratins are divided in two groups, type I and type II, and form heterodimers that constitute the structural basis of the cornified epidermis and epidermal appendages. β-keratins are exclusive of Sauropsida and contribute to the formation of the hard keratinous claws and scales of reptiles, as well as the beaks and feathers in birds 78 . Our results based on the manual annotation of type I and II α-keratins in CheloAbing 1.0 suggest a reduction in the number of these fibrous proteins when compared to other organisms. Specifically, we identified a total of 35 α-keratins (19 type I and 16 type II). By contrast, mammals present more than 50 family members on average and other reptiles display 40 or more. For instance, the green anole lizard has 41 α-keratins, while the American alligator and the saltwater crocodile have 44 and 40, respectively 77 . It must be noted that the annotation of keratins is very sensitive to the quality of the assembly, and these numbers can be over-or underestimated in some cases. If confirmed, we hypothesize that this reduction in the α-keratins repertoire of C. abingdonii could be a consequence of the emergence of β-keratins, essential for the development of the turtle shell, an evolutionary novelty exclusive of this clade.
Several studies suggest that, concomitant with shell formation, the β-keratin gene family underwent repeated duplications in the turtle lineage from genes involved in the formation of the claws and scales in the ancestor of Sauropsida 78,79 . Using manual annotation, we found 30 β-keratins in CheloAbing 1.0, 26 of which seem to be functional. These results suggest that the number of these genes in C. abingdonii is comparable to that of birds 77 , but differ from recent studies that have identified 89 β-keratins in C. p. bellii and 74 in P. sinensis, respectively 78 . However, using our supervised approach, we found 70 apparently functional β-keratins in C. p. bellii and 57 in P. sinensis. Finally, and also in relation to genes implicated in shell formation, it is remarkable that the list of genes under positive selection in C. abingdonii features KDSR, whose mutations cause keratinisation disorders 80 .

Growth and size
Giant tortoises are classified in the upper end of the size scale for extant Chelonii 81 . However, the genetic determinants of body size and growth in these turtles remain unexplored. In this context, we analysed different genes involved in the growth process in order to better understand the mechanism underlying the gigantism of these animals and provide baseline data for future functional studies.
We first examined in CheloAbing 1.0 and in the A. gigantea genomes the GHR gene, coding for the growth-hormone receptor, as well as all the components of the insulin-like growth factor (IGF) system, including IGF1/2, IGF1R, IGF2R, INS, INSR, INSRR, IGFBPs, and PAPP-A. IGFs and insulin play important roles in the regulation of multiple processes, like metabolism and growth, and disruptions in the GH/IGF pathway cause growth disorders in humans 82,83 . Furthermore, studies carried out with domestic dogs (C. lupus familiaris) have suggested that genetic variants in IGF1, IGF1R and GHR genes are related to differences in breed size 84 . Our annotation revealed a high degree of conservation between all species considered, illustrating the basic functions of these proteins. None of the changes in IGF1R (p.R204H) or GHR (p.P177L; p.E191K) that are associated with small size 84 were found in C. abingdonii, A. gigantea or G. agassizii. In addition to the IGF system, other genes have been recently connected to the regulation of animal size. Therefore, we also annotated stanniocalcins (STC1 and STC2), widespread glycoproteins that seem to be involved in growth through the interaction with PAPP-A, a protease essential in IGF signalling 85 . We found a high degree of STC1 and STC2 conservation in C. abingdonii and A. gigantea, suggesting that these proteins are functional in giant tortoises.
The gigantism in tortoises is probably caused by a combination of different genetic factors, as body size is regulated at multiple levels. Our finding that most known factors involved in this trait are perfectly conserved in C. abingdonii and A. gigantea suggests that other evolutionary pathways may have contributed to this gigantism in these tortoises (Fig. S6), which was likely a preadaptation that contributed to the successful colonization of remote oceanic islands 86 .

Sensorial perception
Due to the extraordinary diversity of turtles, the study of sensorial perception in these animals including giant tortoises could reveal some interesting insights about their lifestyle. To study these traits, we analysed genes associated with the senses of sight, taste and smell.

Sight
For most Craniata, photosensitivity is critical for the control of behavioural response. In vertebrates, vision is mediated via cones that are functional under bright-light conditions and rods that detect low levels of light 87 . The photopigments that reside in the retinal photoreceptors are responsible for the detection of light and receive the name of opsins. Five classes of these visual pigments have been identified in Craniata: rod opsin RH1 (RHO), and cone opsins RH2, LWS (OPN1LW), SWS1 (OPN1SW) and SWS2 (OPN1SW2) 88 . We identified all five opsin genes in the genomes of G. agassizii, C. abingdonii and A. gigantea, providing the potential for tetrachromacy in these tortoises (Fig. S7). Figure S7. Divergence tree showing the repertoire of opsin genes along different clades.

Taste
Taste perception is a key ability for the survival of animals and is highly specialized to sense nutritious or harmful compounds in potential food sources. This sense is triggered by the physical interactions between tastant molecules from the external environment and taste receptors 89 . Vertebrates can typically detect five taste qualities: sweet, umami, bitter, sour and salty. The candidate receptors for sour and salty tastes are transient receptor potential ion channel PKD2L1 and the sodium channel ENaC (constituted by subunits SCNN1A, SCNN1B and SCNN1G), respectively. Umami and sweet tastes are sensed by the TAS1R family of G proteincoupled receptors (GPCRs), with the TAS1R1-TAS1R3 heterodimer functioning as the umami receptor and the TAS1R2-TAS1R3 heterodimer being the sweet receptor. Bitter tastants are detected by the TAS2R family of GPCRs 90 . We identified all the genes encoding the receptors responsible for sour and salty tastes (PKD2L1, SCNN1A, SCNN1B and SCNN1G) in the genomes of C. abingdonii, A. gigantea and G. agassizii. However, in the TAS1R family we found that only TAS1R2 and TAS1R3 are apparently functional. The other member of this gene family, TAS1R1, is truncated by a premature stop codon (p.T436*, found in C. abingdonii and A. gigantea; and p.W297* in G. agassizii). Finally, regarding bitterness receptors, our results showed that, as in birds, tortoises seemed to have a lower number of members of this gene family (8 in CheloAbing 1.0, 9 annotated in desert tortoises) than mammals (36 members in humans and 42 in mice) 91 . Altogether these results suggest that while sour, salty and sweet tastes could be perfectly perceived by C. abingdonii, A. gigantea and G. agassizii, these species might have lost the ability of tasting umami. In addition, the reduced number of TAS2R receptors suggests a lesser perception of the bitter taste (Fig. S8). Figure S8. Divergence tree showing the status of taste receptors in different clades. As representatives for each clade we chose A. carolinensis (lizards), G. gallus (birds), P. sinensis (turtles) and C. abingdonii, A. gigantea, and G. agassizii (tortoises). The number of TAS2R family members that are apparently functional in these organisms is shown in parentheses 92 .

Smell
Olfaction is also essential for survival in mammals. This important physiological function is controlled by a large multigene family of olfactory receptors (ORs) that recognize different odour molecules from the environment 93 . These receptors are known to be the largest multigene family in vertebrates. However, the number of OR genes differs largely among different species, and include several pseudogenes in addition to the functional copies 94 . In this context, we identified 622 apparently functional genes of this family in CheloAbing 1.0. This number is higher than the ORs annotated in Chelonia mydas (green sea turtle), in which a total of 254 intact receptors have been found 6 .

Immunity
The identification and analysis of specific genes related to immune response could provide new insights about the development and evolution of the immune system in different organisms. The dynamics of host-pathogen interactions promotes a strong selective pressure during evolution, increasing the diversity of the immune system among different species 95 . Even though immunity has been extensively studied in mammals and birds, little is known about its specific components and functionality in non-avian reptiles, including turtles 96 . As the sister group of archosaurs 97 , chelonians fill a critical gap in the evolution of immunity in amniotes 98 . Therefore, and due to the major roles that the immune system plays in multiple biological processes such as tumour protection and ageing, we exhaustively studied a group of 892 immunity-related genes in the genome of C. abingdonii (Table S9), in addition to the MHC gene family, which due to its importance and complexity was analysed separately. Finally, variants selected for their potential functional associations were subsequently investigated in the genome of A. gigantea. Table S9. Status of immune system-related genes analysed in C. abingdonii.

Balance between innate and acquired immunity
Although several immune mediators can have dual functions in innate and adaptive immune responses, it is thought that the innate branch of the immune system in vertebrates evolved earlier than the adaptive route 96 . All multicellular organisms have some form of innate immune response, which acts as an initial step in the defence against pathogens 96 . Among vertebrates, Reptilia are the only ectothermic amniotes, and therefore the study of their immune system could provide new important insights into its evolution under different circumstances 96 .
Previous genomic studies in Testudines suggested that the adaptive immune system in these animals is less robust than in mammals 5 . In this regard, we identified some features in the genomes of C. abingdonii and A. gigantea that support the enhanced role of innate immune defence in these organisms. Specifically, we found some putatively deleterious changes in genes involved in B-lymphocyte maturation. As an illustrative example, MEP1A  a metalloprotease involved in the inactivation of interleukin-6 (IL6)  shows several premature stop codons (p.Q320*, p.I535*, and p.I538*) due to the presence of two separate frame-shift mutations both detected only in the Galapagos giant tortoises (as assessed by Sanger sequencing validation, Table S7). Consistent with the role of IL6 in the differentiation of B cells 99 , disruptions in MEP1A have been associated with altered homeostasis of monocytes and natural killer (NK) cells in mice 100 .
On the other hand, the immune system seems to have been strengthened at different stages in Galapagos and Aldabra giant tortoises. Thus, we have identified an expansion of the PRF1 gene in C. abingdonii, with 12 copies (Table S21), although three of them are pseudogenized through the generation of premature stop codons. Similarly, A. gigantea genome exhibits an expansion of PRF1 and contains at least 12 functional copies of this gene. The existence of all these PRF1 genes in both species of giant tortoises was validated by Sanger sequencing (Table  S7). PRF1 encodes a perforin protein, which is a component of the cytotoxic T lymphocyte (CTL) and NK cell secretory granules. This protein plays a key role in granule-dependent cell death and protection against virus-infected or neoplastic cells 101 . Other Testudines also present several copies of this gene, whereas other reptiles present two copies and birds only one (Fig. S9). Figure S9. Divergence tree showing PRF1 copy number in different species and the expansion of this gene in C. abingdonii, A. gigantea and other organisms. Icons represent PRF1 copy number in each species. ΨPRF1 represents a copy of PRF1 with premature stop codons. Green lines indicate that manual annotation has been instrumental in characterizing these duplications. The phylogenetic tree was generated using the Common Tree Taxonomy tool from NCBI 1 .
We also found another family expansion involving the granzyme (GZM) serine proteases (Fig.  S10). This well-conserved family of proteolytic enzymes is expressed in 3 clusters, the chymase, the met-ase and the GZM A/K loci, the latter being conserved among Craniata 102 . The chymase locus, encoding GZMH, GZMB, GZMG, CMA1 and CTSG, is greatly expanded, with 4 copies of GZMH, 5 of GZMB, 7 of CTSG and 2 of GZMH, but only one member of CMA1 (Table S21).
These duplications detected in the genome of C. abingdonii are also present in all the other Galapagos giant tortoises tested and in A. gigantea, as assessed by Sanger sequencing of these genomic regions (Table S7). We detected a similar amplification in the genome of G. agassizii. Although some of these copies are pseudogenes, this copy-number variation evidences the importance of effector immunological pathways in tortoises, as previously reported in other organisms such as M. musculus, where a similar expansion was linked to an enhanced immune response 103 . Similar to PRF1, these serine proteases are key components of the CTL and NK cell secretory granules, playing important roles in defence against both pathogens and cancer 102 . Figure S10. Granzyme clusters in different species and their expansion in C. abingdonii and other organisms. Blue boxes display the whole gene (both exons and introns), while the vertical black lines (when feasible) mark the exons of these genes. The cluster to which each group of contigs belongs, based on the flanking genes or the overall presence of granzymes, is indicated by a letter preceding the box of the cluster. The expansions are showed by the extension number following the underscore. Pseudogenized copies are showed by adding 'ps' or an asterisk at the end of the name. Similar to perforins and granzymes, the APOBEC1 gene shows a consistent increase in copy number in different turtles, with three copies in P. sinensis, five in G. agassizii, six in C. p. bellii and seven in both C. abingdonii and A. gigantea (Fig. S11; Table S21). The antiviral function of this family in humans is reduced to the APOBEC3 genes, with APOBEC1 expression restricted to small intestine where it acts as an RNA editor 104 . In other organisms, such as Rattus norvegicus or M. musculus, APOBEC1 expression is much higher and its antiviral activity has been demonstrated 105,106 . In this context, the identified amplification could participate in the enhancement of innate immunity. Another remarkable gene expansion involves the cathelicidin family of antimicrobial peptides (CAMPs). These proteins play important roles in host immune defence against microbial infections, and as anti-inflammatory and immunomodulatory factors 107 . We identified twelve putative copies in C. abingdonii and A. gigantea, and at least one pseudogene with premature stop codons (Fig. S12). Only two copies show aligned RNA-Seq data ( Table S7), suggesting that the expression of the remaining copies could be conditioned to stress situations 108 . Interestingly, the production of these peptides appears to be associated with healthy ageing, with comparable levels between healthy elderly and young individuals and lower levels in elder adults with recurrent infections 109 . The expansion of this family may therefore be related to prevention of infections and be an important factor in the longevity of giant tortoises. Figure S12. Divergence tree showing CAMP copy number in different species and the expansion of this gene in C. abingdonii, A. gigantea and other organisms. Icons represent CAMP copies in each species. ΨCAMP represents a copy of CAMP with premature stop codons. Species emphasized in green indicate that manual annotation has been instrumental in characterizing these duplications. Crosses represent the absence of these peptides in two different species of fishes and the arrows show the type of feeding of each turtle analysed. The phylogenetic tree was generated using the Common Tree Taxonomy tool from NCBI 1 .
In addition, Testudines present an expansion, validated by Sanger sequencing, of the CHIA gene (Table S7; Table S21), which encodes a chitinase involved in the degradation of fungal cell walls, among other functions 110 . We found 3 copies of this gene in C. abingdonii, although one of them seems to be pseudogenized. Furthermore, we found in C. abingdonii and A. gigantea a unique set of toll-like receptors (TLRs), which are key elements of the innate immune system. Compared with C. p. bellii, these giant tortoises and G. agassizii seem to have lost several TLRs (TLR6, TLR9, TLR15 and TLR21). On the other hand, we found expansions in giant tortoises, validated by Sanger sequencing, of TLR2, TLR5, TLR8 and TLR13 ( Fig. S13; Table S21). Out of these, only TLR2 and TLR13 are also expanded in the genome of G. agassizii. Each one of these receptors recognizes different specific elements of bacteria and viruses: TLR2 binds to lipoproteins and other bacterial cell wall components, TLR5 recognises bacterial flagellin, and TLR8 and TLR13 detect different types of bacterial and viral RNAs [111][112][113] . These changes might be linked to adaptations to specific pathogenic challenges in the habitat of giant tortoises.
Notably, we documented the absence of most of the interferon (IFN) and interferon-related genes in Sauropsida, including C. abingdonii (Table S9). However, some genes described as conserved in reptiles, such as IFTM5 and IFTM10, are present 114,115 . Finally, we found that defensins, small proteins that participate in basic defence against bacterial, fungal or viral infections, were expectedly missing in the C. abingdonii genome assembly. Their avian counterparts, gallinacins 116 , were annotated using the available sequences from C. mydas, C. p. bellii and P. sinensis. Figure S13. Comparative analysis of TLR gene set in C. abingdonii, A. gigantea and different vertebrates. Boxes marked with an asterisk refer to pseudogenes. Green boxes indicate the presence of at least one functional copy, while red boxes refer to absences. If there is more than one functional copy, the number is indicated. Data from other species was obtained from multiple studies 5,117-119 .

Immune regulators of inflammation
Parasitic microorganisms depend on another organism for survival. Thus, they have developed strategies to invade, multiply and propagate within their host while avoiding its defences. Likewise, hosts have acquired different mechanisms to detect and respond rapidly to diverse groups of pathogens, including the activation of an innate immune response and apoptosis. However, this response must be tightly regulated, as an excessive reaction could lead to sepsis, a deleterious systemic hyperinflammation.
In this context, we analysed several members of NLRP family, which play a key role in the inflammasome, an innate immune system protein complex that recognizes infectious pathogens and regulates the activation of inflammatory caspases 120 . We found 26 putative NLRP genes in C. abingdonii, although at least one of them seems to be pseudogenized (Table S21). While there is a similar number in G. agassizii and in C. p. bellii (28 genes), there is an amplification compared with P. sinensis (18 genes) and H. sapiens (14 genes). Although its importance in mammals has been widely described, its role in other vertebrates remains uncharacterized 120,121 .
We also examined CASP12, a modulator of the activity of inflammatory caspases which is highly conserved throughout evolution and is associated with higher sensitivity to infection and sepsis in humans 122 . Notably, most human populations and Balaena mysticetus show different loss-of-function variants 23 . We found that CASP12 was perfectly functional in all Testudines, without changes in essential residues.
Overall, these results suggest that the innate immune system of C. abingdonii, A. gigantea, and other turtles has been modulated to play a predominant role in the immune response, with genomic expansions affecting components involved mainly in this process.

Major Histocompatibility Complex
The major histocompatibility complex (MHC) is an essential component in both innate and adaptive immune systems which can be functionally divided into class I and II. MHC class I is present in almost all cells, except enucleated cells of the blood myeloid lineage, and presents an extreme variability in populations, conferring individual specificity while providing a mechanism for the immune system to recognize host cells. MHC class II genes are only expressed in antigenpresenting cells, helping them in the recognition of foreign antigens and recruiting immune response components. These protein complexes are encoded by numerous genes that are classified, according to their classical or non-classical function, in 6 MHC-I gene families (A, B, C, E, F and G) and at least 12 MHC-II genes in the human genome 123 . MHC genes are located in a well-conserved cluster, with MHC class-II-related genes grouped closer to the centromere and MHC class-I-related genes closer to the telomere. There is an additional MHC class, known as MHC-III, that is defined by its genomic location between MHC-I and MHC-II 124 . This complex comprises several genes with heterogeneous functions, some of them implicated in the complement cascade or the inflammatory response.

MHC class I and II genes
Due to the difficulty to find reliable sequences of MHC genes in reptiles, we decided to first manually annotate them in P. sinensis, C. p. bellii and G. agassizii, and then perform a comparison with the sequences present in C. abingdonii. The available data from Aldabra tortoise did not meet the requirements to include this organism in our comparison. Our annotation showed the presence of 10 MHC class-I-related and 8 MHC class-II-related genes apparently functional in P. sinensis, in contrast to the 8 and 10 respectively found in NCBI databases. In addition, we found 20 MHC class-I-related and 12 MHC class-II-related genes in C. p. bellii and 17 MHC class-I-related and 19 MHC class-II-related genes in G. agassizii, showing an expansion of the family in these turtles. C. abingdonii presents 20 MHC class-I-related and 15 MHC class-II-related genes, exhibiting a similar MHC expansion as C. p. bellii (Fig. S14). We also annotated CD1D sequences, which, while phylogenetically closer to MHC class I, are similar to class II proteins in terms of protein functions and distribution 125 . The extreme variability between closely related species and within individuals in a species makes these results difficult to interpret. Thus, we were not able to precisely identify the human orthologs of MHC genes found in C. abingdonii, a difficulty experienced by MHC analysis in other reptiles. Studies about the MHC complex in other turtles are focused on allele variability in populations or individuals, without describing any defined gene 126 . This makes the study of MHC variants in C. abingdonii difficult. Nevertheless, our results suggest that the observed expansion of the MHC cluster in turtles could have been caused by a duplication event in a common ancestor of C. p. belli, G. agassizii and C. abingdonii, a conclusion supported by the fact that the ratios between class I and II genes have barely changed between these three lineages (Fig. S15). Further investigation of MHC gene patterns and levels of variability could be useful to understand the demographic history of different species of Galapagos giant tortoises and also to better inform conservation management strategies, given the known link between MHC genes with the species abilities to face off environmental changes 127 . This is particularly important as we know that different species have experienced bottlenecks of different size and duration since the colonization of the Galapagos Islands 128 . Figure S15. Hypothesis for MHC cluster duplication in a common ancestor of C. p. bellii, G. agassizii and C. abingdonii. Ratios between class I and II MHC genes appear to be unchanged, supporting the idea of cluster duplication.

MHC class III genes
As mentioned above, the MHC class III region includes a heterogeneous group of genes, some of which play a role in the inflammatory response. Due to this heterogeneity, its definition is uniquely based on their location between MHC class I and MHC class II regions in eutherian mammals 129,130 . In other species, like G. gallus, MHC III genes are more variable in their location 131 . To determine how these genes have evolved in turtles, we manually annotated in the genome of C. abingdonii the 60 MHC class III genes present in humans and found the presence of 40 of them in this giant tortoise (Table S10). These genes are also clustered in this species, which supports the conservation of this cluster through evolution (Fig. S16).  Figure S16. MHC III genes present in C. abingdonii, X. laevis, M. musculus and H. sapiens. C. abingdonii presents 40 MHC III genes, a reduced number compared to the 60 genes present in humans but expanded in relation to the 21 genes in chicken. In this tortoise, these genes appear to be grouped in clusters, similar to some the other species analysed -H. sapiens, M. musculus and X. laevis-.

Metabolism and Diet
Metabolism is essential for life and impacts every aspect of biology. In fact, we are beginning to understand the key roles played by metabolites in both physiological and pathological processes, including cancer and age-associated diseases 132,133 . In this regard, genomic studies of different organisms could broaden our knowledge about how diet and lifestyle influence metabolism. To this end, we manually annotated a set of more than 140 key components of metabolic pathways in the C. abingdonii's genome ( Table S11). As before, all variants of special interest in these metabolic genes were checked in the genome of the Aldabra tortoise. Table S11. Status of the metabolism gene set analysed in C. abingdonii.

Glucose metabolism
Glucose is one of the most important carbohydrates due to its pivotal position as a source of energy, as well as a precursor of vitamins and different polymers essential for cells. Among the multiple enzymes involved in these metabolic pathways, glyceraldehyde-3-phosphate dehydrogenase (GAPDH) is a key component of the glycolysis pathway. We found three copies of this gene in CheloAbing 1.0, one of them being a retrotranscript (Fig. S17). The presence of the two apparently functional GAPDH copies with a 95% of identity between them  was confirmed by RNA-Seq and Sanger sequencing in all analysed tortoises (including the other Galapagos tortoises, the Aldabra tortoise and their continental outgroups) ( Table S7; Table S21). We also evaluated their selection status by comparing several selection models, as explained in section 1.2, with no evidence of a positive selection process in any of them, as suggested by its dN/dS value. On the other hand, the analysis of the retrotranscript suggested that it is not present in any of the other tortoise species tested. Although the putative physiological effect of GAPDH duplication has not been described to date, based on its metabolic role it may promote a faster consumption of glucose. Similarly, the mitochondrial metalloprotease neurolysin (NLN) is truncated due to a premature stop codon (W50*, present in all tested species of Galapagos giant tortoises and their continental outgroups, but not present in Aldabra tortoise). This alteration was validated through PCR amplification and Sanger sequencing ( Table S7). The genome of G. agassizii shows a different premature stop codon, which suggests a case of convergent evolution. Interestingly, NLN knock-out mice have greater glucose tolerance, insulin sensitivity and an upregulation of several mRNAs involved in hepatic gluconeogenesis pathways 134 . C. abingdonii also displayed a duplication of CTRB1, a pancreatic serine protease involved in digestion (Table S21). This gene is also duplicated in A. gigantea, while it is not duplicated in other Sauria (Table S7). Along with the benefits to digestion itself, this could also acts as an extra defence against glucose intolerance, since mutations in this gene are related to this trait 135 .
It is well known that glucose levels have to be carefully maintained in the organism in order to keep homeostasis, which is mainly achieved by the action of various hormones 136 . We found in the C. abingdonii genome that the gene encoding glycogen synthase kinase 3 alpha (GSK3A) presents a missense variation that affects a conserved residue (p.R272Q) in the activation loop of its protein. This variation was also validated by Sanger sequencing, and found in all Galapagos giant tortoises tested, the Aldabra tortoise, their continental outgroups, G. agassizii and C. p. bellii (Table S7). GSK3A acts as a negative regulator in the hormonal control of glucose homeostasis, Wnt signalling and regulation of transcription factors and microtubules 137 . Thus, the observed variation may alter its function, causing differences in the regulation of glucose levels in blood. Moreover, the loss of function of ITGA1 (see section 7.6) has been correlated with severe hepatic insulin resistance in a null mouse model when fed a high-fat diet 138 . Thus, we found evidence for selective pressure acting on the regulation of glucose metabolism in giant tortoises and related turtles. Our results point to a putative deregulation of glucose levels in the blood of these animals, which might be compensated with a greater tolerance to and faster consumption of glucose.
Furthermore, the pro-inflammatory cytokine MIF (macrophage migration inhibitory factor) presents a p.N111C variant in C. abingdonii, A. gigantea, as well as in the other species of Galapagos giant tortoises, their outgroups and G. agassizii, as confirmed through Sanger sequencing (Table S7). Surprisingly, previous functional analyses of this variant have shown that it leads to the inhibition of intracellular signalling cascades 139 . MIF deficiency has been shown to hinder the development of insulin resistance, glucose intolerance and associated atherosclerotic disease 140 .
Altogether, these results suggest that tortoises present an altered glucose metabolism compared to other turtles (Fig. S18). Figure S18. Potential glucose metabolism alterations in giant tortoises. The putative alterations affecting glucose metabolism may be associated with four different effects. The extra copies of GAPDH may be linked to a faster consumption of glucose. NLN loss of function could point to a greater glucose tolerance in Galapagos tortoises and their outgroups, whereas the mutation in the activation loop of GSK3A may contribute to a deregulation of glucose levels in blood in these organisms. Finally, the mutation in MIF, present in all analysed tortoises, may be related to insulin resistance.

Diet
The source of food and its relationship with the environment that a species inhabits is a driver of evolution and has been shown to modulate, among others, body size through genetic changes that include copy-number variation and point mutations 141 . In this regard, vitamins are essential in the maintenance of systemic homeostasis and normally obtained from the diet, but in some species, they can be also synthesized by the organism itself. In this context, vitamin C is unique in that some vertebrates have lost the ability to synthesize it due to loss of the GULO gene 142 . Gulonolactone (L-) oxidase, the enzyme encoded by GULO, is indispensable for the final step in vitamin C biosynthesis, and the absence of this protein alone is sufficient to disrupt this process 143 . Primates, guinea pigs, bats, some birds and teleost fishes do not have a functional GULO gene 142 . However, GULO is present in C. abingdonii, A. gigantea and in other Testudines, thereby indicating the capacity to synthesize vitamin C in this clade. This feature may be an important factor for protecting cells and tissues against oxidative damage. In addition, we identified a functional copy of embigin (EMB) in the genome of C. abingdonii. None of the variants in this gene known to be associated with carnivorism 144 were present in C. abingdonii or in A. gigantea, consistent with the well-known fact that giant tortoises are herbivores. In this regard, and as discussed above (see section 3.1), we have found an expansion in cathelicidinrelated genes in C. abingdonii and A. gigantea (Fig. S12), which is also present in other species [145][146][147] , and could be related with an herbivore diet 148 .

Thyroid metabolism
Iodine is an essential nutrient due to its role as a component of thyroid hormones that must be obtained through the diet. Thyroid hormones are involved in a variety of aspects in human health, including relevant physiological processes, such as neurologic development or ageing 149 . To evaluate the genetic status of C. abingdonii's thyroid hormones, we studied the main genes involved in thyroid metabolism. Our results showed that the TSH receptor (TSHR), which plays a pivotal role in regulating thyroid gland physiology, contains a variant that affects a conserved residue (p.K621R) in its G-protein recognition intracellular loop. This variant was validated by Sanger sequencing and found in all the other Galapagos giant tortoise species tested, the Aldabra tortoise and their continental outgroups (Table S7). However, this variant is not present in the genome of G. agassizii. Mutations in this and nearby residues are associated with several problems in the TSH response along with T3 and T4 hormone synthesis defects 150 . Consistently, an association between reduced thyroid signalling and extended longevity has been reported, as hypothyroidism is overrepresented among supercentenarians 151,152 . Therefore, it is tempting to speculate that this TSHR variant could contribute to the extreme longevity found among these tortoises. However, the absence of this variant in the long-lived G. agassizii suggests that other mechanisms are involved in this trait.

Oxygen homeostasis
Globins play an essential role in oxygen binding and transport. Furthermore, these haemcontaining proteins are also involved in nitric oxide metabolism, detoxification of reactive oxygen species (ROS), and intracellular signaling 154 .
The family of globins has been extensively studied and constitutes a classical model to unravel the divergence and evolution of genes and proteins 155 . Five types of globins may be present in most jawed vertebrates (Gnathostomata): haemoglobin (Hb), myoglobin (Mb), cytoglobin (Cygb), neuroglobin (Ngb) and androglobin (Adgb) 154 . However, three additional globins with still poorly defined functions have been recently discovered. These proteins, globin E (GbE), globin Y (GbY), and globin X (GbX) have a more restricted taxonomic distribution and have been lost in many vertebrate taxa 156 . Ancestral Gnathostomata had the complete repertoire of globins with distinct functions, but nowadays only coelacanths and turtles show all eight different types of globins 155 (Fig. S19).
Using human, C. p. bellii, and P. sinensis sequences as references, we annotated the full set of globins in C. abingdonii and A. gigantea, including three copies of α-Hb and β-Hb genes. Multiple α-like and β-like globin genes have been recently derived from a common ancestor gene. In both types of globins, the observed gene amplifications seem to be lineage-specific, and therefore there is not a clear orthology relationship among them.
Recently, it has been discovered that GbX, which was first designated as a neuronal globin, is actually present in fish red blood cells, suggesting a role in oxygen binding and transport similar to Hb 157 . Hypoxia-resistant fishes and Testudines exhibit Hbs with faster nitrite reductase activity 158,159 , indicating that nitric oxide metabolites may contribute to the physiological response to anoxia/hypoxia periods 158 . It has been also reported that deoxygenated GbX is involved in nitrite reduction to nitric oxide, faster than human Hb. This suggests that GbX may participate in the anoxia/hypoxia response mechanism, releasing oxygen at very low oxygen pressures when Hb is already deoxygenated 157 . Accordingly, the presence of this globin in C. abingdonii and A. gigantea, as well as in other Testudines, may be one of the mechanisms underlying the exceptional tolerance to hypoxic conditions of these vertebrates 160 . Figure S19. Set of globins present in different vertebrates. Note that only turtles and the coelacanth have the complete repertoire of globins. The phylogenetic tree was generated using Common Tree Taxonomy tool from NCBI 1 .

Coagulation
We next analysed different genes involved in the coagulation pathway, as both coagulation and blood homeostasis can be greatly impacted by environmental changes and species adaptation to new habitats 23 . We found interesting variations in some members of the coagulation factor family, such as factor VII, factor X, and factor XI. These variants, common to all species of Galapagos giant tortoises tested, A. gigantea and G. agassizii, lead to putative F7 (p.F64L), F10 (p.E91K), and F11 (p.E315K) deficiencies (Table S7). This could result in altered functions of these proteases, since mutations in these residues are associated with pathologies such as factor VII, X, and XI deficiency 161,162 .
Additionally, KLKB1  encoding a serine protease that participates in the kinin-kallikrein system 163  was absent both in C. abingdonii and in P. sinensis. This loss could affect blood pressure and coagulation control mechanisms, through an increased serum level of vasoactive peptides 164 . Likewise, plasminogen (PLG) displays point variants in fibrin-binding sites (p.R134L, also present in A. gigantea and P. sinensis, and p.R136K, found in all studied turtles) that may alter the protein function, despite not being specific of these species. On the other hand, the serine protease PROC that inhibits the generation of plasmin, was absent only in C. abingdonii. The absence of this gene may contribute to compensate other alterations in the coagulation system in turtles.

Transcription factor TP53
The p53 transcription factor is a conserved DNA damage response protein with different roles in regulating cell cycle, apoptosis and energy metabolism, and plays a master role in tumour suppression 165 . As in all other Reptilia genomic studies so far 6,69 , we found one functional copy of this transcription factor in C. abingdonii and A. gigantea, which is likely essential for proper DNA maintenance and cancer protection. However, C. abingdonii and A. gigantea share with other turtles a modification at codon 106 (p.S106E) found also in root voles (Microtus oeconomus) (Fig. S20; Table S7). This variant was first described as an adaptation to hypoxia and cold, given that this vole species lives on the Tibet plateau, and it is also shared by four fish species (Barbus barbus, Tetraodon miurus, Platichthys flesus and Xiphophorus hellerii) and a squid species (Loligo forbesi), which are also hypoxia-tolerant. Collectively, these results suggest a convergent evolutionary process. Specifically, p.S106E hinders p53 from activating the apoptotic genes IGFBP3, APAF1, BAX, NOXA, PUMA and HDM2, and therefore suppresses cell apoptosis by p53 under stress conditions, without affecting cell-cycle genes 166 . In addition, giant tortoises present the ancestral p.R72P variant in p53, which has been previously associated with longevity and better cancer prognosis in human populations 167 . Figure S20. Partial amino acid alignment of the TP53 sequence in C. abingdonii, A. gigantea and other vertebrate species, including four turtles. Codon 106 changes from serine (S) to glutamic acid (E) are highlighted with red boxes.

Cell-stress response proteins
In addition to the involvement of globins and p53 in anoxia/hypoxia resistance, other cellular proteins, such as heat shock proteins (HSPs), play essential roles in maintaining the homeostasis because of their implication in cellular response processes. HSPs are a group of chaperones that act in response to different types of cellular stress, such as elevated temperature, heavy metals, proteasome inhibitors, and hypoxic/anoxic conditions 153,168 . HSP expression is regulated at the transcriptional level by heat shock factors (HSFs), being HSF1 the main mediator in stressful events. Previous studies carried out in Testudines showed that HSF1 and other HSPs play an important role in the anoxia response mechanism, in addition to other cellular processes such as antioxidant defences and regulation of apoptosis 153 . In our study, we identified homologs in C. abingdonii for different members of HSPA and HSPB families (Table S13). We also found the complete HSF repertoire, including HSF3, which has been pseudogenized in primates. The presence of the complete set of HSF genes in all Testudines studied so far suggests the existence in these animals of fine-tuning regulatory mechanisms, acting in the cellular responses to physiological and environmental insults. tested and in their corresponding continental outgroups analysed by Sanger sequencing (Table  S7 and Fig. S21). Previous studies have associated an elevated expression of NEIL1 with extended longevity in naked mole rat and humans compared to mice 170 . Furthermore, downregulation of this gene has been described in different types of cancer 171 . This result points to NEIL1 as a candidate gene for longevity extension and cancer protection through genomic stability control. Figure S21. Partial amino acid alignment of NEIL1 sequence in C. abingdonii, A. gigantea, and other vertebrate species, including four turtles. Copy-specific variants present in C. abingdonii and A. gigantea are emphasized with a red rectangle.
We found another duplication affecting RMI2 (Table S21), which plays an essential role in the processing of homologous recombination intermediates to limit DNA crossover formation in cells 172 . Both copies seem to be conserved and expressed in G. agassizii, C. abingdonii and A. gigantea, although the latter may have additional similar pseudogenes. Other Sauropsida such as C. p. bellii, P. sinensis, A. carolinensis, and G. gallus, only have one functional copy, although a second pseudogenized copy is present in some of these species.
In a preliminary attempt to explore the way these duplications in NEIL1 and RMI2 could modulate the repair response, we cloned the corresponding cDNAs on a pCDH vector and infected HEK-293T cells with them. Clones from each condition -pCDH, pCDH-NEIL1, pCDH-RMI2 and pCDH-NEIL1+pCDH-RMI2-were isolated and cells were exposed to sub-lethal doses of UV light stress (20 J/m 2 ) or H2O2 treatment (500 µM). After 24 or 48 h, we studied the activation of DNA damage response pathways by Western blot analysis, targeting the phosphorylation of the histone H2AX and the cleavage of poly (ADP-ribose) polymerase (PARP). We observed a reduction of both damage-associated forms in all different conditions (Fig. S22 and Figs. S32-S33). This suggests that overexpression of these factors may lead to an improved response to stress and a quicker repairing capacity. Therefore, duplications in NEIL1 and RMI2 could provide giant tortoises with an enhanced DNA-repair response, contributing to their expanded lifespan and protection against cancer. Similarly, for the samples from H2O2 treatment, the Western-blots shown were carried out with the same samples run in parallel in three identical blots (one for PARP and actin; a second blot for Flag (NEIL1 and RMI2); and a third blot for pH2AX).
In addition to these duplications, some potentially relevant point mutations were found in genes involved in DNA repair. For example, we found one variant affecting protein kinase ATM, which activates checkpoint signalling upon double strand breaks, apoptosis and genotoxic stresses, thereby acting as a DNA damage sensor. As discussed above (see section 2.3), one of the nine residues involved in ATM catalytic loop formation 62,173 is mutated in C. abingdonii (p.V2873I). This mutation was also found in all the analysed Galapagos giant tortoises and in fishes but was absent in Aldabra tortoise and in their continental relatives (as confirmed through Sanger sequencing; Table S7).
Other alterations were detected in genes involved in non-homologous end-joining (NHEJ), an important pathway that repairs double-strand breaks in DNA. In this context, XRCC6 (Ku70) and XRCC5 (Ku80), two ssDNA-dependent ATP-dependent helicases involved in this process 174,175 , present changes affecting sumoylation sites in C. abingdonii (p.K556R and p.K568N, respectively) 176 . The variant affecting XRCC6 is present in C. abingdonii, A. gigantea and other Galapagos giant tortoises used for Sanger sequencing validation. The variant affecting XRCC5 was also found in all the Galapagos giant tortoises species tested, Aldabra tortoise, and in other turtles, such as G. agassizii, C. p. belli, and P. sinensis. Interestingly, the naked mole rat, the longest-lived rodent, also presented a similar point mutation in XRCC6 (p.K556N) 177 (Fig. 3b, main text). Sumoylation is generally induced after DNA damage and plays a key role in DNA repair response and in multiple regulatory processes by enabling specific protein interactions 178 . Therefore, changes in this process may have an important effect in the repair system. Collectively, the results described in this section point to a proper or even enhanced stress response in C. abingdonii and A. gigantea with expansion of DNA damage response genes. Minor changes in some of them could suggest small differences in the regulation, interaction and activation of repair proteins, especially those involved in non-homologous end-joining. In addition, duplications in some DNA repair genes could enhance DNA integrity maintenance, contributing to the extended longevity and cancer resistance of C. abingdonii and related turtles. The analysis of globins and p53 showed specific adaptations to hypoxia shared with other Testudines. As C. abingdonii and A. gigantea did not experience the pressure of living in an anoxic or cold environment, these variants are probably relics of the environment where their ancestors thrived.
Other tumour suppressors duplicated in C. abingdonii and A. gigantea among other analysed Sauropsida species include PML, PTPN11 and P2RY8 (Table S21). Manual annotation was instrumental in characterizing these duplications. In some species, such as G. gallus and P. sinensis, the specific expansion of P2RY8 was not clearly detected, although there is an expansion in some of the known purinergic receptors. Similarly, the aforementioned premature stop codon found in ITGA1 (see section 7.6) removes the GFFKR motif which has been proposed to interact with RASSF5 185 , a recognized tumour suppressor involved in the RAS pathway. The putative loss of this interplay might impact on the activity of RASSF5 leading to a tumourprotecting effect.
Some oncogenic processes can be suppressed by a mechanism known as immunosurveillance 186 . The copy-number expansion of PRF1 in Testudines could participate in this process. The absence of perforin-dependent cytotoxicity seems to play a critical role during the earliest stages of epithelial carcinogenesis 187 , and these proteins act as tumour suppressors in B-cell malignancies 188 . Moreover, perforin-deficient mice are more susceptible to the development of spontaneous lymphomas 189 . Therefore, these reports suggest that the expansion of perforins could have a protective effect against cancer and ageing in Testudines.
We also found in C. abingdonii and A. gigantea two copies of PRDM1 (Table S21), a transcriptional repressor that binds specifically to the PRDI element in the promoter of the -IFN gene, driving the maturation of B-lymphocytes into Ig-secreting isotypes 190 . This copynumber gain was also found in C. p. bellii and G. agassizii, but not in other Testudines. In addition, PRDM1 is a tumour suppressor gene in NK cell malignancies 191 , suggesting a protective role against cancer and a robust innate immune system in these animals carrying an extra copy of PRDM1 5 . Altogether, the redundancy of tumour suppressor genes observed in Testudines, and particularly in giant tortoises, could contribute to explain their apparent natural resistance to cancer 192 (Fig. S25). This might provide an interesting example of convergent evolution, as other large and long-lived vertebrates also show genomic expansions affecting tumour suppressors 193 . Figure S25. Putative mechanism of cancer control by expansion of tumour suppressor genes.

Proto-oncogene evolution
Proto-oncogenes play important roles in multiple biological pathways. However, when activated, the resulting proteins may favour unregulated cell growth and cancer. Therefore, oncogene activation is a tightly regulated process resulting from the interplay of multiple factors. In this regard, we detected the duplication of several proto-oncogenes in C. abingdonii genome. Namely, we found two conserved copies of MYCN and SET (Table S21), also present in A. gigantea but not in other Testudines, which suggests that these changes are exclusive of the group including giant tortoises. Of note, the sequences of both SET copies are almost identical, and therefore we have not been able to independently validate this duplication. MYCN encodes a member of the MYC family, highly expressed in the fetal brain and critical for its normal development 194 . The newly identified MYCN-like gene (MYCNL) is present in all the species of Galapagos giant tortoises tested and in A. gigantea, but it is not present in other turtles such as P. sinensis, C. p. bellii or G. agassizii. SET encodes a PP2A inhibitor which regulates a variety of processes such as apoptosis, cell cycle and cell motility 195 , and is overexpressed in many human cancers. In addition, this protein belongs to the SET complex, which mediates oxidative stress responses induced by mitochondrial damage through the action of PRF1 and GZMA in CTL and NK-mediated cytotoxicity 196 . This extra copy of SET is also present in A. gigantea but not in G. agassizii, P. sinensis or C. p. bellii. Related to this, USP12 encoding a cysteine protease involved in the activation of c-MYC 197 was found to be duplicated in G. agassizii, C. abingdonii, other Galapagos giant tortoises, A. gigantea and their continental outgroups, as assessed through Sanger sequencing validation (Table S7; Table S21).
Another remarkable duplication in C. abingdonii affects unconventional myosins 198 . Thus, we have annotated a new myosin copy that resembles MYO5A and MYO5B. This additional copy is also present in A. gigantea and has been annotated in P. sinensis, C. p. bellii, C. mydas and G. agassizii. Based on sequence identity of this novel gene with both MYO5A and MYO5B (Table  S21), we have called it MYO5AB, even though the phylogenetic analysis does not support a direct relationship with those genes (Fig. S26). This new gene contains a missense mutation in its ATPbinding site (p.E114K), which is expected to ablate its ATPase activity 199 . This variant was also present in non-muscle myosins type II as MYH9, which is involved in diverse functions such as cytokinesis, cell motility, maintenance of cell shape and tumour suppression 200 . Thus, the presence of a non-functional copy of this protein might modulate the activity of the functional copy by sequestering substrates, suggesting a new regulatory mechanism affecting this family. Another gene duplication of potential interest due to its role in cancer involves RAS genes. RAS is a family of related proteins belonging to the small GTPase class, which are implicated in transmitting signals within cells 201 . We annotated a new gene, which we have called LRAS, in C.
abingdonii, A. gigantea, P. sinensis, C. p. bellii, G. agassizii and A. carolinensis. This gene shares a common ancestor with HRAS, NRAS and KRAS (Fig. S27). Figure S27. Topology of the BI and ML trees showing the phylogenetic relationships of the RAS family proteins in C. abingdonii, A. gigantea, and other vertebrate species. Numbers represent Bayesian posterior probabilities and bootstrap scores (above and below each branch, respectively). Text colour represents the type of animal to which each species belongs. Dash line (-) indicates no support in this analysis.
We also identified an additional copy of NRAS in C. abingdonii and A. gigantea, probably a pseudogene due to the presence of several premature stop codons in both species. Interestingly, LRAS displays a divergent C-terminus with an intact farnesylation signal. The accumulation of amino acid substitutions at this site could provide a modulatory effect of signal transduction mediated by RAS (Fig. S28). Dominant-negative mutations were described as mutations that act in opposition to the normal activity of the gene product when overexpressed 203 . The presence of a second copy of a gene, as described herein for genes of oncological relevance present in the genomes of giant tortoises, would open up additional opportunities in evolution to develop new functional properties 204 .

Modifications in the Hippo pathway
The Hippo pathway consists of a large network of proteins that play a pivotal role in organ size control during development and regeneration by promoting cell-cycle arrest and apoptosis, as well as in pathological states like cancer. The core of this pathway comprises a kinase cascade in which STK3/MST2 and STK4/MST1, in association with its regulatory protein SAV1, phosphorylate and activate LATS1/2, which in turn phosphorylates and inactivates YAP oncoprotein and WWTR1/TAZ. Phosphorylation of YAP by LATS2 inhibits its translocation into the nucleus, where it regulates important genes for cell proliferation, migration and death 205 (Fig. S29).
Multiple studies have shown that key junctional proteins can regulate YAP and TAZ by sequestering them at cell junctions, thus preventing nuclear entry and contact with their specific phosphatases 206 . Ecadherin, the best characterized and prototype member of the classical cadherins in epithelial cells, is a key component of adherens junctions which acts as a potent tumour suppressor. We found a missense variant (p.Y754H) in all analysed Figure S29. Modifications in Hippo pathway in C. abingdonii and other analysed tortoises. Results suggest a differential modulation of Hippo pathway, promoting cell cycle arrest and inhibiting anti-apoptotic signals. tortoise species and also in C. p. bellii, which was confirmed by Sanger sequencing (Table S7). This is an essential phosphorylated residue for the interaction of E-cadherin with its negative regulator CBLL1 207 , an E3 ligase that mediates ubiquitination of the E-cadherin complex. While the resulting histidine residue may also be phosphorylated, this reaction must be mediated by different enzymes. Therefore, we hypothesize that this missense variation could modify the regulation of the interaction of E-cadherin with CBLL1, perhaps promoting the stabilization of Ecadherin in the surface of cytoplasmic membrane. Due to the role of E-cadherin in preventing epithelial-mesenchymal transition and downregulating cell proliferation, the existence of this variant may suggest a new protective mechanism against tumorigenesis. Of note, the aforementioned functional duplication of NF2 (Table S21) may also affect the regulation of the Hippo pathway by binding and recruiting LATS to the plasma membrane, where it is activated by MST kinases 208 . Taken together, these results suggest a tendency favouring activation of the Hippo pathway, which in turn would lead to cell proliferation arrest, and anti-apoptotic pathway inhibition (Fig. S29).
In summary, our analysis shows a trend towards a lower tumorigenic potential in giant tortoises. Namely, several tumour suppressors have been genomically amplified, which might prevent cells from undergoing transformation. In addition, we have found several putative modulators of proto-oncogenes, suggesting novel regulatory mechanisms specific of this group of genes. While these results may be important in understanding the development of these giant organisms, they may also provide clues about the enhanced protection against tumour development suggested by Peto's paradox. Given the limitations of this analysis, further works are needed to confirm these hypotheses.

Genomic basis of longevity
The unprecedented advances over the last years in ageing research have led to the postulation of nine hallmarks that define ageing and therefore affect longevity 26 . Moreover, the development of next-generation sequencing techniques has led to the comprehensive study of the genome of different long-lived organisms such as the naked mole rat and the bowhead whale 23,177 . Some members of Testudines, such as C. abingdonii, all other Galapagos giant tortoises and A. gigantea are also long-lived. Estimates have placed Lonesome George at over 100 years. The maximum lifespan of the Aldabra giant tortoise ranges from estimates of 70 years 209 to 65-90 years 210 , with unverified higher estimates reported. Moreover, the Mojave Desert tortoise is estimated to live up to 56 years in the wild 211 . Accordingly, we exhaustively analysed a set of almost 495 genes involved in the molecular control of longevity and found relevant genetic modifications that could affect some of the well-established hallmarks of ageing (Fig. 2a, main text) (Table S15). Table S15. Status of ageing-related genes analysed in C. abingdonii.

Genomic instability
One characteristic feature associated with ageing is the accumulation of genetic damage as a result of continuous environmental and cellular stresses, along with the disruption of the function of the DNA repair machinery. In the context of proper DNA maintenance, and as mentioned above (section 5.5), we found in the genomes of C. abingdonii, A. gigantea, their continental outgroups, C. p. bellii and G. agassizii a duplication affecting NEIL1, a key protein involved in the base excision repair (BER) process (Table S7). A higher expression of NEIL1 was previously correlated with extended lifespan in several species 170 , pointing to this gene as one of the candidate putative factors involved in the exceptional longevity of giant tortoises.
Additionally, the aforementioned variants affecting ATM, XRCC5 and XRCC6, as well as the duplications of RMI2 and GAPDH (Table S7) could also contribute to improve the efficiency of DNA repair and therefore, the stability of the genome.

Telomere attrition
As mentioned above, cell repair mechanisms play an essential role in multiple cellular processes, including transcription, apoptosis and telomere maintenance. The accumulation of mutations due to the saturation of the DNA repair machinery with age is accompanied by telomere shortening 26 . Thus, cells need numerous strategies to counteract the natural tendency to telomere attrition. In this regard, the exonuclease DCLRE1B participates in the protection of telomeres against NHEJ-mediated repair and plays a key role in telomeric loop (T-loop) formation after being recruited by TERF2 212 . In C. abingdonii, A. gigantea, G. agassizii and all the other species of Galapagos giant tortoises and continental outgroups used in the Sanger sequencing validation (Table S7), this protein presented a modification in the heterodimer interface binding TERF2 (p.R498C) (Fig. 3b, main text). Furthermore, TERF2 had several alterations in the domain responsible for the binding to DCLRE1B (p.S161T, p.M164T, p.N176G and p.M180L), also shared by other turtles and birds. This variation in key binding residues could affect the interactions between these two proteins and therefore vary the T-loop formation. Moreover, the aforementioned variants present in DNA repair genes, such as ATM, XRCC5 and XRCC6, may also impinge on telomere dynamics in tortoises 213-215 .

Loss of proteostasis
Protein homeostasis (proteostasis) in cells usually deteriorates during ageing and correlates with the downregulation of genes responsible for the synthesis and recycling of proteins 26,216 . In this context, EEF1A1 which encodes an isoform of the alpha subunit of the elongation factor-1 complex  is responsible for the delivery of tRNAs to the ribosome. In C. abingdonii's genome, we identified three copies of this gene, one of which is a retrogene. Comparative analysis with data from A. gigantea and G. agassizii revealed in both of them the existence of two or more copies and no retrotranscript, which suggests a duplication mechanism initiated in a common ancestor, and the later incorporation of the reverse-transcribed copy to C. abingdonii's genome. To confirm these results, we validated all copies by Sanger sequencing (Table S7; Table S21), which led us to identify the two functional copies of EEF1A1 in all the studied species (including Galapagos giant tortoises, A. gigantea and their continental outgroups), whereas the retrotransposed gene seems to be exclusive to C. abingdonii. Interestingly, overexpression of EEF1A1 increases lifespan in Drosophila melanogaster 217 . These findings suggest that proteostasis maintenance may represent a pro-longevity mechanism in these animals.

Deregulated nutrient sensing
Different signalling pathways that recognize and respond to fluctuations in nutrient levels are commonly deregulated during ageing 132 . Over time, glucose dysregulation can result from alterations throughout metabolism control pathways, such as glycolysis or gluconeogenesis. In this regard, the aforementioned variant in giant tortoises affecting the activation loop of GSK3A (section 4.1) may be involved in the maintenance of glucose homeostasis in these organisms (C. abingdonii, the other Galapagos giant tortoises, A. gigantea, their continental outgroups, G. agassizii and C. p. bellii) ( Table S7). Interestingly, the inhibition of GSK3 can extend lifespan in D. melanogaster 218 , which highlights its critical role as a regulator of longevity across the evolution and, particularly, in tortoises. Likewise, the identified alterations in giant tortoises affecting other genes implicated in glucose metabolism may also represent pro-longevity determinants in these animals. In fact, the NLN inactivation in giant tortoises and their continental outgroups due to the presence of premature stop codons could provide a metabolic advantage for these animals, as mutant mice deficient in this enzyme present increased glucose tolerance and insulin sensitivity 134 .

Mitochondrial dysfunction
The correct function of mitochondria is critical for homeostasis maintenance, whilst these organelles are crucial in energy production through beta oxidation and Krebs cycle, and their malfunction or dysregulation leads to severe disorders 132,133,219 . In this context, two variants affecting ALDH2, which encodes a mitochondrial aldehyde dehydrogenase, were found in several analysed tortoises. Specifically, a variant affecting an NAD-binging site (p.Q366M) 220,221 is common to all the Galapagos giant tortoises, but not to their continental outgroup C. chilensis, nor the Aldabra tortoise or its outgroups. Another variant located in an homotetrameric interface site 220 (p.M487T) was found in all the tortoises studied (including both Galapagos and Aldabra tortoises, along with their outgroups) ( Table S7). This protein is implicated in alcohol metabolism and lipid peroxidation, among other detoxification processes 222 . The metabolic machinery responsible for alcoholic degradation arose in the evolution as an adaptation to the consumption of fermented fruits 223 , but it also participates in cellular homeostasis through the elimination of potential damaging molecules such as reactive aldehydes 222 . Therefore, these mutations could also alter the detoxification process and contribute to pro-longevity mechanisms. Likewise, the above described alterations in other genes such as NLN and GAPDH, which encode enzymes associated with mitochondrial functions 224,225 , may also impinge on this hallmark of ageing and contribute to extend the lifespan and healthspan of these animals.

Altered intercellular communication
Several changes in intercellular signalling take place during ageing, such as those involved in immunosurveillance, neurohormonal signalling and inflammatory reactions 26 . Accordingly, we specifically examined the genomes of C. abingdonii and A. gigantea for variants occurring in genes affecting intercellular communication of putative relevance for ageing. In this context, the integrin alpha-1 gene (ITGA1), a heterodimeric cell-surface protein involved in cell-matrix and cell-cell interactions, presents a premature stop codon in position p.R990* only in C. abingdonii. This premature stop codon is expected to remove the C-terminus of the protein, previously described as a critical structural element for the activation of alpha integrins 226 (Fig. S30). Figure S30. Domain structure of human and C. abingdonii ITGA1. The identified mutation affecting ITGA1 generates a stop codon (red bar) at the residue 990 in the protein, removing the C-terminal and a transmembrane region that participates in the dimerization process.

Inflammation and ageing
Among the multiple age-related alterations that affect intercellular communication, inflammation takes a prominent role resulting in age-associated dysfunctions. This situation is known as "inflammaging", and is caused by many factors, such as the accumulation of proinflammatory tissue damage, dysfunction of the immune system, or production of proinflammatory cytokines by senescent cells, among others 227 . In this context, the discussed variant in the pro-inflammatory cytokine MIF in giant tortoises an G. agassizii (see section 4.1) has been shown by mutagenesis studies to cause the formation of interchain disulfide bonds between the altered p.N111C and the p.C81 residue of an adjacent subunit, inhibiting intracellular signalling cascades 139 . Moreover, MIF deficiency tends to reduce chronic inflammation in white adipose tissue 140 and to expand lifespan, especially in response to caloric restriction 228 . In general, this result reinforces the hypothesis that giant tortoises could have decreased chronic inflammatory response, and reduced production of pro-inflammatory interleukins, thereby expanding their longevity.

IGF system and ageing
Insulin and insulin-like growth factors (IGFs) play important roles in the regulation of multiple processes, including metabolism and growth 229 . The IGF system is also involved in the ageing process, as suggested by the extended lifespan in different species correlated to an IGF signalling decrease 230,231 . Thus, the characterization of the IGF system is important to further understand its contribution to growth in giant tortoises and its possible implication in the extremely long lifespan of these animals. The interaction between IGF1/2 and their receptors (IGF1R and IGF2R) is a key step in this signalling mechanism 231 . We found that all of the key residues for this interaction are conserved between the selected species, with the exception of Asn-724 in IGF1R, which corresponds to an aspartic acid in C. abingdonii. Further comparative analysis revealed the presence of the same specific amino acid substitution in G. agassizii, Galapagos and Aldabra tortoises and also in the continental outgroups, which are also long-lived (Fig. 4a, main text) ( Table S7). The presence of this specific change suggests that the correct interaction between IGF1/2 and IGF1R may have changed in these tortoises, probably regulating the downstream signalling mechanism. To explore this possibility, we treated with IGF1 different cells expressing pCDH, pCDH-IGF1R WT and pCDH-IGF1R N724D plasmids, and observed a significant reduction in the phosphorylation levels of the receptor carrying the mutation (Fig. 4b, main text, Fig. S31). Consequently, this unique change in IGF1R provides an attractive target to study the cellular mechanisms underlying the exceptional long lifespan of these animals. Also in this regard, it is remarkable our finding in tortoises of a short deletion in the coding region of IGF2R which results in the loss of two amino acids in this growth factor receptor. Interestingly, an IGF2R SNP has been associated with human longevity 232 , which opens the possibility that the variant found in the studied tortoises in this additional component of the IGF system could somewhat contribute to increase the lifespan of these long-lived animals.

Conclusions
Following a hypothesis-driven strategy, we have carefully annotated a large set of genes related to multiple physiological processes, focusing specially on response to stress, ageing, and DNA repair. By using this manual approximation, we could identify several changes affecting relevant residues of pivotal proteins, which may had been overlooked by more automatic methods.
By comparing our annotations to those of other turtles, our results may shed light on several aspects of the evolution of these vertebrates. Thus, we have found genomic duplications affecting WNT4 and WNT11 in turtles which, given their known roles in development, might be associated with the growth of their shell. We have also pointed out several genomic amplifications in Testudines that suggest an enhanced innate immune system, in addition to a much richer perforin and granzyme system than mammals. We have also found several examples of adaptations to hypoxia, including variants in TP53 and expression of specific globins, shared by all Testudines analysed so far.
In addition, the comparison of these data with the genomic sequence of the Aldabra tortoise may provide important clues about the specific evolution of giant tortoises. In this regard, C. abingdonii and the Aldabra tortoise share several variants predicted to affect their neurological development. Surprisingly, most of these alterations are expected to lead to lower cognitive capabilities and even neurological diseases. However, these alterations may simply reflect a different regulation of these processes in these tortoises compared to other organisms. Also, the phylogenetic group containing giant tortoises seem to have undergone multiple changes affecting glucose metabolism. Several specific alterations suggest that both glucose consumption and glucose tolerance were increased in their lineage. In addition to the different caloric requirements of giant tortoises, these characteristics may also be related with changes affecting the insulin system, which is known to play a role in size regulation. This hypothesis is reinforced by our finding of a specific variant affecting an important residue in IGF1R in tortoises.
An important consequence of gigantism is a potential increase in the risk of cancer development. We have found a number of gene changes specific of giant tortoises which might contribute to explain the low prevalence of cancer in Chelonians, which is remarkably lower than in the other reptilian counterparts (2.7% in Chelonians vs. 15% in snakes) 192 . Indeed, when Chelonians were split into turtles and tortoises, the prevalence of neoplasia in the last group was 1.4%. Thus, changes affecting genes involved in DNA repair, like RMI2, XRCC5 and XRCC6, suggest a differential regulation of DNA integrity in this lineage. Moreover, several known tumour suppressor genes, like PRDM1 and PRF1, were expanded in a common ancestor to both giant tortoises. Notably, we have also found duplications affecting proto-oncogenes, but some of the novel copies show evidence for loss of function or dominant negative functions, which might actually point to a lower tumorigenic potential.
Finally, giant tortoises are also examples of extreme longevity. In this regard, we have found multiple specific variants that might be related to this trait, affecting ageing hallmarks like genomic instability (NEIL1), telomere attrition (DCLRE1B, TERF2), loss of proteostasis (EEF1A1), mitochondrial dysfunction (ALDH2), and intercellular signalling (IGF1R). The interplay of these factors with the immune and insulin systems may hold important clues about the ageing process in these organisms, improving our understanding of longevity by opening new lines of investigation. Figure S31. Raw data of Western-blots from Figure 4b, main text. Figure S32. Raw data of Western-blots from Figure S22, upper panel. Figure S33. Raw data of Western-blots from Figure S22, lower panel. Table S16. Summary of average characteristics of the automatic annotation set of C. abingdonii compared to sets of other species. This table is attached as SuppTables.xlsx (sheet A1).    Table S21. Expansion events with coverage and identity for nucleotide and amino acid sequences. Each row shows the comparison between the firstly identified copy and the rest of them, consecutively. This table is attached as SuppTables.xlsx (sheet A6).