Reciprocal genomic evolution in the ant–fungus agricultural symbiosis

The attine ant–fungus agricultural symbiosis evolved over tens of millions of years, producing complex societies with industrial-scale farming analogous to that of humans. Here we document reciprocal shifts in the genomes and transcriptomes of seven fungus-farming ant species and their fungal cultivars. We show that ant subsistence farming probably originated in the early Tertiary (55–60 MYA), followed by further transitions to the farming of fully domesticated cultivars and leaf-cutting, both arising earlier than previously estimated. Evolutionary modifications in the ants include unprecedented rates of genome-wide structural rearrangement, early loss of arginine biosynthesis and positive selection on chitinase pathways. Modifications of fungal cultivars include loss of a key ligninase domain, changes in chitin synthesis and a reduction in carbohydrate-degrading enzymes as the ants gradually transitioned to functional herbivory. In contrast to human farming, increasing dependence on a single cultivar lineage appears to have been essential to the origin of industrial-scale ant agriculture.

Correlation between GC (Guanine-Cytosine) content and sequencing depth in the five attine ant 30 genome assemblies. The X-axis represents GC content; the Y-axis represents average sequencing 31 depth. We used 10kb non-overlapping sliding windows and calculated GC content and average 32 depth across these windows.

Acep-F G Acol-F T Aech-F G Aech-F T Tcor-F T Tsep-F T Tzet-F T Ccos-F G
Supplementary Methods 266 267 Overview 268 269 Our sequencing efforts focused on five species of ants and six fungal cultivars: Atta colombica 270 (Acol), Acromyrmex echinatior (Aech), Trachymyrmex septentrionalis (Tsep), T. cornetzi (Tcor), 271 T. zeteki (Tzet), and Cyphomyrmex costatus (Ccos) (Supplementary Table 1). The genome of Ac. 272 echinatior has previously been published 1 and the same is true for the genomes of Atta 273 cephalotes (Acep) 2 and its cultivar 3 . Accession numbers for the data generated in this study are 274 given in Supplementary protocol enclosed in the kit with modifications. Ant tissues were manually disrupted in G2 lysis 314 buffer using a Teflon pestle. After addition of 1 % proteinase K (20 mg/ml) and 0.2 % RNAse A 315 (QIAGEN, 100 mg/ml), the samples were incubated for 2 -3 hours at 50 °C on a rotating wheel. 316 One sample volume of chloroform was added after which samples were incubated for 30 -60 317 min at 50 °C and centrifuged for 10 min at 5,000 g, transferred to the Genomic-tip, and 318 processed according to the protocol. 319 320 DNA was extracted from ca. 2 g fungal mycelium by grinding the mycelium in liquid nitrogen 321 with a mortar and pestle. 400 mg ground mycelium was mixed with 5 ml extraction buffer (2 % 322 CTAB, 1.4 M NaCl, 0.1 M Tris, 20 mM EDTA, 1 % Polyvinylpyrrolidone, pH 8), 50 µl 323 Proteinase K (20 mg/ml), 50 μl RNAse A (100 mg/ml) and 50 µl β-mercaptoethanol and 324 incubated at 65 °C for 3 hours. Samples were centrifuged for 10 minutes at 3000 g, and the 325 supernatant transferred to a clean tube. One sample volume of phenol:chloroform:isoamylalcohol 326 (25:24:1), pH 8 was added, after which the tubes were mixed thoroughly by inversion, incubated 327 at room temperature for 5 minutes, and centrifuged for 30 minutes at 3,000 g. The upper phase 328 was transferred to a clean tube and mixed with one sample volume of chloroform:isoamylalcohol 329 (24:1), mixed by inversion and centrifuged for 10 minutes at 3,000 g. The upper phase was again 330 transferred to a clean tube and mixed with 1/3 sample volume of 5 M NaCl and 2/3 sample 331 volume of isopropanol. After careful mixing by inversion, the tubes were centrifuged for 20 332 minutes at 3,000 g. The supernatant was discarded and the pellet washed with 70 % ethanol. 333 After a brief centrifugation the supernatant was discarded and the DNA pellet dried at room 334 temperature. The DNA was dissolved in TE buffer over night at 5 °C. The samples were finally 335 centrifuged for 10 minutes at 15,000 g to pellet any undissolved material, and the supernatant 336 was transferred to a clean tube. 337 338 Total RNA was extracted from ant tissue using the QIAGEN RNeasy Mini Kit with modifications. 339 Ant tissue was disrupted in 500 µl RLC buffer (with 1 % β-mercaptoethanol) in a Fastprep 340 machine at level 6 for 30 seconds, with 5 mm ceramic beads. One sample volume of 341 phenol:chloroform:isoamylalcohol (25:24:1), pH 8, was added, and the samples were vortexed 342 and then centrifuged for 30 minutes at 20,000 g. The upper phase was transferred to a clean tube 343 and one sample volume of chloroform:isoamylalcohol (24:1) was added, after which the samples 344 were vortexed and centrifuged for 15 minutes at 20.000 g. Before assembly, we performed filtering to exclude low quality raw reads that met any of the 387 following criteria: 1) ≥ 5% Ns or polyA; 2) ≥ 50 low-quality bases (Phred score ≤ 7); 3) adapter 388 contamination present; 4) paired reads overlapping each other with ≥ 10 bp (allowing 10 % 389 mismatch); 5) PCR duplicates (reads are considered duplicates if read1 and read2 of the same 390 paired end reads are identical). Low-quality ends were trimmed directly. For small insert size 391 libraries (200 bp, 500 bp and 800 bp), we also performed an error correction step using the 392 correction tool released with SOAPdenovo 4 . The statistics for raw and cleaned data are given in 393 Supplementary Table 3. We also used the cleaned data to estimate the genome sizes of the five 394 ants and the fungal cultivars of C. costatus and Ac. echinatior using k-mer depth distribution 395 analysis 4 ( Supplementary Figure 1 and Supplementary Figure 2 blasting the assemblies against the bacterial (for the fungal assembly) or bacterial and fungal (for 408 ant assemblies) NCBI nt databases. If a scaffold aligned with identity greater than 80% and total 409 alignment length longer than 50 % of the scaffold length, we considered it a contaminant 410 sequence and removed it from the assembly after manual confirmation. In the ant assemblies, 411 some bacterial contamination was found (Supplementary Table 6) but no significant fungal 412 contamination. In the C. costatus cultivar assembly, we found no contaminant sequences. 413 414 Overall statistics of the obtained assemblies after excluding contaminant sequence are given in 415 Supplementary Table 7. For the Ac. echinatior cultivar genome both contigs and scaffolds were 416 very short, presumably due to the heterozygosity problem specified above. As the obtained 417 assembly was too fragmented for overall analyses, we did not perform repeat or gene annotation 418 for the Ac. echinatior cultivar genome. 419 420 Compared to the other ant assemblies, T. cornetzi has a relatively large genome and its assembly 421 has relatively short scaffolds, due to a higher repeat content, as detailed below. GC content 422 distributions of the assemblies are very similar across the attine ants (Supplementary Figure 3).

423
Coverage was calculated by mapping the clean short reads back to the assemblies using 424 SOAPaligner 8 , which produced the sequencing depth distributions given in Supplementary 425 Figure 4 and Supplementary Figure 5. 426

427
Repeat annotation 428 429 Following assembly, repeat content of the ant and fungal cultivar assemblies was annotated using 430 a combination of several programs. First, tandem repeats were identified using Tandem  Markov model was estimated from the 500 training gene sets used in the de novo annotation by 500 two awk scripts which are included with Geneid gene annotation tools (v1.3) 21 . Second, for the 501 exon sequences we estimated the transition probability distribution of each nucleotide given the 502 pentanucleotide preceding it for each of the three possible frames, and an initial probability 503 matrix from the pentamers observed at each codon position using the awk script 504 MarkovMatrices.awk. Third, for the intron sequences a single transition matrix was computed as 505 well as a single initial probabilty matrix using the awk script MarkovMatrices-noframe.awk. 506 Fourth, the coding potential of each reading frame in the inferred transcript was computed based 507 on the Markov model. Transcripts with complete ORFs were picked out and the redundant 508 isoforms were removed by keeping the longest ORF for each locus. Then these ORFs were 509 integrated with the GLEAN annotation to replace the incomplete GLEAN gene models. 510 511 We also performed several filtering steps to refine the gene sets: 1) Removing transposon-related 512 genes based on functional annotation. 2) Filtering out single-exon genes with length < 400 bp 513 and no functional annotation (see below) and support from either D. melanogaster or Ac. 514 echinatior homology predictions. 3) Replacing fragmentary gene models with the original ORF 515 based on Cufflinks transcripts when they overlapped with two or more genes in the integrated 516 gene set. 4) Replacing fragmentary gene models with homology-based gene models when they 517 overlapped with two or more genes in the integrated gene set. 5) Removing species-specific 518 genes (based on gene family clustering, see below) that overlapped for > 80 % of repeats or had 519 no functional annotation or homology support. 6) Adjusting the gene set of T. cornetzi that 520 remained much larger than the other gene sets, because it contained many high-copy-number (≥ 521 20) genes without functional annotation and RNA-seq read support, by discarding these elements 522 as being transposon-related. 523 524 The statistics of the final gene sets of the five ants are given in Supplementary Table 13. 525 Distributions of some general features (CDS length, intron length, etc.) of the final gene sets are 526 shown in Supplementary Figure 7. 527 528 Functional annotation of ant protein-coding genes 529 530 Gene functions were predicted based on the best match of the alignments to the SwissProt 531 database 22 using BLASTP (E-value cutoff 1e-5 Fungal assemblies and annotation 551 552 The raw Ilumina reads were filtered to exclude low quality reads with the following criteria for 553 raw reads: 1) remove reads with ≥ 10% of Ns; 2) remove reads with ≥ 40 low-quality bases 554 (Phred score <=7); 3) remove reads with adapter contamination. Also, to avoid GC bias during 555 read sequencing, we trimmed the first 10 bp of each read. Overall statistics of raw and cleaned 556 data are given in Supplementary  Table  588 16) than other ants (6-13 Mb), SDs only make up a small portion of the whole genome (4.42 %), 589 suggesting that the large genome of T. cornetzi is mainly due to an abundance of relatively short 590 repeat sequences, as described above (see Supplementary Table 18). 591 592 Ant gene family clustering 593 594 To gain insight into the evolution of gene families of attine ants, we clustered the genes of the 595 seven attine ants and five other sequenced ant species (S. invicta, P. barbatus, Ca. floridanus, L. 596 humile and H. saltator) as well as three outgroup insects (Ap. mellifera, D. melanogaster, 597 Nasonia vitripennis) into gene families using OrthoMCL v2.0.9 35 . To identify homologous 598 relationships among sequences, they were first aligned using BLASTp with an e-value cutoff of 599 1e-5 and an alignment length cutoff of 50 % of the gene length. The genes were then clustered 600 using MCL 36 with the inflation parameter (-I) set at 1.5 based on the BLAST results 601 (Supplementary Table 19). 2795 families were single-copy in all species and were used for 602 phylogenetic inference (see below). 603 604 One-to-one ortholog assignment 605 606 We used reciprocal best BLAST hits to identify one-to-one orthologs between different ants. 607 First, protein sequences of the seven attine ant species and two outgroup ant species (S. invicta 608 and P. barbatus) were used to perform all against all BLASTP, with an E-value cut-off of 1e-5. 609 Pairwise bi-directional best hits were considered orthologous pairs. Next we iteratively chose a 610 reference species and, for each reference gene, we put paired orthologous genes from other 611 species together to create an ortholog group. If an ortholog was absent in a given species, we 612 considered the gene locus missing for that species. Finally, we merged the ortholog groups from 613 each reference and removed redundant groups, while keeping the orthologous groups that are 614 present in all species to generate one-to-one orthologous groups. This resulted in 7443 one-to-615 one ant ortholog groups (Supplementary Data 2). 616 617 We used the same method to identify orthologs groups of the symbiotic fungi. Genes of the 618 cultivars of At. colombica, Ac. echinatior, T. septentrionalis, T. zeteki, T. cornetzi and C. 619 costatus, along with genes from Sc. commune and Ag. bisporus as outgroups, were used to 620 perform ortholog assignment. Due to the relatively low completeness of transcriptome-based 621 gene set, we used a looser criterion for creating the transcriptome ortholog groups: at most one 622 ortholog is absent in 6 transcriptomes, and at least one ortholog is present in the outgroup. This 623 resulted in 3499 fungal ortholog groups (Supplementary Data 2). Of these, 1075 ortholog groups 624 were present in a single-copy in all species and were used to build the fungal phylogeny. 625 626 Ant phylogenies 627 628 We first aligned the protein sequences of 2795 single-copy gene families using MUSCLE 37 with 629 default parameters and then converted the protein alignments into CDS alignments. The 2795 630 loci were concatenated in Geneious v7.0 38 , resulting in a data matrix consisting of 1,886,151 631 amino acid sites and 13 taxa (see Supplementary Figure 8). The concatenated matrix was 632 analyzed under the parsimony criterion using a heuristic search and 100 random-taxon-addition 633 replicates in PAUP* 39 , resulting in a single optimal tree. Using this maximum-parsimony tree as 634 a reference tree (user_tree_topology), and the 2795 loci as the maximum number of possible 635 partitions, a partitioning analysis was conducted in PartitionFinder 40 in which all possible protein 636 models were considered and compared (models = all_protein) under the Bayesian Information 637 Criterion (BIC) using the hcluster search algorithm, resulting in a scheme consisting of 132 638 partitions. These partitions and models were employed in a maximum-likelihood analysis in 639 RAxML 7.7.7 41 with hybrid MPI/Pthreads parallelization 42,43 , resulting in a best tree with the 640 topology in Supplementary Figure 8, which is identical to the maximum-parsimony topology. 641 The partitions and models were also employed in maximum-likelihood bootstrap analyses in 642 RAxML consisting of 1152 pseudoreplicates under the "-b" (thorough search) bootstrap option, 643 resulting once again in the same topology ( Supplementary Figure 8) with bootstrap frequencies 644 of 1.0 at all nodes. 645 646 We inferred divergence dates for the maximum-likelihood tree using the penalized likelihood 647 approach implemented in r8s v.1.7 44 . The bee outgroup Ap. mellifera was excluded from the 648 dating analyses. We calibrated two nodes in our tree with fixed ages based on the results from a 649 large-scale diversification analysis of the ant subfamily Myrmicinae that employed a total of 27 650 fossil calibrations across 251 species 45 . The two calibrated nodes in our tree correspond to (1) the 651 most recent common ancestor (MRCA) of C. costatus and its sister group and (2) the MRCA of 652 P. barbatus and its sister group. Three separate analyses were conducted, using the mean, 5% 653 minimum credibility interval, and 95% maximum credibility interval from Ward  The resulting mean dated tree is given in Supplementary Figure 9.  The 1075 loci were concatenated in Geneious v7.0 38 , resulting in a data matrix consisting of 705 825,686 amino acid sites and 8 taxa. The concatenated matrix was analyzed under the parsimony 706 criterion using an exhaustive search in the program PAUP* 39 , resulting in a single optimal tree. 707 Using this maximum-parsimony tree as a reference tree (user_tree_topology), and using the 1075 708 loci as the maximum number of possible partitions, a partitioning analysis was conducted in 709 PartitionFinder 40 in which all possible protein models were considered and compared (models = 710 all_protein) under the Bayesian Information Criterion (BIC) using the hcluster search algorithm, 711 resulting in a scheme consisting of 19 partitions. These partitions and models were employed in a 712 maximum-likelihood analysis in RAxML7.7.7 41 with hybrid MPI/Pthreads parallelization 42,43 , 713 resulting in a best tree with the topology given in Supplementary Figure 10, which is identical to 714 the maximum-parsimony optimal topology. 715 The partitions and models were also employed in maximum-likelihood bootstrap analyses in 716 RAxML consisting of 1152 pseudoreplicates under the "-b" (thorough search) bootstrap option, 717 resulting once again in the same topology (Supplementary Figure 10) with bootstrap frequencies 718 of 1.0 at all nodes. We inferred divergence dates for the maximum-likelihood tree using the 719 penalized likelihood approach implemented in r8s v.1.7 44 . The most distant outgroup taxon 720 Schizophyllum commune was used to root the tree, providing estimates for branch lengths 721 descended from this root node, and was excluded from the dating analyses. We applied a fixed 722 age calibration to the node corresponding to the MRCA of the outgroup Agaricus and its sister 723 group using the results from a previous study 46 , a procedure similar to another diversification 724 date analysis of lepiotaceous attine cultivars 47 . We conducted three separate analyses using 725 different fixed ages for this node. These fixed ages were obtained from previous age estimates 726 for this node from Geml et al. 2004 46 . Thus, we conducted analyses using the mean age (73 727 MYA), the 5% minimum age (55 MYA), and the 95% maximum age (91 MYA) calibrations. 728 The resulting mean dated tree is given in Supplementary Figure 11. 729 730 The PartitionFinder analyses were carried out on the Smithsonian Hydra supercomputer (Linux-based 767 with AMD processors). 768 Obtaining natural history data of attine ants and their fungal cultivars 769 770 The natural history data included in Figure 1 were obtained as follows: 771 772 Maximum colony size: Colony sizes in the field vary depending on habitat and colony age, but 773 can be satisfactorily captured in orders of magnitude of maximal attainable size, as many 774 previous studies have done as well (see Kooij et al., 2015 Table 20). 819 820 To identify syntenic blocks, orthologous relationships were first identified using BLASTp 821 searches (e-value cutoff 1e-5) between all species pairs within each phylogenetic group. 822 Reciprocal best hits (RBH) were considered orthologs. Pairwise syntenic blocks were then 823 identified based on coordinates of these orthologs as follows: We required each syntenic block to 824 contain at least five contiguous orthologous genes, and for a block to be extended the gap had to 825 be be smaller than five genes. No more than five gene inversions were allowed in syntenic 826 blocks between two species. 827 828 829 Rates of loss of synteny 830 831 The loss of synteny between species pairs was assumed to follow an exponential decay process, 832 and rates of synteny loss were calculated accordingly as 1-p s 1/T , where T is divergence time (in 833 millions of years) and p s the estimated proportion synteny between two species. There was some 834 evidence that rates of synteny loss were higher for species pairs that had diverged very recently 835 (<5 million years ago; see Supplementary Figure 12). This might be expected for recently-836 diverged species, where chromosomal rearrangements may evolve rapidly to reinforce genetic 837 isolation of species pairs 59 , but it may also result from the way we have defined syntenic blocks 838 (see above), as the choice of number of orthologous genes and gap sizes is expected to have a 839 greater effect on initial divergence rates. However, there is no reason to suppose that either of 840 these effects would vary between larger taxonomic groups, and inclusion or exclusion of pairs 841 with <5 MY divergence gave similar results (see below). Overall differences between taxonomic 842 groups in their rates of pairwise synteny loss were tested using a Kruskal-Wallis non-parametric 843 test, and pairs of groups were compared using a Steel-Dwass pairwise post-hoc test. 844 845 Overall difference in rates of synteny loss between groups were highly significant (Kruskal-846 Wallis test, H = 104.8, d.f. = 5, P < 0.0001), and all post-hoc pairwise comparisons were also 847 significant (|Z| >3, P < 0.0001 to 0.0229) with the exception of those between primates and birds 848 (Z = -2.08, P = 0.295), mosquitoes and Drosophila (Z = -1.26, P = 0.808) and mosquitoes and 849 non-attine ants (Z = 2.68, P = 0.079). Excluding species pairs with divergence times <5 MY gave 850 similar results (overall difference: H = 101.2, d.f. = 5, P < 0.0001), but now the difference 851 between mosquitoes and non-attine ants was also significant (Z = 3.11, P = 0.023). Calculations 852 were performed in JMP version 11.2.0. 853 854 These results confirm earlier findings that amniotes (primates and birds) have reduced rates of 855 chromosomal rearrangement 60 , but show more variation between insect groups than previously 856 found 61 . 857 858 Mapping loss of synteny onto the ant phylogeny 859 860 Loss of synteny along the branches of the ant phylogeny was estimated by using the FITCH 861 package in the PHYLIP suite of programs v. 3.695 62 , which reconstructs phylogenies based on 862 distance matrices, which are assumed to be additive, but does not make assumptions about an 863 evolutionary clock. The input file was the pairwise loss of synteny between pairs of ant species, 864 which was treated as a distance matrix and mapped onto the ant phylogeny by using the "U" 865 option to specify a user-defined tree with branch lengths, derived from the dated phylogeny 866 based on genome sequences. 867 868 Mapping rates of synteny divergence onto the ant phylogeny showed that differences in the rates 869 of synteny loss between attine and non-attine ants were primarily due to high rates of loss of 870 synteny along the terminal branches in the attine clade. 871 872 Consistently expanded or contracted gene families 873 874 We initially used badirate 65 (with -bmodel FR, -rmodel BDI, -ep ML -out, and using the "mean 875 tree" phylogenetic time estimates as described in 'Phylogenies' above) to estimate the gene birth, 876 death, and innovation rates in the attine ant gene families (see 'Assemblies and annotation' for 877 gene family assignment methods). We used gene family counts from the seven attine genomes 878 and the two outgroups S. invicta and P. barbatus. However, the resulting rates were inflated and 879 highly correlated with branch lengths (Pearson's R up to 0.97, P < 0.002), likely due to short 880 ancestral branches and incomplete lineage sorting. We therefore disregarded the rate estimates 881 themselves and used only the "outlier" gene families that were inferred to evolve at significantly 882 increased rates. Gene models and family assignments for these candidate outlier families were 883 manually checked, resulting in the identification of two significantly expanded gene families: 884 Nardilysin and Tom70, both of which were expanded in all attine ants, see Supplementary Table  885 21 and Supplementary Figure 13. Subcellular localization of potentially full length Ac. echinatior 886 and At. colombica Nardilysin proteins was inferred using the WoLF PSORT web interface 887 (www.genscript.com/psort/wolf_psort.html) 66 . 888 889 To assess overall trends in gene family expansions and contractions, we counted the number of 890 consistently expanded or contracted gene families at ancestral nodes based on gene family sizes 891 at the terminal nodes. At each ancestral node, we compared gene family sizes of the ingroup 892 (speciation after this node) versus outgroups (speciation before this node). For consistent 893 expansions, we required that the minimum family size of the ingroup be greater than the 894 maximum family size of the outgroups. The estimates therefore include novel gene families with 895 0-counts in all outgroups. For consistent gene family contractions we conversely required that 896 the maximum family size of the ingroup be smaller than the minimum family size of the 897 outgroups. 898 899 Since sampling alone could account for some of the observed differences, we sampled all 900 possible permutations for subsets of two or greater and calculated consistently expanded and 901 contracted families as described above. Based on these observed distributions the 5th and 95th 902 percentiles were calculated and compared to the observed data ( Supplementary Figure 14).

903
Calculations were done in R version 3.0.3 67 . 904 905 To check whether novel genes (gene families with 0-counts at all ancestral nodes) were derived 906 de novo or could originate from other protein-coding sequences, we used BLASTp of the novel 907 genes against the NCBI nr database with a relaxed cutoff of 0.1. Genes with no matches to any 908 metazoan sequence (based on the NCBI Taxonomy classification) were considered likely de 909 novo derived. To rule out horizontal gene transfer, these genes were additionally checked with 910 BLASTp (e-value cutoff 0.01) against NCBI nr sequences of plant, fungal, or bacterial origin. 911 No matches were found. 912 Arginine biosynthesis pathway loss 913 914 Two genes encoding the argininosuccinate synthase and argininosuccinate lyase enzymes that 915 are involved in arginine biosynthesis were earlier found lost in the genomes of the evolutionarily 916 derived leaf-cutting ants Ac. echinatior and At. cephalotes 1,2 . To find out when these two genes 917 were lost during the evolution of the attine ants, we used the intact CDS sequences from S. 918 invicta and P. barbatus as references to map to the attine assembly by BLAT (v .35x1, default 919 parameters) 63 . This showed that the argininosuccinate synthase gene is completely lost in all 920 attine ants, while the three Atta and Acromyrmex leaf-cutting ants, T. septentrionalis and T. zeteki 921 have retained regions similar to the argininosuccinate lyase gene. To clarify whether these 922 regions were pseudogenized, we used Genewise (v2.2.0, default parameter) 15 to predict gene 923 structures from the peptide references of S. invicta and P. barbatus. This procedure identified 924 several frame shifts and pre-stop codons in these regions, indicating that all these 925 argininosuccinate lyase gene regions were pseudogenized. 926 927 To confirm that these gene loss events were not caused by assembly errors, we checked the gene 928 synteny of the flanking regions and found that these were intact ( Supplementary Figure 15 and 929 Supplementary Figure 16). We also aligned the pseudogenized argininosuccinate lyase gene 930 sequences to the S. invicta and P. barbatus references to establish which mutations were 931 responsible for the loss of function. 932 933 dN/dS ratio estimations 934 935 Sequences of one-to-one orthologous groups of seven attine ants and outgroup ants were used to 936 generate multiple codon-based alignments by PRANK v.120716 69 using default parameters. 937 Guidance v1.2 70 was then used for assessing alignment qualities (set "--bootstraps 10" and other 938 default parameters). We considered aligned codons with Guidance site-wise scores of < 0.5 as 939 being low-quality sites and marked them as Ns in the alignments for subsequent PAML 71 940 analyses. 941 942 To investigate changes in dN/dS ratios associated with evolutionary transitions in the attine 943 phylogeny, we used three different models: Model 1 had one dN/dS ratio for the outgroup ants, 944 and another for all the attine ants (for a total of two dN/dS ratios). Model 2 added an extra dN/dS 945 ratio for all higher attines (including leaf-cutting ants, for a total of three ratios), while model 3 946 additionally had a specific dN/dS ratio for leaf-cutting ants only (for a total of 4 dN/dS ratios). 947 PAML 71 version 4.7 was run twice for each alignment with different start values (Kappa 2.5 or 1, 948 Omega 0.2 or 2) and non-converging alignments, and those yielding dN/dS estimates >3 were 949 removed. This resulted in 6057 ortholog alignments. Likelihoods of model 2 versus model 1 950 (distinct dN/dS ratio for higher attine ants) and of model 3 versus model 2 (distinct dN/dS ratio 951 for leaf-cutting ants) were then compared with log-ratio tests (LRT Clustering of species was done using the R-package pvclust 78 version 1.3-2, using complete 981 clustering and euclidian distances of normalized CAZy counts (Supplementary Data 1). 982 Statistical significance of C. costatus cultivar CAZy transcriptomic counts versus the 983 domesticated cultivar transcriptomic counts of the higher attine ants and leaf-cutting ants were 984 assessed using the binomial probability of observing counts equal to or greater than the C. 985 costatus count, assuming the sum of all species' counts to be distributed among species with 986 equal probability and treating the domesticated cultivars as a single group (sum) for the purpose 987 of the test. All tests were performed in R version 3.0.3 67 . 988 Fungal Interpro domain losses 989 990 Protein Interpro (IPR) 24 annotations of fungal genes were carried out as described in 'Assemblies 991 and annotation' above. Based on these annotations, domains that were observed in the C. 992 costatus cultivar and the Ag. bisporus outgroup, but were absent in all domesticated cultivars 993 were inferred to be lost in the higher attine ant cultivars. To ensure that the absence of an IPR 994 domains was not due to annotation artefacts, we used HMMER 75 searches with the potentially 995 lost IPR domain profiles against all transcriptomes as well as the genomic assemblies of the C. 996 costatus, Ac. echinatior, and At. cephalotes 3 cultivars. The genomic and 997 transcriptomic sequences were first converted to six frame peptide sequences before searching 998 with HMMER using an e-value of 1e-2 and requiring the length of the match to be greater than 999 30% of the domain length. This resulted in 20 reliably lost domains in the higher attine cultivars