Comparative genomics reveals the dynamics of chromosome evolution in Lepidoptera

Chromosomes are a central unit of genome organization. One-tenth of all described species on Earth are butterflies and moths, the Lepidoptera, which generally possess 31 chromosomes. However, some species display dramatic variation in chromosome number. Here we analyse 210 chromosomally complete lepidopteran genomes and show that the chromosomes of extant lepidopterans are derived from 32 ancestral linkage groups, which we term Merian elements. Merian elements have remained largely intact through 250 million years of evolution and diversification. Against this stable background, eight lineages have undergone extensive reorganization either through numerous fissions or a combination of fusion and fission events. Outside these lineages, fusions are rare and fissions are rarer still. Fusions often involve small, repeat-rich Merian elements and the sex-linked element. Our results reveal the constraints on genome architecture in Lepidoptera and provide a deeper understanding of chromosomal rearrangements in eukaryotic genome evolution.


Table of Contents Section 1. Phylogeny
The dataset used for the phylogeny included a scaffold-level Trichopteran genome (Hydropsyche tenuis) in order to increase the taxonomic breadth of Trichoptera used as an outgroup to Lepidoptera.The phylogeny is consistent with previously published time-calibrated phylogenies, including a recent, comprehensive molecular analysis of lepidopteran phylogeny 5 .All 30 families were recovered as monophyletic and Micropterix aruncella (Superfamily.Micropterigidae) was positioned as the earliest diverging within Lepidoptera as expected 5 .

Section 2. Assignment of orthologues to Merian elements
The assignment of the remaining orthologues to a Merian element or as absent in the last common ancestor will likely be possible with the sequencing of further early-diverging relatives including species of Micropterigoidea, Agathiphagoidea, and Heterobathmioidea.Given that all species that were used to build the lepidoptera odb10 set are from Dytrisia, it is likely that many were simply not present in the last common ancestor of Lepidoptera.
We also inferred the ancestral gene set and gene organisation of Lepidoptera using AGORA46.All orthologues that were assigned to Merian elements by syngraph were also in the ancestral gene set from AGORA.Of the 3093 orthologues that were assigned to Merian elements by syngraph and to contiguous ancestral regions (containing two or more orthologues) by AGORA, we found only a single conflicting orthologue assignment

Section 3. Filtering the dataset when inferring fusion and fission events
Before inferring fusion and fission events, the distribution of Merian elements were visualised across the chromosomes of all 210 lepidopteran genomes in order to check for any data quality issues.This enabled us to be confident that our resulting inferred fusion and fission events were not artefacts due to misassembly.Most genomes were high quality, with only chromosomal scaffolds containing multiple single copy orthologues.However, three genomes contained unlocalised scaffolds with multiple single copy orthologues which belonged to Merian elements.First, Spodoptera frugiperda 6 contained a scaffold (WMCG01000038.1) which contained a set of orthologues corresponding to M5.To prevent this scaffold from being called as a fission event, it was manually removed from the assembly for subsequent analyses.The second, Dendrolimus kikuchii 7 , contained two scaffolds (JAHHIN010000030.1 and JAHHIN010000032.1)contained 18 and 7 BUSCOs respectively, of which 16 and 6 BUSCOs were duplicated.This suggested that these scaffolds are the result of haplotypic duplication.As their presence prevented a fusion between MZ and M31 from being inferred, these two scaffolds were manually removed from the assembly for subsequent analyses.The third, Heliconius sara (GCA 917862395.1)contained four scaffolds which contained sets of orthologues that belonged to Merian elements.Scaffolds CAKJTV010000120.1 and CAKJTV010000131.1 contained 117 and 89 BUSCOs which belong to M17 and M20 respectively.Scaffolds CAKJTV010000302.1 and CAKJTV010000313.1 contained 23 and 9 BUSCOs which belong to M11 and M20 respectively.To prevent these Merian elements from being called as fission events, the assembly was updated by the Darwin Tree of Life team with the above scaffolds assigned to chromosomes.The resulting assembly (GCA_917862395.2) was used in the analysis of fusion and fission events.However, as this updated assembly was not available at the time of analyses of gene and repeat content, it was not used in these analyses.

Supplementary Figure 8. Merian elements painted across the chromosomes of Eupithecia centaureata demonstrate fission and fusion involving M1 and M6.
Each chromosome is represented by a rectangle within which the position of each ortholog is grey if it belongs to the most common Merian element for that chromosome or is coloured if it belongs to an alternative Merian element.Chromosomes that have undergone fusions and/or fission events are outlined in red.This reveals a segment of M1 has fused to a segment of M6 (row 2).The remainder of M1 and the remainder of M6 exist as two separate chromosomes (row 3 and 4 respectively), indicating two fission events.