Dynamic genome evolution in a model fern

The large size and complexity of most fern genomes have hampered efforts to elucidate fundamental aspects of fern biology and land plant evolution through genome-enabled research. Here we present a chromosomal genome assembly and associated methylome, transcriptome and metabolome analyses for the model fern species Ceratopteris richardii. The assembly reveals a history of remarkably dynamic genome evolution including rapid changes in genome content and structure following the most recent whole-genome duplication approximately 60 million years ago. These changes include massive gene loss, rampant tandem duplications and multiple horizontal gene transfers from bacteria, contributing to the diversification of defence-related gene families. The insertion of transposable elements into introns has led to the large size of the Ceratopteris genome and to exceptionally long genes relative to other plants. Gene family analyses indicate that genes directing seed development were co-opted from those controlling the development of fern sporangia, providing insights into seed plant evolution. Our findings and annotated genome assembly extend the utility of Ceratopteris as a model for investigating and teaching plant biology.


SUPPLEMENTARY INFORMATION
Abiotic stress response gene repertoire in ferns 90 Ferns are known for their tolerance of heavy metals and metalloids in both soil and 91 freshwater 1-4 . Another Ceratopteris species, C. pteridoides, accumulates cadmium and is a viable 92 candidate for phytoextraction and remediation of cadmium-contaminated ecosystems 5 . Although 93 the genes directly controlling cadmium accumulation have not been identified, we discovered the 94 expansion and functional diversification of the Natural Resistance-Associated Macrophage Protein 95 (NRAMP) heavy metal transporter gene family, which has been shown to control cadmium uptake 96 in Arabidopsis ( Supplementary Fig. 7) 6 . Further analyses of heavy metal and metalloid tolerance 97 in various fern species alongside genome-informed experimental investigations of gene function 98 in Ceratopteris will expand understanding of molecular processes contributing to heavy metal 99 hyperaccumulation in plants. With their abundant natural defenses and phytoremediation potential, 100 ferns provide an important resource for genetic engineering of crop resistance and environmental 101 restoration.

103
Evolution of the APETALA2 gene lineage 104 The APETALA2/ETHYLENE RESPONSIVE FACTOR (AP2/ERF) family is essential to 105 plant development across green plants and encompasses genes known to control flower, seed, and 106 fruit development in angiosperms 7 . To identify the APETALA2 and AINTEGUMENTA homologs 107 in Ceratopteris richardii, we performed a BLAST search using homologs previously reported 108 across land plants 8,9 . A total of 199 sequences was compiled and used to predict the evolutionary 109 history of this gene lineage. Full nucleotide sequences were aligned using the online version of 110 MAFFT 10 with a gap open penalty of 3.0, offset value of 0.8, and default parameters. To find the 111 nucleotide substitution model that best fits our data, using the Akaike Information Criterion 11 , we 112 used jModelTest 2 12 , which identified the GTRGAMMA model. RAxML v.8.0.0 was used to 113 estimate phylogenetic relationships under a maximum likelihood (ML) framework 13 . The 114 GTRGAMMA model was assigned, and a full ML search was implemented, using the autoMRE 115 bootstrapping criterion to assess nodal support (-f a -# autoMRE option). Closely related genes 116 from The AP2/ERF family is split into two major subclades, euAP2 and ANT 119 (AINTEGUMENTA), in which three and 11 Ceratopteris genes were placed, respectively 120 ( Supplementary Fig. 2). CerAP2 was previously found to be expressed in the inner sporangium 121 wall, young spores, and leaf vasculature during sporogenesis, based on in situ hybridization 122 analyses 7 . We found CerAP2 evenly expressed throughout the ten tissues, while the Ceratopteris 123 euANT genes were more highly expressed in stem, root, and young sporophytes and the 124 Ceratopteris basalANT genes were also evenly expressed throughout the different tissues 125 ( Supplementary Fig. 8 Ceratopteris genes, two clustered with the seed plant and bryophyte SEU genes, and one, SEUSS-145 like 3 (CrSEU3), appeared in a clade consisting only of fern sequences ( Supplementary Fig. 9B).

146
Neither the LUG nor the SEU gene families expanded or collapsed in Ceratopteris but both kept a 147 moderate number of members during land plant evolution, suggesting that even after gene copies 148 originating from a WGD are purged from the genome, the loss of one gene family member cannot 149 be tolerated. SEU-and LUG-like Ceratopteris genes were expressed in sporophytic and 150 gametophytic tissue alike, with lowest expression in sporangia and developing leaves, and highest 151 expression in the stem, root, expanding leaf, and young sporophyte, suggesting pleiotropic 152 functions for the members of both gene families.

154
Evolution and selection of cell cycle gene families 155 Homosporous plants have on average considerably higher chromosome numbers than do 156 heterosporous plants 17,18 . Presumably, this is related either to mechanisms of genome doubling or 157 of genome downsizing, and we hypothesize there could be differences in genes associated with the 158 cell cycle in homosporous and heterosporous plants. Thus, we analyzed genes associated with 159 cyclin-dependent kinases and cyclins, chromatin remodeling, and spindle formation and mapped 160 significant gene copy expansions and contractions on a phylogenetic tree of land plants for which 161 genome sequences are available ( Supplementary Fig. 10).

162
We ran Orthofinder 2.4.0 19 on 21 plant species representing all major clades of land plants, 163 for which whole genomes are currently available (July 2021). We selected genes associated with 164 cyclin-dependent kinases and cyclins 20 and genes associated with spindle fiber formation and 165 chromatin remodeling directly from The Arabidopsis Information Resource (TAIR), using the 166 native search function in TAIR and the search terms "spindle" and "chromatin." We selected 143 167 orthogroups from the output of Orthofinder and examined the evolution of gene copy number. For 168 this we used Computational Analysis of gene Family Evolution (CAFE) v5.0 21 , which uses a birth 169 and death model to simulate gene family evolution. We then mapped nodes and tips that CAFE 170 indicated had significant expansions or retractions in copy number. 171 We aligned orthogroups with MUSCLE 22 and synchronized the headers of the peptide 172 orthogroup files to match the corresponding CDS files (https://github.com/carol-173 rowe666/ortho_group_refile). We then used the peptide alignments to guide the CDS alignment 174 with pal2nal (http://www.bork.embl.de/pal2nal/). Gene models in Pinus taeda were often 175 mismatched between CDS and peptide sequences, so this species was removed from the analysis. 176 The pal2nal-aligned CDS files were then used to generate gene trees using IQtree2 177 (https://github.com/iqtree/iqtree2). Three of the 129 orthogroups included fewer than four 178 sequences and could not be bootstrapped for gene tree estimation. We paired the Newick outputs 179 generated by IQtree2 with the pal2nal-generated aligned CDS orthogroups to test for selection 180 using the absREL model built into Hyphy (http://hyphy.org).

181
We found no distinct patterns of changes in gene copy number in the three heterosporous 182 lineages. However, in the lineage leading to Ceratopteris, we observed five gene expansion events:  Azolla, 11 in Salvinia), which encode omega fatty acid hydroxylases, enzymes essential for spore 207 wall polymers 24 . Interestingly, this subfamily has even lower diversity in seed plants, with a mean 208 count of 1.9 CYP704B genes. 209 Independent of the CYP450 subfamilies, which were largely identified from angiosperm 210 reference genes, we constructed a CYP450 phylogeny from 14 of these taxa, with representatives 211 from each major green plant lineage (Supplementary Figure 11) and ultimately receive training in how to map RNA-seq data onto a reference genome within a 228 semester. Ceratopteris is also capable of undergoing apogamy (a haploid sporophyte forms from 229 gametophytic tissue without fertilization) and apospory (diploid gametophytes form from 230 sporophytic tissue), readily producing polyploid individuals and essentially permitting virtually 231 every means of plant reproduction with this single species 28,29 . The genome assembly and the 232 sequenced doubled haploid F 2 mapping population of Ceratopteris provide the resources and 233 techniques to ensure that Ceratopteris and the C-Fern Curriculum will be an important model 234 system for teaching plant development, genetics, genomics, breeding, physiology, and 235 bioinformatics for years to come.