Fern genomes elucidate land plant evolution and cyanobacterial symbioses

Ferns are the closest sister group to all seed plants, yet little is known about their genomes other than that they are generally colossal. Here, we report on the genomes of Azolla filiculoides and Salvinia cucullata (Salviniales) and present evidence for episodic whole-genome duplication in ferns—one at the base of ‘core leptosporangiates’ and one specific to Azolla. One fern-specific gene that we identified, recently shown to confer high insect resistance, seems to have been derived from bacteria through horizontal gene transfer. Azolla coexists in a unique symbiosis with N2-fixing cyanobacteria, and we demonstrate a clear pattern of cospeciation between the two partners. Furthermore, the Azolla genome lacks genes that are common to arbuscular mycorrhizal and root nodule symbioses, and we identify several putative transporter genes specific to Azolla–cyanobacterial symbiosis. These genomic resources will help in exploring the biotechnological potential of Azolla and address fundamental questions in the evolution of plant life.


Gene annotation
We identified 51,098 and 28,968 protein-coding gene models in Azolla and Salvinia, respectively, using the MAKER-P pipeline 1 (Supplementary Fig. 2). Genes were classified as high-confidence (HC) if they were supported by transcript evidence or had significant sequence similarity to other known plant proteins ( Supplementary Fig. 1, Supplementary Table 3). Gene models only supported by ab initio predictions were classified as low-confidence (LC) and were excluded from analyses of gene families. The mean length of HC protein-coding genes is 5 kb and 3.4 kb with a mean of 5.3 and 5.2 introns per gene in Azolla and Salvinia, respectively (Supplementary Table 3).

RNA gene profiles
The number of rRNA genes is similar in Azolla and Salvinia (1,397 and 1,161,respectively;Supplementary Fig. 2,Supplementary Table 3). In contrast, the Salvinia genome contains 50% more tRNA genes (an increase of 3,515 genes) compared to Azolla. These tRNA genes are primarily distributed evenly across the genome in both species ( Supplementary Fig. 3), but a few tRNA genes appear to have proliferated locally. For example, high numbers of tRNA-Glu genes are clustered on scaffolds 43, 46, and 48 in Salvinia, and tRNA-Asp genes are clustered on scaffolds 10 and 19 in Azolla ( Supplementary Fig. 3). Azolla has nearly twice as many tRNA-Asp genes as its second most abundant tRNA, 95% of which have one (ATC) of the two possible Asp anticodons. The two most abundant tRNA gene types in Salvinia are tRNA-Arg and tRNA-Glu, which are 4.5 and 6.3 times more than the third (tRNA-His). Like Azolla, specific anticodons are disproportionately represented ( Supplementary Fig. 3).

Repetitive elements
In Azolla, we found 17,484 putative full length long terminal repeat retrotransposons (LTR-RT), more than six times the number in Salvinia (Supplementary Figs. 1 and 4). We estimated sequence divergences between LTRs for all full length LTR-RT predictions that were supported by having homology to LTR-RTs in the Dfam 2.0 and Repbase 22.04 databases. Assuming a low rate of gene conversion among LTRs 2 , the divergence between LTRs could serve as a proxy for time since element insertion due to the nature of the LTR-RT transposition mechanism. Interestingly, the density plots in Supplementary Fig. 4 show the distribution of LTR divergences in Salvinia as potentially bimodal. Given a constant background mutation rate and a constant birth rate for LTR-RTs, one would expect a smoothly tapered right-skewed distribution. The bimodality could be due to recent deletion of many newer LTR-RTs, a burst of transposition in the past, and/or heterogeneous historical substitution rates. The Azolla and Salvinia assemblies include 12.138 Mb and 13.095 Mb of centromere-like sequences, respectively. These sequences are concentrated on particular scaffolds and have been identified on 514 scaffolds in Azolla and 940 scaffolds in Salvinia ( Supplementary Fig. 2).

Tandem gene duplications
In addition to examining gene evolution associated with whole genome duplications, we also characterized tandem gene duplication in the Azolla and Salvinia genomes. To distinguish gene duplicates as syntenic or tandem, we used SynMap and DAGChainer algorithm to extract syntenic paralogs. Duplicates that are within ten genes apart in the same region of the genome were identified as tandem duplicates. Functional enrichment analysis revealed the GO term 'protein binding' as the most significantly over-represented in both Azolla and Salvinia tandemly duplicated genes, most of them annotated as belonging to the highly diverged pentatricopeptide repeat protein (PPR) family. A second group of Azolla tandem duplicates was found to be involved in chitin-binding and chitinase activities, belonging to a distinct family of glycosyl hydrolases involved in breaking down glycoside bonds in chitin, a polymer of the glucose derivative N-acetylglucosamine found in the cell walls of fungi and the exoskeletons of arthropods such as crustaceans and insects. These tandem genes formed a cluster of 12 genes, located in a genomic region syntenic to a cluster of four tandem duplicates in the Salvinia genome (the microsynteny analysis can be regenerated at https://genomevolution.org/r/zsy2).

Global gene expression pattern comparing cyano-absent and cyano-present individuals
A total of 6,644 genes are differentially expressed between AzCy-and AzCy+ individuals, and 2,254 of them exhibit at least 2-fold expression difference. Under the N-conditions, 3,433 genes are up-regulated and 2,777 are down-regulated. Far fewer genes, 1,286 and 839 genes, are respectively up-and down-regulated under the N+ conditions.

Candidate gene set
We show here that cyanobacterial N2-fixation rate is highly induced when plants are grown without nitrogen nutrient ( Supplementary Fig. 10), indicating an active control of plants on the cyanobionts. To identify likely candidates involved in this symbiotic regulation, we focused on genes that, when cyanobionts are present, are differentially expressed between the N treatments, but not or to a lesser degree when cyanobionts are gone. In other words, for the up-regulated genes, they have to satisfy these three criteria: (1) when cyanobionts are present, they have a higher expression in N-than in N+, (2) when limited by nitrogen nutrients, they have a higher expression in cyano+ than cyano-, and (3) they are not down-regulated in N-compared to N+ when cyano-. And the opposite pattern would apply to the down-regulated genes. We found a total of 88 up-regulated and 72 down-regulated genes in this category that we termed "putative symbiotic genes". These include an ammonium transporter, a metal ion transporter, and a chalcone synthase that might be involved in flavonoid signaling. The importance of these genes is discussed below.

Symbiosis-specific transporters
Azolla has five ammonium transporter paralogs (AMT) within its genome. Ammonium transporters come in two major classes in plants: AMT1s and AMT2s. In plants, AMT1 genes are mainly expressed in the roots, and are responsible for transporting ammonium from the external environment into the xylem. These genes are usually constitutively expressed. In contrast, AMT2s are inducibly-expressed in all other plant tissues, such as shoots, leaves, and flowers. Azolla filiculoides has one AMT1 (Azfi_s0034.g025388) and four AMT2s. One A. filiculoides AMT2, AfAMT2-4 (Azfi_s0034.g025227), appears to be symbiosis-specific, as its expression is up-regulated when the cyanbiont is present, particularly under the nitrogen-depleted condition (i.e. when cyanobionts are fixing the most nitrogen; Supplementary Fig. 11). AfAMT2-4 is therefore likely the main transporter for exchange of ammonium with Nostoc in the leaf pocket. On the other hand, the expression profile of AfAMT2-3 (Azfi_s0093.g043301) suggests that it is a nitrogen-starvation responsive gene, whereas AfAMT1 is likely a general ammonium transporter, as it is expressed similarly regardless of cyanobacterial presence ( Supplementary  Fig. 11). The AfAMT2-1 and AfAMT2-2 genes are nearly identical to each other, so that their expressions cannot be measured correctly and were thus excluded. In addition to ammonium, there are myriad cofactors that are needed by Nostoc for N2-fixation. Metal ions, such as molybdenum, copper, and iron, are among the most crucial of these cofactors 3,4 . We found a particular paralog of molybdate transporter (AfMOT1; Azfi_s0167.g054529) and a paralog of vacuolar iron transporter (AfVIT) in the putative symbiotic gene list ( Supplementary Fig. 11). Similarly, in Medicago truncatula, a root nodule-specific MtMOT1.3 paralog was recently identified to mediate Mo transfer from plants to the symbiotic rhizobium 5 .

Possible roles of flavonoids in Azolla-cyanobacteria communication
The identification of a chalcone synthase (CHS) in our putative symbiosis gene list is of particular interest (Figure 5e). CHS produces naringenin chalcone, and is the first committed step in flavonoid biosynthesis pathway. Flavonoids are major plant signals used in symbioses with rhizobia and Frankia. Silencing of CHS in Medicago truncatula 6 and Casuarina glauca 7 both resulted in a defective nodule formation. Interestingly, flavonoids also have significant effects on cyanobacteria growth and cellular differentiation. Naringenin was shown to stimulate growth of a number of cyanobacteria species including ones in Nostoc 8 . Furthermore, naringin was found to be one of the most potent hormogonia-repressing factors (HRF) 9 . Hormogonia are the motile stage of cyanobacteria and do not contain N2-fixing heterocysts. In the Azolla-Nostoc symbiosis, hormogonia are maintained in the shoot apex, and upon entering nascent leaf cavities, they return to the vegetative stage, and develop heterocysts for N2-fixation. Because hormogonia cannot fix nitrogen, boosting the hormogonia-repressing signals can promote N2-fixation rates and cyanobacteria maturation. Given the expression pattern of CHS, we hypothesize that flavonoids act as a HRF in Azolla-Nostoc symbiosis, and are a major communication signal to help time the development of the leaf cavity with the metabolic development of the cyanobiont. Consistent with our hypothesis, Azolla aqueous extract was found to contain flavonoids and, importantly, can effectively suppress hormogonia differentiation 10 .    Figure 3. Genomic distribution of tRNA genes. Circos plots showing locations and densities of non-pseudogenized tRNA genes predicted by tRNAscan-SE organized by their predicted anticodon for the largest scaffolds greater than 1 Mb for Salvinia (a) and Azolla (b). The shade of each 1 Mb region in the outermost track corresponds to the total tRNA density for that sequence region. Each inner track shows the location of each 1 Mb sequence region that contains tRNA genes for a specific amino acid; square size is proportional to the density of the given tRNA gene in that region. Numbers of tRNA genes in genome by amino acid (c) and anticodon (d). Circular plot areas are proportional to the amount of sequence shown.