24-nt reproductive phasiRNAs are broadly present in angiosperms

Small RNAs are key regulators in plant growth and development. One subclass, phased siRNAs (phasiRNAs) require a trigger microRNA for their biogenesis. In grasses, two pathways yield abundant phasiRNAs during anther development; miR2275 triggers one class, 24-nt phasiRNAs, coincident with meiosis, while a second class of 21-nt phasiRNAs are present in premeiotic anthers. Here we report that the 24-nt phasiRNA pathway is widely present in flowering plants, indicating that 24-nt reproductive phasiRNAs likely originated with the evolutionary emergence of anthers. Deep comparative genomic analyses demonstrated that this miR2275/24-nt phasiRNA pathway is widely present in eudicots plants, however, it is absent in legumes and in the model plant Arabidopsis, demonstrating a dynamic evolutionary history of this pathway. In Solanaceae species, 24-nt phasiRNAs were observed, but the miR2275 trigger is missing and some loci displaying 12-nt phasing. Both the miR2275-triggered and Solanaceae 24-nt phasiRNAs are enriched in meiotic stages, implicating these phasiRNAs in anther and/or pollen development, a spatiotemporal pattern consistent in all angiosperm lineages that deploy them.

P hased secondary siRNAs or "phasiRNAs" in plants play key regulatory roles in growth, development, stress responses, and of importance to this work, in reproduction [1][2][3][4] . They are typically triggered by a 22-nt microRNA (miRNA) directing Argonaute-mediated slicing of the primary precursor ("PHAS") transcript, leading to recruitment of enzymes that make the sliced mRNA double-stranded and then sequentially process it by a Dicer-like (DCL) protein into 21-or 24-nt siRNAs. In rice and maize, hundreds of loci on all chromosomes yield abundant 21-nt phasiRNAs in premeiotic anthers [4][5][6] and 24-nt phasiRNAs enriched in meiotic-stage anthers 4,5 . Perturbation of 21-PHAS loci underlies agronomically important, photoperiod-sensitive male sterility in rice 7 , while disruption of 24-nt phasiRNA production yields conditional male sterility in maize 8 . The evolutionary origins and phylogenetic distribution of these small RNA (sRNA) pathways have not been described.
The 24-nt phasiRNAs are triggered by miR2275, thus far characterized only in monocots 4 , and then produced by a monocot-specific DCL protein, DCL5 4,9 . The absence of miR2275 and DCL5 in genomes of well-studied eudicot families (Brassicaceae, Fabaceae, Solanaceae) led us and others to hypothesize the restriction of this 24-nt phasiRNA pathway to the monocots 4 . Here, we report that this pathway is widely present in eudicots, consistent with an important role in flowering plant reproductive biology.

Results
Discovery in litchi of the pathway generating 24-nt phasiRNAs. During an analysis of litchi (Litchi chinenses Sonn.) sRNA data generated previously 10 , we unexpectedly identified three miR2275 stem-loop precursors yielding six canonical miRNA duplexes (two abundantly expressed example precursors shown in Fig. 1a). Two precursors encode polycistronic copies of miR2275 ( Fig. 1a, b, Supplementary Fig. 1A and B). Mature miR2275 accumulated substantially in flowers (Fig. 1c). The only known role of miR2275 is to trigger 24-nt reproductive pha-siRNAs; therefore, we sought these "24-PHAS" loci in litchi. We identified >178 24-PHAS loci; 113 had a target motif for miR2275, the only conserved sequence element in these loci (Fig. 1d, Supplementary Data 1). miR2275-directed cleavage was confirmed by PARE data, initiating the registry of the sRNA phasing (exemplified in Supplementary Fig. 1C), for floral pha-siRNAs (Fig. 1e). These results were strikingly similar to our prior observations in maize 4 , yet litchi, an eudicot which is phylogenetically distinct.
To assess whether litchi 24-nt phasiRNA exhibit as tight spatiotemporal control as grasses, we collected stages from early buds to fully open flowers. The dioecious flowers were segregated when the sex was identifiable (Fig. 2a), yielding five stages before sexual differentiation (MDS_I to MDS_V), male flowers from four stages (MFB_I to MFB_IV), and female flowers from three stages (FFB_I to FFB_III). The 24-nt phasiRNA abundance correlated with development of flower buds until a peak at stage MDS_V (Fig. 2a). After sexual specification, anthers either develop rapidly or abort; both miR2275 and 24-nt phasiRNAs were enriched in immature anthers, with slightly distinct peaks (Fig. 2a). In situ localization showed an enrichment of both miR2275 and 24-nt phasiRNAs in the tapetum and meiocytes of meiotic anthers, in which the archesporial cells and pollen become less arranged compared to cells in premeiotic stages (Fig. 2b). The spatiotemporal distribution in litchi was consistent with maize 4 , suggesting a role in pollen development 8 .
Confirmation of 24-nt phasiRNA pathway in four eudicots. The ephemeral abundance peak of reproductive 24-nt phasiRNAs may explain why these highly abundant sRNAs were not previously discovered in eudicots. To ascertain their distribution in flowering plants, we examined more eudicots. Citrus sinensis (orange) is phylogenetically close to litchi; genomic analysis identified ten MIR2275 precursors, with six mature miR2275 copies expressed (Supplementary Fig. 2A) plus 32 24-PHAS loci enriched in flowers (Supplementary Fig. 2A and Supplementary Data 2). The grape genome encodes ten MIR2275 stem-loops (five unique mature miR2275 sequences detected), and inflorescence-enriched 24-nt phasiRNAs from 42 loci were identified (Fig. 3a, Supplementary Data 3). The strawberry genome (Fragaria vesca) encodes 17 MIR2275 precursors (12 unique miR2275 sequences), but we previously failed to find miR2275 11 , likely reflecting utilization of uninformative floral staging. We collected early floral buds of strawberry (stages 6-9), and found both miR2275 and 24-nt phasiRNAs from 221 loci (Fig. 3b, Supplementary Data 4). We also found miR2275 and 24-nt phasiRNAs in cotton enriched in anthers at the tetrad stage ( Supplementary Fig. 2B, Supplementary Data 5).
Conserved from these five eudicots to grasses is the clustering of MIR2275 stem-loops, separated by just tens of nucleotides on a single precursor transcript ( Supplementary Fig. 2C), an arrangement not described as conserved for other plant miRNAs. In litchi, three or two MIR2275 stem-loops cluster (Fig. 1a, Supplementary Fig. 1A). In strawberry, as many as six MIRNA stem-loops cluster within~1 kb, encoding four unique miR2275 sequences (Fig. 3c). We infer by its deep conservation that this unique, polycistronic precursor structure is of functional importance.
Broad presence of the miR2275 phasiRNA pathway in eudicots. As supported by extensive public sRNA data and genomic analyses, miR2275 is absent in Arabidopsis and in well-studied species in the Fabaceae and Solanaceae, nor have 24-nt pha-siRNAs been described in these species. This discrepancy with the eudicots analyzed in the previous section supports variable presence/absence of the pathway across eudicots. We next assessed the presence of the pathway more broadly in eudicots, via in silico analysis of sequenced eudicot genomes. Given the tight association of miR2275 with 24-nt phasiRNAs, we analyzed 209 plant genomes for MIR2275 loci. MIR2275 was identified in the basal angiosperm Amborella trichopoda ( Supplementary Fig. 2D, Supplementary Data 6), but absent from gymnosperms or earlierdiverging species (Supplementary Data 7). In monocots, MIR2275 was in all grasses (Poales) plus close lineages (Asparagales, Arecales, and Commelinales), although absent in some basal monocots (Alismatales and Pandenales) (Fig. 3d, Supplementary Data 7). However, MIR2275 was widely observed in eudicots (Fig. 3d, Supplementary Data 6), present in the Malpighiales, Fagales, Sapindales, Ericales, but absent in the Brassicales, Caryophyllales, Cucurbitaceaes, and Lamiales. MIR2275 was not detected in several well-studied plant families, including the Brassicaceae, Solanaceae, Cucurbitaceae, and Fabaceae. In almost all species containing MIR2275, at least one polycistronic MIR2275 cluster was observed (Fig. 3d, Supplementary Data 7).
In grasses, only the specialized DCL5 generates 24-nt phasiRNAs 8 . Which DCL performs this role in eudicots? We searched for DCL genes from the 209 plant genomes. DCL5 was monophyletic in monocots, suggesting another DCL generates 24-nt phasiRNAs in eudicots ( Supplementary Fig. 3A). DCL3 is the only eudicot Dicer known to produce 24-mers, i.e. the 24-nt heterochromatic siRNAs (hc-siRNAs); DCL3 can process mRNA substrates, as demonstrated from tasiRNA loci in the Arabidopsis dcl2/4 double mutant 12 . DCL3 is conserved in all eudicot species we examined ( Supplementary Fig. 3A). Many eudicot lineages demonstrated duplicated DCL3 genes, yet just a few lineages had ancestral duplications, for instance, the Rosaceae ( Supplementary Fig. 3B). However, DCL3 is single copy in some eudicots with 24-nt phasiRNAs, for example, grape (Vitis vinifera) and rubber (Hevea brasiliensis) (Supplementary Fig. 3B). Therefore, we hypothesize that DCL3 in some eudicots has dual functions, processing distinct substrates (RNA Pol II vs Pol IV) for biogenesis of either 24-nt phasiRNAs or 24-nt hc-siRNAs, a duality relieved in many monocots via subfunctionalization of DCL3 and DCL5. Application of machine learning demonstrated that angiosperm 24-nt phasiRNAs are more similar across species than to 24-nt hc-siRNAs from within a species ( Supplementary Fig. 4), supporting the distinction of these pathways.  Fve-miR2275c Fve-miR2275d Fve-miR2275e Fve-miR2275f Fve-miR2275g Fve-miR2275h Fve-miR2275i Fve-miR2275j Fve-miR2275k Fve-miR2275l   (2) Apocynaceae (2) Phrymaceae (1) Asclepiadaceae (1) Rubiaceae ( An analysis of data from Solanaceous species unexpectedly (because of an absence of miR2275) yielded 137 highly confident 24-PHAS loci enriched in meiotic tomato inflorescences (Fig. 4a,  Supplementary Data 8). These loci had unique characteristics: no apparent miRNA target site and a 12-nt phasing of the 24-nt siRNAs (Fig. 4b). Similar loci were observed in petunia (Supplementary Fig. 5, Supplementary Data 9 and 10). We revisited the 24-PHAS loci of rice and maize and found a number of 12-nt phased loci also lacking an miR2275 target site ( Supplementary  Fig. 6); in the absence of the miR2275 target site, phasing was less precise (Supplementary Fig. 6). How is 12-nt phasing generated? One possibility is cis activity of 24-nt phasiRNAs, demonstrated for 24-nt heterochromatin siRNAs 13 , but in this case cis-directed cleavage from either source strand would promote the generation of secondary 24-nt phasiRNAs shifted by 12 nt (Fig. 4c,   Supplementary Fig. 7). Yet, it is unclear how this feedback loop of cis-triggered phasiRNA biogenesis would initiate-a topic for future analyses.

Discussion
We conclude that 24-nt phasiRNAs emerged in angiosperms, but were lost in some lineages, presumably reflecting life history adaptations. Genetic components of pathway were enhanced in monocots via specialization of DCL5. If 24-PHAS loci predate the DCL3/DCL5 duplication and divergence, the emergence of DCL5 could represent a case of "escape from adaptive conflict" (EAC), the theory that gene duplication can segregate functions from a single-copy progenitor 14 . EAC was invoked as an explanation for the expansion of RNA polymerases in plants 15 , including Pol IV subunits that make DCL3 substrates for hc-siRNAs. Coupled with  Fig. 4 24-nt phasiRNAs with 12-nt phasing in tomato. a 24-nt phasiRNA accumulation in different tissues of tomato, with each row denoting a 24-PHAS locus. b A representative locus generating 24-nt phasiRNAs in tomato. Distribution of sRNAs (above) and deduced phasing score (below) are viewed along the coordinates of the PHAS locus, with phasiRNA distribution in registers displayed in a radar plot on the left. c A hypothesis for the generation of 24-nt phasiRNAs with a 12-nt phasing pattern. Cis activity of either strand of a 24-nt phasiRNA duplex could trigger secondary phasing with a 12-nt shift to the primary 24-nt phase. The combination of the primary and secondary phases leads to the formation of the 12-nt phase pattern the demonstration that 24-nt phasiRNAs are required for full male fertility in maize 8 , our observations extend the analogy of the independently evolved plant reproductive phasiRNAs and metazoan piRNAs, including our observation that in some plant lineages an alternative mechanism of biogenesis may occur, that is, cis-mediated cleavage that is reminiscent of the "ping-pong" production of piRNAs 16 . The presence of the miR2275-24-PHAS-phasiRNA pathway across the angiosperms supports an important functional role for these molecules, likely in reproduction, as recently demonstrated 8 . However, the absence of miR2275 in many eudicots and some basal monocots suggests that the function of this pathway is dispensable for the development of some plant species. Alternatively, perhaps the 24-nt phasiRNAs are indispensable in angiosperms, but they are produced without miR2275, as found in tomato and petunia, and in such a spatiotemporally limited manner that they have escaped detection in the Brassicaeae, Fabaceae, etc. Yet another possibility is that there are other pathways in these lineages that functionally substitute for the role of the miR2275-24-PHAS-phasiRNA pathway, without the necessity of generating phasiRNAs. As with mammalian meiotic piRNAs, the functions of plant meiotic phasiRNAs are as yet largely unclear; as these are worked out in the coming years, the answers to these questions are likely to be addressed.

Methods
Plant materials. Early-stage flower buds for litchi ( Supplementary Fig. 8) were collected from nine litchi trees, cultivar "Feizixiao", grown together in an orchard at South China Agricultural University (Guangzhou, China). Samples from three sets of three trees per biological replicate were used for sequencing. Flower buds of 12 different stages with three biological replicates were collected, yielding 36 samples in total. Strawberry (Fragaria vesca) inbred line Yellow Wonder 5AF7 (YW5AF7) was grown in a growth chamber with 12 h light at 25°C followed by 12 h dark at 20°C. Anthers from different stages including buds from stages 6 to 9 17 were hand-dissected under a stereomicroscope and frozen immediately in liquid nitrogen.
sRNA sequencing. For flower buds of litchi, total RNA was extracted using TRIzol (Thermo Fisher Scientific) plus Fruit-mate (Takara) for polysaccharide/polyphenol removal, both according to the manufacturers' instructions. For sRNA sequencing, 10 μg of the mixed total RNAs with RNA integrity number (RIN) ≥ 7.5 were used for sRNA library construction. sRNA libraries were constructed and sequenced on Illumina HiSeq 2500 platform at RIBOBIO (Guangzhou, China).
RNA of strawberry flower buds was extracted using PureLink Plant RNA Reagent. sRNA libraries were constructed using the Illumina TruSeq sRNA kit, and sequenced on the Illumina HiSeq platform at the University of Delaware.
sRNA data analysis and PHAS locus annotation. Annotation of PHAS loci was conducted largely by application of a P-value-based approach 18 . The P-value cutoff was set to 0.001. In the initial analyses (as described in the main text), only genomic loci with a target site of miR2275 were considered as valid PHAS loci and used for further analyses.
Phylogenetic tree construction. DCL protein sequences were identified from each plant genome using genBlastG 19 and aligned using MUSCLE 20 and then trimmed using trimAL 21 . The resulting multiple alignment was utilized to construct the phylogenetic tree using RAxML 22 .
In situ RNA hybridizations. sRNAs were detected using LNA probes by Exiqon (Woburn, MA). Samples were vacuum fixed using 4% paraformaldehyde, and submitted to a histology lab (A.I. DuPont Hospital for Children) for paraffin embedding. We followed published protocols for the prehybridization, hybridization, and post-hybridization steps 23 . For the detection step, fluorescent in situ hybridization was carried out using anti-Digoxigenin Fab fragment (Sigma-Aldrich cat# 11214667001) and donkey anti-sheep IgG (H + L) AF647 (Thermo Fisher Scientific cat# A-21448) antibody combination. We used a Zeiss LSM 880 multiphoton confocal microscope for the final imaging process with an excitation of 745 nm and EC Plan-Neofluar 40×/1.30 oil DIC objective (University of Delaware Bioimaging Center). Probe sequences are listed in Supplementary Data 11.
Reporting Summary. Further information on experimental design is available in the Nature Research Reporting Summary linked to this article.