The pollen virome of wild plants and its association with variation in floral traits and land use

Pollen is a unique vehicle for viral spread. Pollen-associated viruses hitchhike on or within pollen grains and are transported to other plants by pollinators. They are deposited on flowers and have a direct pathway into the plant and next generation via seeds. To discover the diversity of pollen-associated viruses and identify contributing landscape and floral features, we perform a species-level metagenomic survey of pollen from wild, visually asymptomatic plants, located in one of four regions in the United States of America varying in land use. We identify many known and novel pollen-associated viruses, half belonging to the Bromoviridae, Partitiviridae, and Secoviridae viral families, but many families are represented. Across the regions, species harbor more viruses when surrounded by less natural and more human-modified environments than the reverse, but we note that other region-level differences may also covary with this. When examining the novel connection between virus richness and floral traits, we find that species with multiple, bilaterally symmetric flowers and smaller, spikier pollen harbored more viruses than those with opposite traits. The association of viral diversity with floral traits highlights the need to incorporate plant-pollinator interactions as a driver of pollen-associated virus transport into the study of plant-viral interactions.


nature research | reporting summary
April 2020

Data analysis
For manuscripts utilizing custom algorithms or software that are central to the research but not yet described in published literature, software must be made available to editors and reviewers. We strongly encourage code deposition in a community repository (e.g. GitHub). See the Nature Research guidelines for submitting code & software for further information.

Data
Policy information about availability of data All manuscripts must include a data availability statement. This statement should provide the following information, where applicable: -Accession codes, unique identifiers, or web links for publicly available datasets -A list of figures that have associated raw data -A description of any restrictions on data availability Field-specific reporting Please select the one below that is the best fit for your research. If you are not sure, read the appropriate sections before making your selection.

Life sciences Behavioural & social sciences Ecological, evolutionary & environmental sciences
For a reference copy of the document with all sections, see nature.com/documents/nr-reporting-summary-flat.pdf

Ecological, evolutionary & environmental sciences study design
All studies must disclose on these points even when the disclosure is negative.
The raw reads were deposited in GenBank under Bioproject number PRJNA589022 and will be publicly available upon publication. The Pickaxe output from viral read alignments to VRS and the Pickaxe output from viral contig alignments to the GenBank nucleotide and protein databases are included as Supplementary Datasets 1 and 2, respectively. All contig sequences are also included in Supplementary Dataset 2. In addition, supplementary information and source data are provided with this paper.
All other data used for this manuscript are reported in the "Data Collection" section of the "Software and Code" section of the Reporting Summary in junction with the software used to analyze it and are reported similarly in the "Methods" and "Supplementary Methods" sections of the paper.
To uncover the diversity of pollen-associated viruses, and understand landscape and floral features that drive pollen-mediated viral spread, we performed a species-level metagenomic survey of pollen from wild, asymptomatic plants (24 species, 16 families, five subclasses), located in one of four regions (California Grasslands, California Coast, Central Appalachia, Eastern Deciduous Agro-forest Interface) in the United States that vary in human land use.

nature research | reporting summary
April 2020 Sampling strategy

Data collection
Timing and spatial scale No manipulations were made.
At each of the four regions, we identified visually asymptomatic individuals of wild plant species that were in full flower and in high enough abundance to achieve our pollen sample minimum (30 mg). To achieve the broadest representation of plant species, we selected species in different families when possible. We focused mainly on perennial species to avoid any effects of life-history variation. From these, we collected 30 to 50 mg of pollen from newly dehiscing anthers in situ using a sterile sonic dismembrator (Fisherbrand Model 50, Fisher Scientific, Waltham, MA, USA) with a frequency of 20 Hz. We removed non-pollen tissues (e.g., anther debris) with sterile forceps. Visibly pure pollen from a single species was transferred to a 2-mL collection tube with Lysing Matrix D (MP Biomedicals, Irvine, CA, USA) and kept on dry ice until transported to and stored at -80ºC at the University of Pittsburgh. Statistical methods were not used to predetermine sample size; rather, we conducted RNA extraction trials on varying amounts of pollen collected from flowers available in the University of Pittsburgh's greenhouse, as detailed for this project, before we collected pollen for this project. Through the trials, we aimed to find a volume of pollen from which we could consistently extract enough high quality RNA for subsequent sequencing.
We collected one hundred leaf discs (500 mg of leaf tissue) using a sterile hole punch from the same Raphanus sativus individuals from which we collected pollen. All the leaf discs spanned the the mid-leaf vein and were immediately submerged in RNAlater (Invitrogen, ThermoFisher Scientific, Waltham, MA, USA) and kept at room temperature for seven days until frozen, transported to the University of Pittsburgh, and stored at -80ºC. The amount of leaf tissue collected followed the manufacturer's recommendation in the Quick-RNA Plant Miniprep Extraction Kit (Zymo Research Corporation, Irvine, CA, USA), which we used to extract RNA throughout the project.
Pollen sampling data for each plant species included the number of flowers and plants from which pollen was collected and the GPS coordinates of collection sites within each of the regions (see the "Location" section of the of the "Field work, Collection, and Transport" section of the Reporting Summary). Additional plant traits were scored from the literature, and land use in each region was calculated using GIS technology (see the "Data Collection" section of the "Software and Code" section of the Reporting Summary). All except the land use was recorded using pen and paper. The GPS coordinates were determined using the "Maps" app on an iPhone.
Raphanus sativus leaf sampling data included the number of plants from which the tissue was collected (the same 18 individuals from which we collected pollen), the GPS coordinates of collection sites within each of the regions (see the "Location" section of the of the "Field work, Collection, and Transport" section of the Reporting Summary), and how the tissue was preserved and stored (see the "Sampling Strategy" section of the "Ecological, Evolutionary, and Environmental Studies Design" section of the Reporting Summary). This data was recorded using pen and paper. The GPS coordinates were determined using the "Maps" app on an iPhone.
Pollen RNA extraction data included pollen grain size and texture, how long a pollen sample was lysed, and the concentration, A260:A280 purity ratio, and RNA integrity value of the total RNA extracted from a pollen sample. The Raphanus sativus leaf extraction data included how the tissue was lysed (in liquid nitrogen), and the concentration, A260:A280 purity ratio, and RNA integrity value of the total RNA extracted. Data was recorded using pen and paper. The concentrations were measured using a Qubit 2.0 fluorometer (Invitrogen, ThermoFisher Scientific, Waltham, MA, USA), the purity ratios were measured using a NanoDrop spectrophotometer (ThermoFisher Scientific< Waltham, MA, USA), and the RNA integrity values were measured by the Genomics Research Core (GRC) at the University of Pittsburgh via TapeStation analysis.
Only RNA from pollen was sequenced. Next-generation sequencing data included the number of raw reads, recorded into Excel spreadsheets. Pickaxe output/data, deposited into Excel spreadsheets, included the number of non-host reads, number of read alignments to Virus RefSeq (NCBI), number of contigs that passed the quality control steps, and the number of viral contigs. Other relevant Pickaxe output/data, deposited into Excel spreadsheets, also included the length of a contig, the top hit from viral non-host read or contig alignments to Virus RefSeq (NCBI) or GenBank protein or nucleotide databases (NCBI), how similar our viral non-host read or contig was to a reference genome (percent identity), and how much our viral non-host read or contig covers a reference genome (percent sequence coverage or query coverage, respectively). The open reading frames and conserved domains found in a contig, as well as their stop/start positions within a genome and lengths, were recorded into Excel spreadsheets after using ORFfinder (NCBI) and searching the Conserved Domain Database (NCBI), as described above in the "Data Collection" section of the "Software and Code" section of the Reporting Summary. The conservative and relaxed estimates of virus richness were calculated in Excel by tallying the known viruses and novel coding-complete genomes and variants (conservative) or adding the novel partial genomes and variants (RdRps only) to the conservative estimate of virus richness (relaxed) in Excel.
The representative plant species used for the evaluations of pollen sample purity detailed in the Supplementary Methods were chosen because they had either relatively low or relatively high estimates of pollen-associated virus richness. The data in the microscopy analysis were recorded using pen and paper and included the counts of intact pollen grains, pollen grain exine, intine, or cytoplasm fragments, debris similar to that seen in the control (e.g., dust particles), and unidentified debris (i.e., potential contaminants) in three aliquots of pollen samples from Packera aurea, Raphanus sativus, and the Solidago sp., or in three aliquots from a control. Calculations were done in Excel. The data from the RNAseq analysis on our trimmed raw reads from Fragaria chiloensis and Raphanus sativus or Arabidopsis thaliana leaf tissue (NCBI SRA accession SRP018034) were TPM values of pollen-and chloroplast-specific genes and enrichment ratios between genes of each group in all three RNAseq datasets and were recorded into Excel. The data from the RT-PCR experiment on Raphanus sativus pollen and leaves included raw Ct values (i.e., technical replicate Ct values) of pollen-and chloroplast-specific genes in each tissue type and were output into an .sds file by the GRC at the University of Pittsburgh, which we copied directly into Excel. We calculated the relative expression of the genes in each tissue type from the raw Ct values using the double delta method in Excel. Lastly, we report the custom forward and reverse primers used to detect the expression of the pollen-and chloroplast-specific genes, which were designed as described above in the "Data Collection" section of the "Software and Code" section of the Reporting Summary.
All authors participated in at least one type of data collection.

Reporting for specific materials, systems and methods
We require information from authors about some types of materials, experimental systems and methods used in many studies. Here, indicate whether each material, system or method listed is relevant to your study. If you are not sure if a list item applies to your research, read the appropriate section before selecting a response.
Interface August 7 -11, 2018. We chose to sample at these times because we wanted to include as many diverse plant species in our study as possible. We collected the pollen samples in the morning or early afternoon to avoid competition with pollinators.
The Raphanus sativus leaf tissue was collected from the California Coastal region in March 2018 at the same time the pollen was collected from that plant species.
No data were excluded from this study.
Although this study was not experimental, we have disclosed all software that was used (see the "Data Collection" section of the "Software and Code" section of the Reporting Summary) and all output (Supplementary Datasets 1 and 2) from the viral discovery pipeline (Pickaxe) so that reviewers and readers can easily follow how decisions were made concerning the identification of known viruses and novel viral genomes found to be in association with pollen.
In addition, any R code generated for standard data analysis for this manuscript is available upon request and the Pickaxe code is available upon request or can be accessed at https://github.com/pcantalupo/pickaxe or at Zenodo: doi: 10.5281/zenodo.5718362.
Randomization was not relevant to this study. This study was not experimental/manipulative (i.e., there were no treatments or groups). Instead, we identified known viruses and novel viral genomes in association with pollen and related those results to plant traits and human land use within the regions.
Blinding was not relevant to this study. This study was not experimental/manipulative (i.e., there were no treatments or groups). Instead, we identified known viruses and novel viral genomes in association with pollen and related those results to plant traits and human land use within the regions.
No environmental parameters were relevant to this study, but all pollen sampling was done in fair weather (i.e., days on which no precipitation was falling). We collected the pollen samples in the morning or early afternoon to avoid competition with pollinators. Many of the pollen samples (and the leaf sample) in this study were collected from public roadsides. However, some from the California Grasslands were collected from the University of California McLaughlin Natural Reserve, and some from the Eastern Deciduous Agro-forest Interface were collected from the University of Pittsburgh Pymatuning Laboratory of Ecology. We had permission to sample in both places. In addition, we obtained permission from the USDA Forest Service to sample in the Till Ridge Cove area of the Chattahoochee-Oconee National Forest from April 18 to April 25, 2018, though some of the Central Appalachia pollen samples were also collected from public roadsides.
All pollen samples were preserved on dry ice in the field. They were also shipped overnight on dry ice to the University of Pittsburgh, where they were stored at -80C until the RNA extraction phase of this study. The Raphanus sativus leaf discs were immediately submerged in RNAlater (Invitrogen, ThermoFisher Scientific, Waltham, MA, USA) in the field and kept at room temperature for seven days until frozen and shipped overnight on dry ice to the University of Pittsburgh, where they were stored at -80C until the RNA extraction phase of this study.
No short-or long-term disturbance to any habitat was caused by this study.