Grass pollen is the world’s most harmful outdoor aeroallergen. However, it is unknown how airborne pollen assemblages change across time and space. Human sensitivity varies between different species of grass that flower at different times, but it is not known whether temporal turnover in species composition match terrestrial flowering or whether species richness steadily accumulates over the grass pollen season. Here, using targeted, high-throughput sequencing, we demonstrate that all grass genera displayed discrete, temporally restricted peaks of incidence, which varied with latitude and longitude throughout Great Britain, revealing that the taxonomic composition of grass pollen exposure changes substantially across the grass pollen season.
Allergens carried in airborne pollen are associated with both asthma1 and allergic rhinitis (hay fever), negatively affecting 400 million people worldwide2. Pollen from the grass family (Poaceae) constitutes the most important outdoor aeroallergen3,4, and more people are sensitized to grass pollen than to any other pollen type5. However, despite the harmful impact of grass pollen on human health, as reported in developed nations, current studies and forecasts categorize grass pollen at the family level (Poaceae)6,7 owing to the difficulty of differentiating species based on morphology8. Furthermore, although different species of temperate grass flower at different time points9,10, it is unknown whether the disparate phenology of local grass taxa at the ground level is useful for making predictions on the seasonal variation in airborne pollen. Airborne pollen is highly mobile11,12 and pollen concentrations often do not directly correlate with local flowering times11. Persistence and mobility of grass pollen could result in a steadily increasing species richness of airborne pollen over the grass pollen season. Conversely, if grass pollen does not persist for an extended time in the air, pollen assemblages should reflect temporal turnover in the composition of species over the summer months. Understanding the taxon-specific phenology of airborne pollen would fill an important gap in our knowledge of allergen triggers and would have associated benefits for healthcare providers, pharmaceutical industries and the public.
Many species within the subfamilies Pooideae, Chloridoideae and Panicoideae release allergenic pollen into the atmosphere5, including Phleum spp. (for example, timothy grasses), Dactylis spp. (cocksfoot or orchard grasses), Lolium spp. (ryegrasses), Festuca spp. (fescues), Poa spp. (meadow grasses and bluegrasses) and Anthoxanthum spp. (vernal grasses). Furthermore, some grass taxa, such as diverse cultivars and hybrids of Lolium spp., are widely sown in agricultural grasslands and are likely to contribute disproportionately to airborne pollen. However, it is unknown whether particular grass species, or varieties or cultivars within species, contribute more to the prevalence of allergic symptoms and related diseases than others13. Although some grass species have been identified as more allergenic than others in vitro (triggering higher levels of immunoglobulin E (IgE) antibody production), there is a high degree of cross-reactivity between grass species14,15,16. In addition, the allergen profiles (the characterization of the different allergens that are common to different grass) and the degree of sensitization differ between grass species14,17 and the overall allergenicity of grass pollen in the air varies across seasons18. Therefore, family-level estimates of grass pollen concentrations cannot be considered to be a reliable proxy for either the concentration of pollen-derived aeroallergens or pollen-induced public health outcomes.
The identification of biodiversity through high-throughput analysis of taxonomic marker genes (also termed metabarcoding) provides an emerging solution to semi-quantitatively identify complex mixtures of airborne pollen grains19,20,21,22. Furthermore, recent global DNA barcoding initiatives and coordinated regional efforts have now resulted in near-complete genetic databases of national native plants, including grass species. In Great Britain, the vast majority of angiosperms are included in mature DNA barcoding databases for multiple markers23, meaning that we are now in a position to investigate the aerial composition of pollen during the grass pollen season at a national scale.
Here, using two complementary DNA barcode marker genes (rbcL and ITS2), we characterize the spatial and temporal distribution of airborne grass pollen throughout the temperate summer grass-pollen season (May–August) across the latitudinal and longitudinal range of Great Britain (Fig. 1). We hypothesize that the composition of airborne grass pollen, from different grass taxa will be broadly homogenous across the grass pollen season, regardless of terrestrial Poaceae phenology and that they are homogenous across Great Britain owing to the potential for long-distance transport of windborne pollen grains.
Airborne grass pollen from each genus occupied distinct temporal windows across the grass pollen season in 2016 (May to August), thus we reject the first hypothesis that the composition of airborne grass pollen from different grass taxa is broadly homogenous across the grass pollen season (Fig. 2 and Supplementary Fig. 1). Time, measured as the number of days after the first sample was collected, is a good predictor of the composition of airborne grass pollen taxa using both markers (Fig. 2 and Supplementary Fig. 1; ITS2, likelihood ratio (LR)1,74 = 128.8, P = 0.001; rbcL, LR1,71 = 46.71, P = 0.001). Community-level ordination reveals that the airborne grass pollen community as a whole changed across the grass pollen season (Supplementary Figs. 2, 3); similar overarching trends were observed for the most-abundant airborne pollen families, including Poaceae, Pinaceae and Urticaceae (Supplementary Fig. 4). In addition, observations of the dates of first flowering from a citizen science project (www.naturescalendar.org.uk) and metabarcoding data show similar sequences of seasonal progression with a lag time similar to that found in observational studies11 (see Supplementary Text and Supplementary Fig. 5), suggesting that there is a link between local phenology of Poaceae and the composition of airborne grass pollen.
Focusing on the more taxonomically specific ITS2 marker dataset, Alopecurus and Holcus typically dominated the early grass pollen season (Fig. 2), which coincides with typical peaks in allergic rhinitis24; however, further research will be required to confirm this association. Lolium featured prominently for the majority of the later grass season. The popularity of Lolium species as a forage crop means that it is widely sown in agricultural grasslands25, although most agricultural grasslands are managed by grazing, silage cutting or mowing, which prevents the growth of flowering heads25. The length of time over which Lolium pollen dominated may be because many varieties have been bred with the potential to mature at different times throughout the year26, although it should be noted that Lolium species frequently hybridize with each other and that it is therefore difficult to distinguish these genera using genetic material alone. Additionally, although there is some evidence that some species of grass appear to be more allergenic than others18, it is unknown how much allergenicity differs within a species (that is, at the cultivar or hybrid level)16. Although Lolium was the dominant species in airborne grass pollen from July to the end of the sampling period, the total grass pollen concentration declined in August, indicating that the absolute number of Lolium pollen grains at this time is low (Fig. 1 and Supplementary Fig. 6).
The top five genera that contributed to airborne pollen, indicated by the relative abundance of taxonomic marker genes, were Alopecurus, Festuca, Holcus, Lolium and Poa (Fig. 2 and Supplementary Fig. 6). Each of these genera is widespread in the United Kingdom, although long-distance pollen transport means that they may also originate further afield27. These dominant genera have all been shown to provoke IgE-mediated responses in grass-sensitized patients14, providing candidate species for links with hay fever and asthma exacerbation. Conversely, less prevalent species in the dataset could contribute disproportionately to the allergenic load. Species such as Phleum pratense have been identified to be a major allergen5,28. However, we found that Phleum made up a very small proportion of metabarcoding reads (Supplementary Fig. 1), corresponding with the results of an earlier phenological study9. Most genera, such as Phleum, Anthoxanthum and Dactylis, show distinct and narrow temporal incidences (Supplementary Fig. 1) and could enable researchers to identify the grass species that are associated with allergenic windows with greater accuracy.
Changes in species composition over time were localized. We found that peaks in abundance of airborne pollen occurred at different times at each location during the summer (Fig. 2 and Supplementary Fig. 1). For example, the relative abundance of airborne grass pollen from the genus Poa peaked in mid-June in Worcester and Bangor, but 6–8 weeks later in Invergowrie (Fig. 2), probably owing to latitudinal effects on flowering time7,27. This is supported by a significant interaction between latitude and time of year for both markers (Fig. 2 and Supplementary Fig. 1; ITS2, LR68,1 = 34.2, P = 0.002; rbcL, LR68,1 = 47.36, P = 0.001). Differences in species composition of airborne grass pollen between the six sampling sites is supported by a significant effect of latitude (Fig. 2 and Supplementary Fig. 1; ITS2, LR1,73 = 73.2, P = 0.001; rbcL, LR1,70 = 26.4, P = 0.025) and longitude (Fig. 2 and Supplementary Fig. 1; ITS2, LR1,69 = 33, P = 0.003; rbcL, LR1,69 = 27.10, P = 0.010), which are proxies for a broad range of environmental variables. These results do not support our second hypothesis that the composition of airborne grass pollen will be homogenous across the United Kingdom, and instead suggest that taxon-specific effects of regional geography, climate and environmental conditions underpin distributions that have been demonstrated for Poaceae pollen as a whole7. Further investigations into the mechanisms of pollen production and transport, and interactions with a range of climatic, seasonal and meteorological effects, will therefore provide valuable future research foci that will help to expand our mechanistic knowledge of the deposition of grass pollen across time and space.
Enabled by contemporary molecular biodiversity assessments and mature, curated DNA barcoding databases, here we provide a taxonomic overview of the distribution of airborne grass pollen, throughout an entire grass pollen season and across geographical scales. The grass pollen season is defined by discrete temporal windows of different grass species and some species showed geographical variation. Temporal pollen distributions in metabarcoding data follow observed flowering times. The data provide an important step towards developing genera-, and in certain cases, species-level grass pollen forecasting. Additionally, the research presented here enables future studies to elucidate the relationships between grass pollen and disease, the understanding of which will have considerable global public health relevance and socioeconomic importance.
Sampling and experimental design
We collected aerial samples from six sites across Great Britain (Fig. 1 and Supplementary Table 3) using Burkard Automatic Multi-Vial Cyclone Samplers (V2; Burkard Manufacturing) designed to simplify collection of pollen and spores by sampling directly into a microcentrifuge tube29. The volumetric aerial sampler uses a turbine to draw in air (16.5 l min−1) and aerial particles, using mini-cyclone technology. The aerial particles are collected into 1.5-ml sterile microcentrifuge tubes located on a carousel; the carousel is programmed to sample into a new tube every 24 h, thus providing daily samples of airborne pollen (Supplementary Fig. 7). Sample tubes were sent to Bangor and stored at −20 °C before processing. Each sampling unit was mounted alongside a seven-day volumetric trap of the Hirst design30 (Burkard Manufacturing) belonging to the Met Office UK Pollen Monitoring Network, which provided daily pollen concentrations (Fig. 1; map produced using ArcGIS). In the seven-day volumetric trap, a turbine draws air in (10 l min−1) and particles are impacted upon an adhesive-coated tape carried on a clockwork-driven drum. The tape is cut into 24-h sections and mounted on glass slides using a gelatine–glycerol mountant containing basic fuchsin to stain the pollen grains. Pollen were identified and counted under a microscope and converted to volumetric concentrations7. Although the high cost of the pollen samplers precluded routine replicate sampling, our methodologies mirror methodologies that have been used for several decades in the UK network31,32 and are in agreement with recommended terminology described previously33. All pollen samplers were situated in elevated positions on flat-roofed buildings between 4 to 6 floors in height to sample from a mixed air flow. Fins on the samplers (both Burkard Multi-Vial Cyclone and Hirst-type seven-day volumetric samplers) direct the cyclone inlet port into the wind. Bangor was the only sampling site that was not part of the pollen monitoring network, but the same methodology for observational concentrations and molecular analysis of pollen was also used at the Bangor site (which began on 24 June 2016, Fig. 1).
Sampling began in late May 2016 and during alternate weeks, aerial samples were collected for seven days for a total of seven weeks between 25 May and 28 August. Exact sampling dates varied slightly between sites and a total of 279 aerial samples were collected (Supplementary Table 4).
DNA extraction, PCR and sequencing
From the 279 daily aerial samples, 231 were selected for downstream molecular analysis, as described below. Within each sampling week, two series of three consecutive days were pooled. Pooled samples were selected based on grass pollen concentrations on the basis of microscopy analyses. The final, unselected day was not used in downstream molecular analysis. In total, 77 pools of DNA were created. In one instance, three consecutive days of pollen samples were unavailable (Invergowrie, week 2, pool 2) owing to trap errors. For this sample, the next sampling day was selected for pooling (Supplementary Table 4). DNA was extracted from daily samples using a DNeasy Plant Mini kits (Qiagen), with some modifications to the standard protocol that has been previously described34. DNA from daily samples was pooled and eluted into 60 µl of elution buffer at the binding stage of the DNeasy Plant Mini kit.
Illumina MiSeq paired-end indexed amplicon libraries were prepared following a two-step protocol. Two marker genes were amplified with universal primer pairs RBCLaf and RBCLr50623,35, and ITS2 and ITS318 (Supplementary Table 5). A 5′ universal tail was added to the forward and reverse primers and a 6N sequence was added between the forward universal tail and the template-specific primer; this is known to improve clustering and cluster detection on MiSeq sequencing platforms36 (Integrated DNA Technologies). The first round of PCR was carried out in a final volume of 25 µl, including forward and reverse primers (0.2 µM), 1× Q5 HS High-Fidelity Master Mix (New England Biolabs) and 1 µl of template DNA. Thermal cycling conditions were an initial denaturation step at 98 °C for 30 s; 35 cycles of 98 °C for 10 s, 50 °C for 30 s, 72 °C for 30 s; and a final annealing step of 72 °C for 5 min. Products from the first PCR were purified using Agencourt AMPure XP beads (Beckman Coulter) with a 1:0.6 ratio of product:AMPure XP beads.
The second round of PCR added the unique identical i5 and i7 indexes and the P5 and P7 Illumina adaptors, along with universal tails complementary to the universal tails used in the first round of PCR (Supplementary Tables 4, 5) (Ultramer, Integrated DNA Technologies). The second round of PCR was carried out in a final volume of 25 µl, including forward and reverse index primers (0.2 µM), 1× Q5 HS High-Fidelity Master Mix (New England Biolabs) and 5 µl of purified PCR product. Thermal cycling conditions were: 98 °C for 3 min; 98 °C for 30 s, 55 °C for 30 s, 72 °C for 30 s (10 cycles); 72 °C for 5 min, 4 °C for 10 min. Both PCRs were run in triplicate. The same set of unique indices were added to the triplicates, which were then pooled following visual inspection on an agarose gel (1.5%) to ensure that indices were added successfully. Pooled metabarcoding libraries were cleaned a second time using Agencourt AMPure magnetic bead purification, run on an agarose gel (1.5%) and quantified using the Qubit high-sensitivity kit (Thermo Fisher Scientific). Positive and negative controls were amplified in triplicate with both primer pairs and sequenced alongside airborne plant community DNA samples using the MiSeq. Sequence data, including metadata, are available at the Sequence Read Archive (SRA) using the project accession number SUB4136142.
Initial sequence processing was carried out following a modified version of the workflow described previously37. In brief, raw sequences were trimmed using Trimmomatic v.0.3338 to remove short reads (<200 bp), adaptors and low-quality regions. Reads were merged using FLASH v.1.2.1137,39 and merged reads shorter than 450 bp were excluded. Identical reads were merged using fastx-toolkit (v.0.0.14) and reads were split into ITS2 and rbcL on the basis of the primer sequences.
To prevent spurious BLAST hits, custom reference databases that contained rbcL and ITS2 sequences from UK plant species were generated. Although all native species of the UK have been DNA-barcoded23, a list of all species found in the United Kingdom was generated to gain coverage of non-native species. A list of UK plant species was generated by combining lists of native and alien species40 with a list of cultivated plants obtained from Botanic Gardens Conservation International, which represented horticultural species. All available rbcL and ITS2 records were downloaded from NCBI GenBank, and sequences belonging to UK species were extracted using the script ‘creatingselectedfastadatabase.py’, archived on GitHub.
Metabarcoding data was searched against the relevant sequence database using blastn41, using the script ‘blast_with_ncbi.py’. The top 20 blast hits (identified using the highest bit-score) were tabulated (using ‘blast_summary.py’), then manually filtered to limit results to species currently present in Great Britain. Reads occurring fewer than four times were excluded from further analysis. All scripts used are archived on GitHub: https://doi.org/10.5281/zenodo.1305767.
To understand how the composition of grass pollen changed across space and time, the effect of time (measured as the number of days after the first sampling date), and latitude and longitude of sampling location were included in a two-tailed generalized linear model using the ‘manyglm’ function in the package ‘mvabund’42. The proportion of sequences was set as the response variable; proportion data was used as this has been shown to be an effective way of controlling for differences in read numbers43. The effect of time, latitude, longitude and the interaction between time and latitude were included as explanatory variables in the models to test our hypotheses. The effect of longitude is also consistent when York, the most easterly sampling site, with missing data from mid-July until the end of the sampling period, is removed from the analysis (Supplementary Table 6).
The data best fit a negative binomial distribution, most likely owing to the large number of zeros (zeros indicate that a grass genus is absent from a sample), resulting in a strong mean-variance relationship in the data (Supplementary Fig. 8). The proportion of sequences was scaled by 1,000 and values were converted to integers so that a generalized linear model with a negative binomial distribution could be used. Overfitting of the models was tested using ‘dropterm’ in R and based on the lowest Akaike information criterion score, no terms were removed from the models. In addition, the appropriateness of the models was checked by visual inspection of the residuals against predicted values from the models (Supplementary Fig. 9).
In order to compare the metabarcoding data with flowering time data, we used phenological records of first flowering collected in 2016 by citizen scientists from the UK’s Nature’s Calendar (www.naturescalendar.org.uk). First flowering time was compared to genus-level ITS2 metabarcoding data for three species: Alopecurus pratensis, Dactylis glomerata and Holcus lanatus. Because grass pollen could only be reliably identified to genus level in the metabarcoding data, the taxa compared may not have been exactly equivalent since both Alopecurus and Holcus contain other widespread species within the United Kingdom. However, A. pratensis and H. lanatus are the most abundant species within their respective genera. The comparison was only carried out for ITS2 data, because two of the three genera were not identified by the rbcL marker.
Non-metric multidimensional scaling (NMDS) ordination was carried out using package ‘vegan’ in R44, based on the proportion of total high-quality reads contributed by each grass genus, using Bray–Curtis dissimilarity (Supplementary Figs. 2, 3). Ordination is used to reduce multivariate datasets (for example, abundances of many species) into fewer variables that reflect overall similarities between samples. A linear model was carried out using the ‘lm’ function within the ‘stats’ package in R to investigate the relationship between the number of reads obtained for each genus using the rbcL and ITS2 marker.
Further information on research design is available in the Nature Research Reporting Summary linked to this article.
All sequence data are available at the Sequence Read Archive (SRA) using the project accession number SUB4136142. Archived sequence data were used to generate Fig. 2 and Supplementary Figs. 1–6, 8–10). Data on first flowering dates used in Supplementary Fig. 5 were obtained from Nature’s Calendar, Woodland Trust and are available upon request. The sequence analysis pipeline is available at https://github.com/colford/nbgw-plant-illumina-pipeline.
Publisher’s note: Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
We thank J. Kenny, P. Koldkjær, R. Gregory and A. Lucaci for sequencing support; J. Winn for ArcGIS assistance with Fig. 1; W. Grail and the technical support staff at Bangor University; the Botanic Gardens Conservation International (BGCI) for access to the list of plant collections in the National Gardens in the United Kingdom and Ireland; the Met Office network for providing additional observational grass pollen count data; the Woodland Trust and the Centre for Ecology & Hydrology for supplying the UK Phenology Network data and the citizen scientists who have contributed to the latter scheme. We acknowledge the computational services and support of the Supercomputing Wales project, which is part-funded by the European Regional Development Fund (ERDF) via Welsh Government. This work was supported by the Natural Environment Research Council (https://nerc.ukri.org/), awarded to S.C. (NE/N003756/1), C.A.S. (NE/N002431/1), N.J.O. (NE/N002105/1), and G.W.G., N.d.V. and M.H. (NE/N001710/1). IBERS Aberystwyth receives strategic funding from the BBSRC. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.