Particulate matter emission sources and meteorological parameters combine to shape the airborne bacteria communities in the Ligurian coast, Italy

Aim of the present study is to explore how the chemical composition of particulate matter (PM) and meteorological conditions combine in shaping the air microbiome in Savona (Italy), a medium-size, heavily inhabited urban settlement, hosting a wide range of industrial activities. In particular, the air microbiome and PM10 were monitored over six months in 2012. During that time, the air microbiome was highly dynamic, fluctuating between different compositional states, likely resulting from the aerosolization of different microbiomes emission sources. According to our findings, this dynamic process depends on the combination of local meteorological parameters and particle emission sources, which may affect the prevalent aerosolized microbiomes, thus representing further fundamental tools for source apportionment in a holistic approach encompassing chemical as well as microbiological pollution. In particular, we showed that, in the investigated area, industrial emissions and winds blowing from the inlands combine with an airborne microbiome which include faecal microbiomes components, suggesting multiple citizens’ exposure to both chemicals and microorganisms of faecal origin, as related to landscape exploitation and population density. In conclusion, our findings support the need to include monitoring of the air microbiome compositional structure as a relevant factor for the final assessment of local air quality.


Results
Particulate matter emission sources and atmospheric parameters. The PMF model application on PM 10 samples resulted in a solution with an optimum number of seven source factors at the receptor site, i.e. the station where the PM 10 samples were collected. Like other multivariate methods, these factors correspond to linear combinations of the original compositional parameters, each potentially identifiable as an emission source profile. The fractional contribution per sample for each of the seven factors is reported in Supplementary  Table S1.
In order to associate the factors with specific emission sources, prior knowledge about the receptor site (Savona, Italy) was used together with a critical analysis of the factor fingerprints (Fig. 1a). Moreover, the percentage contribution of the seven identified sources to the total variable was reported (PM 10 , Fig. 1b). As a result, the seven factors extracted by PMF analysis can be described as follows. Factor 1 is characterized by the prevalence of elements attributable to the geochemical composition because of the high percentages of Si, Al, and Ti. Therefore, this factor was identified as "crustal material and road dust resuspension", deriving from the soil and/or road surface 32,33 . Factor 2 is linked to organic carbon (OC), Cu, Zn, Cr, and K + . OC and K + are strictly related to combustion processes, including biomass burning, as previously described 34 . Cu, Zn, and Cr are associated with traffic: Cu and Cr are well-known tracers of the brakes of motor vehicles, while Zn is known as a tracer of tire wear [35][36][37] . Therefore, this factor was identified as a combination of "traffic and biomass burning" sources. Factor 3 is mainly associated with NO 3 from gas-to-particle conversion of NO x (g) in the atmosphere to which traffic and other high-temperature combustion processes may contribute 38,39 ; as such it can hardly be attributed to a single well-defined source, especially in such a complex emissive scenario. Therefore, this factor was identified collectively as "secondary nitrate". Factor 4 relates to SO 4 2and NH 4 + from gas-to particle reactions, leading to secondary ammonium sulphate [40][41][42] . Similarly to secondary nitrate, this component can be contributed by various sources (both natural and anthropogenic) due to the multiplicity of fossil fuel sources of the precursor gaseous SO 2 and the ubiquity of NH 3 (g) 43,44 . Therefore, this factor was collectively identified as "secondary ammonium sulphate". Factor 5 is associated with Na, Mg ins, V, and Ni. The distinctive association of V and Ni reveals emissions attributable to the combustion of heavy oil [45][46][47] . The association of these species with Na and Mg suggests a "naval-maritime transport" source. Factor 6 is mainly characterized by high scores of Pb, K ins, Zn, OC, and elemental carbon (EC). The fine particles produced by coal combustion are characterized by significant fractions  48 . Therefore, this factor was identified as "coal burning". Factor 7 is connected to a large score of Cl -, Na, and Mg ++ , and clearly identified as "sea spray" aerosol 49 .
In order to confirm the PMF analysis results, the origin of the polluted air masses was investigated by analyzing the PMF factors as a function of wind direction, calculating the respective cumulative distribution functions and generating the corresponding wind polar plots. This method associates the emissive profile obtained by PMF with wind direction and intensity to which the receptor site is downwind. The plots obtained are shown in Supplementary Figure S1. In particular, factors 1, 3, 4, and 6 (respectively crustal material and road dust resuspension, secondary nitrate, secondary ammonium sulphate, and coal burning) are associated with winds blowing from the inland towards the coast covering traffic and industrial sources. Factor 5 (naval-maritime transport) is oriented downwind from the sea, confirming that it is associated with the fuel oil used for sea shipping. Finally, while factor 2 shows a local origin indicating sources in the proximity of the receptor site, factor 7 is meridionally oriented, indicating once more a marine origin. It should be noted, however, that, unlike factor 5 characterized by elements typical of the submicron fraction likely flushed back and forth by sea-land breezes from the harbor, factor 7 is associated with coarse particles requiring different meteorological conditions (possibly more intense winds from the open sea in order to sustain heavier particles). AM overall composition. Next generation sequencing of the V3-V4 hypervariable region of the 16S rRNA gene from the total microbial DNA extracted from PM 10 air filters resulted in 98 samples containing more than 1 000 reads per samples which were retained for the rest of the study, for a total of 797,781 high-quality sequences with an average of 8,058 ± 3,410 (mean ± SD) paired-end reads per sample, binned into 4 189 ASVs. According to our data, AM is dominated by the phyla Proteobacteria (mean relative abundance ± SD = 42.8 ± 19.4%) and Firmicutes (27.4 ± 18.9%), with Actinobacteria (14.8 ± 10.9%) and Bacteroidetes (9.2 ± 8.6%) being subdominant. At the family level, the most represented taxa are Comamonadaceae (6.1 ± 13.4%) and Sphingomonadaceae (4.3 ± 5.0%), both belonging to Proteobacteria. Other represented families are Ruminococcaceae (3.9 ± 7.6%), Enterobacteriaceae (3.7 ± 5.9%), Clostridiaceae (3.6 ± 6.8%), Bacillaceae (3.5 ± 5.0%) and Flavobacteriaceae (3.4 ± 5.7%). Please see Supplementary Figure S2 for a graphical representation of the overall compositional structure of AM throughout the entire sampling period.
In order to explore connections between the AM structure and seasonality, we compared the levels of AM diversity over the different months (Fig. 2). Diversity measurements indicated a general trend of microbial richness to decrease from winter to summer, although the differences did not reach statistical significance (Kruskal-Wallis test, FDR corrected p-value > 0.05) (Fig. 2a). Conversely, the PCoA of unweighted UniFrac distances between the AM compositional profiles showed sample segregation according to the month of sampling Variation of the AM topological structure and association with PM emission sources and meteorological parameters. To further explore the overall AM variation across the sampling period, a clustering analysis of the AM compositional profiles was carried out. Hierarchical Ward-linkage clustering based on the Spearman correlation coefficients of family-level AM profiles resulted in the significant separation of 4 clusters, named C1, C2, C3 and C4, respectively (FDR-corrected permutation test with pseudo-F ratio, p-value ≤ 0.001) (Fig. 3). Confirming the robustness of the identified clusters, the PCoA of the unweighted UniFrac distances between samples revealed a sharp segregation based on the assigned cluster (Fig. 4). Interestingly, when we searched for correlations between PCoA coordinates and measured meteorological parameters or PMF factors (Supplementary Tables S2 and S1, respectively), we found that factor 5 (naval-maritime transport) and relative humidity (RH) were both positively correlated with the PCo1 axis (Kendal's test, FDR-corrected p-value ≤ 0.001), while factor 6 (coal burning) was negatively correlated with the PCo1 coordinates (p-value ≤ 0.001) (Fig. 4). This indirect gradient analysis allowed to highlight positive associations between clusters C1 and C3 and factors 6 and 5, respectively. Further, cluster C3 was found to be positively related to RH. As for seasonality, the clusters C3 and C4 are the most prevalent in summer and winter, respectively, while for C1 and C2 we did not observe any prevalence for a particular sampling period. We also compared the microbial diversity values of samples included in the different clusters, using three different diversity metrics. Our data indicated higher biodiversity in clusters C1 and C2 (PD whole tree, chao1, and observed ASVs, mean ± SD: 3 Compositional specificity and prevalent microbiological source of the four AM clusters. We subsequently compared the relative abundance of AM families among the four clusters in order to find out the most distinctive families of each of them (Supplementary Figure S3). According to our findings, the discriminating families (i.e. families with significantly different relative abundance, based on Kruskal-Wallis test) for the microbial cluster C1 are Prevotellaceae, Erysipelotrichaceae, Coriobacteriaceae, Christensenellaceae, Lachno-  In an attempt to identify the most likely prevalent microbial origin of the four AM clusters, we first derived the respective compositional peculiarities at the OTU level. To this aim, 16S rRNA gene reads were clustered at 97% homology, resulting in 3 821 OTUs. By linear regression, we subsequently obtained 80 OTUs specifically discriminating the four clusters. In particular, for 52 of these OTUs a significantly different distribution in the four clusters was confirmed by a Kruskal-Wallis test, as shown in Supplementary Figure S4. For each of them, the isolation source of the closest BLAST match within the NCBI 16S rRNA sequence database was recovered (Supplementary Table S3). Interestingly, according to our findings, the cluster C1 is mainly characterized by OTUs of faecal origin. These OTUs include sequences assigned to typical components of the human gut microbiome, such as Faecalibacterium prausnitzii, Ruminococcus faecis, Prevotella copri, Eubacterium eligens, Ruminococcus bromii, Roseburia inulinivorans and Blautia faecis [50][51][52] , the cattle rumen components Succinivibrio dextrinosolvens 53 and Oscillibacter ruminantium 54 , and the porcine gut microbiome member Treponema porcinum 55 . Differently, the cluster C2 is characterized by OTUs assigned to microorganisms isolated from plant roots and leaves, including Curtobacterium flaccumfaciens 56 , Glutamicibacter halophytocola 57 and Frigoribacterium endophyticum 58 , as well as by a specific pattern of environmental bacteria, from soil, air, and fresh and marine water ecosystems. Similarly, both clusters C3 and C4 are characterized by a peculiar combination of environmental microorganisms from different sources, including soil, fresh and marine waters, and airborne microbial ecosystems.

Discussion
In order to explore connections between the local air microbiome, atmospheric pollution and meteorological factors, here we provide a longitudinal survey of the near-ground AM, atmospheric particulate and atmospheric parameters in Savona, Italy. According to our findings, the local AM appears dominated by the phyla Proteobacteria, Firmicutes, Actinobacteria and Bacteroidetes, well matching the general layout of an AM community 4,25 .
The application of the PMF receptor modelling on the chemical compositional pattern of the PM 10 samples collected during the field campaign allowed the identification of seven emission sources: "crustal material and road dust resuspension", "traffic and biomass burning", "secondary nitrate", "secondary ammonium sulphate", "naval-maritime transport", "coal burning" and "sea spray". Each source factor was subsequently subjected to anemological analysis based on polar plots, allowing each emission source to be associated with the corresponding wind direction to which the receptor site is downwind, similarly to the approach used by Innocente et al. 28 . Specifically, emission sources as "crustal material and road dust resuspension", "secondary nitrate", "secondary ammonium sulphate" and "coal burning" were associated with winds blowing from the inland toward the sampling site, intercepting substantially traffic and industrial particulate sources. Conversely, emission sources such as "naval-maritime transport" and "sea spray" were associated with a sea breeze, supporting a marine origin for both. Finally, the "traffic and biomass burning" emission source mostly showed a local origin. It is known that bacterial communities structure and composition can be influenced by environmental conditions, such as seasons, air masses origin and landscape characteristics 28,31,59 , as well as PM 10 chemical Specifically, the emission source factor 5 (naval-maritime transport) and relative humidity are both positively correlated with the PCo1 axis (Kendall correlation test, FDR-corrected p-value ≤ 0.001), while the emission source factor 6 (coal burning) is negatively correlated with the PCo1 coordinates (p-value ≤ 0.001). For each AM cluster, the proportion of samples based on the sampling time (from February (dark blue) to July (yellow)) is shown as a pie chart. www.nature.com/scientificreports/ compositional pattern 27 . When we explored the AM structure variation during the observation period, we were able to identify four distinct clusters of samples, named C1 to C4. Interestingly, the four clusters were associated with a peculiar combination of seasonality, meteorological variables and emission sources, as integrated into factors by PMF analysis. In particular, the AM cluster C1 was associated with the "coal burning" emission source, suggesting not actually the industrial facility as a microbiome source, but rather the influence of an air mass whose transport over a given district harvests chemical and microbiological components along the same tropospheric path. Instead, the cluster C3, most represented in the warm period, probably has a marine origin due to its association with the "naval-marine transport" emission source and high relative humidity. Finally, the clusters C2 and C4 did not show any specific association with the aerosol sources assessed by PMF, even if they showed a different seasonal behaviour, with C4 being more represented in the cold period. The results obtained by the application of such double-step multivariate analysis-associating multidimensional microbiome and chemical datasets-represent the true novelty of our study. Indeed, we were able to associate the AM composition to well discriminated emission sources, each with a characteristic chemical composition and origin.
The four AM clusters revealed a distinct, well-defined compositional structure, each being enriched with a specific set of microbial families and OTUs. The specificity of each bacterial profile de facto serves as a microbiological fingerprint, allowing to single out the probable microbiome sources characterizing each cluster that, similar to what occurs to abiotic particles, allow to trace back the origin of the air mass. In particular, the clusters C3 and C4 substantially reflect interconnected environmental microbiomes, encompassing a specific combination of microorganisms from soil resuspension, as well as from marine and fresh waters (possibly from rivers and streams flowing into the Ligurian Sea) and from the air. C2 cluster reveals the plant microbiome as an additional source, showing a further combination of plant-associated and environmental microorganisms, due to the contact of air masses over a vegetation landscape. Interestingly, the feasibility of air mass tracing also using bacterial species clearly emerges when we observe in detail the compositional structure of C1. This is in fact the only AM cluster carrying a recognizable pool of bacterial moieties of faecal origin, which are consistently part of the animal gut microbiome, suggesting not only a well-defined origin but also the potential use of this information in the assessment of microbiological impacts. Faecal material as a potential source of bacteria was already reported by Bowers et al. 24 . It should be noted that in the area upwind C1 no sewage treatment plant as a possible source of faecal microbiome was present at the time of sampling. However, the area is very densely populated and forested areas populated by local fauna are closely found within a few kilometres.
Taken together, our data on the temporal dynamics of the near-ground AM in Savona, highlight the relevant degree of plasticity of AM over time. As such, we demonstrated how meteorological factors (e.g. wind direction and humidity) and atmospheric pollution (particles emission sources) can combine in shaping the AM configuration. In particular, coal burning and winds blowing from the inlands mix to establish a characteristic AM with a prevalence of aerosolized faecal microorganisms, regardless of seasonality. Conversely, in the summer season, humidity, sea breeze and naval-marine transport pollutants result in an AM mainly originating from environmental microbiomes, including microorganisms that are typically found in seawater and soil. Even if we were not able to establish connections between the other identified emission sources and specific AM clusters, we would stress the importance of seasonality in shaping the AM structure. Indeed, the variation between the clusters C2 and C4, for which no connection with any emission source was observed, was shown to be dependent on the sampling period, with the cluster C2 most prevalent during the warm season and including plant microbiomes as possible characteristic sources, as previously observed by Franzetti et al. 60 and Bertolini et al. 61 .
In conclusion, our results suggest that, in an urban settlement, the influence of air masses harvesting chemical components from industrial sources may increase the proportion of aerosolized faecal microorganisms in the atmosphere, ultimately increasing citizens' exposure to faecal microbes. Similar results have recently been obtained by exploring AM in Beijing over 6 months 23 . Our findings strengthen the importance of including the monitoring of the AM compositional structure as a determinant factor in the currently used air quality indexes. Indeed, in urban areas, the possible increased exposure to faecal-associated microbiomes as a result of increasing pollution can pose a possible threat to human health, particularly in regions with high-intensity animal farming, due to the inherent propensity of opportunistic pathogens to aerosolize.

Materials and methods
Site description. The PM 10 samples treated in this work were collected in Savona, one of the main towns in the Ligurian region (Fig. 5). The whole region overlooks the Tyrrhenian sea but is entirely occupied by the Apennine range down to the coast, where only a narrow strip of plain is present. Therefore, the coastal area is densely inhabited and crossed by an extremely busy traffic road mainly connecting Italy to France. Besides being occupied by a medium-size heavily inhabited urban settlement, the Savona district also hosts a wide industrial area, including a coal-fired power plant active at the time of our experimental field activity and a large and very busy harbour. The climate of this site is classified as warm-temperate (Csa, according to Köppen and Geiger classification) 62,63 with an average annual temperature of 14.6 °C and average precipitation of 910 mm (https :// en.clima te-data.org, accessed 28/07/2020). Intense northern winds characterize the circulation in winter 64 , while sea-land breeze regimes prevail in the warm season, usually starting from March 65,66 . The air sampling site was located at an altitude of 12 m on the sea level in a rural area as classified according to the EU Directive 2008/50/ EC on Air quality, with a distance of 2 km from the sea. www.nature.com/scientificreports/ man PM 2.5 PTFE). Samples were stored frozen in the dark at − 10 °C until processing. In this work, PTFE membranes were used for gravimetry, ion chromatography and elemental analysis with particle induced X-ray emission and inductively coupled plasma mass spectrometry, while quartz membranes were used for the analysis of carbonaceous macrocomponents and microbiology. A subset of 98 samples, uniformly distributed across the sampling period, was used for the analyses reported in the present paper. During the sampling campaign, meteorological parameters were measured simultaneously on site using a Davis Vantage Pro2 Weather Station (Davis Instruments, Hayward, CA), placed in proximity of the PM 10 sampler, for the measurement of temperature, pressure, relative humidity, rainfall, and wind direction and speed with a time resolution of 30 min. Subsequently, the data obtained were averaged on a daily scale (Supplementary Table S2), i.e. at the same time resolution as the PM 10 samples, using the "openair" package 67 of the R software (version 3.6.1) 68 .

Sample collection and atmospheric parameters.
Chemical characterisation of the samples. Chemical characterization of PM 10 filters was carried out using several analytical techniques. First, PM 10 mass load (μg/m 3 ) was determined by gravimetric analysis. Elemental and organic carbon were determined on quartz membranes by thermal-optical transmittance analysis (TOT), as previously described 69  www.nature.com/scientificreports/ Positive Matrix Factorization analysis. Positive Matrix Factorization (PMF) is an advanced multivariate factor analysis technique widely used in receptor modelling for the chemometric evaluation and modelling of environmental datasets 3,32,[71][72][73][74][75] . PMF allows the identification and quantification of the emissive profile of a receptor site, i.e. the monitoring site where an air quality station is operated. We applied EPA PMF 5.0 software 76 . The dataset was checked and re-arranged prior to PMF modelling according to the model criteria previously described 76 and, after data pre-processing, a concentration matrix of 98 samples × 25 variables was obtained. After careful evaluation of the input data and uncertainty matrices, an optimum number of factors was found by analysing the values of Q, a parameter estimating the goodness of the fit performed 77 , and the distribution of residuals. In order to assess the reliability of the model reconstruction, measured (input data) and reconstructed (modeled) values together with the distribution of residuals were compared. Our results indicated a good general performance of the model in reconstructing PM 10 (coefficient of determination equal to 0.79) for most variables. In order to confirm the results of receptor modelling, the origin of the air masses associated with the factors obtained was investigated through the creation of wind polar plots using the source contribution of the factors produced by PMF. In particular, polar plots were produced for each single PMF factor using the "openair" package of R 67 , utilizing the conditional probability function (CPF) 78 with an arbitrary threshold set to the 75th percentile.
Microbial DNA extraction, 16S rRNA gene amplification and sequencing. Microbial

Bioinformatics and statistics.
A pipeline combining PANDAseq 82 and QIIME 2 83 was used to process raw sequences. DADA2 84 was used to bin high-quality reads (min/max length = 350/550 bp) into amplicon sequence variants (ASVs). After taxonomy assignment using the VSEARCH algorithm 85 and the SILVA database (December 2017 release) 86 , the sequences assigned to eukaryotes (i.e. chloroplasts and mitochondria) or unassigned were discarded. Three different metrics were used to evaluate alpha diversity-Faith's Phylogenetic Diversity (PD whole tree) 87 , Chao1 index for microbial richness, and number of observed ASVs-and unweighted UniFrac distance was used for Principal Coordinates Analysis (PCoA). Permutation test with pseudo-F ratio (function "adonis" in the "vegan" package of R), Kruskal-Wallis test or Wilcoxon rank-sum test were used to assess data separation. Kendall correlation test was used to determine associations between the PCoA coordinates of each sample and the factors identified by the PMF analysis. P-values were corrected for multiple testing with the Benjamini-Hochberg method, with a false discovery rate (FDR) ≤ 0.05 considered statistically significant. All statistical analyses and respective figures were produced with the R software 68 using "Made4" 88 and "vegan" (https ://cran.r-proje ct.org/web/packa ges/vegan /index .html) packages. Clustering analysis of family-level AM profiles, filtered for family subject prevalence of at least 20%, based on the SILVA taxonomy assignment, was carried out using hierarchical Ward-linkage clustering based on the Spearman correlation coefficients. We verified that each cluster showed significant correlations between samples within the group (multiple testing using the Benjamini-Hochberg method) and that the clusters were statistically significantly different from each other (permutational MANOVA using the Spearman distance matrix as input, function adonis of the vegan package in R). Additionally, PANDAseq assembled paired-end reads were also processed with the QIIME1 89 pipeline for Operational Taxonomic Units (OTUs) clustering based on 97% similarity threshold. Taxonomy was then assigned using the SILVA database. OTUs were processed through the R package "MaAsLin2" 90 to determine their association with microbial clusters. Kruskal-Wallis test was used to find OTUs whose relative abundance was significantly different among microbial clusters. The resulting OTUs were taxonomically assigned against the NCBI 16S rRNA database using the BLAST algorithm (https ://blast .ncbi.nlm.nih.gov/).