Air mass source determines airborne microbial diversity at the ocean–atmosphere interface of the Great Barrier Reef marine ecosystem

The atmosphere is the least understood biome on Earth despite its critical role as a microbial transport medium. The influence of surface cover on composition of airborne microbial communities above marine systems is unclear. Here we report evidence for a dynamic microbial presence at the ocean–atmosphere interface of a major marine ecosystem, the Great Barrier Reef, and identify that recent air mass trajectory over an oceanic or continental surface associated with observed shifts in airborne bacterial and fungal diversity. Relative abundance of shared taxa between air and coral microbiomes varied between 2.2 and 8.8% and included those identified as part of the core coral microbiome. We propose that this variable source of atmospheric inputs may in part contribute to the diverse and transient nature of the coral microbiome.


Sample Recovery
Our study took advantage of a persistent flat-calm sea state between 30 th September to 22 nd October 2016 (www.marineweather.net.au) that minimised interference from microorganisms that are aerosolised by marine spray in heavier sea states at the Great Barrier Reef. The voyage of the RV Investigator circumnavigated the reef. We sampled air mass daily from the deck of the research vessel approximately 25m above the sea surface. Massive bulk-phase air volumes (18-108 m 3 per sample) were collected for each discrete sampling station. Overall, we retrieved 53 independent bulk-phase air samples (Supplementary Information, Table S1).
We employed a high-volume liquid impinger apparatus (Coriolis µ, Bertin Technologies, France) and collected aerosol particulate fractions directly into phosphate-buffered saline (PBS) using collection cones supplied by the manufacturer.
Evaporation was compensated for using a peristaltic pump set to between 0.5 and 1.3 mL per minute with PBS. Random collection cones were also assembled into the machine but not activated, and these were used as the negative controls. All collection cones were soaked in 1.5 % sodium hypochlorite (NaClO) then washed with 70 % ethanol and three washes of Milli-Q H20 before being filled with filtered PBS. All sampling equipment was disassembled between locations and cleaned with NaClO, ethanol and Milli-Q H20. Samples were processed immediately on board the vessel.
Back-trajectories of air mass arriving at each sampling interval were generated using the National Oceanic and Atmospheric Administration (NOAA) HYSPLIT-WEB model (https://ready.arl.noaa.gov/HYSPLIT.php) (Fig. S1). Calculations were made with a 3 day back trajectory because this is estimated as the average residence time for microbial cells in air [1] (Fig. S1). We acknowledge that large variation is inherent in models of atmospheric transport and uncertainty increases with duration of the back trajectory estimate. With our model the estimated trajectories greater than two weeks resulted in diffuse origins for all samples from above the Southern Ocean. The HYSPLIT back trajectories were calculated using the GDAS database and the model vertical velocity option. The continental or oceanic origin of air mass was further verified by measuring the concurrent concentration of atmospheric radon gas on board the research vessel in real time by the Australian Nuclear Science and Technology Organisation (ANSTO) (Supplementary Information, Table S1) [2], with the hypothesis that radon levels are typically higher from back trajectories originating over continental Australia compared with those that originate from the ocean. Radon is emitted only from unsaturated, unfrozen terrestrial surfaces and has a half-life of 3.8 days, making it a powerful tracer of continental air transport and mixing.

Environmental DNA sequencing and bioinformatics
Airborne microbial samples suspended in PBS were filtered onto a 25 mm 0.2 µm polycarbonate filter and stored frozen until processed. Total DNA was directly extracted using a CTAB protocol [3]. DNA yield for these ultra-low biomass samples was quantified using the Qubit 2.0 Fluorometer (Invitrogen) (Supplementary Information, Table S1). Samples were then stored at -20 °C until processed. We used DNA yield as an indirect estimate for biomass.
Illumina MiSeq libraries were prepared as per manufacturer's protocol Sequencing data for bacterial 16S rRNA gene and fungal ITS1 amplicons was processed based on the DADA2 v1.8 [9] pipeline. Primers sequences were removed using cutadapt [10] to remove forward (CCTACGGGNGGCWGCAG) and reverse (GACTACHVGGGTATCTAATCC) for 16S rRNA gene, and forward (CTTGGTCATTTAGAGGAAGTAA) and reverse (CTTGGTCATTTAGAGGAAGTAA) for fungal ITS1 region. Bacterial reads were uniformly trimmed to 280 bp (forward) and 250 bp (reverse) and then filtered by removing reads exceeding maximum expected error of 2 for forward reads and 5 for reverse reads or reads containing ambiguity N symbol. Fungal reads were untrimmed (due length hyper-variability of the region) and reads exceeding maximum expected error of 5 for forward reads and 8 for reverse reads, or those with ambiguity N symbols were removed. The reads were used to train the error model and then dereplicated to acquire unique sequences, which were used to infer sequence variants with the trained error model. The forward and reverse reads were merged and chimeric sequences were removed. For bacteria we used amplicon sequence variants (ASVs) to assign taxa since this has been shown as the most robust method currently available for bacterial 16S rRNA gene-defined taxa identification [11].
ASVs were given taxonomic assignment using DADA2 with SILVA nr v132 database [12] for bacterial reads to provide species level assignment based on exact match between ASVs and known reference sequences. . Fungal reads were taxonomically classified using UNITE v7.2 database [13]. All sequence data generated by this study has been submitted to the EMBL European Nucleotide Archive (ENL) under BioProject PRJEB31630 with sample accession numbers ERS3215240 to ERS3215312.
The resulting taxa were then processed as previously described [14] [15]. The R packages phyloseq [16], DESeq2 [17] and ggplot2 [18] were used for downstream analysis and visualisation including ordination and alpha/beta diversity calculations.

Meta-analysis of atmospheric microbiome with coral microbiomes
A meta-analysis of our data with publicly available sequence data from coralassociated microbiomes was conducted. We first searched for relevant recent published studies and constrained our search as follows: The first 20 matches by relevance from the search parameters ["16S" "coral"] and ["16S" "reef"] from 2015 to 2019 with Google Scholar were employed to determine if there was any publicly available next-generation sequencing data that encompassed the compatible 16S rRNA gene region. Based on these criteria, we identified a single publication [20] which allowed for direct ASV analysis. Another dataset with limited overlap was also assessed based on similarity searches of representative sequences to determine shared taxa [21]. Some excellent studies on Great Barrier Reef coral microbiomes could not be included because they have employed the V1-V3 region of the 16S rRNA gene for bacteria and this does not overlap with the V3-V4 region used in our study. We also searched nucleotide databases (NCBI and EMBL) for unpublished but publicly available coral microbiome sequences using the same search criteria. This search revealed three entries in the top 50 matches by relevance but our initial quality checking revealed these were unsuitable for further inclusion in our analysis.
PacBio SMRT sequencing of 16S rRNA gene amplicons from corals of Gulf of Thailand and Andaman Sea was conducted by Pootakham et al., (2017) [20]. The raw reads were retrieved from NCBI Short Read Archive (accession SRR5149735) via NCBI SRA toolkit (https://github.com/ncbi/sra-tools/wiki). As the reads from this study extend beyond those of our study (V3-V4), we were able to process the reads in a manner similar to our study and trim the reads based on the primer regions of our study (341F/785R). This resulted in reads targeting homologous regions for comparison between the two studies. The trimmed reads followed a similar processing pipeline as our study (dada2 for ASV-calling and taxonomic classification, see below), which enabled direct and precise comparison of ASV sequences.
The Wainwright et al., (2019) study [21] conducted paired-end Illumina sequencing of 16S rRNA gene amplicons of the coral Pocillopora acuta around Singapore. The coral study utilised comparable data processing pipeline (specifically dada2 to generate ASV) but used a different set of primers to amplify 16S rRNA gene (V4 vs our V3-V4 region). However, a significant portion of the partial 16S rRNA gene amplicon still overlapped (~250 bp). The ASVs from our study were searched against reference ASVs(https://github.com/gzahn/Pocillopora_Bacteria) from Wainwright et al. (2019) using the usearch v10.0.240_i86linux32 global function [22]. Both strand orientations were searched (-strand both) and a minimum identity threshold (-id) of 0.97 was set. The ASVs were sorted with descending number in relation to mean relative abundance (i.e. ASV1 has highest mean relative abundance).
We also conducted a more relaxed analysis where the 100 most abundant bacterial and fungal taxa at genus level (Table S1-3) were compared to published studies that used different approaches for coral microbiome diversity estimation [23][24][25][26][27][28][29][30][31][32][33][34][35]. Microorganisms unassigned at the Genus taxonomic level were excluded from further analysis. A weighted average of the relative abundance was calculated for each ASV identified in this study, grouping samples with an ocean air mass source, and a continental air mass source. Near full length 16S rRNA gene sequences of coral associated bacteria were searched against our ASV sequences using usearch_local function in usearch v10.0.240 to identify highly similar matches.
For the fungi no comparable sequence data for coral-associated fungi is available. We therefore performed an approximate comparison at the Genus level with reference to published descriptions of coral-associated fungi [23,24,28,29]. There are no bacterial or fungal taxa that have yet been unequivocally been identified as either obligate symbionts [27] or pathogens [36] of corals.

Statistical Treatments
Community phylogenetic metrics including were calculated using the R [37] packages picante [38], ape [39], phylobase [40], adephylo [41], and phytools [42]. We tested partitioning of Bray-Curtis dissimilarity matrices by air mass origins (continent or ocean) through PERMANOVA, using the adonis function from R package vegan 2.5-5 [43] with 999 unrestricted permutations. Phylogenetic trees for community phylogenetic structure analysis were constructed for all OTUs with FastTree v2.1.9 on multiple alignment of sequences produced by MUSCLE v3.8.31. For Bacteria an approximately Maximum-Likelihood approach was used whilst for Fungi an alignmentfree distance approach with Neighbour-Joining method was employed in order to generate a ITS-based phylogenetic tree for community metrics [44]. The distance approach was used for Fungi as the hypervariability of ITS1 loci hinders multiple sequence alignment required for most phylogenetic analyses. To validate this approach, we compared our tree topology with the most recent whole genome phylogenies for the Fungi [45,46]. This approach was robust at higher taxonomic levels (there is no consensus for fungal phylogenies using multiple loci or whole genomes below Order rank) and has been used successfully with other eukaryotic taxa as a workflow for Net Relatedness analysis [47]. Although we acknowledge limitations to this approach, the advantages of using ITS loci for taxonomic identification vastly outweighed its shortcomings, and we also triangulated data from this test with additional analytical approaches, so overall our inclusion of this test is justified and interpreted conservatively. Mean phylogenetic distance (MPD) [48] was calculated to measure phylogenetic distance between ASVs and OTUs in each sample. A null model algorithm based on independent swap (999 randomisations) was used to test the extent of phylogenetically clustering (positive values for the NRI effect size) or over-dispersion (negative values for the NRI effect size) [49].

1.
Burrows     Geographical distance (m) Bray Curtis dissimilarity Figure S6. Relative abundance of bacterial taxa between air and coral sequence libraries for all taxa with >0.05% mean relative abundance. a) direct ASV matches between our study by source and the Pootakham et al coral microbiome [20], b) direct ASV matches that are classified at genus level between our study and the Pootakham et al coral microbiome [20] c) shared ASVs (by ≥97% sequence identity) between our study and the Wainwright et al coral microbiome [21]. . Table S2. Relative abundances of the 100 most abundant bacterial ASV's from genera identified in the literature as components of coral associated microbiomes. ND = not detected. Table S3. Relative abundance of the most abundant fungal ASV's at genus level to genera identified from the literature as components of coral associated microbiomes.