Horizontal gene transfer (HGT) is thought to be an important driving force for microbial evolution and niche adaptation. This process is often driven by mobile genetic elements (MGE), such as transposons, integrons and phages [1,2,3]. It has been reported that genetic exchange between bacteria living in a biofilm is higher than between those in a free-living/planktonic state [4, 5], which is thought to be mainly due to short physical distances between microorganisms as a result of the high cell densities often found in biofilms [6]. For example, Gold and Moellering created an artificial biofilm system, which consisted of a Bacillus subtilis strain carrying a tetracycline resistance gene and a sensitive Staphylococcus species. From this biofilm, Staphylococcus isolates with resistance to tetracycline were recovered and were shown to be carrying the resistance gene from the Bacillus strain [7]. Indeed, biofilms have been extensively studied for their contribution to the horizontal dissemination of antibiotic resistance genes; however, the vast majority of these studies have used artificial, lab-based biofilm set-ups involving mostly only one donor and one recipient bacterium of medical relevance [8]. The true extent to which HGT takes place and what functions are being shared in more complex and natural biofilm systems remains, however, largely unknown.

Marine macroalgae (seaweeds) are naturally covered in dense biofilms and the diversity and function of their associated microbiota is distinct from those of the surrounding seawater [9,10,11]. Macroalgal-associated microbial communities are often enriched in gene functions involved in vitamin synthesis, the detection and attachment to host surfaces, biofilm formation, polysaccharide catabolism and various defence and stress mechanisms [12,13,14,15]. In addition, functions associated with HGT were found to be more abundant in the microbial communities associated with the green macroalga Ulva australis compared to planktonic communities [13]. These observations suggest that HGT might play a role in the distribution of certain functional genes among the members of macroalgae-associated communities. However, beyond these observations, there is no further knowledge on the role of HGT in macroalgae-associated microbial communities.

To address this specific issue and broaden our understanding of HGT in complex, natural biofilm communities, we generated an extensive metagenomic dataset (764 Gbp from 90 samples) for the biofilm community of the kelp (large brown macroalga) Ecklonia radiata. Aside from being a major habitat former in temperate rocky shores [16], this kelp hosts abundant and diverse epiphytic communities, which have an important role in its health and performance [17, 18]. Based on the reconstruction of metagenome-assembled genomes (MAGs) and a newly developed pipeline for HGT detection called MetaCHIP [19], we investigated here the HGT and the enrichment of gene function in this kelp-biofilm community in comparison to planktonic microbiomes derived from the Tara Ocean survey.


Sample collection, DNA extraction and sequencing

Surface swabs of microbial biofilm communities were obtained from Ecklonia radiata individuals as previously described [18]. Briefly, kelp samples were identified based on their unique morphological properties [18], collected and brought to shore, where they were rinsed with sterile seawater to remove loosely-associated (i.e. planktonic) microorganisms. An area of 20 cm2 of a secondary lamina at mid-thallus was then swabbed for 30 s with a sterile cotton swab, which was immediately placed in a sterile tube in liquid nitrogen. Kelp swabs from n = 3 individuals were collected ~bi-monthly from four locations (Salmon Haul, Cobblers Beach, Bare Island and Balls Head) in Sydney, Australia, from April 2016 to June 2017 in order to cover spatial and temperal variability in the biofilm community (Table S1). Genomic DNA was extracted from these swabs and sequenced on an Illumina HiSeq 2500 sequencer according to standard protocols developed by the Australian Microbiome Initiative ( A total of 90 kelp swabs were analysed yielding 764 Gbp of raw sequencing data. Sequencing data are available through the Australian Microbiome data portal ( with the accession numbers given in Table S1.

Metagenomic assembly, binning and dereplication

Raw sequencing reads of the kelp-biofilm communities, were quality trimmed with Trimmomatic v0.36 [20] with parameters specified as ʻHEADCROP:15 LEADING:3 TRAILING:3 SLIDINGWINDOW:4:15 MINLEN:100’ and subsequently assembled with metaSPAdes v3.11.1 [21] with default parameters. Scaffolds smaller than 2500 bp were removed. The coverage of the remaining scaffolds was determined by mapping of the quality-filtered reads using bowtie v2.3.2 [22]. After converting and sorting the mapping format using samtools [23], the coverage was determined with the jgi_summarize_bam_contig_depths script [24]. MAGs were generated using MetaBAT v0.32.5 [24] with default parameters and MyCC [25] with the 56 kmer option and subsequently refined using the Binning_refiner v1.2 [26] with default parameters. MAGs were directly taken from the TARA Oceans project, which covers a substantial fraction of the spatial and temporal variation for the planktonic microbial communities of the world’s ocean, including temperate waters in the Southern Pacific Ocean [27]. The completeness and contamination of MAGs were determined with CheckM v1.0.7 using the lineage_wf workflow [28]. Only MAGs with completeness greater than 50% and contamination lower than 5% were kept for further analysis. These MAGs were then separately dereplicated for the two community types (i.e. kelp-biofilm and plankton) at 99% average nucleotide identity using dRep v2.3.2 [29]. The actual genome size was estimated by dividing the size of the MAG by the estimated completeness.

Taxonomic classification, functional enrichment analysis and HGT detection

Taxonomic classification of MAGs was performed with GTDB-Tk v0.1.3 [30]. Protein sequences were predicted with Prodigal [31] using the-meta mode A phylogenetic tree for the dereplicated MAGs was constructed using the protein sequences of 43 universal single-copy genes [19, 28]. Functional annotation of the MAGs was performed by running similarity searches with Diamond v0.9.24 [32] at an e value threshold of <0.001 against the Cluster of Orthologous Groups of proteins (COG, December 2014 release) [33, 34] and the Kyoto Encyclopaedia of Genes and Genomes Orthology (KEGG) [35] databases as well as by running dbCAN ( [36] against the CAZy database (July 31th, 2019 release) [37].

Functions that were enriched in the MAGs in either community type were identified using Mann–Whitney U tests followed by a Bonferroni correction with a p value cut-off of 0.05 being considered significant. Only significantly different functions with greater than twofold mean differences were considered to be enriched. Functions detected only in the MAGs from one community type were considered to be enriched if they existed in at least 50% of the MAGs.

HGT among MAGs was detected with MetaCHIP v1.2.0 [19] at phylum, class, order, family and genus levels. MetaCHIP uses best-hit analysis between defined phylogenetic/taxonomic groups to identify candidate genes for HGT, which are subsequently further refined using phylogenetic analysis and the reconciliation of species and gene trees. This gives high confidence in the prediction and allows to determine the direction of transfer. HGTs detected at all taxonomic levels were combined and then dereplicated. Horizontally transferred functions were considered to be enriched if their relative proportions in the HGTs dataset were above the 75% percentile of the relative abundance across the entire dataset.

Results and discussion

Phylogenetic composition and genome size analysis of MAGs

After quality filtering and dereplication, 300 and 730 MAGs were obtained from the kelp-biofilm and planktonic communities, respectively. Taxonomic classification revealed that they are spanning 11 and 31 phyla, respectively. Kelp-biofilm MAGs were most frequently assigned to the phyla Proteobacteria (48.0%), Planctomycetota (22.3%), Bacteroidota (12.7%) and Verrucomicrobiota (7.7%) (Fig. 1), consistent with previous analysis on bacterial communities associated with E. radiata using 16S rRNA gene sequencing and gene-centric metagenomics [14, 18]. Proteobacteria (43.2%) and Bacteroidota (13.6%) were the two most frequent bacterial phyla for the planktonic MAGs. The proportion of Planctomycetota for the planktonic MAGs (4.8%) was lower than for the kelp-biofilm MAGs (22.3%) (Fig. 1), consistent with previous observation that this phylum is often enriched on macroalgae [9, 38,39,40].

Fig. 1: Phylogenetic relationship of the 1030 MAGs based on the protein sequences of 43 universal single-copy genes.
figure 1

Internal branches of the tree are coloured by phylum. The inner ring displays MAG’s lifestyle (kelp-biofilm or planktonic), while the outer ring shows the estimated genome size (in Mbp).

The average estimated genome size for kelp-biofilm and planktonic MAGs were 3.77 ± 1.47 and 2.62 ± 1.28 Mbp, respectively (Fig. 1), which was statistically different (t = 12.52, df = 1028, p = 1.43e−33). Genome size of planktonic prokaryotes have previously been found to positively correlate with cell sizes [41], with a smaller cell size benefitting nutrients acquisition due to the increased surface-to-volume ratio [42]. Marine planktonic bacteria have been proposed to reduce their genome size in response to metabolic constraints in nutrient-limited seawater [43]. A collateral of such a conclusion is also that the relatively larger genome size of kelp-biofilm bacteria would indicate that nutrient limitation is not driving a genome reduction, which is consistent with measurements of relatively high nutrient concentrations on the surface or surroundings of seaweeds [44, 45].

Enrichment of functions in kelp-biofilm MAGs

We have previously used gene-centric metagenomics to analyse the functionality of entire microbial communities on different types of host surfaces (e.g. macroalgae, seagrasses) and found an enrichment of functions related to host-specific properties or a surface-associated lifestyle [13, 14]. Here we extended this analysis and investigated which specific functions were enriched on the individual genome-level across the communities associated with E. radiata when compared to genomes found in seawater (Table S2 and Fig. 2). We found 164 out of 4350 COG functions, 180 out of 6582 KEGG functions and 27 out of 261 CAZy function to be enriched in the kelp-biofilm genomes when compared to the seawater genomes. Function in the COG categories transcription (K), signal transduction mechanisms (T), carbohydrate transport and metabolism (G), cell wall/membrane/envelope biogenesis (M) and inorganic transport and metabolism (P) were most prevalent in the enriched function of the kelp-biofilm MAGs.

Fig. 2: Enrichment of functional categories in kelp-biofilm bacteria.
figure 2

The number of COG functions aggregated by category that are enriched in MAGs from kelp-biofilms relative to planktonic communities.

Specifically, the most enriched functions in the kelp-biofilm MAGs include pilus assembly and adherence proteins (e.g. COG4961, COG4965, COG4964, K02651, K02279, K02280, K02282, K12510, K12511, K02278, K02283; all >4× enrichment), which speaks to the need for bacteria to be able to attach to the macroalgal surface [11]. Gene functions for conductance mechanosensitive channels (COG1970, K03282, K03442) and aquaporins (K06188), which regulate water and osmolyte flow across the cytoplasmic and outer membrane, respectively [46, 47], were also enriched and these might play a role in osmoadaptation within the steep solute gradients (i.e. diffusion boundary layer) found on algal surfaces [48, 49]. In addition, the enrichment for a number of transcriptional regulators (e.g. COG3682, COG1695, COG1510, COG3655, COG3279, K16137, K05800, K10778) and signal transducers (e.g. COG0515, COG2972) indicate the need to regulate physiological response to environmental conditions, which has also been seen in community-level analysis of macroalgal microbiomes [13, 14]. Kelp-biofilm MAGs were furthermore enriched in enzymes involved in the degradation of fucoidans (CAZy CBM47, GH29, GH95), alginates (PL6, PL7, PL14, PL15, PL17) and xylans (CE3, CE6) (Table S2), which are common carbohydrate polymers in brown algae [50, 51].

Mapping of these function across the 300 kelp-biofilm MAGs revealed that gene functions for pilus assembly and adherence were virtually absent in the Verrucomicrobiota, but consistently present and abundant in MAGs assigned to the Actionobacteriota and the classes Planctomycetes and Alphaproteobacteria. In contrast, conductance mechanosensitive channels and aquaporins appear to be more evenly distributed across the entire phylogenetic breadth, but with a perhaps patchier distribution in MAGs of the Alphaproteobacteria. Enriched transcriptional regulators and signal transducer function were also evenly distributed across all taxa, but appeared in particular high relative abundances in members of the class Planctomycetes. Finally, gene functions associated with carbohydrate degradation were found in representatives of all major phyla and classes, but genes involved in alginate degradation were rarely found among the Verrucomicrobiota and Actinobacteriota MAGs, while those involved in degradation of fucoidans and xylan were less abundant in MAGs from the class Alphaproteobacteria (Fig. 3). Together this illustrates how certain enriched functions exist across the full taxonomic breadth of our dataset, while other enriched functionality seemed to be performed by particular taxa. The former would provide a degree of functional redundancy within the biofilm communities [13], while the latter would allow for niche differentiation of some taxa.

Fig. 3: Distribution of kelp-biofilm enriched functions across the phylogenetic and taxonomic diversity.
figure 3

Branches are coloured by phylum-level taxonomy, while the two inner-most rings show class and order taxonomies. Square root of relative abundances for enriched gene function are shown as heat map rings and are based on aggregated values for the annotation ID given in the text.

Functional analysis of horizonal gene transfer

Given the clear selection of specific functions in bacteria associated with the biofilms of E. radiata and their distribution across community members, we next investigated how HGT contributes to the functional gene profiles of their genomes. Our HGT detection algorithm MetaCHIP identified 629 and 2085 transfers from the 300 kelp-biofilm and 730 planktonic MAGs (or an average 2.10 ± 5.09 and 2.86 ± 5.70 transfers per MAG), respectively. 597 of the genes transferred between kelp-biofilm MAGs were annotated to 360 unique COG functions, while 426 genes could be assigned to 295 unique KEGG functions. Five HGT in the kelp-derived dataset were assigned to unique CAZy functions. Functional annotation against the COG database revealed that functions belonging to the carbohydrate (G), lipid (I) and secondary metabolites (Q) transport and metabolism were preferentially subject to HGT between the kelp-biofilm MAGs (Fig. 4). COG categories for metabolism processes, energy production and conversion (C); amino acid (E) and inorganic ion (P) transport and metabolism were enriched for HGTs in both the kelp-biofilm and planktonic MAGs (Fig. 4). Table 1 shows those specific functions subjected to HGT that are further discussed here (see Table S3 for a complete listing).

Fig. 4: Enrichment analysis of HGTs in kelp-biofilm and planktonic MAGs.
figure 4

The boxes in the plot represent the relative abundances of COG categories in the entire MAG dataset and are bound by the 25–75% quartile proportions with the thick line being the median value. Q1, Q3 and IQR refer to the 25%, 75% and interquartile range, respectively. The upper whisker refers to the largest observation less than or equal to upper Q3 + 1.5 × IQR, while the lower whisker refers to the smallest observation greater than or equal to Q1 − 1.5 × IQR. Symbols show the relative abundance of COG categories subject to HGT and are highlighted, if enriched (red triangle), depleted (blue triangle) or neither (grey square).

Table 1 Horizontally transferred gene functions in bacteria associated with kelp-biofilms.

The enrichment of transport functions of COG categories G, I and Q was reflected in the high number of HGT assigned to KEGG pathways C_02000 (Transport) and C_02010 (ABC transporters) (Table S4). Specific functions identified cover the predicted transport of the metabolites glycerol-3-phosphate (permease component of ABC transporter, COG1653), sugars (permease component of ABC transporter, COG1175 and K02027), peptides or nickel (ABC transporter, COG0601, K02035, K02032, KO2033, K02034), branched amino acids (ABC transporter, COG0410, K01996, K01998, K01997, K01995, K1999), mannitol (periplasmic compound of the TRAP-type transporter, COG4663) and ammonium (K03320, COG0004) (Table 1). Sugars, including mannitol, as well as proteinaceous material are abundant and prevalent constituents of the extracellular matrix of brown algae [52, 53] and there might be a selective advantage for bacteria to be able to efficiently take up these as nutrients or energy sources. In contrast, brown algae have so far not been noted to excrete excess ammonium, but instead might rely to some extent on ammonium released by invertebrates associated with their surface [54]. This could indicate that the surface-associated biofilm is indeed nitrogen limited, making the need for efficient ammonium uptake by bacteria even more important.

Utilisation of specific sugars might further be facilitated by acetyl xylan esterases (CAZy ID CE6) that were shared within the bacterial community and were also enriched in the kelp-biofilm bacteria (see above). Related to polymer degradation are also the frequently transferred arylsulfatases (COG3119), which cleave ether linkages between sulfates and phenolic hydroxyl groups [55]. Brown algae contain and secrete large amounts of phlorotannins, which have broad antagonistic activities (including against bacteria) and are composed of hydroxylated and sometime sulfated poly-benzens linked by ether bonds [56, 57]. Degradation of these phlorotannins could initially involve the attack of the sulfate group through arylsulfatases. Further degradation of phlorotannins could then be performed by the frequently shared genes for catechol 2,3-dioxygenase (COG0346), which is a key enzyme in the phenol degradation pathway [58], as well as the transferred genes for the subunit B, C and E of the ring-1,2-phenylacetyl-CoA epoxidase (K02610, K02611, K02613), the phenylacetate-CoA ligase (K01912) and the oxepin-CoA hydrolase (K02618) (Table 1), which represent the degradation pathway of phenylacetate, a major intermediate in the bacterial catabolism of many aromatic compounds [59]. Interestingly, the genes for phenylacetate degradation were also enriched in the kelp-biofilm MAGs compared to the planktonic ones. This indicates a particular advantage for kelp-biofilm bacteria to share these genes and thus be able to degrade specific, host-derived and antagonistic metabolites.

Bacterial biofilm communities on E. radiata also frequently shared genes involved in oxidative stress response, and in particular the genes for a alkylhydroperoxidase family protein (COG2128) and for a bifunctional enzyme with both catalase and broad-spectrum peroxidase activity (KatG, COG0376, K03782). Alkylhydroperoxidases and KatG have been shown to protect Mycobacterium tuberculosis against toxic reactive oxygen species (ROS), including hydrogen peroxide and organic peroxides, which are produced by macrophages as part of an innate immune response [60, 61]. Seaweeds, including brown algae, also produce ROS in response to pathogen or grazer attacks (so called ʻoxidative bursts’) [62] and algae-associated bacteria need to defend themselves against this defence mechanism in order to ensure their survival [63]. Genes for alkylhydroperoxidase family proteins (COG2128) were also significantly more abundant in the kelp-biofilm MAGs compared to the planktonic MAGs. Sharing as well enriching this function could facilitate the maintenance of stable host-associated communities by providing protection against a ROS-based defence of the macroalgae. Related to stress response are also the frequently shared genes for MoxR-like ATPases (COG0714), which have been described to have an important function in the cell envelope stress response in Rhizobium leguminosarum and well as in the oxidative stress responses in Francisella tularensis [64].

We also noted that kelp-biofilm MAGs frequently shared ribosomal proteins, including S10 (COG0051), S11 (COG0100) and S12 (COG0048) (COG category J) (Table 1). It has been proposed that ribosomal proteins are less likely to be horizontally transferred and therefore their sequences have been widely used for phylogenetic inferences [28, 65]. Interestingly, Jeong et al. have also recently found that some ribosomal proteins (including S10, S11 and S12) are subject to HGT in human-associated microorganisms [66]. While the exact biological advantage for HGT for these conserved ribosomal proteins is unclear, these two independent observations should be considered when using ribosomal proteins for phylogenetic analysis.

Mode of HGT in the biofilm community

The HGTs on the kelp-biofilms were dominated by transfers between member within the same class or order (Fig. 5), in accordance with previous findings that showed HGT are more frequent among closely related organisms [67,68,69,70]. The transfer of functions related to sugar and phlorotannins degradation were found to occur within most orders (and occasionally also between them), which indicates their general importance for being acquired and present in all community members. The same applies for the transport function, however, members of the phylum Planctomycetota and the orders Caulobacterales, UBA10353 and Pseudomonadales seem not to share them (Fig. 5).

Fig. 5: Visualisation of gene flow among phyla, classes and orders for different functional groups.
figure 5

The inner ring shows donors gene function (coloured) and recipients (grey) and bands connect the various taxa, with the band width correlating to the number of HGTs. The outer three rings are coloured by MAG taxonomy going from phylum, class to order from the outer to inner rings.

Only three out of the 629 HGTs (0.48%) in the kelp-biofilm MAGs were related to MGE (2× COG1943: REP element-mobilising transposase RayT and 1× COG3436: transposase) (Table S3). This contrasts a recent study on bacterial and archaeal symbionts of sponges using the same methodology as here (Robbins et al. [71], in review), where the 4963 HGTs detected contained 152 genes (or 3.06%) that were assigned to integrases (COG0582, COG2452), 156 gene (or 3.14%) were annotated as transposases (COG3547, COG2963, COG2963, COG4584, COG3436, COG3415, COG3293, COG5421, COG0675, COG2801, COG3039, COG3464, COG2826, COG3385, COG3666, COG3316, COG3676, COG1662, COG3328, COG5433) and 32 gene (0.64%) had annotations related to phages (COG4679, COG4422, COG3654, COG4626, COG5004, COG2369, COG3600, COG3617, COG3728, COG3772, COG4387, COG4397). This indicates that HGT in the kelp-biofilm bacteria is likely not actively driven by MGE, but might rather involve the unspecific uptake (e.g. natural transformation) of foreign DNA, which is amply present in the biofilm matrix [72, 73].

Only 22 of the 164 COG functions, 25 of the 180 KEGG functions and 1 out of 27 CAZy functions found to be enriched in kelp-biofilm MAGs were also subject to HGT. While these low overlap between the enriched-genes and the HGT datasets might be due to limited sensitivity of detecting HGTs in metagenomic datasets, and in particular for more recent transfers [19], one might have nevertheless reasonably expected that gene functions with frequently observed HGTs (e.g. such as arylsulfatase A (COG3119), Table S3) would have also been detected in the enrichment analysis. This indicates that much of the functionality enriched and required for bacteria to live in an E. radiata biofilm might be derived from vertical transmission processes and not from HGT-driven evolution, as one might have predicted from laboratory-based biofilm studies (see Introduction).

Interestingly, this scenario stands in contrast to our recent observation for bacterial and archaeal symbionts of sponges, where a much higher overlap was found between the gene functions subject to HGT and those that were enriched in comparison to planktonic microorganisms (Table 2) (Robbins et al. [71], in review). This indicates that bacteria and archaea in these two host-associated systems have likely undergone different evolutionary scenarios, with the evolution of sponge-associated genomes being more influenced by HGT than those from kelp-biofilm genomes. One underlying reason might be the different modes of symbiont transmission between host generations in these two systems. Many examples of vertical transmission have been described for sponges [74,75,76,77,78,79], while evidence for the same occurring in seaweeds is so far lacking and thus the assembly of macroalgae-associated communities being more likely driven by environmental acquisition (i.e. horizontal transmission) of symbionts [11]. Vertical transmission would favour HGT between symbionts to generate genomic diversity and distribute critical functions between members of a sponge-associated community, while environmental acquisition of symbionts on marine macroalgal surfaces could select the required community functionality from the vast microbial diversity of the surrounding seawater [27]. Future comparison of HGT and functional enrichment across multiple systems will likely shed further light on how evolutionary and ecological processes drive community assembly and genomic diversity of host-associated and biofilm communities

Table 2 The number and proportion of HGT gene function that are also enriched in the kelp- and sponge-associated MAGs.