Introduction

Marine microbial eukaryotes are major contributors to nutrient cycling and photosynthesis, responsible for a sizable proportion of the global primary production (Field et al., 1998; Worden et al., 2015). A subset of these organisms produce toxins involved in harmful algal blooms with major impacts on ecosystem functioning and economic impacts on aquaculture and fisheries industries (Hallegraeff, 1993, 2010 and references therein). Despite their importance, comparatively little is known regarding key biosynthetic pathways in protists (Kalaitzis et al., 2010). In all organisms studied to date, fatty acid synthases (FASs) and polyketide synthases (PKSs) are closely related and have a common evolutionary history (Jenke-Kodama et al., 2005). FASs and PKSs share a similar enzymatic domain structure in which acyl transferase (AT), ketosynthase (KS) and an acyl carrier protein (ACP) form the core structure for condensation of acyl units, and are essential for both PKSs and FASs. The other domains, ketoreductases (KR), enoyl reductase (ER) and dehydratase (DH) modify the acyl units after condensation, which is essential for FASs, but selectively present/absent in PKS (Cane et al., 1998; Khosla et al., 1999; Jenke-Kodama et al., 2005).

Type I PKS (modular) have a single protein consisting of all catalytic domains, which are used in a progressive fashion for chain elongation until the thioesterase domain releases the finished polyketide, analogous to FASs in animals and fungi (Khosla et al., 1999; Jenke-Kodama et al., 2005). Type II PKSs carry each catalytic domain on separate polypeptides (mono-functional proteins) that form multiprotein complexes, analogous to type II FASs in bacteria and plants (McFadden 1999; Jenke-Kodama et al., 2005). Type III PKSs are self-contained homodimeric enzymes where each monomer performs a specific function and are found in plants, brown alga, bacteria and fungi (Khosla et al., 1999; Jenke-Kodama et al., 2005; Cock et al., 2010). In plants, FAS genes are encoded in the nucleus and proteins are targeted towards the chloroplast, where fatty acid synthesis occurs (McFadden, 1999). Gene identification/cloning and functional characterisation of all the FAS enzymes has been carried out in higher plants and bacteria (White et al., 2005; Brown et al., 2010 and references therein).

In the unicellular chlorophyte Chlamydomonas, fatty acid synthesis is thought to be carried out in the chloroplast stroma via a type II FAS, which was characterised by identifying the genes encoding type II FAS enzymes (Riekhof and Benning, 2009 and references therein). Fatty acid desaturation takes place via multiple desaturases, as majority of fatty acids in Chlamydomonas are unsaturated (Riekhof and Benning, 2009). In Apicomplexa, an exceptionally large type I FAS has been characterised from Cryptosporidium parvum (Zhu et al., 2000). In contrast, genes encoding some type II FAS enzymes have been identified in Toxoplasma gondii and Plasmodium falciparum (Waller et al., 1998), with no type I FAS genes found in these organisms (Gardner et al., 2002). The mechanism and genetic basis for fatty acid synthesis remain largely unknown in many eukaryote lineages (Ryall et al., 2003; Armbrust et al., 2004).

Type I PKS genes are known from only a handful of Apicomplexa, haptophytes, chlorophytes and dinoflagellates (Bachvaroff and Place, 2008; John et al., 2008; Monroe and Van Dolah, 2008; Place, 2008; Eichholz et al., 2012; Murray et al., 2012; Salcedo et al., 2012; Pawlowiez et al., 2014; Meyer et al., 2015). Recent transcriptome surveys have demonstrated the possibility that protists may encode a massive diversity of PKS genes (Pawlowiez et al., 2014; Kohli et al., 2015; Meyer et al., 2015). Studies of the natural products produced by some protists continue to identify new polyketide compounds at a rapid rate. This indicates that our current knowledge about the genetic basis of PKS in protists is highly incomplete (Pawlowiez et al., 2014; Meyer et al., 2015).

The first comprehensive genetic information on 210 marine microbial genera (305 unique species, 396 unique strains, 678 transcriptomes) encompassing most of the major lineages of eukaryotes has recently been undertaken by the Marine Microbial Eukaryote Sequencing Project (MMETSP) project (Keeling et al., 2014). Using this vast data resource, the major aims of this study were to (i) identify the genetic basis of FAS and PKS synthesis in the major lineages of eukaryotes and (ii) infer the constraints and processes in the evolutionary history of polyketide and fatty acid synthesis in eukaryotes.

Materials and methods

RNA extraction and construction of transcriptomic libraries

Alexandrium margalefi CS322 and Gambierdiscus australes CAWD149 were cultured at 18 and 25 °C, respectively, in f/2 medium, under cool white fluorescent light at a light intensity of 60 μmol m−2s−1 and a 12:12 light:dark cycle. RNA was first extracted via TriReagent (Life Technologies, Carlsbad, CA, USA), then purified using the RNeasy Plant mini kit (Qiagen, Limberg, Netherlands) and residual DNA removed via the TURBO DNA-free Kit (Life Technologies) according to the manufacturer’s protocols. Total RNA was submitted to MMETSP for sequencing. Procedures used by MMETSP to generate transcriptomic libraries have been described in detail in Keeling et al. (2014). Underlying culturing conditions, environmental and experimental metadata for all the other MMETSP libraries used in this study are described in Supplementary Table S5.

Identification of FAS and PKS genes

Transcriptomic libraries representing 213 strains and 152 genera were obtained from MMETSP and other studies (Supplementary Table S1). For strains where multiple transcriptomic libraries were constructed (e.g. Alexandrium monilatum was grown under different physiological stresses and four transcriptomic libraries each grown under a different physiological stress were constructed), a combined assembly of data generated from all the transcriptomic libraries was provided by MMETSP to maximise transcriptomic coverage. In this study we used combined libraries where available. All the MMETSP assembled transcriptomes can be accessed from http://data.imicrobe.us/project/view/104. Transcriptomic libraries for Alexandrium fundayense (Wisecaver et al., 2013), A. pacificum (Hackett et al., 2013), A. tamarense (Hackett et al., 2013), Gambierdiscus belizeanus (Kohli et al., 2015) and Symbiodinium sp. CassKB8 (Bayer et al., 2012) were obtained from Genbank and culturing conditions and metadata can be obtained from their respective references. Sequences for T. gondii, Neospora caninum, Eimeria falciformis and Eimeria tenella were obtained from ToxoDB (Gajria et al., 2008) using the BLAST tool and Azadinium spinosum sequences as a query. Emiliania huxleyi CCMP1516 reference genome and transcriptome were obtained from JGI genome portal (Read et al., 2013). Genes encoding type I PKSs and type II FASs were identified using HMMER (Finn et al., 2011; Supplementary Tables S1 and S3) where in-house HMM databases were developed for each enzyme investigated in this study. Separate HMM profiles for each domain involved in type I PKSs and type II FASs were developed so that the profiles could recognise multiple domains on a single transcript. The presence of a type I FAS enzyme in all the transcriptomic libraries was recorded only when the transcript encoded the full domain. For type I PKS enzymes, the presence of both partial and full domains were recorded (Supplementary Table S3). However, the completeness of the transcripts encoding full/partial domains could not be determined. Functional prediction and identification of conserved active site amino acid residues in the transcripts were identified and screened using CDD (Marchler-Bauer et al., 2015) and Pfam (Punta et al., 2012). Identification of transit peptide targeted towards chloroplast was detected using ChloroP (Emanuelsson et al., 1999). Geneious software was used for sorting all these sequences (Kearse et al., 2012).

Phylogenetic analysis

MAFFT (Katoh et al., 2002) and ClustalW (Thompson et al., 1994) were used to align the protein sequences from different data sets. The alignments were manually trimmed to ensure they spanned the same coding region of each enzyme and maximum likelihood phylogenetic analysis was carried out using RAxML with 1000 bootstraps using the GAMMA and LG model of rate heterogeneity (Stamatakis, 2006). Details of each alignment and phylogenetic tree (newick format) used in this study are listed in Supplementary Table S6 and sequences used to generate a concatenated alignment of type II FAS genes for Figure 3 are listed in Supplementary Table S7. Phylogenetic trees were visualised using iTOL (Letunic and Bork, 2011) and MEGA:Version6 (Tamura et al., 2013).

Results and discussion

Fatty acid biosynthesis in protists

Transcriptomic libraries representing 213 strains and 152 genera were screened for seven key enzymes, that is, 3-ketoacyl ACP synthase I, II and III (KASI-FabB, KASII-FabF, KASIII-FabH), ACP S-malonyltransacylase (AT-FabD), trans3-ketoacyl ACP reductase (KR-FabG), 3-hydroxyacyl-ACP dehydratase (DH-FabZ) and enoyl-ACP reductase (ER-FabI), involved in type II fatty acid synthesis. The presence of six of the seven genes (except KASI-FabB) were confirmed in all phototrophic lineages of alveolates (dinoflagellates, apicomplexa, Vitrella, Chromera), stramenopiles (diatoms, bolidophytes, chrysophytes, pelagophytes, raphidophytes, synurophytes, dictyochophytes, pinguiophytes, xanthophytes), Rhizaria (chlorarachniophytes and Haplosporidia), Viridiplantae (chlorophyceans, prasinophytes, trebouxiophytes), excavates (euglenids, only three enzymes in Eutreptiella), cryptophytes and haptophytes (Table 1 and Supplementary Table S1). KASI-FabB was completely absent in Rhizaria and Viridiplantae, and selectively present/absent in all other phototrophic lineages examined (Supplementary Table S1). The presence of all seven type II FAS enzymes was also confirmed in the reference genome of E. huxleyi (Supplementary Table S2).

Table 1 List of organisms screened for fatty acid synthesis enzymes in marine microbial eukaryotes

Unique among the phototrophic lineages, the glaucophytes did not possess any of the type II FAS enzymes we screened for. It is noteworthy that the two glaucophyte genera screened here belonged to freshwater habitats. The absence of FAS genes may indicate insufficient depth of sequencing or that the FAS genes were not being expressed at the time of analysis. Detailed screening of other genera would shed more light on fatty acid synthesis in glaucophytes.

Previously, only two genes involved in type II FAS synthesis were known from a limited number of phototrophic lineages: haptophyte Prymnesium parvum, synurophyte Mallomonas rasilis, bacillariophyte Phaeodactylum tricornutum and Thalassiosira pseudonana, oomycete heterokont Thraustotheca clavata and cryptophytes Guillardia theta and Hemiselmis virescens (Ryall et al., 2003; Armbrust et al., 2004).

Among the heterotrophic lineages, we detected the presence of genes coding type II FAS enzymes in the dinoflagellate Oxyrrhis marina (DH-FabZ and KR-FabG) and opisthokont choanoflagellate Acanthoeca-like sp. (KASI-FabB, KASIII-FabH and KR-FabG), possibly related to secondary acquisition of these genes from a prey item (Heterosigma akashiwo in case of O. marina), as supported by their phylogenetic position (Supplementary Table S1 and Supplementary Figures S1D and S2). The heterotrophic lineages of alveolates (ciliates), Rhizaria (Foraminifera), stramenopiles (bicosoecids, labyrinthulids-thraustochytrids and chrysophytes), Amoebozoa (tubulinids and dactylopodids), excavates (kinetoplastids) and Palpitomonas did not possess any genes coding type II FAS enzymes (Table 1 and Supplementary Table S1). This suggests that these organisms either obtain fatty acids from their diet and/or have a different FAS pathway.

The amino acid residues comprising the active sites of all seven enzymes have been elucidated previously in plants and bacteria and mutations at these sites abolishes function of the respective enzymes (White et al., 2005; Brown et al., 2010 and references therein). We found these active site residues highly conserved in protists (Figure 1), indicating that these FAS genes are fully functional.

Figure 1
figure 1

Conserved active sites in key fatty acid synthase enzymes in eukaryotes: 3-hydroxyacyl-ACP dehydratase (a), enoyl-ACP reductase (b), 3-ketoacyl ACP reductase (c), S-malonyltransacylase (d), 3-ketoacyl ACP synthase II (e), 3-ketoacyl ACP synthase III (f) and 3-ketoacyl ACP synthase I (g). Active site residues are highlighted in black boxes and numbers above residues are according to the Azadinium spinosum sequences except for Alexandrium monilatum sequence in (g).

In type II FAS, KASIII initiates the condensation reaction, while fatty acid chain elongation is carried out by KAS I or II. Depending on the length of fatty acid being produced and varying substrate specificities, different types of KAS I and II are present in plant type II FAS systems (Kunst et al., 1992; Millar and Kunst 1997; Millar et al., 1999; Fiebig et al., 2000; Dunn et al., 2004), and are encoded by different gene families (Kunst et al., 1992; Millar and Kunst 1997; Millar et al., 1999; Fiebig et al., 2000; Dunn et al., 2004; White et al., 2005; Brown et al., 2010). Here, KASII was confirmed in all phototrophic lineages of protists (Supplementary Table S1); however, KASI was absent in Rhizaria, Viridiplantae, raphidophytes, synurophytes, pinguiophytes, xanthophytes, Vitrella and Chromera (Supplementary Table S1). Our results show the presence of at least six different gene families that encode KAS II (Supplementary Figure S1C) suggesting the production of different types of fatty acids. The active site residues Cys-His-His in KASI and KASII and Cys-His-Asn in KASIII found in higher plants and bacteria (White et al., 2005 and references therein) are conserved in protists (Figure 1).

There are two types of dehydratases, that is, DH-FabA and DH-FabZ, described in bacteria and higher plants (White et al., 2005 and references therein). DH-FabA has the additional function of performing isomerisation (in addition to dehydration) essential for formation of unsaturated fatty acids and normally co-occurs with KASI-FabB (White et al., 2005 and references therein). However, in protists only genes encoding the DH-FabZ enzyme were found. DH-FabA was absent from all lineages (Supplementary Table S1).

Evolution of type II FAS in protists

Several lines of evidence from our analysis support the notion that type II FAS genes are nuclear encoded, and that the initial steps of fatty acid synthesis take place in the chloroplast: (1) the presence of all seven type II FAS enzymes in the reference genome of E. huxleyi (Supplementary Table S2); (2) the detection of transit peptides targeted towards the chloroplast (chloroP; Emanuelsson et al., 1999) in ~80% of the sequences (Supplementary Table S1); (3) the presence of eukaryotic polyA tails; and (4) 5′ trans-spliced leader sequences (Zhang et al., 2007) on dinoflagellate type II FAS gene transcripts. These features suggest that type II FAS genes were transferred from the plastid to the host genome at some point during their evolutionary history. Transfer of plastidial genes to the nucleus in the host organism allows for selection processes to act on genes according to their functional advantage to the host and restricts the accumulation of deleterious mutations (Muller’s ratchet; Felsenstein, 1974) in the endosymbiotic genome, which cannot recombine. The nuclear location also provides protection from reactive oxygen species generated during the process of photosynthesis (Martin et al., 1998; Martin 2003; Deusch et al., 2008; McFadden 2014). As fatty acid synthesis is essential for survival, these genes were likely retained by protists in the nucleus due to strong selective pressure.

The origin of protistan plastids has been traced to either an ancestral rhodophyte or chlorophyte, with a clear distinction between the two clades (Janouškovec et al., 2010). Evidence presented here indicates two possible scenarios in which either (i) type II FAS genes may have been transferred from the plastid to the nuclear genome in an early ancestral protist, before the initial split of the rhodophyte and chlorophyte lineages (Figures 2a and b) 1750–2000 million years ago (Parfrey et al., 2011) or (ii) there has been a more recent transfer event from the ancestor of chlorophytes/rhodophytes to the ancestors of stramenopiles, alveolates, Rhizaria, haptophytes and cryptophytes (Figures 2a and b), which happened 1250–1500 million years ago (Parfrey et al., 2011). Within the type II FAS clade, the evolution of these genes broadly follows the trend of microbial eukaryotic evolution (Keeling, 2013; Keeling et al., 2014) in which chlorophytes, higher plants, Rhizaria, haptophytes and alveolates form separate monophyletic clades, with the exception of cryptophytes, which form a monophyletic clade placed within the polyphyletic stramenopile clade (Figure 3). In addition, evidence presented here supports the hypothesis that some type II FAS genes were transferred more recently from the plastid to the nucleus of its tertiary hosts, specifically from haptophytes to their dinoflagellate hosts Karenia spp. and Karlodinium spp. (Yoon et al., 2002; Figure 3 and Supplementary Figure S1) and from diatoms to their dinoflagellate hosts Glenodinium spp, Durinskia spp and Kryptoperidium spp (Imanian et al., 2012; Figure 3 and Supplementary Figure S1), some 250–750 million years ago (Figure 2b; Parfrey et al., 2011).

Figure 2
figure 2

Comparative evolution of fatty acid and polyketide synthase. (a) Concatenated phylogeny, inferred from protein sequences of five enzymes (3-ketoacyl ACP synthase III; S-malonyltransacylase; 3-hydroxyacyl-ACP dehydratase; enoyl-ACP reductase; trans3-ketoacyl ACP reductase, 1431 characters) involved in type II fatty acid synthesis (inferred using RAxML, GAMMA model of rate heterogeneity, 1000 bootstraps). Solid circles indicate bootstrap values above 90. (b) For comparison, a dated molecular clock phylogeny of the eukaryotic tree of life, showing absolute time scale (million years) (from Parfrey et al., 2011). These phylogenetic analyses show that evolution of fatty acid synthase genes broadly follow the evolution pattern of the organism. (c) Phylogenetic analysis of 25 type II 3-ketoacyl ACP synthase II and 67 type I ketosynthase domains from prokaryotic and eukaryotic polyketide synthases and fatty acid synthases, showing the position of each major group, inferred in RAxML using GAMMA model of rate heterogeneity and 1000 bootstraps. Solid circles indicate bootstrap values above 90. Owing to relaxed selection pressure, polyketide synthase genes were retained/lost by protists based on the functionality their polyketide product provided the organism.

Figure 3
figure 3

Fatty acid synthase gene phylogeny in eukaryotes: concatenated phylogeny of five enzymes involved in type II fatty acid synthesis: 3-ketoacyl ACP synthase III; S-malonyltransacylase; 3-hydroxyacyl-ACP dehydratase; enoyl-ACP reductase; trans3-ketoacyl ACP reductase (1431 characters), inferred using RAxML, GAMMA model of rate heterogeneity, 1000 bootstraps. Solid circles indicate bootstrap values above 90.

Polyketide synthesis in protists

We found an enormous diversity of Type I PKS genes in selected alveolates (dinoflagellates: 46 strains, 24 genera; Vitrella: 1 genus), stramenopiles (labyrinthulids, thraustochytrids, chrysophytes, pelagophytes, synurophytes, dictyochophytes: 15 genera), haptophytes (12 strains, 9 genera) and chlorophytes (6 strains, 5 genera) (Figure 4 and Supplementary Table S3). We confirm the absence of expressed Type I PKS genes in other alveolates (chromera, ciliates: 17 strains, 15 genera), stramenopiles (bacillariophytes, bicosoecids, bolidophytes, raphidophytes, pinguiophytes, xanthophytes: 62 strains, 43 genera), Rhizaria (chlorarachniophytes, haplosporidia, foraminifera:12 strains, 9 genera), cryptophytes (8 strains, 7 genera), glaucophytes (2 genera) and Palpitomonas bilix (Figure 4 and Supplementary Table S3). In the Type I PKSs of stramenopiles, haptophytes, chlorophytes and Vitrella, each transcript encoded multiple PKS domains, and 286 contigs encoding multiple type I PKS domains were found (Supplementary Table S3). Increased expression of certain PKS genes has been indirectly linked to higher toxin production in the haptophyte Prymnesium parvum (Freitag et al., 2011); therefore, the presence of these genes in the haptophyte Chrysochromulina polylepis and the stramenopile Aureococcus anophagefferens is intriguing, as they produce polyketide toxins that cause fish kills (John et al., 2010; Freitag et al., 2011; Gobler et al., 2011).

Figure 4
figure 4

Survey of polyketide synthase genes in eukaryotes: The figure shows the abundance of expressed type I polyketide synthases (PKS)-ketoacyl synthase (KS) domains from various eukaryotic lineages. The KS domain gene family is highly expanded in dinoflagellates and haptophytes, and also present in Vitrella, labyrinthulids, thraustochytrids, chrysophytes, pelagophytes, synurophytes, dictyochophytes, chlorophyceans, trebouxiophytes and prasinophytes. The KS domains were absent in Chromera, ciliates, bacillariophytes, bicosoecids, bolidophytes, raphidophytes, pinguiophytes, xanthophytes, chlorarachniophytes, haplosporidia, foraminifera, cryptophytes, glaucophytes and Palpitomonas.

In the reference genome of E. huxleyi CCMP1516 (Read et al., 2013), a total of 30 contigs encoding multiple type I PKS genes were found (Supplementary Table S2). A comprehensive expressed sequenced tag (EST) library containing sequences from 14 isolates of E. huxleyi (Read et al., 2013) was used to study the expression of type I PKS genes found in the E. huxleyi genome. Transcripts corresponding to 26 contigs found in the E. huxleyi genome were observed in the EST library of E. huxleyi (Supplementary Table S2). It is possible that type I PKS genes on the other four contigs were not being expressed at the time of analysis, or not present in the transcriptome due to an insufficient depth of sequencing. Interestingly, seven sequences encoding partial type I PKS genes (KS domains) from the EST library were not found in the reference genome of E. huxleyi CCMP1516 (Supplementary Table S2). Read et al. (2013) found that E. huxleyi has a pan genome where certain genes are variably distributed between different strains. This might explain the absence of seven sequence encoding type I PKS genes from the reference genome of E. huxleyi CCMP1516.

To date, no gene has been definitively linked to the synthesis of a particular polyketide toxin produced by a eukaryotic harmful algal bloom species. This is in part due to the difficulty in producing genetically transformable protists. Therefore, genetic screening methods for detecting patterns of genes expressed by toxin-producing protists have been the most fruitful approach to date (Bachvaroff and Place, 2008; John et al., 2008, 2010; Monroe and Van Dolah, 2008; Stüken et al., 2011; Eichholz et al., 2012; Murray et al., 2012; Pawlowiez et al., 2014; Meyer et al., 2015). Through transcriptomic analysis, PKS genes in dinoflagellates were found to be evolutionarily related to type I PKS, but expressed as mono-functional proteins, a feature synonymous to type II PKSs (Millar et al., 1999; Monroe and Van Dolah 2008; Eichholz et al., 2012; Salcedo et al., 2012; Pawlowiez et al., 2014; Kohli et al., 2015; Meyer et al., 2015). In this study, we identified a much larger range of unique KS domains than expected or previously found: an average of 56/strain, comprising a total of 2577 unique KS domains (1976 full and 601 partial) and 234 KR domains (190 full and 44 partial) in 24 genera and 46 strains of dinoflagellates (Figure 4 and Supplementary Tables S3 and S4). Azadinium spinosum, which produces the polyketide toxin azaspiracid and its analogues, had the largest number of KS domains: 140 (Meyer et al., 2015), while the non-toxic dinoflagellate species Togula jolla encoded seven KS domains (Supplementary Table S4). Like type I and II FASs, KS domains in type I and type II PKS have very conserved active site residues, Cys-His-His, which are essential for their functionality (Kwon et al., 2002), and their presence was confirmed in 66% of our sequences (Supplementary Table S4). Thus, PKS gene families appear to have expanded dramatically within the dinoflagellates, suggesting numerous duplications and the evolution of novel functions.

Previously, no KS domains resembling type I or type II FAS have been found in dinoflagellates, possibly because of low sequencing depth, which led to the hypothesis that dinoflagellate fatty acid synthesis is carried out by enzymes resembling PKS enzymes (Pawlowiez et al., 2014). However, our results suggest fatty acid synthesis is likely carried out by type II FAS in dinoflagellates. These findings are important, as the differentiation of PKS and FAS will facilitate approaches to investigating harmful algal toxin biosynthesis pathways in dinoflagellates. The inferred distinction between type II FAS and type I PKS genes in dinoflagellates are based on sequence analysis, and a functional proof remains to be done.

The phylogeny of PKS KS and KR domains shows that protistan KS and KR domains form a monophyletic group within which dinoflagellate, chlorophyte, haptophyte and apicomplexan KS and KR domains form monophyletic clades (John et al., 2008; Monroe and Van Dolah 2008) (KS—Figure 2c, KR—Supplementary Figure S5). KS domains from stramenopiles also form two well-supported clades within the protistan clade (Figure 2c). The phylogeny of 1591 KS domains within the dinoflagellate clade further shows that their KS domains form three distinct clades (Figure 5), each of which includes sequences from numerous species of multiple dinoflagellate orders, clearly not related to the species phylogeny (Orr et al., 2012). No clear pattern could be established between these three clades and the chemical structure of the compounds known to be produced by these organisms. For example, Karenia brevis, which produces over 15 polyketide compounds (Baden et al., 2005), had sequences in each of the three clades of dinoflagellate PKS (Figure 5), as did the species Togula jolla (not shown), which is not known to produce any toxins. It is likely that dinoflagellates produce many polyketide compounds that are as yet undetected and uncharacterised. Polyketide compounds produced by dinoflagellates that do not clearly impact fisheries or aquaculture industries are likely to have been unnoticed by researchers.

Figure 5
figure 5

Polyketide synthase gene phylogeny in dinoflagellates: Phylogenetic analysis of type I ketoacyl synthase (KS) domains from prokaryotic and eukaryotic polyketide synthases (PKS) and fatty acid synthases (FAS). In total, 1633 KS domains representing 43 dinoflagellate and 30 other prokaryotic and eukaryotic taxa were inferred using RAxML, GAMMA model of rate heterogeneity and 1000 bootstraps (653 characters). PKS gene families are highly expanded in dinoflagellates, forming three distinct clades (clades I–III coloured in green, pink and orange colours respectively), where the pattern of distribution is not related to the species phylogeny and/or the chemical structure these organisms produce. Solid circles indicate bootstrap values above 80. Clade labelled as outgroup/others consist of type I PKS-KS domains from fungi (reducing/non-reducing) and bacteria (cis and trans AT modular), type I FAS-KS domains from animals and type II PKS-KS from bacteria.

There is a growing body of evidence for the ecological benefits of some marine microbial eukaryotic toxins in the form of antipredator or allelopathic impacts (Cembella, 2003; Ianora et al., 2011 and references therein), given the importance of grazing as a selective force in the marine planktonic ecosystem (Smetacek, 2001). While only one or few compounds may be necessary to produce these ecological impacts, the presence of the genetic basis for the production of a vast number of distinctive polyketide compounds within a species may be related to the Screening Hypothesis (Jones et al., 1991; Firn and Jones 2003), based on the principle that ‘potent biological activity is a rare property for any one molecule to possess’ (Jones et al., 1991; Firn and Jones 2003). This would predict that organisms that produce and screen a larger variety of chemical compounds have an increased likelihood of enhanced fitness, as the chance of producing a rare chemical with a useful biological activity will be increased. An example of this may be the production of many different congeners of brevetoxins and ciguatoxins by Karenia brevis and Gambierdiscus spp. respectively (Kalaitzis et al., 2010 and references therein), which differ from one another in biological activity (Chinain et al., 2010). In dinoflagellates the lack of correlation with the species phylogeny (Figure 5), and the large intraspecific diversity in KS domains, suggests that multiple gene duplication events, domain shuffling and losses have occurred. This suggests that relaxed selection pressures have acted on the evolution of these secondary metabolite genes (Kroymann 2011; Weng et al., 2012), which may have been acquired or lost based on the functionality they provided to the organism (Murray et al., 2015).

Ecological experiments have been used to determine the predicted function of some polyketide compounds for the producing marine microbial eukaryotes (Cembella 2003; Ianora et al., 2011 and references therein). The elucidation of genes involved in polyketide synthesis in these organisms opens up the possibility that these ecological roles can be further investigated by examining factors effecting gene regulation, and by producing genetically transformed knockouts. This information will be crucial in revealing the biochemical and molecular basis of marine microbial eukaryotic community interactions.