Sponges (Porifera) are one of the most ancestral metazoan groups. They are characterized by a simple body plan lacking the true tissues and organ systems found in other animals. Members of this phylum display a remarkable diversity of form and function and yet little is known about the composition and complexity of their genomes. In this study, we sequenced the transcriptomes of two marine haplosclerid sponges belonging to Demospongiae, the largest and most diverse class within phylum Porifera, and compared their gene content with members of other sponge classes. We recovered 44,693 and 50,067 transcripts expressed in adult tissues of Haliclona amboinensis and Haliclona tubifera, respectively. These transcripts translate into 20,280 peptides in H. amboinensis and 18,000 peptides in H. tubifera. Genes associated with important signaling and metabolic pathways, regulatory networks, as well as genes that may be important in the organismal stress response, were identified in the transcriptomes. Futhermore, lineage-specific innovations were identified that may be correlated with observed sponge characters and ecological adaptations. The core gene complement expressed within the tissues of adult haplosclerid demosponges may represent a streamlined and flexible genetic toolkit that underlies the ecological success and resilience of sponges to environmental stress.
Sponges are important components of coral reef ecosystems. They can filter large amounts of seawater and are central players in bentho-pelagic coupling and cycling of nutrients1,2. They also contribute to substrate modification through bioerosion and consolidation of reef structures while providing unique habitats for various marine organisms and microorganisms3. These important functional roles exert major impacts on the overall functioning of marine ecosystems. Sponges have also become important model organisms for the understanding of early animal evolution and the emergence of multicellularity, as well as for their capacity for regeneration, biomineralization, and their unique chemical defense mechanisms4. However, despite their evolutionary significance and ecological impact on marine habitats, sponges remain poorly investigated and appreciated.
Sponges arose more than 600 million years ago and are considered one of the most ancestral metazoan groups5,6. Their simple body plan has proven effective and has remained largely unchanged over time. A typical sponge body consists of three distinct functional layers, the pinacoderm (exterior), mesohyl (middle) and choanoderm (interior) around an intricate system of water canals, choanocyte chambers, incurrent pores (ostia) and excurrent pores (oscules)7. The skeletal elements of sponges can be comprised of inorganic spicules (either siliceous or calcareous), proteinaceous spongin, chitin and collagen8. Unlike other animals, sponges do not have a centralized gut nor conventional muscles and nerves8. Sponge tissues house a dense and varied community of prokaryotic, as well as eukaryotic, symbionts that contribute to the metabolic complexity of the holobiont9.
Despite the diversity within phylum Porifera, the genetic potential within this group is largely understudied. Analysis of sponge species representing the major classes within Porifera reveals the surprising genetic complexity of sponges and debunk the view that morphologically simple animals have simple genomes. Sponges possess a genetic toolkit corresponding to key metazoan features, such as cell adhesion, cell cycle control, tissue differentiation, apoptosis, innate immunity and development4. Surprisingly, animal-associated genes that were previously thought to have evolved only in the last common ancestor of cnidarians and bilaterians have been discovered in some sponges. This suggests that the last common ancestor of metazoans already possessed a complex genome that conferred the capacity to sense and respond to the environment while maintaining multicellular homeostasis4,10,11,12,13. The difference in the genomic content between basal metazoans and the different sponge lineages is driven by gene loss and gene family expansion events that occurred after their divergence from the last common ancestor of metazoans4,10,13.
Porifera can be divided into four major classes based on skeleton and tissue composition. These classes are the Homoscleromorpha, Calcarea, Hexactinellida, and Demospongiae. Demospongiae are by far the most diverse in terms of number of species, abundance, and ecosystem distribution. Of the 8,553 species of modern sponges, 83% belong to the demosponge class14. Demosponges exhibit ecological adaptability and plasticity in response to different environmental factors, such as light, temperature, turbidity and hydrodynamics, allowing them to integrate into diverse habitats from marine to freshwater, intertidal to deep seafloors, even in caves and polar seas14,15. Many demosponge species have emerged as ‘winners’, thriving in extreme environments, exhibiting resilience to environmental disturbances, and successfully outcompeting other organisms16. Unfortunately, taxonomic classification of demosponges remains a major challenge because of deep divergence between major clades, which hinders phylogenetic resolution at various levels17.
In this study, we sequenced the adult transcriptomes of two marine haplosclerid demosponges and compared their gene content to other sponge classes. Our findings point to the similarity of the genetic complement of closely related haplosclerid demosponges and highlight some differences that may underlie the unique characteristics of each species. These genetic differences may be hallmarks of the ecological requirements and adaptive potential of each sponge.
Results and Discussion
This study generated transcriptome data using Illumina sequencing for adult tissues of two non-model Haliclona species belonging to order Haplosclerida of class Demospongiae. Comparison to other sponge genomes and transcriptomes reveals that haplosclerid demosponges possess a similar core genetic repertoire despite broad differences in morphology and ecology.
Haliclona amboinensis and Haliclona tubifera are classified as haplosclerid demosponges under family Chalinidae yet exhibit very different morphological and ecological characteristics. H. amboinensis (Fig. 1, Supplementary Figure 1) is a blue-purple encrusting sponge with a broad depth range. It can be found attached to hard substrates, such as rocks or corals. The surface of this sponge is rough to the touch with no visible ostia and its texture is brittle or crumbly. Its skeleton is characterized by an isotropic reticulation formed by oxeas, which are either straight or curved at the center18. Microcleres such as c-shaped sigmas can also be found in this species. H. tubifera (Fig. 1, Supplementary Figure 1) is a soft pink, occasionally light brownish, tubular sponge found in association with coral skeletons in shallow water reef flats. This sponge has a hispid surface, is spongy and compressible in texture. H. tubifera resembles Haliclona sinyeoenesis19 in its growth form, shape and choanosomal skeleton. However, H. tubifera has one size of oxea, while H. sinyeoensis has oxea of two different sizes (Supplementary Table 1).
Order Haplosclerida is the largest group within class Demospongiae. It consists of three main suborders, Haplosclerina, Petrosina (both marine) and Spongillina (freshwater). Based on traditional morphological cladistics, H. amboinensis and H. tubifera are classified within family Chalinidae of suborder Haplosclerina, while Amphimedon queenslandica belongs to family Niphatidae within the same suborder20. Cytochrome oxidase I (COI) gene sequencing, however, reveals that members of genus Haliclona are interspersed into different subclades within Haplosclerida. H. amboinensis COI sequences cluster with both Chalinidae and Niphatidae sponges (Niphates and Amphimedon) while H. tubifera sequences cluster with Petrosia of suborder Petrosina. In previous studies, H. tubifera was found to be closely related to other niphatids, such as Neopetrosia and Xestospongia, and such species were positioned within the same clade as A. queenslandica21. However, inclusion of sequences from H. amboinensis reveals that this species has greater affinity with the niphatid sponges compared to H. tubifera. Our COI tree is congruent with recent phylogenetic studies that support the monophyly of marine haplosclerids but fail to resolve most families and their two suborders, Haplosclerina and Petrosina, as monophyletic21,22. It should be noted, however, that the 5′ end of the COI gene has been shown to evolve more slowly in sponges and thus may not always be able to finely resolve their phylogenetic relationships23. Nonetheless, COI phylogeny is recapitulated by phylogenetic analysis based on sequences of nuclear encoded genes, such as the membrane-associated guanylate kinases, DLG and MAGI (Supplementary Figure 2).
Transcriptome sequencing and de novo assembly
Barcoded cDNA libraries with an average insert size of 319 bp were constructed using the Illumina TruSeq RNA sample prep kit. Libraries were sequenced on the Illumina HiSeq 2000 platform to generate an average of 53 million clean paired-end reads per library with a read length of 100 bp. Trinity de novo assembly rendered 107,470 and 124,476 total transcripts for H. amboinensis and H. tubifera, respectively (Supplementary Table 2). After clustering of transcripts to reduce assembly redundancy, 44,693 and 50,067 transcripts were retained for H. amboinensis and H. tubifera, respectively. Longer transcripts were assembled for H. tubifera (N50 = 1,583 bp) compared to H. amboinensis (N50 = 1,527 bp). Based on these assembly statistics, our transcriptomes are of comparable quality to recently published sponge transcriptomes13.
Protein coding regions within the non-redundant reference transcipts were identified. About 35–45% of total assembled transcripts could be translated into proteins, suggesting that the reference transcriptome assemblies may still include non-protein coding sequences, as well as truncated or potentially misassembled sequences (Supplementary Table 2). More complete open reading frames (ORFs) were recovered from the H. tubifera compared to the H. amboinensis assembly. Retaining only the longest ORF for each transcript returned 20,280 and 18,000 reference peptides for H. amboinensis and H. tubifera, respectively.
To annotate the reference transcriptomes, transcripts were aligned by Blastx against the UniProt database with an e-value cutoff of 1 × 10−5 (Supplementary Table 3 and Supplementary Figure 3). Gene ontology terms were assigned based on the top Blastx hits. 32% of H. amboinensis and 23% of H. tubifera transcripts aligned to proteins in the UniProt database (Fig. 2a). About half of the sequences with hits to UniProt have associated gene ontology assignments. Predicted peptides were similarly annotated by Blastp alignment to the UniProt database with an e-value cutoff of 1 × 10−5. Protein domains were identified using HMMER v3.1b1 against the Pfam 28.0 database. Approximately 63% of the predicted peptides in both sponges have matches in UniProt or contain identifiable protein domains but only about 40% are associated with gene ontology annotations (Fig. 2a). The low percentage of annotation for predicted peptides may be due to the scarcity of poriferan sequences in most public data repositories. Alignment to the Ensembl metazoan database reveals that about 75–85% of predicted peptides in the two sponges are similar to sequences from other animals, with the majority of sequences matching to peptides in the sponge, A. queenslandica (Fig. 2b).
Global comparison of sponge genomes
To determine the similarity of the transcriptomes of H. amboinensis and H. tubifera to gene sequences in other sponges, we performed global Blast comparisons between all the transcripts or predicted peptides from different sponge species. As expected, pairwise global comparison reveals greater similarity (>50% for transcripts; >80% for peptides) of Haliclona sequences to other demosponges (Fig. 3a). In contrast, Haliclona sequences have fewer (<40% for transcripts; <65% for peptides) Blast hits to calcareous and homoscleromorph sequences, supporting the substantial divergence between demosponges and the calcareous-homoscleromorph sister group13. Within the demosponge group, the marine haplosclerids exhibit greater sequence similarity (>55% for transcripts; >80% for peptides) amongst themselves than to the freshwater demosponge, Ephydatia muelleri. 85% of H. amboinensis and 75% of H. tubifera peptides are similar to genes in the A. queenslandica genome. Reciprocally, Blastp alignment reveals that more than 80% of A. queenslandica genes possess significant similarity to peptides recovered from the adult transcriptomes of H. amboinensis and H. tubifera (Supplementary Table 4). This suggests that despite being derived from a single developmental stage, the adult transcriptomes of Haliclona capture many of the genes present in the genome of the model demosponge.
To determine the number of shared and unique protein families among different sponges, we constructed orthologous gene clusters for 8 sponge species representing 3 classes using OrthoMCL with default settings. After all-against-all comparisons of sequences, 264,663 proteins were clustered into 6,996 orthologous groups (Fig. 3b). Of these groups, 2,908 (42%) are common to all eight species. Calcareous sponges possess the highest number (1,215) of unique orthologous clusters, indicating that this sponge lineage may have experienced gene expansions after divergence from the last common ancestor of sponges. Looking just at orthologous clusters present in haplosclerid demosponges, we found 3,591 out of 5,397 (66%) that are common to the four species represented in the analysis (Fig. 3c). Very few orthologous protein clusters are unique to each of the haplosclerid demosponges, suggesting that these sponges possess highly similar sets of genes.
EvolMap analysis was performed to estimate gene gains and losses in the different sponge lineages. This method identifies gene orthologs by all-to-all Blast followed by Needleman-Wunsch alignment to determine similarity scores. The algorithm then traverses the species tree to estimate ancestral gene content at each node and applies Dollo parsimony-based comparison of these ancestral genes to determine how many have been lost or gained within each branch24. Peptide sequences from the non-redundant assemblies of H. amboinensis, H. tubifera, and P. ficiformis were used in the analysis to minimize the inflation of gene gain estimates, although this data set may underestimate the true genetic diversity in haplosclerid sponges because the transcriptomes are derived solely from the adult stage. The O. carmela genome was used as the outgroup to compare the gene expansions occurring along the demosponge and calcareous lineages. EvolMap analysis reveals that more gene losses and less gains occurred on the branches leading to the demosponges (Fig. 3d-e), suggesting that the ancestor of this sponge class underwent periods of genomic reduction. Marine haplosclerid lineages exhibit a similar extent of gene loss whereas gene gains are slightly higher for A. queenslandica, which may be due to inclusion of its full genome in the analysis. In contrast, extensive gains and fewer losses were observed on the branch leading to the calcareous group (Supplementary Table 5), which is also reflected in the high number of unique orthologous gene clusters observed for this class. Because the sequences for S. ciliatum and L. complicata are derived from genome and non-redundant multi-developmental stage transcriptome data10, respectively, the estimated gains are likely to be true gains and not only artifacts of assembly. Nevertheless, while this analysis provides us with an estimate of genomic changes in different sponge lineages, a more accurate picture of the genetic history of diverse sponge groups will emerge with future genome sequencing efforts.
Genomic innovations for ecological adaptation
Sponges possess a functional metazoan toolkit4. Although only about 25% of putative peptides in the four haplosclerid demosponges are associated with gene ontology assignments based on their top Blastp match in the UniProt database (Supplementary Figure 4), metabolic processes, transport, signal transduction, and catalytic activity are the most abundant gene ontology terms. As in other sponges, haplosclerids possess genes that underlie the basic characteristics of multicellular organisms, including control of the cell cycle, cell differentiation, apoptotic processes, extracellular matrix, cell adhesion, and innate immunity13,25. The most abundant protein domains are known to be involved in metabolism, cell adhesion, cell structural support, signaling, immune response, apoptosis and transcriptional control (Supplementary Figure 5). Below we highlight some gene families that may underlie the unique characteristics of each sponge lineage.
Comparison of selected Pfam domains in the four haplosclerid sponges reveals similar abundance patterns for the different transcription factor families, which may have integral roles in controlling development, shaping sponge tissue architecture, and regulating responses to the environment (Fig. 4a). Sponges possess transcription factor families similar to that of metazoans. In A. queenslandica, these transcription factors have been found to be expressed at various stages of the sponge lifecycle. bZip, Tbox, bHLH, and homeobox factors are enriched in the competent larvae of A. queenslandica and may regulate the genes necessary for settlement, while zinc finger, forkhead, ETS, and homeobox domain-containing factors may control the widespread transcriptional changes observed immediately after settlement as metamorphosis begins26. Members of these transcription factor families are also present in the transcriptomes of H. amboinensis, H. tubifera and P. ficiformis. Whether they are expressed in the same manner, or if they regulate the same sets of genes, remains to be discovered.
The haplosclerid sponges possess a diverse set of peptides containing domains involved in signal transduction (Fig. 4a). PDZ domains and protein kinases are found to be similarly abundant in all four species. Sponges are also known to possess an expanded and diversified set of rhodopsin, secretin/adhesion, and glutamate G-protein couple receptors (GCPRs)27. Most groups of glutamate receptors are represented in the four haplosclerids while rhodopsin-family GPCRs have expanded in A. queenslandica (Fig. 4a,b).
GPCRs are membrane proteins that mediate a wide variety of cellular responses to environmental stimulants, such as allelochemicals and inducers of settlement and metamorphosis28. Differences in GPCR content and function reflect differences in sponge physiology and behavior. GPCRs expressed in sponge larvae may influence the patterns of settlement that govern sponge assemblage distribution28. Some sponge larvae exhibit photosensitive behavior and swim away from light, thus ensuring their settlement in shaded areas under rocks or coral rubble for protection against intense sunlight or UV radiation29. This behavior may be regulated by photoreceptive pigments and the expression of specific sets of GPCRs. Other potential functions of the diverse GPCR families in sponges are coordination of cellular contractions, regulation of the uptake of dissolved organic matter from seawater, or even regulation of sponge morphology and tissue properties through the detection of fluid shear stress30,31,32.
Cell adhesion and structural components
Haplosclerid sponge bodies are composed of proteinaceous mesohyl in which various cells are embedded. Sponges maintain tissue integrity within a colony through strong cell aggregation and allorecognition properties33,34. These same properties also promote reaggregation and regrowth of damaged or fragmented colonies. Not surprisingly, cell adhesion and extracellular matrix-related domains, such as integrin, immunoglobulin I-set, and cadherin are similarly abundant in the sponges (Fig. 4a). All four haplosclerids possess many collagen, calcium-binding EGF, fibronectin, and immunoglobulin domains, although these are more enriched in the A. queenslandica genome. It is important to note, however, that except for A. queenslandica, the domain counts for the other sponges are derived from single-stage transcriptome data and may not capture the complete repertoire of genes in these species.
Spongin is a distinguising character of most demosponges, providing support and stability to the sponge tissue. Spongin short-chain collagens are homologous to type IV collagen, one of the main extracellular matrix components ubiquitous in vertebrates and some invertebrates35. Not surprisingly, we identified spongin short-chain collagen sequences in the haplosclerid demosponges (Supplementary Figure 5). We further identified sequences for silicatein, an enzyme involved in biosilica formation and spiculogenesis36. Although the phylogenetic tree for this gene is not well resolved, the demosponge sequences still cluster together. Similarly, we identified homologs of spherulin, which is a gene acquired from bacteria that has become an integral component of the biomineralization strategy of the coralline demosponge, Astrosclera willeyana37. Interestingly, we did not find spherulin homologs outside of the demosponge group, suggesting that the horizontal transfer event occurred specifically in this lineage.
In Haliclona, intracellular spongin fibres formed by chains of specialized cells function as supplementary skeletal support during the production of the more rigid spicule and spongin skeleton8. The loose organization of the sponge body enables morphological plasticity that is responsive to environmental factors38. For example, sponges in high water flow environments can induce spongin production and spicule formation to form tougher tissues that can withstand high water flow while still maintaining efficient filter feeding capacity38,39.
The scavenger receptor cysteine-rich (SRCR) domain is a highly conserved domain found in sponges and in other animals40. Scavenger receptors, which are known to associate with diverse co-receptors, are highly versatile and can recognize a large repertoire of ligands. They may function in cell-cell recognition, aggregation, lipid recognition, pattern recognition, phagocytosis, and pathogen clearance and transport41. We identified many SRCR-containing peptides in the four haplosclerids (Fig. 4a). H. amboinensis has the greatest number of SRCR-containing peptides, although the majority of these are composed of SRCR repeats only. Both A. queenslandica and H. amboinensis possess more SRCR peptides associated with various other protein domains (Fig. 4c). The presence of this diverse class of receptors in the cell membranes of sponges may be important for the selection or clearance of bacteria from sponge tissues. Moreover, the modified domain architectures suggest that they may also be involved in a wider range of cellular functions.
The microbial abundance in sponge tissues is reflected in the complement of immune-related genes in the host. For instance, H. tubifera, A. queenslandica, and H. amboinensis, which are low microbial abundance sponges42, exhibit expansion and diversification of immune-related genes, particularly scavenger receptors, which may be involved in the selection of specific symbionts and maintenance of a lower bacteria population. On the other hand, P. ficiformis, a known high microbial abundance sponge43, has a reduced repertoire of scavenger receptors. The core microbiome of sponges effectively extends the genetic potential of the host, providing a pool of novel genes that represent functional innovations that may contribute to the survival of the holobiont under stressful conditions. Rapid shifts in the natural bacterial community due to environmental perturbation can affect the stability of the sponge-bacteria interactions and result in the deterioration of sponge health44.
Stress response-related protein domains, such as heat shock protein (Hsp) and thioredoxin, which are involved in protein stabilization and antioxidant defense, respectively, are abundant in H. amboinensis and P. ficiformis (Fig. 4a). These two sponges also possess many proteins with death effector domains (DED) that regulate cellular homeostasis through the simultaneous control of proliferation and apoptosis45. Caspases, which are cysteine proteases that regulate apoptosis, as well as glutathione S-transferase (GST) domains, which function in the antioxidant response, are similarly abundant in all four sponge species. Critical deployment of stress response proteins and antioxidant defenses are important mechanisms that protect the organism and support restoration of cellular homeostasis. Modifications of the complement of stress response proteins found in each sponge species may underlie differences in their tolerance to environmental perturbations.
Transcriptome sequencing of two haplosclerid demosponges and comparative analysis with other sponge species reveal that haplosclerid sponges possess a streamlined gene complement compared to other sponge classes. This core gene repertoire contains all the tools needed for sponge function and is flexible enough to allow diversification into different habitats. The invention of new functions based on this genetic toolkit may have occurred through exaptation of preexisting genes and rewiring of gene regulatory networks46. The streamlining of genomes after bursts of molecular invention associated with species radiation is a common theme in organismal evolution47 and is also likely to have contributed to the diversification of haplosclerid demosponges.
Haplosclerid demosponges are found in various habitats but are common in shallow water environments that are characterized by high water flow, variable temperatures, intense light exposure, fluxes in salinity, as well as input of nutrients from coastal ecosystems. To survive in these highly dynamic environments, sponges must possess mechanisms to maintain tissue integrity and repair tissue damage. While haplosclerid sponges share a highly similar set of genes, key differences in specific gene families, such as GPCRs, immune response-related genes, and stress response genes, provide clues to the unique adaptations of each species.
Thus, sponges, and demosponges in particular, provide a glimpse into how different molecular innovations emerging from an essentially similar core gene complement allow adaptation to variable conditions encountered within a habitat range. In future studies, it would be of interest to look at the sponge holobiont as a whole and determine how functional genes of the sponge-associated microbiota contribute to local adaptation of their host. Further genome and metagenome-level comparisons with other sponge species will provide a deeper understanding of the genomic features that support the ecological success of this diverse group of organisms.
Sponges were collected by SCUBA diving in Malilnep Channel, Bolinao, Pangasinan (H. amboinensis - 16.43968°N 119.94434°E; H. tubifera - 16.43530°N 119.94062°E) in September, 2013 at a depth of 7–10 meters for H. amboinensis and 1–2 meters for H. tubifera. Collections were conducted with permission from the Philippines Department of Agriculture Bureau of Fisheries and Aquatic Resources (DA-BFAR GP-0075-14). Tissues were dissected and cleaned of macroscopic contaminants before storage in RNAlater (Ambion). Samples were transferred to liquid nitrogen for transport and stored at −80 °C for subsequent molecular analyses.
Sponge characterization and identification
Sponge characterization was done by observing the morphological features of collected sponges (in situ and voucher specimens), including growth form, texture, spicule forms and architecture. Fresh sponge samples were fixed in 95% ethanol solution and used for spicule observation and DNA extraction. Spicules were prepared following the bleach digestion protocol20. Briefly, a longitudinal section of the sponge was dissolved in a household bleach (5% sodium hypochlorite) and the remaining mineral skeleton was viewed by optical microscopy (Zeiss Primo Star and Nikon E200). Hand-sectioning was also done to examine the skeleton structure of the sponge. Features were matched to descriptions in Systema Porifera20 and The Sponge Guide48 to confirm sponge identity.
Genomic DNA extraction and COI amplification
Genomic DNA was extracted from sponge tissues using the xanthogenate DNA isolation method49. The 5′ region of cytochrome oxidase I (COI) gene was amplified using 0.5 uM of the degenerate primers dgLCO1490 and dgHCO219850 in a reaction mix containing buffer (20 mM Tris-HCl pH 8.4, 50 mM KCl), 1.5 mM MgCl2, 0.2 uM dNTPs, and 0.25 units of Taq DNA polymerase (Invitrogen). Amplification was performed using a standard three-step PCR with initial denaturation of 3 minutes at 94 °C followed by 40 cycles of 30 seconds at 94 °C, 30 seconds at 40 °C and 1 minute at 72 °C, and a final extension step of 5 minutes at 72 °C. PCR amplicons were purified and sequenced (1st Base Laboratories, Malaysia).
RNA extraction, quantity and quality assessment
RNA was extracted using Trizol Reagent (Invitrogen) following the manufacturer’s protocol. Longitudinal sections of sponge tissues were manually homogenized to minimize RNA shearing. Contaminating genomic DNA was removed using the DNAfree kit (Ambion). Nucleic acid concentrations were obtained using a BioSpec Nanodrop spectrophotometer (Shimadzu). The integrity of RNA samples was determined by electrophoresis on a native agarose gel with denaturing loading dye. RNA quality was further assessed using the mRNA Pico Series II assay on the Agilent Bioanalyzer 2100 System (Agilent Technologies).
RNA sequencing, data filtering, and de novo assembly
Genetic material from three independent sponge colonies were used in the assembly of a reference transcriptome for each species. Barcoded cDNA libraries derived from non-reproductive adult sponge tissues were prepared using the Illumina TruSeq RNA Sample Prep Kit protocol. The rRNA-depleted and mRNA-enriched libraries were sequenced on the Illumina HiSeq 2000 platform with 100 bp paired-end reads (Beijing Genomic Institute, Hong Kong). Raw sequence reads were filtered to remove adapter sequences and low-quality reads. Read quality was visualized using FastQC 0.10.1 (Babraham Bioinformatics). Trimmomatic 0.3251 was used to trim the first 15 bases of the reads and bases with a quality score below 30 at leading and trailing ends. Reads were then scanned with a 4-base sliding window, cutting when the average quality per base dropped below 30. Only reads that passed the quality filters and were longer than 36 bases were retained for further analysis. De novo transcriptome assemblies were carried out on Trinity (trinityrnaseq_r2013-11-10)52. To evaluate the quality of transcriptome assemblies, all clean paired-end reads used for assembly were mapped back to the assembled transcripts using the alignReads.pl script incorporated in the Trinity package.
Filtering of transcriptome assemblies and peptide prediction
Reads were first mapped back to the reference transcriptome assembly using RSEM53. Isoforms with the highest combined isoform percentage (IsoPct) from three libraries, or the longest transcript for isoforms with the same IsoPct, were retained. Isoforms with zero IsoPct in all libraries were removed, as these may be misassembled transcripts. CD-HIT-EST54 was then implemented using default parameters to cluster similar sequences and filter out redundant transcripts. The non-redundant reference transcript set was used for all further analyses. Coding regions were identified using the TransDecoder script included in the Trinity package. Only the longest ORF from each transcript was included in building the reference peptide set for each sponge.
Annotation of transcripts and peptides
Transcripts and peptide sequences for the longest open reading frames (ORFs) were mapped against UniProtKB/Swiss-Prot database (February 2015) and to predicted peptides of A. queenslandica4, E. muelleri, S. ciliatum and O. carmela genomes, as well as P. ficiformis and L. complicata transcriptomes. Sponge sequences were downloaded from Compagen55, except for A. queenslandica sequences, which were downloaded from Ensembl Metazoa, and P. ficiformis sequences, which were shared by Ana Riesgo (Natural History Museum, London). Only sponge peptides greater than 100 amino acids in length were included in all subsequent analyses. The top Blast hit for each sequence was used as input into Blast2GO56 to retrieve gene ontology terms. Protein domains were identified by mapping predicted peptides against the Pfam 28.057 database using HMMER v3.1b158. Gene homologs were identified by reciprocal Blastp alignments with an e-value cutoff of 1 × 10−5 .
Analysis of sequence similarity, orthologous groups and gene histories
Pairwise sequence comparisons between different sponge species were performed using Blastx and Blastp alignments at an e-value cutoff of 1 × 10−5. The percent of transcripts or predicted peptides with detectable sequence similarity to various sponges was visualized as a heatmap using the pheatmap R package. Orthologous gene groups were identified using OrthoMCL59 with default settings (e-value 1 × 10−5, protein identity 50%, and MCL inflation of 1.5). EvolMap24 was used to infer ancestral genome content and trace the history of gene gains and losses in the different sponge species. All-to-all Blast was run to retrieve the 300 top-scoring gene pairs for which normalized similarity scores were generated. Orthologous genes were identified as symmetrical best alignments scoring above the minimum threshold set at 250.
Sequences were aligned using ClustalW60 and trimmed with Gblocks61. The best-fit substitution model for each alignment was determined using jmodeltest (v2.1.7) for nucleotides and prottest (v3.4) for amino acids. Maximum-likelihood analysis was implemented on PhyML 3.062 with 1,000 bootstrap replicates. Bayesian inference analysis was executed on MrBayes 3.2.263 using two independent MCMC runs with four chains per run. Each analysis set for 1 million generations sampled every 100 trees or until the standard deviation of split frequencies was <0.01. The first 25% of trees were discarded as burn-in. COI sequences of other sponges were downloaded from NCBI (Supplementary Table 6). Specific genes and gene families were identified through reciprocal Blast. Other peptide sequences used for phylogenetic comparisons were downloaded from NCBI.
How to cite this article: Guzman, C. and Conaco, C. Comparative transcriptome analysis reveals insights into the streamlined genomes of haplosclerid demosponges. Sci. Rep. 6, 18774; doi: 10.1038/srep18774 (2016).
The authors wish to thank the Bolinao Marine Laboratory for assistance with collections. We also thank Belinda Longakit and Fleur deliz Panga for assistance in sponge identification and Kenneth S. Kosik for access to the UCSB CNSI server. This study was funded by the University of the Philippines System PhD Recruitment Grant (OVPAA-BPhD-2012-04) and a L’Oreal-UNESCO For Women in Science National Fellowship to C.C.