Diversity of a bacterial community associated with Cliona lobata Hancock and Gelliodes pumila (Lendenfeld, 1887) sponges on the South-East coast of India

Marine sponges are sources of various bioactive metabolites, including several anticancer drugs, produced mainly by sponge-associated microbes. Palk Bay, on the south-east coast of India, is an understudied, highly disturbed reef environment exposed to various anthropogenic and climatic stresses. In recent years, Palk Bay suffered from pollution due to the dumping of untreated domestic sewage, effluents from coastal aquaculture, tourism, salt pans, cultivation of exotic seaweeds, and geogenic heavy-metal pollution, especially arsenic, mercury, cadmium, and lead. Low microbial-abundant sponge species, such as Gelliodes pumila and Cliona lobata, were found to be ubiquitously present in this reef environment. Triplicate samples of each of these sponge species were subjected to Illumina MiSeq sequencing using V3–V4 region-specific primers. In both C. lobata and G. pumila, there was an overwhelming dominance (98 and 99%) of phylum Candidatus Saccharibacteria and Proteobacteria, respectively. The overall number of operational taxonomic units (OTUs) was 68 (40 and 13 OTUs unique to G. pumila and C. lobata, respectively; 15 shared OTUs). Alphaproteobacteria was the most abundant class in both the sponge species. Unclassified species of phylum Candidatus Saccharibacteria from C. lobata and Chelotivorans composti from G. pumila were the most abundant bacterial species. The predominance of Alphaproteobacteria also revealed the occurrence of various xenobiotic-degrading, surfactant-producing bacterial genera in both the sponge species, indirectly indicating the possible polluted reef status of Palk Bay. Studies on sponge microbiomes at various understudied geographical locations might be helpful in predicting the status of reef environments.


Introduction
The SILVAngs analysis pipeline is primarily targeting the analysis of large scale small-and large subunit (LSU/SSU) ribosomal RNA (rRNA) gene tag sequencing projects but can also be run on meta-genomes studies. Each project normally includes thousands to millions of reads from many different samples produced by massive parallel high-throughput "next generation" sequencing (NGS) technologies. Each read is aligned, quality checked, and classified based on the SILVA Reference alignment and taxonomy. Intuitive graphical outputs are provided for statistical information about the taxonomical distribution of the reads within and across samples. Interactive tax breakdowns are available for detailed inspection of the diversity in the samples.
Processing of the data is performed by five basic modules: align, quality control, dereplication, clustering, and classification.
In the first step the alignment is used to verify that each read is indeed, depending on the project, an LSU or SSU rRNA gene sequence. Ambiguous reads and reads that are not of the required rRNA gene type, will be rejected based on the alignment score and the alignment identity. This module also checks the sequence quality of each read and it filters out low-quality reads based on ambiguous bases, or too many homopolymers. The number of aligned bases, within the boundaries of the rRNA genes, is determined and sequences below a user defined minimal length cut-off are rejected.
After alignment and quality checks, the remaining sequences are dereplicated, clustered and classified. SILVAngs implements an approach similar to map and reduce. First all reads that are 100% identical (allowing overhangs) to another read are marked as replicate by the dereplication module. Next, the clustering module creates clusters of sequences with 98% sequence identity to each other (pairwise distance and single linkage clustering). The longest read in each cluster is selected as its reference. Finally, the classification module classifies all reference sequences. Currently, BLAST in combination with the SILVA SSU or LSU Ref datasets are used to classify the sequences. The resulting classification of the reference sequence of a cluster is mapped to all members of the respective cluster as well as their replicates. Sequences having an average BLAST alignment coverage and alignment identity of less than 93% will be considered as unclassified and assigned to the virtual taxonomical group "No Relative".
All results can be downloaded as CSV and SVG files. Aligned sequence data can be downloaded in the FASTA and ARB file formats.
If you consider this tool useful and use its results in a publication please consider citing  for SILVA and the SILVAngs pipeline. The pipeline itself uses the following tools: SINA for the alignment of se-quences (Pruesse et al., 2012), CD-HIT for the clustering of sequences (Li and Godzik, 2006), BLAST for the classification of sequences (Camacho et al., 2009), and KRONA for some parts of the visualisation of results (Ondov et al., 2011).
Overview of method for analysis (this can roughly be paraphrased for the purpose of manuscripts and grants): All sequence reads were processed by the NGS analysis pipeline of the SILVA rRNA gene database project (SILVAngs 1.3) . Each read was aligned using the SILVA Incremental Aligner (SINA SINA v1.2.10 for ARB SVN (revision 21008)) (Pruesse et al., 2012) against the SILVA SSU rRNA SEED and quality controlled . Reads shorter than 50 aligned nucleotides and reads with more than 2% of ambiguities, or 2% of homopolymers, respectively, were excluded from further processing. Putative contaminations and artefacts, reads with a low alignment quality (50 alignment identity, 40 alignment score reported by SINA), were idendified and excluded from downstream analysis.
After these inital steps of quality control, identical reads were identified (dereplication), the unique reads were clustered (OTUs), on a per sample basis, and the reference read of each OTU was classified. Dereplication and clustering was done using cd-hit-est (version 3.1.2; http://www. bioinformatics.org/cd-hit) (Li and Godzik, 2006) running in accurate mode, ignoring overhangs, and applying identity criteria of 1.00 and 0.98, respectively. The classification was performed by a local nucleotide BLAST search against the non-redundant version of the SILVA SSU Ref dataset (release 132; http://www.arb-silva.de) using blastn (version 2.2.30+; http://blast.ncbi.nlm.nih.gov/Blast.cgi) with standard settings (Camacho et al., 2009).
The classification of each OTU reference read was mapped onto all reads that were assigned to the respective OTU. This yields quantative information (number of individual reads per taxonomic path), within the limitations of PCR and sequencing technique biases, as well as, multiple rRNA operons. Reads without any BLAST hits or reads with weak BLAST hits, where the function "(% sequence identity + % alignment coverage)/2" did not exceed the value of 93, remain unclassified. These reads were assigned to the meta group "No Relative" in the SILVAngs fingerprint and Krona charts (Ondov et al., 2011).
This method was first used in the publications of Klindworth et al. (2013) and Ionescu et al. (2012)