RNA-Seq has proven excellence in providing information about the regulation and transcript levels of genes. We used this method for profiling genes in the flatworm Schistosoma mansoni. This parasite causes schistosomiasis, an infectious disease of global importance for human and animals. The pathology of schistosomiasis is associated with the eggs, which are synthesized as a final consequence of male and female adults pairing. The male induces processes in the female that lead to the full development of its gonads as a prerequisite for egg production. Unpaired females remain sexually immature. Based on an organ-isolation method we obtained gonad tissue for RNA extraction from paired and unpaired schistosomes, with whole adults included as controls. From a total of 23 samples, we used high-throughput cDNA sequencing (RNA-Seq) on the Illumina platform to profile gene expression between genders and tissues, with and without pairing influence. The data obtained provide a wealth of information on the reproduction biology of schistosomes and a rich resource for exploitation through basic and applied research activities.
Machine-accessible metadata file describing the reported data (ISA-tab format)
Background & Summary
Transcriptome profiling has substantially benefitted from sequencing approaches like as RNA-Seq. Compared to microarrays or SAGE/SuperSAGE, RNA-Seq offers a wider dynamic range and can theoretically cover all transcripts in a biological sample1. This was a major limitation of previous microarray approaches, where the sequences used as hybridization baits did not represent the complete genome/transcriptome, and where fluorescence intensities tend to saturate for highly expressed genes. SAGE/SuperSAGE is a RT-PCR and cloning-based technology. Although transcript quantification is possible, SAGE/SuperSAGE fails when transcripts/cDNAs lack specific restriction sites needed for cloning, or when a specific restriction site occurs too close to the polyA-tail region of an otherwise detectable mRNA/cDNA2,
RNA-Seq has previously been used to profile gene expression in parasites like schistosomes8,
Schistosomes have a complex life-cycle including an intermediate snail host and a vertebrate final host in which the adult worm develops21. Whereas flatworms are generally hermaphrodites, schistosomes have evolved separate sexes. Sexual dimorphism is visible at the adult stage only24. An exceptional feature of schistosome biology, however, is the constant pairing contact which is the prerequisite for the sexual maturation of the female. Pairing induces processes in the female which lead to the differentiation of its gonads20,21,24. The latter comprise the ovary, producing oocytes, and the vitellarium, delivering vitelline cells that provide egg-shell proteins needed for egg production and resources for embryogenesis20,21. Composite eggs are finally formed within the ootype, the egg-forming organ which is connected to the ovary and vitellarium by separate ducts and additionally to the uterus which ends at the tegumental surface and ensures egg transport to the environment. The eggs possess dual capacity being important for life-cycle maintenance but also causing the pathologic consequences of schistosomiasis. Trapped in host tissues such as gut, spleen, and liver, eggs induce inflammatory processes which finally lead to liver cirrhosis25.
Few studies have been initiated to unravel the reproductive development of schistosomes at the molecular level23,26,
Our data sets represent the first gonad-specific transcriptome profiles of a parasitic flatworm including genes regulated by pairing in ovaries and testes of S. mansoni, one of the three major species affecting humans14,15. In the related work39 published in Scientific Reports we performed a first in-depth bioinformatics analysis to explore and mine the data, providing an integrated overview on pairing-influenced processes that cover the majority of genes in ovaries but also genes in testes. The dataset reported here will be supportive for future studies investigating the reproductive biology of schistosomes and further parasitic flatworms, for which to our knowledge no comparable studies exist.
Detailed methods on schistosome maintenance, gonad isolation, RNA extraction, RNA-Seq analyses and data processing can be found in our related work39.
The workflow for the comparative RNA-Seq approach included the following steps (Fig. 1). After isolating adult worms from final hosts by perfusion38, paired worms were separated and their gonads isolated the same day37,39. All biological material was kept in Trizol in liquid nitrogen until RNA extraction. For each sample 100 ng of total RNA was used for generating RNA-Seq libraries, of which sequencing was performed with 100 bp paired-ends. After quality assessment, raw reads were mapped to S. mansoni reference genome using TopHat40 (version 2.0.8b). Differential gene expression was analyzed using the R package edgeR41 (v3.6.7), for which raw reads were counted by HTSeq42 (v0.5.4). Mean RPKM values based on normalized reads were used for illustrative purposes (barplots and heatmaps), and Standard Error of Mean (s.e.m.) values were calculated and added where applicable.
For confirming the RNA-Seq results, real-time quantitative RT-PCRs (qPCRs) were performed. Total RNA of approved quality of each sample was used for cDNA synthesis using the QuantiTect Reverse Transcription Kit (Qiagen) including a genomic DNA wipe-out step. The subsequent qPCR was done with SYBR Green for detection (PerfeCTa SYBR Green Super Mix, Quanta) and a Rotor-Gene Q cycler (QIAGEN). Each gonad sample was analyzed with two biological replicates and two technical replicates. Ct (threshold cycle) values were obtained and compared with corresponding RPKM values from the RNA-Seq results. The online tools Primer3 Plus (http://www.bioinformatics.nl/cgi-bin/primer3plus/primer3plus.cgi/), OligoCalc (http://biotools.nubic.northwestern.edu/OligoCalc.html), and OligoAnalyzer (https://eu.idtdna.com/calc/analyzer) were applied to design and analyze primers. All primer pairs were conceived to have an annealing temperature of 60 °C. Their sequences are shown in Table 1.
Codes have been deposited as part of the archive in figshare (Data Citation 1: Figshare https://doi.org/10.6084/m9.figshare.4868093.v2, Data Citation 2: Figshare https://doi.org/10.6084/m9.figshare.4884290.v2).
All sequence data can be obtained from the European Nucleotide Archive (ENA) (Data Citation 3: European Nucleotide Archive PRJEB14695) and from Array Express (Data Citation 4: ArrayExpress E-ERAD-516). Transcript profiles of each gene detected in the study were deposited as individual pdf files for downloading in figshare (Data Citation 1: Figshare https://doi.org/10.6084/m9.figshare.4868093.v2, Data Citation 2: Figshare https://doi.org/10.6084/m9.figshare.4884290.v2). Furthermore, a web interface was created that offers immediate access to individual transcript plot data by Smp numbers of genes of interest (http://schisto.xyz/geneexp).
Qualitative and quantitative control of extracted RNA
Quality and quantity of total RNA from each sample was checked by electropherogram analyses using the Agilent RNA 6000 Pico Kit (Agilent Technologies). Representative results of all samples are shown in Fig. 2. Note that the 28S rRNA peak is not present due to a known gap region within the molecule43.
Quality control of RNA-Seq reads
RPKM values were calculated from non-normalized reads. Density plots were generated from transformed log2(RPKM+0.001) values as a method to check RPKM distributions and replicate matchings. All samples exhibited bimodal distributions, with the first peak representing the percentage of genes without any reads and the second peak showing the highest RPKM density (Fig. 3). In addition, for all eight samples, the biological replicates matched to each other indicating good correlations.
Principal component analysis and representative transcripts
By Principle Component Analysis (PCA), all 23 samples were clustered into different groups, indicating differences and/or similarities within their transcriptomes, which except for gender or tissue were also caused by pairing. Furthermore, we identified transcription profiles that allowed conclusions about specialized gene functions, which is of high value for basic as well as applied research aspects (Fig. 4).
These examples illustrate that genes were detected with specialized pairing-dependent or -independent function in either one gender and gonad-independently, or in gonads of both genders, or gonad-specifically in one gender39. Although the vitellarium has not been covered as a separate reproductive female organ, genes with functions in this organ are among those differentially regulated in bF compared to sF. The vitellarium is the biggest organ in schistosome females, representing 70–80% of the body of a sexually mature female. Thus it appears likely that genes whose transcripts are more abundant in bF compared to sF, and for which few transcripts could be detected in ovaries of paired females (bO), represent genes with functions associated preferentially or specifically to the vitellarium39. Representative examples are p14 (Smp_131110) or fs800 (Smp_000290) (Fig. 4), both egg-shell precursor genes whose activities were demonstrated before to be specific for the vitellarium44,45.
Validation of gene expression by qPCR
Ct-values were calculated by qPCR. Pearson’s correlations between RNA-Seq expression (RPKM) and qPCR (Ct) were calculated (Fig. 5). For the analyzed genes, all gonad samples demonstrated a good correlation between RNA-Seq and qPCR expression (Pearson’s r>0.8 in all cases).
How to cite this article: Lu, Z. et al. A gene expression atlas of adult Schistosoma mansoni and their gonads. Sci. Data 4:170118 doi: 10.1038/sdata.2017.118 (2017).
Publisher’s note: Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
ArrayExpress E-ERAD-516 (2016)
The authors thank Fabian Tann for technical support. This work was supported by the Wellcome Trust, grants 107475/Z/15/Z (C.G.G., M.B.; F.U.G.I.) and 098051 (M.B.), and by a grant of the Deutsche Forschungsgemeinschaft (D.F.G.), GR1549/7-2 (C.G.G.).