A gene expression atlas of adult Schistosoma mansoni and their gonads

RNA-Seq has proven excellence in providing information about the regulation and transcript levels of genes. We used this method for profiling genes in the flatworm Schistosoma mansoni. This parasite causes schistosomiasis, an infectious disease of global importance for human and animals. The pathology of schistosomiasis is associated with the eggs, which are synthesized as a final consequence of male and female adults pairing. The male induces processes in the female that lead to the full development of its gonads as a prerequisite for egg production. Unpaired females remain sexually immature. Based on an organ-isolation method we obtained gonad tissue for RNA extraction from paired and unpaired schistosomes, with whole adults included as controls. From a total of 23 samples, we used high-throughput cDNA sequencing (RNA-Seq) on the Illumina platform to profile gene expression between genders and tissues, with and without pairing influence. The data obtained provide a wealth of information on the reproduction biology of schistosomes and a rich resource for exploitation through basic and applied research activities.

RNA-Seq has proven excellence in providing information about the regulation and transcript levels of genes. We used this method for profiling genes in the flatworm Schistosoma mansoni. This parasite causes schistosomiasis, an infectious disease of global importance for human and animals. The pathology of schistosomiasis is associated with the eggs, which are synthesized as a final consequence of male and female adults pairing. The male induces processes in the female that lead to the full development of its gonads as a prerequisite for egg production. Unpaired females remain sexually immature. Based on an organ-isolation method we obtained gonad tissue for RNA extraction from paired and unpaired schistosomes, with whole adults included as controls. From a total of 23 samples, we used high-throughput cDNA sequencing (RNA-Seq) on the Illumina platform to profile gene expression between genders and tissues, with and without pairing influence. The data obtained provide a wealth of information on the reproduction biology of schistosomes and a rich resource for exploitation through basic and applied research activities.
Design Type(s) parallel group design • transcription profiling by high throughput sequencing design • organism part comparison design

Measurement Type(s) gene expression
Technology Type(s)

RNA-seq assay
Factor Type(s) experimental condition • biological sex • animal body part

Background & Summary
Transcriptome profiling has substantially benefitted from sequencing approaches like as RNA-Seq. Compared to microarrays or SAGE/SuperSAGE, RNA-Seq offers a wider dynamic range and can theoretically cover all transcripts in a biological sample 1 . This was a major limitation of previous microarray approaches, where the sequences used as hybridization baits did not represent the complete genome/transcriptome, and where fluorescence intensities tend to saturate for highly expressed genes. SAGE/SuperSAGE is a RT-PCR and cloning-based technology. Although transcript quantification is possible, SAGE/SuperSAGE fails when transcripts/cDNAs lack specific restriction sites needed for cloning, or when a specific restriction site occurs too close to the polyA-tail region of an otherwise detectable mRNA/cDNA 2-7 . RNA-Seq has previously been used to profile gene expression in parasites like schistosomes [8][9][10][11][12][13] . The latter cause schistosomiasis, a devastating infectious disease that ranks second only to malaria 14 . A vaccine is not available, and only one widely used drug (Praziquantel) exists 15 . There is a justifiable fear resistance emerging, for which experimental and field-study evidence has been obtained [16][17][18][19] . Schistosomes exhibit remarkable reproductive biology, which is not understood at the molecular level 20-23 but may represent a vulnerability that could be exploited by novel control measures.
Schistosomes have a complex life-cycle including an intermediate snail host and a vertebrate final host in which the adult worm develops 21 . Whereas flatworms are generally hermaphrodites, schistosomes have evolved separate sexes. Sexual dimorphism is visible at the adult stage only 24 . An exceptional feature of schistosome biology, however, is the constant pairing contact which is the prerequisite for the sexual maturation of the female. Pairing induces processes in the female which lead to the differentiation of its gonads 20,21,24 . The latter comprise the ovary, producing oocytes, and the vitellarium, delivering vitelline cells that provide egg-shell proteins needed for egg production and resources for embryogenesis 20,21 . Composite eggs are finally formed within the ootype, the egg-forming organ which is connected to the ovary and vitellarium by separate ducts and additionally to the uterus which ends at the tegumental surface and ensures egg transport to the environment. The eggs possess dual capacity being important for life-cycle maintenance but also causing the pathologic consequences of schistosomiasis. Trapped in host tissues such as gut, spleen, and liver, eggs induce inflammatory processes which finally lead to liver cirrhosis 25 .
Few studies have been initiated to unravel the reproductive development of schistosomes at the molecular level 23,[26][27][28][29][30][31][32][33][34][35][36] . Although the first molecular insights into the exceptional male-female interaction have been obtained, our knowledge is still fragmentary. To fill existing gaps we approached pairingdependent processes and female sexual maturation by a comparative RNA-Seq analysis. To this end we isolated gonad tissue of S. mansoni by an organ-isolation method 37 . Adult worms were raised by rodent infections either with both genders, leading to paired worms (b; bisex), or with a single gender (s; single sex), leading to sexually naïve males (M) or virgin females (F) 38 , and RNA was extracted from complete testes (T) and ovaries (O) as well as whole worms. In this way 23 libraries of paired (bF) and unpaired (sF) females and the corresponding ovaries (bO, sO) as well as of paired (bM) and unpaired (sM) males and the appropriate testes (bT, sT) were obtained for Illumina sequencing. After mapping to the reference genome, reads were counted and used to calculate differential gene expression caused by adult pairing, gender difference, and tissue orgin.
Our data sets represent the first gonad-specific transcriptome profiles of a parasitic flatworm including genes regulated by pairing in ovaries and testes of S. mansoni, one of the three major species affecting humans 14,15 . In the related work 39 published in Scientific Reports we performed a first in-depth bioinformatics analysis to explore and mine the data, providing an integrated overview on pairinginfluenced processes that cover the majority of genes in ovaries but also genes in testes. The dataset reported here will be supportive for future studies investigating the reproductive biology of schistosomes and further parasitic flatworms, for which to our knowledge no comparable studies exist.

Methods
Detailed methods on schistosome maintenance, gonad isolation, RNA extraction, RNA-Seq analyses and data processing can be found in our related work 39 .

Experimental design
The workflow for the comparative RNA-Seq approach included the following steps (Fig. 1). After isolating adult worms from final hosts by perfusion 38 , paired worms were separated and their gonads isolated the same day 37,39 . All biological material was kept in Trizol in liquid nitrogen until RNA extraction. For each sample 100 ng of total RNA was used for generating RNA-Seq libraries, of which sequencing was performed with 100 bp paired-ends. After quality assessment, raw reads were mapped to S. mansoni reference genome using TopHat 40 (version 2.0.8b). Differential gene expression was analyzed using the R package edgeR 41 (v3.6.7), for which raw reads were counted by HTSeq 42

Quantitative RT-PCR
For confirming the RNA-Seq results, real-time quantitative RT-PCRs (qPCRs) were performed. Total RNA of approved quality of each sample was used for cDNA synthesis using the QuantiTect Reverse Transcription Kit (Qiagen) including a genomic DNA wipe-out step. The subsequent qPCR was done with SYBR Green for detection (PerfeCTa SYBR Green Super Mix, Quanta) and a Rotor-Gene Q cycler (QIAGEN). Each gonad sample was analyzed with two biological replicates and two technical replicates. Ct (threshold cycle) values were obtained and compared with corresponding RPKM values from the RNA-Seq results. The online tools Primer3 Plus (http://www.bioinformatics.nl/cgi-bin/primer3plus/ primer3plus.cgi/), OligoCalc (http://biotools.nubic.northwestern.edu/OligoCalc.html), and OligoAnalyzer (https://eu.idtdna.com/calc/analyzer) were applied to design and analyze primers. All primer pairs were conceived to have an annealing temperature of 60°C. Their sequences are shown in Table 1.

Code availability
Codes have been deposited as part of the archive in figshare (Data Citations 1, 2).

Data Records
All sequence data can be obtained from the European Nucleotide Archive (ENA) (Data Citation 3) and from Array Express (Data Citation 4). Transcript profiles of each gene detected in the study were deposited as individual pdf files for downloading in figshare (Data Citations 1, 2). Furthermore, a web Figure 1. Schematic overview of the experimental design. All biological replicates (n = 2 for sO and n = 3 for all other samples) were obtained from the species Schistosoma mansoni, which was maintained in hamsters as final hosts. The separation of couples and the isolation of ovaries and testes were always performed at the same day to avoid any in vitro-culture effects. Total RNA was extracted by Trizol and its quality assessed by electropherogram analyses using an Agilent 2100 Bioanalyzer and Pico chips (Agilent Technologies). For each sample 100 ng of total RNA was used for synthesizing cDNA and generating libraries. Pair-end 100 bp sequencing was performed on Illumina HiSeq 2500 running two technical replicates for each sample. Raw reads were assessed using FastQC (http://www.bioinformatics.babraham.ac.uk/projects/fastqc/) and mapped to Schistosoma mansoni reference genome (v5.0) using TopHat. Reads for each transcript were counted using HTSeq and imported to edgeR for normalization across all samples and differential expression analysis. Mean RPKM and s.e.m. values based on normalized reads were calculated and used for barplots. A transcript profile was generated for each gene, including the barplot, as well as the log 2 fold-change (log 2 FC) and adjusted P (PAdjust) values based on various comparisons (see Smp_051920.2 as representative example, a nanosortholog that is abundantly transcribed in ovaries of unpaired females = sO).

Technical Validation
Qualitative and quantitative control of extracted RNA Quality and quantity of total RNA from each sample was checked by electropherogram analyses using the Agilent RNA 6000 Pico Kit (Agilent Technologies). Representative results of all samples are shown in Fig. 2. Note that the 28S rRNA peak is not present due to a known gap region within the molecule 43 .

Quality control of RNA-Seq reads
RPKM values were calculated from non-normalized reads. Density plots were generated from transformed log 2 (RPKM+0.001) values as a method to check RPKM distributions and replicate matchings. All samples exhibited bimodal distributions, with the first peak representing the percentage of  genes without any reads and the second peak showing the highest RPKM density (Fig. 3). In addition, for all eight samples, the biological replicates matched to each other indicating good correlations.

Principal component analysis and representative transcripts
By Principle Component Analysis (PCA), all 23 samples were clustered into different groups, indicating differences and/or similarities within their transcriptomes, which except for gender or tissue were also caused by pairing. Furthermore, we identified transcription profiles that allowed conclusions about specialized gene functions, which is of high value for basic as well as applied research aspects (Fig. 4). These examples illustrate that genes were detected with specialized pairing-dependent or -independent function in either one gender and gonad-independently, or in gonads of both genders, or gonadspecifically in one gender 39 . Although the vitellarium has not been covered as a separate reproductive female organ, genes with functions in this organ are among those differentially regulated in bF compared to sF. The vitellarium is the biggest organ in schistosome females, representing 70-80% of the body of a sexually mature female. Thus it appears likely that genes whose transcripts are more abundant in bF compared to sF, and for which few transcripts could be detected in ovaries of paired females (bO), represent genes with functions associated preferentially or specifically to the vitellarium 39 . Representative examples are p14 (Smp_131110) or fs800 (Smp_000290) (Fig. 4), both egg-shell precursor genes whose activities were demonstrated before to be specific for the vitellarium 44,45 .   . Correlations between RNA-Seq and qPCR results. Each dot indicates a single gene whose corresponding Smp-number can be found in Table 1. The selected genes include those with high (large RPKM and small Ct-values) and low transcript abundance (small RPKM and large Ct) in the gonad samples. Pearson´s correlation values are given in each case. Full names of gene symbols: psm (proteasome), rad50 (DNA double-strand break repair rad50 ATPase), melk (maternal embryonic leucine zipper kinase), hmgcs (Hydroxymethylglutaryl-CoA synthase), meg13 (micro exon gene 13), vwa (von Willebrand factor A), twik (twik family of potassium channels), sox (transcription factor SOX), syt14 (synaptotagmin XIV), cpeb1 (cytoplasmic polyadenylation element binding), dia (diaphanous), EIF4G (eukaryotic translation initiation factor 4 gamma), letem1 (LETM1 and EF hand domain containing protein 1), lin9 (protein lin9), rgs7 (regulator of G protein signaling), elav (embryonic lethal abnormal visual system), and cdc25 (M phase inducer phosphatase 3 cdc25).

Validation of gene expression by qPCR
Ct-values were calculated by qPCR. Pearson's correlations between RNA-Seq expression (RPKM) and qPCR (Ct) were calculated (Fig. 5). For the analyzed genes, all gonad samples demonstrated a good correlation between RNA-Seq and qPCR expression (Pearson's r>0.8 in all cases).