Introduction

The Ixodidae share the most significant common denominator of blood feeding behaviour with all other blood-feeding arthropods: the host-vector interface. Hosts on which ticks feed possess effective defence mechanisms to prevent blood loss and possible infection. Tick saliva in turn forms the key to successful feeding by counteracting the hosts’ wound healing responses and innate and acquired immune responses during prolonged feeding1. This is due to the expansion of certain dominantly expressed protein families in their saliva2,3. In this regard, an understanding of the biology of tick feeding, the hosts’ immune evasion tactics and genes involved in these processes contribute to the ongoing search towards improved tick control methods.

Transcriptomics enabled by RNA next generation sequencing technologies (RNA-Seq) has revealed the complex dynamics of the salivary glands of ticks as evident in the growing list of published tick salivary gland transcriptomes generated from various tick life stages and/or genders4. These studies identified thousands of transcripts coding for numerous protein families secreted over the course of feeding with defined differential expression patterns between male and female ticks. This approach shaped the concept of ‘sialome switching’, which is hypothesised to aid the tick in feeding for days on the host and still evade the host’s immune system5,6,7. Transcriptomes enabled identification of orthologs and paralogs i.e. genes that started diverging by speciation or duplication, that allow tracing of protein functionality to common ancestors. Transcriptomics also allow for the cataloguing of known and unique genes that can be used to identify proteins using proteomic methods. This is especially relevant for neglected, emerging and re-emerging diseases, such as tick toxicosis and tick induced paralysis, caused by non-model organisms8.

Rhipicephalus evertsi evertsi is an economically important two-host tick widely distributed across sub-Saharan Africa. It is the major trans-stadial vector of Theileria equi and Babesia caballi, making it an important agent of equine piroplasmosis9. Adults prefer large mammals, especially cattle and Equidae. However, the synchronised appearance of adults in large numbers during spring in certain areas in South Africa, burden lambs extensively and cause an ascending paralysis10,11 comparable to the phenotypic paralyses caused by other ticks12. To date, no molecular data exist that can explain the ability of R. evertsi evertsi to modulate inflammation, host immune responses, haemostasis or cause paralysis. We therefore sequenced the salivary gland transcriptome of male and female R. evertsi evertsi, over the course of feeding. The aim of this study was to de novo assemble a gene catalogue of R. evertsi evertsi salivary gland proteins of early, mid and late feeding stages. This enabled the identification of genes putatively involved in feeding as extrapolated from expression profiles of protein families during the different feeding stages of both sexes. The catalog may also be used in future as resource for identification of functional proteins.

Results

Transcriptome assembly and completeness

Sixty-six datasets were generated from different feeding stages for male and female salivary glands. Rigorous adapter and quality trimming removed 44–58% of the reads summarized in Table S1. Using various assemblers and kmer sizes, the datasets were used to de novo assemble the different time points of the male and female salivary glands of R. evertsi evertsi yielding 1320 separate assemblies. Open reading frames (ORFs) were extracted from the assembled contigs and clustered at 95% identity using CD-HIT13. The ORF set was reduced by selection of coverage cut-offs (reads per million, RPKM > 5; transcripts per million, TPM > 1) and a BLASTX cut-off of E-004. All ORFs (TPM ≥ 1) were manually inspected for duplicate sequences by BLASTP14 analysis and alignment before selecting those with the highest percent identity, bit score and complete domains to reduce redundancy to a minimum dataset (MDS) of 15,115. The annotation completeness and transcriptome assembly quality as determined by the Benchmarking Universal Single-Copy Orthologs (BUSCO)15 reference-based metrics graded the transcriptome as 82.9% complete with 60.7% as complete single genes, 22.2% as complete duplicated genes, 4.0% fragmented and 13.1% missing. These values compare well with those from other Rhipicephalus transcriptomes (Fig. 1).

Figure 1
figure 1

Comparison of BUSCO values of Rhipicephalus evertsi evertsi and transcriptomes published from other Rhipicephalus species. Indicated in parenthesis is the number of transcripts from each transcriptome submitted to BUSCO analysis. The percentage of single, duplicated, fragmented and missing genes are presented as a percentage fraction contributing to the whole transcriptome. The number of ORFs analysed are indicated in parenthesis.

Comparison of the R. evertsi evertsi assembled transcriptome to transcriptomes of ticks from the Rhipicephalus genus

To obtain a comprehensible assessment of the quality and completeness of the transcriptome, the 15,115 ORFs were translated to proteins and compared with proteins from publicly available Rhipicephalus transcriptomes. To get an estimate on whether the proteins from R. evertsi evertsi are shorter than other Rhipicephalus transcriptomes (indicating truncated transcripts), the protein lengths from various transcriptomes were compared (Fig. 2A). This confirmed that the translated proteins from R. evertsi evertsi are generally longer than the other Rhipicephalus transcriptomes, except for Rhipicephalus sanguineus. For an estimate of shared orthologs, the transcriptome was compared with the transcriptomes of Rhipicephalus appendiculatus16, Rhipicephalus microplus17,18, Rhipicephalus pulchellus19, R. sanguineus20 and Rhipicephalus zambeziensis21 using the reciprocal best hit approach (RBH). Pairwise comparisons indicated that RBHs between transcriptomes are similar, suggesting that the R. evertsi evertsi transcriptome possessed similar numbers of orthologs when compared to existing transcriptomes (Fig. 2B). A Venn diagram plot indicated that R. evertsi evertsi shared 7726 orthologs in total with all Rhipicephalus transcriptomes analysed (Fig. 2C). A total of 2294 orthologs was shared between all transcriptomes, while 5432 orthologs were shared by two or more transcriptomes. Similar results were obtained for the other Rhipicephalus transcriptomes (supplementary material), indicating that the R. evertsi evertsi transcriptome is comparable to these published transcriptomes. The total number of orthologs could be classified as 83% housekeeping and 12% secretory (classification below).

Figure 2
figure 2

Comparison of the Rhipicephalus evertsi evertsi (Reve) transcriptome with other publically available Rhipicephalus transcriptomes that include Rhipicephalus appendiculatus (Rapp), Rhipicephalus microplus (Rmic), Rhipicephalus pulchellus (Rpul), Rhipicephalus sanguineus (Rsan) and Rhipicephalus zambeziensis (Rzam). (A) A plot of the protein length in amino acid residues presented up to 2000 residues against the number of proteins. Proteins were binned in windows of 20 based on protein length. (B) Pairwise reciprocal best hit analysis for the publically available Rhipicephalus transcriptomes. (C) Venn diagram to show the number of shared orthologs found for the Rhipicephalus evertsi evertsi transcriptome based on reciprocal best hits. Similar diagrams can be found for the other publically available Rhipicephalus transcriptomes in the supplementary material. Numbers in parenthesis indicate total number of reciprocal best hits for each transcriptome and while those shared among transcriptomes are indicated in numbers.

Annotation and predicted protein classes

The transcriptome was annotated with sequence similarity searches to proteins in at least one of the listed databases. Six percent (n = 869) of the ORFs could not be annotated with a known function but contributed 3% of the expression represented by TPM values with a dynamic transcript expression range from TPM 1 to 3321. This underlined the fact that many expressed proteins with potential physiological and biological functions, remains uncharacterized. Fourteen percent (n = 2040) could be annotated with a putative secretory protein function contributing 46% of the expression while the majority (80%, n = 12,206) represented a housekeeping function contributing 51% to the overall expression.

Expression profiling and transcript abundance by functional class in R. evertsi evertsi salivary glands

The R. evertsi evertsi transcriptome displayed a range of expression over time in terms of transcripts per million (TPM) from the lowest expressed (TPM = 1) to the most abundant (TPM = 13,234). A moderate positive Pearson’s correlation was obtained over the whole transcriptome between expression and abundance (counts) of transcripts of each class (r(85) = 0.5083, p < 0.05) as well as between the most abundant transcripts and their expression (r(20) = 0.7096, p < 0.05). On the other hand, the highly expressed transcripts were not necessarily the most abundant (r(20) = 0.331, p < 0.05) with the twenty most abundant proteins contributing ~ 57% to the total expression based on TPM values. The majority of these were housekeeping proteins involved in metabolism, genetic information processing, cellular processes and organismal systems. Secretory proteins also highly expressed were the glycine-rich family (26%), lipocalins (5%), trypsin inhibitor-like cysteine rich (TIL) domain proteins (3%), immunoglobulin-binding proteins (2%), Kunitz-type inhibitors (BPTI) (2%), the 8.9 kDa protein family and metalloproteases, each contributing ~ 1% to the total expression in the transcriptome respectively.

Housekeeping functional class

The 12,206 transcripts assigned to the housekeeping class were divided into five main groups based on KEGG GhostKOALA database searches (Fig. 3). These groups relate to metabolism (n = 2120), environmental information processing (n = 1072), genetic information processing (n = 836), cellular processes (n = 961) and organismal systems (n = 1860). To infer functionality within the 5 groups, 9327 (76.4%) of the transcripts mapped to KEGG reference pathways.

Figure 3
figure 3

Housekeeping proteins from Rhipicephalus evertsi evertsi that mapped to the five groups of known housekeeping pathways defined by Kegg orthology.

Key metabolic pathways included carbohydrate metabolism (12%), amino acid metabolism (9%), and lipid metabolism (7%). While information regarding tick metabolic pathways remains limited, carbohydrate, lipid and amino acid metabolism may play a role during pathogen infection22.

The environmental information processing group (n = 1072) included important housekeeping processes such as signal transduction (94%), membrane transport (2%), signalling molecules and interaction (4%). Important signal transduction pathways identified were the highly conserved phosphoinositide-3-kinase-protein kinase B activated/Akt pathway (PI3K-Akt signalling pathway), the mammalian target of rapamycin (mTOR) signalling pathway, which both control cell metabolism, growth, proliferation and survival19 and the mitogen-activated protein kinase (MAPK) signalling pathway. These pathways are ubiquitously expressed and evolutionary conserved in eukaryotes23.

The genetic information processing group contained the genetic informative processes encapsulated in the Central Dogma represented by 15% transcription, 38% translation, 14% DNA replication and repair and 33% folding, sorting and degradation of proteins.

Secretory functional class

Protein families within the secretory class (n = 2040) were expressed at varying levels ranging from TPM 3.59–90813 or 0.002% to 57% of the total TPM with no correlation between the number of proteins in the family and what they contribute toward total transcript expression within the class (Table 1). The largest contributor to the expression of the secreted proteins were the glycine-rich protein family (cement) representing 57% to the total transcript expression in the secretory protein class from only 177 transcripts, followed by the lipocalins contributing 11% from 235 transcripts. The remaining nine protein families had TPM values from 9525 to 1907 (6 to 1%) and consisted of a range of transcripts per family (8 to 306) (Fig. 4). These included TIL domain proteins, immunoglobulin-binding proteins (IGBPB), bovine pancreatic trypsin inhibitor (BPTI), ML domain lipid-recognition proteins, proteins belonging to the 8.9-kDa polypeptide family unique to hard ticks, reprolysin/metalloproteases, basic tail secretory proteins (BTSP), mucins and evasins. Those with the lower TPM values included proteins belonging to the 28-kDa metastriate specific protein family, Ixodegrin B, a protease inhibitor similar to the Hirudin-like superfamily, antigen 5 cysteine rich proteins (AG5), 24 kDa proteins, defensins, gluzincin, 5′ nucleotidases like apyrase (5NT) and secretory proteins with unknown functions (Table 1) (Fig. 4).

Table 1 Characteristics of major tick secretory protein family expression in the Rhipicephalus evertsi evertsi transcriptome.
Figure 4
figure 4

Expression analysis of the secretory (SEC) proteins in the Rhipicephalus evertsi evertsi transcriptome (n = 2040). The glycine-rich protein family (cement) accounted for 57% of the total expression within the class while contributing less than 10 percent of the transcripts in the class. Expression was measured by transcript per million (TPM).

Orthologs in the Rhipicephalus evertsi evertsi transcriptome for proteins with known functions involved at the tick-host interface

To date functions for ~ 120 tick proteins involved at the tick-host interface have been experimentally validated24. Finding such orthologs within a transcriptome is an important quality measure, since it allows assignment of function based on homology, experimentally verified functions and provide confirmation that such functional proteins are present in the transcriptomes of evolutionary related organisms. A BLASTP14 analysis of protein sequences with known functions followed by phylogenetic analysis allowed identification of 22 potential orthologs (Table 2).

Table 2 Orthologs identified in the Rhipicephalus evertsi evertsi transcriptome for proteins with known functions.

Gender dependent transcript expression patterns

To determine expression patterns over time for each sex, single reads for each time point were mapped so the MDS to estimate transcript expression per sampling point of unfed, early feeding, mid-feeding and late feeding (Fig. 5). The male and female transcriptomes shared 83.3% ORFs with 5.1% unique ORFs in the female transcriptome and 6% unique male ORFs. The class with unknown functions shared 4.8% ORFs between the sexes with 0.2% unique ORFs in the males and 0.6% ORFs in females.

Figure 5
figure 5

Female and male housekeeping and secretory protein expression patterns per day expressed as percentage per day based on TPM ≥ 0.5 cut off for females and males. Protein families with a significant increase or decrease in expression are indicated with stars. Represented housekeeping proteins were: translation (TRSL), folding sorting and degradation (FSDE), transcription (TRSC), replication and repair (REPL), transport and catabolism (TRCA), signal transduction (STDU), carbohydrate metabolism (CMET), energy metabolism (EMET), amino acid metabolism (AMET), lipid metabolism (LMET), glycan biosynthesis and metabolism (GBME), nucleotide metabolism (NMET), signaling molecules and interaction (SIGN), cell motility (CMOT), digestive system (DIGS).

The most abundant housekeeping classes, namely translation (TRSL), folding, sorting, degradation (FSDE) and transcription (TRSC) in both male and female transcriptomes accounted for the majority of the overall expression in each (51% female, 47% male) (Fig. 5). The housekeeping proteins involved in translation, folding sorting, degradation and transcription followed the trend of the whole transcriptome at being the most abundant and highly expressed in both the male and female salivary gland transcriptomes although there was a 5% difference in expression in the male (TRSL TPM = 98,784) and the female’s (TRSL TPM = 127,389).

The male dominant secretory protein, immunoglobulin-binding protein (IGBP), was highly expressed from only a few transcripts and accounted for 7% of the male secretory protein transcript expression. This was followed by the glycine-rich superfamily (cement), which was expressed 4% higher in the male transcriptome than in the female. The metalloproteases made up 2% of each gender’s transcriptome while the Kunitz-type inhibitors (BPTI), lipocalins and basic tail secretory protein (BTSP) families each contributed to 1% in both male and female expressed proteins with major expressed lipocalins having predicted functional folds of serotonin and histamine binding activity using the Phyre2 protein fold recognition server25. However, only 25% (18 of the 72 identified female lipocalins) had the canonical biogenic amine-binding motif that would support binding of histamine and serotonin26.

The initial high expression of cement by the females decreased between day 4 to 6 (~ 50% to less than 10%). The secretion of lipocalins, evasins, Kunitz-type inhibitors, TIL and BTSPs increased after day 0 and varied during the feeding period of day 2 to 4 (~ 10 to 30%) but lipocalins had a gradual decrease of expression as feeding progressed. At the day 4 to 6 feeding ixodegrin B, Ix8.9 kDa and a protein assigned to the hirudin class of anticoagulants, increased in expression value to be the most expressed proteins just before the females detached (< 10% and > 10%). In contrast, the expression of cement by the males remained relatively constant throughout the feeding period as did evasins, BPTIs and TIL with an increase of IGBPA toward day 6. Lipocalins were not as highly expressed in the males as in the females.

In both the male and the female transcriptome, the highest expressed housekeeping proteins (TRSL, FSDE, TRSC and TRCA) stayed constant with a gradual decrease in expression of TRSL and increase from day 2 for FSDE, TRSC and TRCA. To confirm the observation of an increase in expression during the feeding period and a down tapering toward the end, a simple illustration of the housekeeping, secretory and unknown classes followed suit (Fig. 6). Overall, certain protein families were predominantly expressed in either one of the sexes, especially in the case of the secretory proteins; the lipocalins, 28 kDa metastriate and ixodegrins appear to be more expressed in the female transcriptome.

Figure 6
figure 6

Illustrative effect of expression of the housekeeping (blue), secretory (orange) and unknown (grey) functional classes over time. Expression was measured as TPM ≥ 0.5 for both the male and female transcriptomes.

Differential expression between male and female transcriptomes

Principle component analysis (PCA) confirmed differential expression between male and female tick salivary gland transcripts and clustered feeding points based on similarities in their expression patterns (Fig. 7). There was a clear grouping of the unfed (day 0) males and females before changes in expression started. Males of day 2, 4 and 6 clustered together but separate from day 0, suggesting that differential expression occurs in males once feeding starts, but then stabilized. In contrast, expression patterns for the females of day 2 and 4 clustered together before a remarkable change in expression of the day 6 female salivary glands. The female differentially expressed proteins’ ranged from TPM 31,945 for a secretory Ixodegrin B-like protein to TPM 0.1 for a translational housekeeping protein. In the case of the males the highest differentially expressed protein was a ML-domain housekeeping protein (TPM 19,725) as compared to the lowest of the housekeeping proteins involved in the metabolism of terpenoids and polyketides (TPM 0.05).

Figure 7
figure 7

Principle component and subsequent expression analysis (volcano plot) of the male and female transcripts controlled for by day to identify differentially expressed genes over time in the male and female transcriptome. The scatter plot displays the largest variance between unfed male and female ticks (Reve Fem Day 0 and Reve Male Day 0) (31.29%) followed by the second largest variance between feeding time points of males (25.2%). The volcano plot represents the differentially expressed female and male transcripts over the feeding period as a fold change against absolute confidence. Differentially expressed transcripts were double filtered to have FDR value < 0.05 and a log2 fold change larger than 4 and smaller than − 4 (dots in blue). Those that are highly up- or downregulated are further to the right and left sides, while highly significant changes appear higher on the plot.

The projection by the PCA analysis presented the transformation from a metabolically inactive to metabolically active tick salivary gland, observing the most variation between day 0 males and females and the rest of the male feeding stages (31.2%) with the second largest variation being between the sequential days of male transcriptome (25.2%) and the female’s (13.1%) which had a dynamic range of variance in time points sampled (Fig. 7).

Both sexes had similar patterns in terms of overall expression of protein classes however, by applying the filters to the volcano plot data (FDR < 0.05, log2 fold, − 4 to 4 expression) 3258 transcripts from the female and 2086 from the male differentially expressed transcripts were identified. Interestingly, the largest fold changes in terms of the magnitude of the difference of expression values in the group were as predicted by the principle component analysis, between proteins of the unfed and day six transcriptomes especially the housekeeping proteins (Table 3). This was even more evident in the female transcriptome. Most of these major fold changes occur from day two up to day 6 of feeding for the females with 12% for the housekeeping proteins and 20% for the secretory proteins. Similar to the females, the male’s largest fold changes of differentially expressed proteins were also between the unfed and day 6 males but had a lot less expression changes (< 10%) during feeding i.e. day 2 versus 4, day 2 versus 6 and day 4 versus 6 for both the housekeeping and secretory proteins (Table 3). This may be expected from the male transcriptome.

Table 3 Counts of transcripts of each class; housekeeping (HKP), secretory protein (SP) and unknown function proteins (UNK), displaying their largest fold change or mean expression value relative to time points of sampling the female and male salivary gland transcriptome based on differential expression by RNA-Seq analysis using global trimmed mean of M values (TMM) normalization.

By removing the shared transcripts between the male and female differentially expressed transcripts, the males had 968 differentially expressed transcripts with 77% housekeeping, 16% secretory and 7% ‘unknown’ hits while the females had 2140 of which 4% were ‘unknown’, 44% secretory and 52% housekeeping.

These differentially expressed proteins, in both the male and female transcriptomes, with the highest significance in relation to fold change over time, measured as log2 fold change, were of particular interest (Fig. 8). The visual trend in the differentially expressed proteins of the male transcriptome is one of downregulation as opposed to the female’s upregulation dominated differentially expressed proteins. The majority of the differentially expressed transcripts were of the secretory class in both sexes. It would also support the observation of the high fold changes from day 2 to 6 for the females. Almost all secretory proteins are differentially expressed to a lesser degree in the males. This included the previously highly expressed cement, evasin, TIL proteins, IGBP and BPTIs. Gluzincin and the 24 kDa proteins displayed some upregulation. The females on the other hand, displayed a range of upregulated differentially expressed proteins with a downregulation of their evasin protein secretion. Those differentially expressed protein families with significant upregulation in the female salivary glands were the BPTIs, Ixodegrin, TIL, lipocalins, cement and mucins.

Figure 8
figure 8

Differentially expressed secretory proteins in the male and female transcriptome with the highest significance as related to fold change over time compared to protein abundance as counts. Positive log2 fold changes are indicative of upregulation while negative log2 fold changes indicate downregulation.

Discussion

The aim of the current study was the de novo assembly and annotation of a salivary gland transcriptome for R. evertsi evertsi representing various feeding stages that included unfed, early feeding and late feeding of males and females. Sequence data for R. evertsi evertsi have been oddly scarce since it has the widest geographical range of Rhipicephalus in Africa24,27, with only 153 nucleotide and 28 protein sequences available in GenBank before the start of the current study (confirmed on 13 June 2020). This study improves on this shortfall by contributing 15,115 annotated proteins. The assembled transcriptome quality in terms of read mapping-based assessment and reference sequence-based assessment, even after rigorous removal of duplicated sequences to render a minimal dataset, indicated it to be representative of the sequenced reads with a comparable level of accuracy and completeness. It is comparable to available salivary gland transcriptomes from other genetically related ticks. Most of the transcripts (95%) could be annotated or at least assigned to a functional class which left 6% with an unknown status contributing ~ 3% (n = 869) of the overall expression in the salivary glands. Functional classes represented those transcripts involved in housekeeping of the cell, secretory proteins with the presence of signal peptides, or of unknown function. Those with an unknown status showed homology to other unknown proteins in the database based on their BLAST E-values. Putative functions were attributed based on previously characterized proteins in publicly available databases. Experimental confirmation of such functions needs to be performed in future.

The housekeeping functional class contributed to the majority of the composition of the transcriptome (80%) but with lower expression (51%), while the secretory protein functional class represented only 14% of the transcriptome but 46% of the total coverage. This was comparable to the overall description of the secretory category of other salivary gland transcriptomes from Amblyomma americanum, Ixodes ricinus, R. appendiculatus, R. microplus, R. pulchellus, R. sanguineus and R. zambeziensis (between 12 and 63%)5,6,16,18,19,20,21. Nonetheless, the transcriptomes displayed a dynamic range of expression with a few transcripts responsible for most of the expression in the salivary glands. The unknown class was not analysed further although it may contain housekeeping proteins. The housekeeping classes is close to the ~ 7000 shared core housekeeping genes previously predicted as part of the core eukaryotic set of orthologous genes28, suggesting that the R. evertsi evertsi transcriptomes represent a large portion of the expected housekeeping genes.

Housekeeping functional class

More than 15% of the housekeeping transcripts that could map to KEGG reference pathways were responsible for environmental information processing relating to membrane transport, signalling transduction, signalling molecules and interaction. These processes rely on proteins (receptors) and their chemical signals (ligands) to regulate a cell’s internal microenvironment in response to external changes through the movement of molecules or fluids across membranes. For example, the active secretion of saliva from the salivary gland is controlled by dopamine (ligand) which activates two signal transduction pathways for salivary secretion: the cyclic adenosine monophosphate (cAMP)-dependant signal transduction pathway that leads to fluid secretion and the calcium-dependant signalling pathway that activates prostaglandin E2 production which ultimately also leads to protein secretion in tick saliva29. Orthologs for the dopamine D1-like receptors were present in this transcriptome30. Furthermore, osmoregulation and salivation via the salivary glands is a continuous process controlled by a complex neural and endocrine system31 that requires concentrating the nutrients from the blood meal and pumping approximately 75% of ingested water and ions back into the host. The major intrinsic protein (MIP) neuropeptide and respective G-protein coupled receptors forms part of osmoregulation32. MIP and its predicted G-protein coupled receptor were also identified in the current transcriptome. MIP and its receptors function in the modulation of membrane channels selectively transporting water and ions in and out, as well as between cells assisted by aquaporin. More recently, aquaporin’s involvement was extended to inflammatory processes in humans as reviewed33. This function could be extended to the complex mechanisms regulating inflammation in the tick-host feeding site and possibly the complexity of molecules needed in tick saliva to regulate host aquaporin signalling pathways during inflammation. Additional research into aquaporin’s regulatory mechanisms might be an alternative therapeutic target in the prevention or treatment of inflammation.

The majority of transcripts involved in genetic information processing were associated with mRNA translation to proteins (38%). This is the protein synthesis function of the ribosomes in the cytoplasm or endoplasmic reticulum after gene coding DNA has been transcribed to mRNA in the cell nucleus. It is a continuous process evident by the differential expression during feeding and illustrative of how the tick is able to constantly adapt to a changing feeding site and host immune defences. These housekeeping proteins (41%) contribute to the successful continuous feeding by the tick and are supported by key metabolic pathways with the majority of the transcripts involved in amino acid metabolism, carbohydrate metabolism and lipid metabolism.

Secretory functional class

The secretory functional class represented 46% of the overall expression in the transcriptome with the most abundant secretory protein transcripts being the BPTI (15% of the transcripts of the secretory protein class contributing ~ 4% to the secretory class and 2% of overall expression), the lipocalins (11.5% of the transcripts of the secretory protein class contributing ~ 11% to the secretory class and 5% of overall expression) and metalloprotease/Reprolysin families (10.5% of the transcripts of the secretory protein class contributing ~ 2% to the secretory class and 1% of overall expression). This was to be expected since these are major protein families that expanded due to gene duplication in the evolution of blood-feeding tick lineages2,3. To put the secretory protein families into functional context a brief description of each is provided with regard to known functions.

Cement formation

The dominantly expressed glycine-rich protein family accounted for 26% of the total expression and 57% in the secretory class alone. Glycine-rich proteins are involved in formation of the cement cone34. The dominant expression profiles of glycine-rich proteins have been noted before16,21,35,36.

Immunoglobulin scavengers

The male dominant IGBPA transcripts accounted for 2% of the total expression and 5% of the secretory protein class alone. Immunoglobulin-binding proteins have orthologs in all ixodid ticks, but vary in size between species and time of expression during feeding37,38. IGBPAs are essential proteins with the purpose of the inactivation of host IgG, otherwise detrimental to tick survival, while aiding its secretion back into the host39. This is probably an adaption to their extended periods of feeding and being exposed to the onslaught of the host’s immune system as compared to the shorter feeding strategies of soft ticks. Orthologs have been found for at least IGBPA and IGBPC in this R. evertsi evertsi transcriptome.

Cytokine storms and the evasins

Inflammation is driven by host chemokines, referred to as ‘the cytokine storm’40 and play an important role in recruiting immune cells to the feeding site, resulting in inflammation and possible tick rejection41. Evasins inhibit chemokine function and a large array of inhibitors has been found in ticks42. In R. evertsi evertsi a large number of the evasin A and evasin B class were found in the transcriptome. Orthologs for EVA-P991 and evasin B were detected.

Antimicrobial activities and TIL domain containing proteins

The TIL domain containing proteins (peptidase inhibitors) were described from mosquitoes, hard and soft ticks43 including the transcriptomes of the non-hematophagous Antricola delacruzi44. These transcripts displayed a steady decrease in the female transcriptome. However, TIL protein expression increased in the male transcriptome toward the end of feeding after a relatively constant expression thereof during the preceding days. The function of these proteins suggest a role in tick innate immunity and an antimicrobial function in preventing the tick from being infected after feeding45.

Fibrinogenolytic and immunomodulatory metalloproteases

Even though the functionality of many of the metalloproteases are unknown, those isolated and characterized from Ixodes scapularis targeted fibrin and fibrinogen, thereby preventing blood clotting or dissolving formed blood clots46, affect inflammation and wound healing by degrading integrin47. An ortholog to the fibrinogenolytic metalloprotease was found in the R. evertsi evertsi transcriptome. Other authors reported this protein family to target the innate immune response, reduce pain and cause vasodilation through the degradation of bradykinin48. An ortholog for angiotensin-converting enzyme has been found that may target bradykinin49,50.

Anti-inflammatory lipocalins

Proteins involved in the regulation of inflammation include proteins that minimize itching by targeting biogenic amines such as the histamine-binding lipocalins48,51. The lipocalin family of proteins have diverse functions from modulating host immune responses by scavenging complement C5, leukotriene-C4, -B4, histamine and serotonin, to the inhibition of blood-clotting through the prevention of platelet aggregation. Orthologs has been found for histamine-binding protein 1 specific to females from R. appendiculatus. These orthologs were also specific to females in the current study. However, given the large number of lipocalins found in the salivary glands other functions may be expected.

Complement inhibitors

Recently, complement inhibitors have been found for R. appendiculatus that specifically targets complement C5 to control complement-mediated inflammation52,53. Complement inhibitor (CirPT1) presents the knottin fold (inhibitor cysteine knot (aka ICK or Knottin)), while R. appendiculatus complement inhibitor (RaCI) presents a unique fold that resembles the disulphide-rich structure of small toxins from snake venom53. Orthologs for CirTP1 were present but not RaCI.

Blood clotting inhibitors

The BPTI family are slow tight binding Kunitz proteins preventing blood clotting by inhibiting thrombin. Known thrombin inhibitors are Amblin, Variegin, Boophilin, Americanin, Savignin and Ornithodorin54,55,56,57,58,59. Orthologs for Amblin, Boophilin and Rhipilin-1 were present in the R. evertsi evertsi salivary transcriptome. Other proteins that may also target thrombin include serpins such as Aas19 and RmS-1560,61,62, for which orthologs were found in the transcriptome. Longistatin act as plasminogen activator and enolase as plasminogen receptor, leading to formation of plasmin (fibrinolytic enzyme) that result in feedback inhibition of thrombin via a feedback loop of the extrinsic or tissue factor pathway63,64. Orthologs for both proteins were present. This would indicate that targeting of the blood-clotting cascade is an important part of the feeding strategy for this tick species.

The basic tail secretory proteins (BTSP)

The basic tail secretory proteins are lysine-rich at the C-terminus and were well represented in other transcriptomes65,66. One protein was shown to exhibit anticoagulant function for the inhibition of factor Xa67. The initial high expression of BTSP and subsequent decrease is similar to what was found for I. ricinus68.

The involvement of a protein similar to hirudin with potent thrombin inhibiting activity and ixodegrin B, which is only expressed in tick salivary glands in the late phase of feeding is indicative of a greater prevention of platelet aggregation which would suit the purpose during the engorgement phase of feeding from the blood pool. However, they are related to lectins which may also be indicative of a function in the tick innate immunity68.

Paralysis toxins

BLASTP14 analysis of the 18 available holocylotoxin genes in Genbank69 (the paralysis-causing toxins from Ixodes holocyclus), found no homologs. A PSI-BLAST analysis to identify its novel inhibitory cysteine knot (ICK) toxin fold70, as performed previously for holocyclotoxin12, retrieved all family members but did not detect any homologs. It was suggested that the ICK fold is related to the 5.3 kDa family66. No members of the 5.3 kDa family has been detected in the R. evertsi evertsi transcriptome. However, the small size and high genetic divergence of the holocyclotoxins may prevent identification of toxin orthologs at present. Even so, the availability of a salivary gland transcriptome may in future enable identification of a purified toxin using proteomic methods.

Differential expression

The differentially expressed secretory proteins, especially the multi-gene families, increased to 16% expression (compared to the initial overall 10%) and the housekeeping differentially expressed proteins decreased to 80% compared to the initial 87% overall expression. Most female differentially expressed transcripts were upregulated as opposed to those from the males, which were downregulated if assuming that they were upregulated during the initial stages of feeding. This was similar to what has been reported for R. pulchellus and R. zambeziensis19,21, but contradictory to R. appendiculatus16. Feeding processes between day 2 and 6 (day 3 to 4) seem to be characterized by an increase in differentially expressed transcripts as compared to the earlier and later feeding stages and was similar to expression patterns for nymphs of I. scapularis71. This period coincides with the start of the rapid engorgement phase which overlaps with the proliferative and remodelling phase of wound healing from day 3 after attachment1. The process of angiogenesis (the formation of new blood vessels and granulation tissue) also occurs during the proliferative phase of wound healing. The saliva of ticks therefore has potent cell proliferation and angiogenesis inhibitors, such as calreticulin72 and troponin-I73,74 to prevent wound healing and contraction. Calreticulin transcripts were present within the R. evertsi evertsi transcriptome as well as transcripts with the novel tropomodulin domain. Tropomodulin is a tropomyosin regulatory protein also identified in R. sanguineus, Amblyomma triste, Amblyomma sculptum, Amblyomma maculatum, Hyalomma excavatum, I. scapularis and R. pulchellus19,35,75,76,77,78,79.

The differential expression observed during progression of feeding time resonate with the concept of “sialome switching” where ticks changes their salivary gland protein expression patterns to evade the hosts immune system or as response to specific host species and environmental cues5,7,80. As more differential expression studies are performed this is becoming more apparent18,20,81. Differential expression as means to increase local concentrations at the feeding site and in response to the changing feeding site during the prolonged feeding of ixodids has also been considered24.

Male and female expression patterns

Because tick attachment and feeding is not a synchronized process, graphical illustration and analysis of transcripts by means of a principle component analysis and volcano plot allowed visualization of temporal changes in transcript expression over time in both male and female transcriptomes. Gender biased expression in response to blood feeding was confirmed with differential expression analysis confirming nearly half (~ 44%) of the female differentially expressed transcripts belonging to the secretory class, similar to other reports5,6. For both sexes, the glycine-rich cement expression was the most dominantly expressed per day. This group of proteins is exclusive to the ixodid lineage, with the exception of the Ixodes genus, and is important for tick attachment during feeding1,48,82. The current study’s observation of the increase in spatiotemporal transcript expression for female tick salivary gland glycine-rich transcripts up to day 4 of feeding suggests a feeding induced function for maintaining the integrity of the tick-feeding site as suggested before36. The function of the glycine-rich transcript expression as observed in the current and other studies with regard to the male expression patterns that remain relatively constant during the feeding process16,19,83, remains to be tested since this presumably does not function in cement cone maintenance. The time point expression profiles of the males and females, as obtained by pooling the salivary glands of males and females of each time point together and mapping each time point’s reads to the assembled transcriptome, revealed gender dominated secretory proteins as was found in other transcriptomes5,6,16,21,84,85,86,87.

Focusing on transcriptional response during tick feeding and large expression differences, the male dominated IGBPA appear to drastically increase after day 2 of feeding, to a lesser degree so for trypsin inhibitor-like (TIL domain) transcripts, while evasin expression remained relatively constant. The confirmation of male dominated expression of IGBP in the salivary glands conformed to previous studies on ixodid tick expression profiles of these proteins37,38,88, suggesting males have a biological support role during feeding of the female ticks. With early female feeding (day 0 to 4) bovine pancreatic trypsin inhibitor (BPTI), evasin, lipocalins and BTSP dominating salivary gland expression. From day 4 to the end of feeding (day 6) which coincides with the rapid engorgement phase, ixodegrin B, Ix 8.9 kDa and a protein similar to hirudin started dominating expression along with lipocalins. They are also the families that are differentially expressed as feeding progress.

Previous studies on the proteome of the spermatophore revealed ML-domain containing proteins, TIL domain proteins and immune proteins hypothesised to be important in sperm motility and mating89. The observation of the increase in expression patterns of these protein families from day 4 to day 6 of feeding males might associate these proteins with the concept of male seminal fluid if mating occurred with the rapid engorgement phase of the female (from day 4 onward). These families have also been implicated in tick innate immunity, protease inhibitors, antimicrobial peptides and venom toxins1,90. Other studies also noted the expression of serine proteases, cystatins, immune proteins and metalloproteases as potential seminal fluid proteins19,81,91. However, these proteins were not annotated as ‘seminal proteins’ hindering possible orthologous comparisons between species. In the current transcriptome using a NCBI conserved protein domain search, proteins relating to male reproduction were identified. These included transcripts coding for a zonadhesin-like protein, which is a sperm-specific membrane protein associated with cell adhesion consisting of several TIL domains and an uncharacterised spermatogenesis-associated protein 20 from the conserved YyaL, SSP411 protein family, containing thoiredoxin and six-hairpin glycosidase-like domains. In humans and rats it is expressed in the testis in an age-dependent manner with SSP411 mRNA increasing during spermatogenesis92. A vesicle-associated membrane protein from the MSP (Major sperm protein) domain important for sperm motility also formed part of the male expressed transcriptome93. Notably, these proteins were present in the male transcriptome but not differentially expressed.

Conclusions

The advances in next generation sequencing form the baseline towards functional genomic research especially in the case of a non-model organism, like R. evertsi evertsi, where genetic information is limited with no reference genome or proteome to annotate putative identified genes with little or no homology to other model-organisms. The current study represents the first de novo assembled transcriptome of the salivary glands from R. evertsi evertsi with time-dependant gender-specific transcript expression. The transcriptome was constructed using Illumina sequenced reads from unfed to representative feeding time phases of both male and female salivary glands. Unique salivary gland expression patterns in secretory protein families were observed for both sexes, the majority with unknown function, highlighting the shortfalls in tick feeding biology. This transcriptome will contribute to the increasing number of known tick transcriptomes that enabled identification of tick proteins, improving our basic understanding of tick biology and salivary gland function evident by the orthologous transcripts and those differentially expressed during male and female feeding. The identification of platelet aggregation inhibitors, blood clotting inhibitors and immune-modulators orthologous to the ancestral tick lineages confirm a common tick ancestor with similar functions than that of R. evertsi evertsi adding to our insight to their evolutionary salivary gland protein biology. Not having biological replicates might influence statistical variation when doing analysis across groups. However, the technology seems robust enough to be able to deduce expression trends in the R. evertsi evertsi transcriptome that is comparable to existing transcriptomes. Including biological replicates will offer stronger support for further statistical data analysis. It should also be kept in mind that trying to elucidate the molecular basis of tick paralysis, as possibly caused by R. evertsi evertsi is likely obscured by the pharmacological intricacies of tick saliva. Nevertheless, a transcriptome of this sort, sampled over time during feeding, contribute data of importance into salivary gland biology and blood feeding physiology.

Methods

Ethics approval

Animals used during this study were housed in the East Coast fever quarantine stable complex of the Agricultural Research Council-Onderstepoort Veterinary Research (ARC-OVR). The use of animals was approved by the ARC-OVR animal ethics committee (AEC 01.15), the animal ethics committee of the University of Pretoria (V023-16) and permission to do research in terms of Section 20 of the Animal Diseases Act, 1984 of South Africa (No: 12/11/1). All animal experiments were performed in accordance with relevant animal ethics guidelines and regulations and adhering to ARRIVE guidelines.

Tick feedings

The ARC-OVR R. evertsi evertsi colony was expanded from field collected engorged females from the farm Uitspanning, Amsterdam, Mpumalanga province, South Africa in 2013. This 7-year-old colony is maintained under laboratory conditions as described94. Samples for the current study were obtained in 2017. For this study, female ticks were fed to repletion in the presence of males on the back of South African meat merino sheep weighing 92 kg at a ratio of 1.7 ticks/kg bodyweight with defined protocols94.

Sample preparation for RNA isolation

Twenty male and females were removed, respectively, at intervals during feeding to recover material representing unfed (day 0 weighing < 5 mg), early feeding (day 2 weighing 6–15 mg for females and < 5 mg for males), mid feeding (day 4 weighing 16–23 mg for females and ~ 5 mg for males) and late feeding (day 6 weighing > 24 mg for females and ~ 7 mg for males) ticks. Salivary glands were dissected out within two hours of tick removal, cleaned from other internal tissues and placed into at least ten times its volume RNAlater (Qiagen, AMBION, Inc., Austin, Texas), and left overnight at 4 °C before storing at -70 °C. Each day’s samples were pooled by gender to render one male (40 glands) and one female (40 glands) sample per time point. Salivary glands were suspended in 600 µl RLT buffer (RNeasy Protect Mini Kit) per 20 mg tissue and total RNA extracted using the RNeasy Protect Mini Kit (QIAGEN Group). Residual genomic DNA was removed with DNase I digestion (QIAGEN Group). Total RNA quantification and assessment of the integrity of RNA was done using the Qubit fluorometer 2.0 (Life Technologies, Carlsbad, CA) and the Agilent Bioanalyzer 2100 (Agilent Technologies, Santa Clara, CA).

Library construction and next generation sequencing

Libraries were prepared for sequencing using 1–1.5 µg purified total RNA and the TruSeq RNA sample Preparation kit (Illumina, San Diego, CA) with modifications. This comprised isolation of poly-A mRNA, fragmentation (for 3 min), conversion to double stranded cDNA, adaptor ligation and PCR amplification for 12 cycles. The libraries were size selected using agarose gel electrophoresis to 450–1200 bp. Bands were excised, purified and sequenced at the Biotechnology Platform of the Agricultural Research Council (South Africa) using the Illumina MiSeq system (300 bp × 300 bp). Raw sequence reads were submitted to Genbank under Bioproject PRJNA670032 and the TSA database with numbers SAMN16482386, SAMN16482387, SAMN16482388, SAMN16482389, SAMN16482390, SAMN16482391, SAMN16482392 and SAMN16482393. The final ORFs can be found under the same Bioproject number.

Transcriptome assembly, extraction of ORFs and quality assessment

A multiple kmer, multi-algorithm, multi-dataset approach was used for the assembly. This entailed the use of three short read assemblers: Minia95 and Trinity96 along with CLC Genomics Workbench [CLC Genomics Workbench 7.0 and 12.0 (https://www.qiagenbioinformatics.com/)]. Trinity v2.4.0 was used with default parameter settings using a kmer size of 25. Minia v3.2.4 was used with kmer sizes in step sizes of 10 starting at 29 up to 99 (8 assemblies). CLC kmer sizes were used in step sizes of 5 starting at 15 up to 60 and an additional assembly using kmer 64 (11 assemblies). CLC assembly parameters were set up as follows: mismatch cost-2, insertion cost-3, deletion cost-3, length fraction-0.9, similarity-0.9, minimum contig length-240, kmer size-variable, bubble size-automatic. The number of assemblies produced in this manner totalled 66 for Trinity, 528 for Minia and 726 for CLC Genomics Workbench. ORFs were extracted using a Perl-script and chimeric and duplicate sequences removed by clustering at 90% identity using CD-HIT v4.6.413,97. The All-Single dataset was mapped against the clustered ORFs using CLC Genomics Workbench and ORFs with RPKM > 5 were selected. Further reduction was performed with BLASTX analysis against the ACARI database and hits with E-values below 0.004 were selected for further analysis. To identify additional duplicates, translated ORFs with the same BLASTP14 hits were pairwise compared and analysed against the ACARI database using BLASTP14 analysis to identify ORFs with intact domains with misassembled regions. ORFs were retained that showed intact domains and highest coverage in TPM value that could be unambiguously assigned to evolutionary conserved homologs. This enabled construction of a final transcriptome database with high confidence. The transcriptome quality was measured for accuracy, completeness, contiguity and chimerism using the Benchmarking Universal Single-Copy Orthologs (BUSCO v3) approach15.

Transcriptome annotation: identification of tick secretory and house-keeping proteins

Translated protein sequences were analysed against the ACARI protein database using BLASTP14 and hits with a cut off of E-value ≤ e−4 were retained. The ACARI (mites and ticks) protein database is an in-house curated database of protein sequences derived from Genbank or VectorBase28,98 and the EuKaryotic Orthologous Groups (KOG) dataset99. Where no protein sequences were available, nucleotide, expressed sequence tag (EST) or assembled contigs from the short read archive (SRA) were retrieved from Genbank or VectorBase and ORFs were extracted as described above and annotated as TPA (third party annotation). The ACARI database is curated based on Kyoto Encyclopedia of Genes and Genomes (KEGG)100 for the annotation for house-keeping pathways along with KEGG Automatic Annotation Server (KAAS) and GhostKOALA to assign KEGG orthology (KO) identifiers to the transcripts101. Annotations from Acari genome projects and PSI-BLAST analysis102 using tick secretory proteins from literature were used to identify secretory protein families. This first pass analysis assigned proteins to house-keeping or secretory classes based on sequence homology. To confirm this, ORFs were submitted to the SignalP 4.1 Server103, TMHMM server v.2.0104 and Phobius105. Orthologs to proteins with experimental determined functions were identified using BLASTP14 analysis followed by multiple alignment using MAFFT v7.311106 and phylogenetic analysis using IQ-TREE v1.6.12107. To identify reciprocal best hits, BLASTP14 analysis was preformed using pairwise analysis and a final cutoff value of e-6108. Results from the analyses were presented using InteractiVenn109.

Male and female expression patterns over time

The single reads of each time point (Day 0, Day 2, Day 4 and Day 6) for male and female salivary glands were mapped against the de novo assembled transcriptome using CLC Genomics Workbench (12.0), using the RNAseq module. This was obtained using default parameters of Mismatch cost = 2, Insertion cost = 3, Deletion cost = 0, Length fraction = 0.8, similarity fraction = 0.9, both strands specific and the maximum number of hits for a read = 10. Each day’s transcripts were grouped as being housekeeping or secretory and according to functional class. Expression values (TPM > 0.5) were used to determine sex dependant expression patterns over time.

Differential expression of housekeeping and secretory proteins

Quality filtered sequence reads for each time point were analysed against the de novo assembled transcriptome as reference using CLC Genomics Workbench (12.0). A principle component analysis as a function of time on the unclustered datasets determined the amount of variance in the transcriptomes of males and females during the feeding period using normalized log counts per million (CPM) values in CLC Genomics Workbench (12.0). This incorporated the use of the trimmed mean of M values (TMM) normalization method110 as used in EdgeR or DESeq2 packages. This method was also suggested for exploratory studies without biological replicates. In such a case, the algorithm shares data between genes with similar expression to estimate technical and biological variability.

To identify differentially expressed genes by male and female ticks across the feeding period a metadata table of male and female reads respectively was constructed to analyse the variance (ANOVA) across groups to test differential expressions due to day on CLC Genomics Workbench (12.0). A volcano plot was used as visual tool to demonstrate structure in the data and to display both the fold-change and statistical significance (the -log10 of the p value) while taking noise level into account when performing gene filtering. Differentially expressed transcripts (FDR value < 0.05) with a log2 fold change larger than 4 and smaller than − 4 were extracted.