De novo assembled salivary gland transcriptome and expression pattern analyses for Rhipicephalus evertsi evertsi Neuman, 1897 male and female ticks

Ticks secrete proteins in their saliva that change over the course of feeding to modulate the host inflammation, immune responses, haemostasis or may cause paralysis. RNA next generation sequencing technologies can reveal the complex dynamics of tick salivary glands as generated from various tick life stages and/or males and females. The current study represents 15,115 Illumina sequenced contigs of the salivary gland transcriptome from male and female Rhipicephalus evertsi evertsi ticks of early, mid and late feeding stages from 1320 separate assemblies using three short read assemblers. The housekeeping functional class contributed to the majority of the composition of the transcriptome (80%) but with lower expression (51%), while the secretory protein functional class represented only 14% of the transcriptome but 46% of the total coverage. Six percent had an unknown status contributing 3% of the overall expression in the salivary glands. Platelet aggregation inhibitors, blood clotting inhibitors and immune-modulators orthologous to the ancestral tick lineages were confirmed in the transcriptome and their differential expression during feeding in both genders observed. This transcriptome contributes data of importance to salivary gland biology and blood feeding physiology of non-model organisms.

www.nature.com/scientificreports/ evertsi transcriptome is comparable to these published transcriptomes. The total number of orthologs could be classified as 83% housekeeping and 12% secretory (classification below).
Annotation and predicted protein classes. The transcriptome was annotated with sequence similarity searches to proteins in at least one of the listed databases. Six percent (n = 869) of the ORFs could not be annotated with a known function but contributed 3% of the expression represented by TPM values with a dynamic transcript expression range from TPM 1 to 3321. This underlined the fact that many expressed proteins with potential physiological and biological functions, remains uncharacterized. Fourteen percent (n = 2040) could be annotated with a putative secretory protein function contributing 46% of the expression while the majority (80%, n = 12,206) represented a housekeeping function contributing 51% to the overall expression.
Expression profiling and transcript abundance by functional class in R. evertsi evertsi salivary glands. The R. evertsi evertsi transcriptome displayed a range of expression over time in terms of transcripts per million (TPM) from the lowest expressed (TPM = 1) to the most abundant (TPM = 13,234). A moderate positive Pearson's correlation was obtained over the whole transcriptome between expression and abundance (counts) of transcripts of each class (r(85) = 0.5083, p < 0.05) as well as between the most abundant transcripts and their expression (r(20) = 0.7096, p < 0.05). On the other hand, the highly expressed transcripts were not necessarily the most abundant (r(20) = 0.331, p < 0.05) with the twenty most abundant proteins contributing ~ 57% to the total expression based on TPM values. The majority of these were housekeeping proteins involved in metabolism, genetic information processing, cellular processes and organismal systems. Secretory proteins also highly expressed were the glycine-rich family (26%), lipocalins (5%), trypsin inhibitor-like cysteine rich (TIL) domain proteins (3%), immunoglobulin-binding proteins (2%), Kunitz-type inhibitors (BPTI) (2%), the 8.9 kDa protein family and metalloproteases, each contributing ~ 1% to the total expression in the transcriptome respectively.
Housekeeping functional class. The 12,206 transcripts assigned to the housekeeping class were divided into five main groups based on KEGG GhostKOALA database searches (Fig. 3). These groups relate to metabolism (n = 2120), environmental information processing (n = 1072), genetic information processing (n = 836), cellular processes (n = 961) and organismal systems (n = 1860). To infer functionality within the 5 groups, 9327 (76.4%) of the transcripts mapped to KEGG reference pathways. Key metabolic pathways included carbohydrate metabolism (12%), amino acid metabolism (9%), and lipid metabolism (7%). While information regarding tick metabolic pathways remains limited, carbohydrate, lipid and amino acid metabolism may play a role during pathogen infection 22 .
The environmental information processing group (n = 1072) included important housekeeping processes such as signal transduction (94%), membrane transport (2%), signalling molecules and interaction (4%). Important signal transduction pathways identified were the highly conserved phosphoinositide-3-kinase-protein kinase B activated/Akt pathway (PI3K-Akt signalling pathway), the mammalian target of rapamycin (mTOR) signalling pathway, which both control cell metabolism, growth, proliferation and survival 19 and the mitogen-activated protein kinase (MAPK) signalling pathway. These pathways are ubiquitously expressed and evolutionary conserved in eukaryotes 23 .
The genetic information processing group contained the genetic informative processes encapsulated in the Central Dogma represented by 15% transcription, 38% translation, 14% DNA replication and repair and 33% folding, sorting and degradation of proteins.
Secretory functional class. Protein families within the secretory class (n = 2040) were expressed at varying levels ranging from TPM 3.59-90813 or 0.002% to 57% of the total TPM with no correlation between the number of proteins in the family and what they contribute toward total transcript expression within the class ( Table 1). The largest contributor to the expression of the secreted proteins were the glycine-rich protein family (cement) representing 57% to the total transcript expression in the secretory protein class from only 177 transcripts, followed by the lipocalins contributing 11% from 235 transcripts. The remaining nine protein families had TPM values from 9525 to 1907 (6 to 1%) and consisted of a range of transcripts per family (8 to 306) (Fig. 4). These included TIL domain proteins, immunoglobulin-binding proteins (IGBPB), bovine pancreatic trypsin inhibitor (BPTI), ML domain lipid-recognition proteins, proteins belonging to the 8.9-kDa polypeptide family unique to hard ticks, reprolysin/metalloproteases, basic tail secretory proteins (BTSP), mucins and evasins. Those with the lower TPM values included proteins belonging to the 28-kDa metastriate specific protein family, Ixodegrin B, a protease inhibitor similar to the Hirudin-like superfamily, antigen 5 cysteine rich proteins (AG5), 24 kDa proteins, defensins, gluzincin, 5′ nucleotidases like apyrase (5NT) and secretory proteins with unknown functions (Table 1) (Fig. 4).
Orthologs in the Rhipicephalus evertsi evertsi transcriptome for proteins with known functions involved at the tick-host interface. To date functions for ~ 120 tick proteins involved at the tick-host interface have been experimentally validated 24 . Finding such orthologs within a transcriptome is an important quality measure, since it allows assignment of function based on homology, experimentally verified functions and provide confirmation that such functional proteins are present in the transcriptomes of evolutionary related organisms. A BLASTP 14 analysis of protein sequences with known functions followed by phylogenetic analysis allowed identification of 22 potential orthologs ( www.nature.com/scientificreports/ Gender dependent transcript expression patterns. To determine expression patterns over time for each sex, single reads for each time point were mapped so the MDS to estimate transcript expression per sampling point of unfed, early feeding, mid-feeding and late feeding (Fig. 5). The male and female transcriptomes shared 83.3% ORFs with 5.1% unique ORFs in the female transcriptome and 6% unique male ORFs. The class with unknown functions shared 4.8% ORFs between the sexes with 0.2% unique ORFs in the males and 0.6% ORFs in females. The most abundant housekeeping classes, namely translation (TRSL), folding, sorting, degradation (FSDE) and transcription (TRSC) in both male and female transcriptomes accounted for the majority of the overall expression in each (51% female, 47% male) (Fig. 5). The housekeeping proteins involved in translation, folding sorting, degradation and transcription followed the trend of the whole transcriptome at being the most abundant and highly expressed in both the male and female salivary gland transcriptomes although there was a 5% difference in expression in the male (TRSL TPM = 98,784) and the female's (TRSL TPM = 127,389).
The male dominant secretory protein, immunoglobulin-binding protein (IGBP), was highly expressed from only a few transcripts and accounted for 7% of the male secretory protein transcript expression. This was followed by the glycine-rich superfamily (cement), which was expressed 4% higher in the male transcriptome than in the female. The metalloproteases made up 2% of each gender's transcriptome while the Kunitz-type inhibitors (BPTI), lipocalins and basic tail secretory protein (BTSP) families each contributed to 1% in both male and www.nature.com/scientificreports/ female expressed proteins with major expressed lipocalins having predicted functional folds of serotonin and histamine binding activity using the Phyre 2 protein fold recognition server 25 . However, only 25% (18 of the 72 identified female lipocalins) had the canonical biogenic amine-binding motif that would support binding of histamine and serotonin 26 .
The initial high expression of cement by the females decreased between day 4 to 6 (~ 50% to less than 10%). The secretion of lipocalins, evasins, Kunitz-type inhibitors, TIL and BTSPs increased after day 0 and varied during the feeding period of day 2 to 4 (~ 10 to 30%) but lipocalins had a gradual decrease of expression as feeding progressed. At the day 4 to 6 feeding ixodegrin B, Ix8.9 kDa and a protein assigned to the hirudin class of anticoagulants, increased in expression value to be the most expressed proteins just before the females detached (< 10% and > 10%). In contrast, the expression of cement by the males remained relatively constant throughout the feeding period as did evasins, BPTIs and TIL with an increase of IGBPA toward day 6. Lipocalins were not as highly expressed in the males as in the females.    www.nature.com/scientificreports/ In both the male and the female transcriptome, the highest expressed housekeeping proteins (TRSL, FSDE, TRSC and TRCA) stayed constant with a gradual decrease in expression of TRSL and increase from day 2 for FSDE, TRSC and TRCA. To confirm the observation of an increase in expression during the feeding period and a down tapering toward the end, a simple illustration of the housekeeping, secretory and unknown classes followed suit (Fig. 6). Overall, certain protein families were predominantly expressed in either one of the sexes, especially in the case of the secretory proteins; the lipocalins, 28 kDa metastriate and ixodegrins appear to be more expressed in the female transcriptome.

Differential expression between male and female transcriptomes. Principle component analysis
(PCA) confirmed differential expression between male and female tick salivary gland transcripts and clustered feeding points based on similarities in their expression patterns (Fig. 7). There was a clear grouping of the unfed (day 0) males and females before changes in expression started. Males of day 2, 4 and 6 clustered together but separate from day 0, suggesting that differential expression occurs in males once feeding starts, but then stabilized. In contrast, expression patterns for the females of day 2 and 4 clustered together before a remarkable change in expression of the day 6 female salivary glands. The female differentially expressed proteins' ranged from TPM 31,945 for a secretory Ixodegrin B-like protein to TPM 0.1 for a translational housekeeping protein.
In the case of the males the highest differentially expressed protein was a ML-domain housekeeping protein The projection by the PCA analysis presented the transformation from a metabolically inactive to metabolically active tick salivary gland, observing the most variation between day 0 males and females and the rest of the male feeding stages (31.2%) with the second largest variation being between the sequential days of male transcriptome (25.2%) and the female's (13.1%) which had a dynamic range of variance in time points sampled (Fig. 7).
Both sexes had similar patterns in terms of overall expression of protein classes however, by applying the filters to the volcano plot data (FDR < 0.05, log2 fold, − 4 to 4 expression) 3258 transcripts from the female and 2086 from the male differentially expressed transcripts were identified. Interestingly, the largest fold changes in terms of the magnitude of the difference of expression values in the group were as predicted by the principle component analysis, between proteins of the unfed and day six transcriptomes especially the housekeeping proteins (Table 3). This was even more evident in the female transcriptome. Most of these major fold changes occur from day two up to day 6 of feeding for the females with 12% for the housekeeping proteins and 20% for the secretory proteins. Similar to the females, the male's largest fold changes of differentially expressed proteins were also between the unfed and day 6 males but had a lot less expression changes (< 10%) during feeding i.e. day 2 versus 4, day 2 versus 6 and day 4 versus 6 for both the housekeeping and secretory proteins (Table 3). This may be expected from the male transcriptome.
By removing the shared transcripts between the male and female differentially expressed transcripts, the males had 968 differentially expressed transcripts with 77% housekeeping, 16% secretory and 7% 'unknown' hits while the females had 2140 of which 4% were 'unknown' , 44% secretory and 52% housekeeping.
These differentially expressed proteins, in both the male and female transcriptomes, with the highest significance in relation to fold change over time, measured as log2 fold change, were of particular interest (Fig. 8). The visual trend in the differentially expressed proteins of the male transcriptome is one of downregulation as opposed to the female's upregulation dominated differentially expressed proteins. The majority of the differentially expressed transcripts were of the secretory class in both sexes. It would also support the observation of the high fold changes from day 2 to 6 for the females. Almost all secretory proteins are differentially expressed to a lesser degree in the males. This included the previously highly expressed cement, evasin, TIL proteins, IGBP and BPTIs. Gluzincin and the 24 kDa proteins displayed some upregulation. The females on the other hand, displayed a range of upregulated differentially expressed proteins with a downregulation of their evasin protein secretion. Those differentially expressed protein families with significant upregulation in the female salivary glands were the BPTIs, Ixodegrin, TIL, lipocalins, cement and mucins. The volcano plot represents the differentially expressed female and male transcripts over the feeding period as a fold change against absolute confidence. Differentially expressed transcripts were double filtered to have FDR value < 0.05 and a log2 fold change larger than 4 and smaller than − 4 (dots in blue). Those that are highly up-or downregulated are further to the right and left sides, while highly significant changes appear higher on the plot.

Discussion
The aim of the current study was the de novo assembly and annotation of a salivary gland transcriptome for R. evertsi evertsi representing various feeding stages that included unfed, early feeding and late feeding of males and females. Sequence data for R. evertsi evertsi have been oddly scarce since it has the widest geographical range of Rhipicephalus in Africa 24,27 , with only 153 nucleotide and 28 protein sequences available in GenBank before the start of the current study (confirmed on 13 June 2020). This study improves on this shortfall by contributing 15,115 annotated proteins. The assembled transcriptome quality in terms of read mapping-based assessment and reference sequence-based assessment, even after rigorous removal of duplicated sequences to render a minimal dataset, indicated it to be representative of the sequenced reads with a comparable level of accuracy and completeness. It is comparable to available salivary gland transcriptomes from other genetically related ticks. Most of the transcripts (95%) could be annotated or at least assigned to a functional class which left 6% with an unknown status contributing ~ 3% (n = 869) of the overall expression in the salivary glands. Functional classes represented those transcripts involved in housekeeping of the cell, secretory proteins with the presence of signal peptides, or of unknown function. Those with an unknown status showed homology to other unknown proteins in the database based on their BLAST E-values. Putative functions were attributed based on previously characterized proteins in publicly available databases. Experimental confirmation of such functions needs to be performed in future. The housekeeping functional class contributed to the majority of the composition of the transcriptome (80%) but with lower expression (51%), while the secretory protein functional class represented only 14% of the transcriptome but 46% of the total coverage. This was comparable to the overall description of the secretory category of other salivary gland transcriptomes from Amblyomma americanum, Ixodes ricinus, R. appendiculatus, R. microplus, R. pulchellus, R. sanguineus and R. zambeziensis (between 12 and 63%) 5,6,16,18-21 . Nonetheless, the transcriptomes displayed a dynamic range of expression with a few transcripts responsible for most of the expression in the salivary glands. The unknown class was not analysed further although it may contain housekeeping proteins. The housekeeping classes is close to the ~ 7000 shared core housekeeping genes previously predicted as part of the core eukaryotic set of orthologous genes 28 , suggesting that the R. evertsi evertsi transcriptomes represent a large portion of the expected housekeeping genes.
Housekeeping functional class. More than 15% of the housekeeping transcripts that could map to KEGG reference pathways were responsible for environmental information processing relating to membrane transport, signalling transduction, signalling molecules and interaction. These processes rely on proteins (receptors) and their chemical signals (ligands) to regulate a cell's internal microenvironment in response to external changes through the movement of molecules or fluids across membranes. For example, the active secretion of saliva from the salivary gland is controlled by dopamine (ligand) which activates two signal transduction pathways for salivary secretion: the cyclic adenosine monophosphate (cAMP)-dependant signal transduction pathway that leads to fluid secretion and the calcium-dependant signalling pathway that activates prostaglan- www.nature.com/scientificreports/ din E2 production which ultimately also leads to protein secretion in tick saliva 29 . Orthologs for the dopamine D1-like receptors were present in this transcriptome 30 . Furthermore, osmoregulation and salivation via the salivary glands is a continuous process controlled by a complex neural and endocrine system 31 that requires concentrating the nutrients from the blood meal and pumping approximately 75% of ingested water and ions back into the host. The major intrinsic protein (MIP) neuropeptide and respective G-protein coupled receptors forms part of osmoregulation 32 . MIP and its predicted G-protein coupled receptor were also identified in the current transcriptome. MIP and its receptors function in the modulation of membrane channels selectively transporting water and ions in and out, as well as between cells assisted by aquaporin. More recently, aquaporin's involvement was extended to inflammatory processes in humans as reviewed 33 . This function could be extended to the complex mechanisms regulating inflammation in the tick-host feeding site and possibly the complexity of molecules needed in tick saliva to regulate host aquaporin signalling pathways during inflammation. Additional research into aquaporin's regulatory mechanisms might be an alternative therapeutic target in the prevention or treatment of inflammation. The majority of transcripts involved in genetic information processing were associated with mRNA translation to proteins (38%). This is the protein synthesis function of the ribosomes in the cytoplasm or endoplasmic reticulum after gene coding DNA has been transcribed to mRNA in the cell nucleus. It is a continuous process evident by the differential expression during feeding and illustrative of how the tick is able to constantly adapt to a changing feeding site and host immune defences. These housekeeping proteins (41%) contribute to the successful continuous feeding by the tick and are supported by key metabolic pathways with the majority of the transcripts involved in amino acid metabolism, carbohydrate metabolism and lipid metabolism.
Secretory functional class. The secretory functional class represented 46% of the overall expression in the transcriptome with the most abundant secretory protein transcripts being the BPTI (15% of the transcripts of the secretory protein class contributing ~ 4% to the secretory class and 2% of overall expression), the lipocalins (11.5% of the transcripts of the secretory protein class contributing ~ 11% to the secretory class and 5% of overall expression) and metalloprotease/Reprolysin families (10.5% of the transcripts of the secretory protein class contributing ~ 2% to the secretory class and 1% of overall expression). This was to be expected since these are major protein families that expanded due to gene duplication in the evolution of blood-feeding tick lineages 2,3 . To put the secretory protein families into functional context a brief description of each is provided with regard to known functions.
Cement formation. The dominantly expressed glycine-rich protein family accounted for 26% of the total expression and 57% in the secretory class alone. Glycine-rich proteins are involved in formation of the cement cone 34 . The dominant expression profiles of glycine-rich proteins have been noted before 16,21,35,36 .
Immunoglobulin scavengers. The male dominant IGBPA transcripts accounted for 2% of the total expression and 5% of the secretory protein class alone. Immunoglobulin-binding proteins have orthologs in all ixodid ticks, but vary in size between species and time of expression during feeding 37,38 . IGBPAs are essential proteins with the purpose of the inactivation of host IgG, otherwise detrimental to tick survival, while aiding its secretion back into the host 39 . This is probably an adaption to their extended periods of feeding and being exposed to the onslaught of the host's immune system as compared to the shorter feeding strategies of soft ticks. Orthologs have been found for at least IGBPA and IGBPC in this R. evertsi evertsi transcriptome.
Cytokine storms and the evasins. Inflammation is driven by host chemokines, referred to as 'the cytokine storm' 40 and play an important role in recruiting immune cells to the feeding site, resulting in inflammation and possible tick rejection 41 . Evasins inhibit chemokine function and a large array of inhibitors has been found in ticks 42 . In R. evertsi evertsi a large number of the evasin A and evasin B class were found in the transcriptome. Orthologs for EVA-P991 and evasin B were detected.
Antimicrobial activities and TIL domain containing proteins. The TIL domain containing proteins (peptidase inhibitors) were described from mosquitoes, hard and soft ticks 43 including the transcriptomes of the nonhematophagous Antricola delacruzi 44 . These transcripts displayed a steady decrease in the female transcriptome. However, TIL protein expression increased in the male transcriptome toward the end of feeding after a relatively constant expression thereof during the preceding days. The function of these proteins suggest a role in tick innate immunity and an antimicrobial function in preventing the tick from being infected after feeding 45 .
Fibrinogenolytic and immunomodulatory metalloproteases. Even though the functionality of many of the metalloproteases are unknown, those isolated and characterized from Ixodes scapularis targeted fibrin and fibrinogen, thereby preventing blood clotting or dissolving formed blood clots 46 , affect inflammation and wound healing by degrading integrin 47 . An ortholog to the fibrinogenolytic metalloprotease was found in the R. evertsi evertsi transcriptome. Other authors reported this protein family to target the innate immune response, reduce pain and cause vasodilation through the degradation of bradykinin 48 . An ortholog for angiotensin-converting enzyme has been found that may target bradykinin 49,50 .
Anti-inflammatory lipocalins. Proteins involved in the regulation of inflammation include proteins that minimize itching by targeting biogenic amines such as the histamine-binding lipocalins 48 www.nature.com/scientificreports/ kotriene-C4, -B4, histamine and serotonin, to the inhibition of blood-clotting through the prevention of platelet aggregation. Orthologs has been found for histamine-binding protein 1 specific to females from R. appendiculatus. These orthologs were also specific to females in the current study. However, given the large number of lipocalins found in the salivary glands other functions may be expected.

Complement inhibitors.
Recently, complement inhibitors have been found for R. appendiculatus that specifically targets complement C5 to control complement-mediated inflammation 52,53 . Complement inhibitor (CirPT1) presents the knottin fold (inhibitor cysteine knot (aka ICK or Knottin)), while R. appendiculatus complement inhibitor (RaCI) presents a unique fold that resembles the disulphide-rich structure of small toxins from snake venom 53 . Orthologs for CirTP1 were present but not RaCI.
Blood clotting inhibitors. The BPTI family are slow tight binding Kunitz proteins preventing blood clotting by inhibiting thrombin. Known thrombin inhibitors are Amblin, Variegin, Boophilin, Americanin, Savignin and Ornithodorin [54][55][56][57][58][59] . Orthologs for Amblin, Boophilin and Rhipilin-1 were present in the R. evertsi evertsi salivary transcriptome. Other proteins that may also target thrombin include serpins such as Aas19 and RmS-15 [60][61][62] , for which orthologs were found in the transcriptome. Longistatin act as plasminogen activator and enolase as plasminogen receptor, leading to formation of plasmin (fibrinolytic enzyme) that result in feedback inhibition of thrombin via a feedback loop of the extrinsic or tissue factor pathway 63,64 . Orthologs for both proteins were present. This would indicate that targeting of the blood-clotting cascade is an important part of the feeding strategy for this tick species.
The basic tail secretory proteins (BTSP). The basic tail secretory proteins are lysine-rich at the C-terminus and were well represented in other transcriptomes 65,66 . One protein was shown to exhibit anticoagulant function for the inhibition of factor Xa 67 . The initial high expression of BTSP and subsequent decrease is similar to what was found for I. ricinus 68 . The involvement of a protein similar to hirudin with potent thrombin inhibiting activity and ixodegrin B, which is only expressed in tick salivary glands in the late phase of feeding is indicative of a greater prevention of platelet aggregation which would suit the purpose during the engorgement phase of feeding from the blood pool. However, they are related to lectins which may also be indicative of a function in the tick innate immunity 68 .
Paralysis toxins. BLASTP 14 analysis of the 18 available holocylotoxin genes in Genbank 69 (the paralysis-causing toxins from Ixodes holocyclus), found no homologs. A PSI-BLAST analysis to identify its novel inhibitory cysteine knot (ICK) toxin fold 70 , as performed previously for holocyclotoxin 12 , retrieved all family members but did not detect any homologs. It was suggested that the ICK fold is related to the 5.3 kDa family 66 . No members of the 5.3 kDa family has been detected in the R. evertsi evertsi transcriptome. However, the small size and high genetic divergence of the holocyclotoxins may prevent identification of toxin orthologs at present. Even so, the availability of a salivary gland transcriptome may in future enable identification of a purified toxin using proteomic methods. Differential expression. The differentially expressed secretory proteins, especially the multi-gene families, increased to 16% expression (compared to the initial overall 10%) and the housekeeping differentially expressed proteins decreased to 80% compared to the initial 87% overall expression. Most female differentially expressed transcripts were upregulated as opposed to those from the males, which were downregulated if assuming that they were upregulated during the initial stages of feeding. This was similar to what has been reported for R. pulchellus and R. zambeziensis 19,21 , but contradictory to R. appendiculatus 16 . Feeding processes between day 2 and 6 (day 3 to 4) seem to be characterized by an increase in differentially expressed transcripts as compared to the earlier and later feeding stages and was similar to expression patterns for nymphs of I. scapularis 71 . This period coincides with the start of the rapid engorgement phase which overlaps with the proliferative and remodelling phase of wound healing from day 3 after attachment 1 . The process of angiogenesis (the formation of new blood vessels and granulation tissue) also occurs during the proliferative phase of wound healing. The saliva of ticks therefore has potent cell proliferation and angiogenesis inhibitors, such as calreticulin 72 and troponin-I 73,74 to prevent wound healing and contraction. Calreticulin transcripts were present within the R. evertsi evertsi transcriptome as well as transcripts with the novel tropomodulin domain. Tropomodulin is a tropomyosin regulatory protein also identified in R. sanguineus, Amblyomma triste, Amblyomma sculptum, Amblyomma maculatum, Hyalomma excavatum, I. scapularis and R. pulchellus 19,35,[75][76][77][78][79] .
The differential expression observed during progression of feeding time resonate with the concept of "sialome switching" where ticks changes their salivary gland protein expression patterns to evade the hosts immune system or as response to specific host species and environmental cues 5,7,80 . As more differential expression studies are performed this is becoming more apparent 18,20,81 . Differential expression as means to increase local concentrations at the feeding site and in response to the changing feeding site during the prolonged feeding of ixodids has also been considered 24 . Male and female expression patterns. Because tick attachment and feeding is not a synchronized process, graphical illustration and analysis of transcripts by means of a principle component analysis and volcano plot allowed visualization of temporal changes in transcript expression over time in both male and female transcriptomes. Gender biased expression in response to blood feeding was confirmed with differential expression analysis confirming nearly half (~ 44%) of the female differentially expressed transcripts belonging to the secre- www.nature.com/scientificreports/ tory class, similar to other reports 5,6 . For both sexes, the glycine-rich cement expression was the most dominantly expressed per day. This group of proteins is exclusive to the ixodid lineage, with the exception of the Ixodes genus, and is important for tick attachment during feeding 1,48,82 . The current study's observation of the increase in spatiotemporal transcript expression for female tick salivary gland glycine-rich transcripts up to day 4 of feeding suggests a feeding induced function for maintaining the integrity of the tick-feeding site as suggested before 36 . The function of the glycine-rich transcript expression as observed in the current and other studies with regard to the male expression patterns that remain relatively constant during the feeding process 16,19,83 , remains to be tested since this presumably does not function in cement cone maintenance. The time point expression profiles of the males and females, as obtained by pooling the salivary glands of males and females of each time point together and mapping each time point's reads to the assembled transcriptome, revealed gender dominated secretory proteins as was found in other transcriptomes 5,6,16,21,[84][85][86][87] . Focusing on transcriptional response during tick feeding and large expression differences, the male dominated IGBPA appear to drastically increase after day 2 of feeding, to a lesser degree so for trypsin inhibitor-like (TIL domain) transcripts, while evasin expression remained relatively constant. The confirmation of male dominated expression of IGBP in the salivary glands conformed to previous studies on ixodid tick expression profiles of these proteins 37,38,88 , suggesting males have a biological support role during feeding of the female ticks. With early female feeding (day 0 to 4) bovine pancreatic trypsin inhibitor (BPTI), evasin, lipocalins and BTSP dominating salivary gland expression. From day 4 to the end of feeding (day 6) which coincides with the rapid engorgement phase, ixodegrin B, Ix 8.9 kDa and a protein similar to hirudin started dominating expression along with lipocalins. They are also the families that are differentially expressed as feeding progress.
Previous studies on the proteome of the spermatophore revealed ML-domain containing proteins, TIL domain proteins and immune proteins hypothesised to be important in sperm motility and mating 89 . The observation of the increase in expression patterns of these protein families from day 4 to day 6 of feeding males might associate these proteins with the concept of male seminal fluid if mating occurred with the rapid engorgement phase of the female (from day 4 onward). These families have also been implicated in tick innate immunity, protease inhibitors, antimicrobial peptides and venom toxins 1,90 . Other studies also noted the expression of serine proteases, cystatins, immune proteins and metalloproteases as potential seminal fluid proteins 19,81,91 . However, these proteins were not annotated as 'seminal proteins' hindering possible orthologous comparisons between species. In the current transcriptome using a NCBI conserved protein domain search, proteins relating to male reproduction were identified. These included transcripts coding for a zonadhesin-like protein, which is a sperm-specific membrane protein associated with cell adhesion consisting of several TIL domains and an uncharacterised spermatogenesis-associated protein 20 from the conserved YyaL, SSP411 protein family, containing thoiredoxin and six-hairpin glycosidase-like domains. In humans and rats it is expressed in the testis in an age-dependent manner with SSP411 mRNA increasing during spermatogenesis 92 . A vesicle-associated membrane protein from the MSP (Major sperm protein) domain important for sperm motility also formed part of the male expressed transcriptome 93 . Notably, these proteins were present in the male transcriptome but not differentially expressed.

Conclusions
The advances in next generation sequencing form the baseline towards functional genomic research especially in the case of a non-model organism, like R. evertsi evertsi, where genetic information is limited with no reference genome or proteome to annotate putative identified genes with little or no homology to other model-organisms. The current study represents the first de novo assembled transcriptome of the salivary glands from R. evertsi evertsi with time-dependant gender-specific transcript expression. The transcriptome was constructed using Illumina sequenced reads from unfed to representative feeding time phases of both male and female salivary glands. Unique salivary gland expression patterns in secretory protein families were observed for both sexes, the majority with unknown function, highlighting the shortfalls in tick feeding biology. This transcriptome will contribute to the increasing number of known tick transcriptomes that enabled identification of tick proteins, improving our basic understanding of tick biology and salivary gland function evident by the orthologous transcripts and those differentially expressed during male and female feeding. The identification of platelet aggregation inhibitors, blood clotting inhibitors and immune-modulators orthologous to the ancestral tick lineages confirm a common tick ancestor with similar functions than that of R. evertsi evertsi adding to our insight to their evolutionary salivary gland protein biology. Not having biological replicates might influence statistical variation when doing analysis across groups. However, the technology seems robust enough to be able to deduce expression trends in the R. evertsi evertsi transcriptome that is comparable to existing transcriptomes. Including biological replicates will offer stronger support for further statistical data analysis. It should also be kept in mind that trying to elucidate the molecular basis of tick paralysis, as possibly caused by R. evertsi evertsi is likely obscured by the pharmacological intricacies of tick saliva. Nevertheless, a transcriptome of this sort, sampled over time during feeding, contribute data of importance into salivary gland biology and blood feeding physiology. www.nature.com/scientificreports/ Tick feedings. The ARC-OVR R. evertsi evertsi colony was expanded from field collected engorged females from the farm Uitspanning, Amsterdam, Mpumalanga province, South Africa in 2013. This 7-year-old colony is maintained under laboratory conditions as described 94 . Samples for the current study were obtained in 2017. For this study, female ticks were fed to repletion in the presence of males on the back of South African meat merino sheep weighing 92 kg at a ratio of 1.7 ticks/kg bodyweight with defined protocols 94 .

Methods
Sample preparation for RNA isolation. Twenty male and females were removed, respectively, at intervals during feeding to recover material representing unfed (day 0 weighing < 5 mg), early feeding (day 2 weighing 6-15 mg for females and < 5 mg for males), mid feeding (day 4 weighing 16-23 mg for females and ~ 5 mg for males) and late feeding (day 6 weighing > 24 mg for females and ~ 7 mg for males) ticks. Salivary glands were dissected out within two hours of tick removal, cleaned from other internal tissues and placed into at least ten times its volume RNAlater (Qiagen, AMBION, Inc., Austin, Texas), and left overnight at 4 °C before storing at -70 °C. Each day's samples were pooled by gender to render one male (40 glands) and one female (40 glands ). CLC kmer sizes were used in step sizes of 5 starting at 15 up to 60 and an additional assembly using kmer 64 (11 assemblies). CLC assembly parameters were set up as follows: mismatch cost-2, insertion cost-3, deletion cost-3, length fraction-0.9, similarity-0.9, minimum contig length-240, kmer size-variable, bubble size-automatic. The number of assemblies produced in this manner totalled 66 for Trinity, 528 for Minia and 726 for CLC Genomics Workbench. ORFs were extracted using a Perl-script and chimeric and duplicate sequences removed by clustering at 90% identity using CD-HIT v4.6.4 13,97 . The All-Single dataset was mapped against the clustered ORFs using CLC Genomics Workbench and ORFs with RPKM > 5 were selected. Further reduction was performed with BLASTX analysis against the ACARI database and hits with E-values below 0.004 were selected for further analysis. To identify additional duplicates, translated ORFs with the same BLASTP 14 hits were pairwise compared and analysed against the ACARI database using BLASTP 14 analysis to identify ORFs with intact domains with misassembled regions. ORFs were retained that showed intact domains and highest coverage in TPM value that could be unambiguously assigned to evolutionary conserved homologs. This enabled construction of a final transcriptome database with high confidence. The transcriptome quality was measured for accuracy, completeness, contiguity and chimerism using the Benchmarking Universal Single-Copy Orthologs (BUSCO v3) approach 15 .
Transcriptome annotation: identification of tick secretory and house-keeping proteins. Translated protein sequences were analysed against the ACARI protein database using BLASTP 14 and hits with a cut off of E-value ≤ e−4 were retained. The ACARI (mites and ticks) protein database is an in-house curated database of protein sequences derived from Genbank or VectorBase 28,98 and the EuKaryotic Orthologous Groups (KOG) dataset 99 . Where no protein sequences were available, nucleotide, expressed sequence tag (EST) or assembled contigs from the short read archive (SRA) were retrieved from Genbank or VectorBase and ORFs were extracted as described above and annotated as TPA (third party annotation). The ACARI database is curated based on Kyoto Encyclopedia of Genes and Genomes (KEGG) 100 for the annotation for house-keeping pathways along with KEGG Automatic Annotation Server (KAAS) and GhostKOALA to assign KEGG orthology (KO) identifiers to the transcripts 101 . Annotations from Acari genome projects and PSI-BLAST analysis 102 using tick secretory proteins from literature were used to identify secretory protein families. This first pass analysis assigned proteins to house-keeping or secretory classes based on sequence homology. To confirm this, ORFs were submitted to the SignalP 4.1 Server 103 , TMHMM server v.2.0 104 and Phobius 105 . Orthologs to proteins with experimental determined functions were identified using BLASTP 14 analysis followed by multiple alignment using MAFFT v7.311 106 and phylogenetic analysis using IQ-TREE v1.6.12 107 . To identify reciprocal best hits, BLASTP 14 analysis was preformed using pairwise analysis and a final cutoff value of e-6 108 . Results from the analyses were presented using InteractiVenn 109 . www.nature.com/scientificreports/ Male and female expression patterns over time. The single reads of each time point (Day 0, Day 2, Day 4 and Day 6) for male and female salivary glands were mapped against the de novo assembled transcriptome using CLC Genomics Workbench (12.0), using the RNAseq module. This was obtained using default parameters of Mismatch cost = 2, Insertion cost = 3, Deletion cost = 0, Length fraction = 0.8, similarity fraction = 0.9, both strands specific and the maximum number of hits for a read = 10. Each day's transcripts were grouped as being housekeeping or secretory and according to functional class. Expression values (TPM > 0.5) were used to determine sex dependant expression patterns over time.
Differential expression of housekeeping and secretory proteins. Quality filtered sequence reads for each time point were analysed against the de novo assembled transcriptome as reference using CLC Genomics Workbench (12.0). A principle component analysis as a function of time on the unclustered datasets determined the amount of variance in the transcriptomes of males and females during the feeding period using normalized log counts per million (CPM) values in CLC Genomics Workbench (12.0). This incorporated the use of the trimmed mean of M values (TMM) normalization method 110 as used in EdgeR or DESeq2 packages. This method was also suggested for exploratory studies without biological replicates. In such a case, the algorithm shares data between genes with similar expression to estimate technical and biological variability.
To identify differentially expressed genes by male and female ticks across the feeding period a metadata table of male and female reads respectively was constructed to analyse the variance (ANOVA) across groups to test differential expressions due to day on CLC Genomics Workbench (12.0). A volcano plot was used as visual tool to demonstrate structure in the data and to display both the fold-change and statistical significance (the -log10 of the p value) while taking noise level into account when performing gene filtering. Differentially expressed transcripts (FDR value < 0.05) with a log2 fold change larger than 4 and smaller than − 4 were extracted. www.nature.com/scientificreports/ Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creat iveco mmons .org/licen ses/by/4.0/.