RNA-sequencing of the Nyssomyia neivai sialome: a sand fly-vector from a Brazilian endemic area for tegumentary leishmaniasis and pemphigus foliaceus

Leishmaniasis encompasses a spectrum of diseases caused by a protozoan belonging to the genus Leishmania. The parasite is transmitted by the bite of sand flies, which inoculate the promastigote forms into the host’s skin while acquiring a blood meal. Nyssomyia neivai is one of the main vectors of tegumentary leishmaniasis (TL) in Brazil. Southeastern Brazil is an endemic region for TL but also overlaps with an endemic focus for pemphigus foliaceus (PF), also known as Fogo Selvagem. Salivary proteins of sand flies, specifically maxadilan and LJM11, have been related to pemphigus etiopathogenesis in the New World, being proposed as an environmental trigger for autoimmunity. We present a comprehensive description of the salivary transcriptome of the N. neivai, using deep sequencing achieved by the Illumina protocol. In addition, we highlight the abundances of several N. neivai salivary proteins and use phylogenetic analysis to compare with Old- and New-World sand fly salivary proteins. The collection of protein sequences associated with the salivary glands of N. neivai can be useful for monitoring vector control strategies as biomarkers of N. neivai, as well as driving vector-vaccine design for leishmaniasis. Additionally, this catalog will serve as reference to screen for possible antigenic peptide candidates triggering anti-Desmoglein-1 autoantibodies.

Contigs containing an open reading frame and any similarity to sequences in the chosen databases were selected for further analysis. Reads for each library were mapped on the deducted CDs using blastn 38 with a word size of 25, 1 gap allowed, and 95% identity or better required. Up to five matches were allowed if and only if the scores were the same as the largest score. Mapping of the reads was also included in the Excel spreadsheet. Values of the Reads Per Kilobase of transcript, per Million mapped reads (RPKM) 39 for each coding sequence were also mapped to a spreadsheet. Automated annotation of proteins was based on a vocabulary of nearly 350 words found in matches to various databases: Swiss-Prot, Gene Ontology, KOG, Pfam, SMART, Refseq-invertebrates, and the GenBank Diptera subset. Raw reads were deposited on the Sequence Read Archive (SRA) of the National Center for Biotechnology Information (NCBI) under BioProject ID PRJNA359206 and read accession SRR5134059. This Transcriptome Shotgun Assembly project has been deposited at DDBJ/EMBL/GenBank under the accession GFDF00000000.
Phylogenetic analysis. For multiple sequence alignment and phylogenetic analysis, abundant salivary proteins from N. neivai, had their predicted signal peptide signal (SignalP-5.0 server 40 ) removed, and resulting protein sequence entered into a Basic Local Alignment Search tool (BLAST 38 ) against NR and TSA databases. We selected the five most similar homolog sequences (based on the e-value) for each sand fly species. The cutoff to exclude a homologue was an e-value above 1 -10 , except for the N. neivai yellow family of proteins where a homologue from Drosophila was used to root the tree. Multiple sequence alignment and identity/similarity matrix were constructed on MacVector v15.5.3 with MUSCLE 41 using PAM 200 profile. We determined the best method for amino acid substitution using the "Find best protein Models" feature of MEGA7 42 . A score was given to each of 56 amino acid substitution models including the mixing Gamma and invariant sites likelihood. The option with the lowest Bayesian information criterion score was selected to build the tree. Through this feature, it was determined that the best amino acid substitution model for phylogeny as follows: WAG for the ML domain and Maxadilan trees. The model WAG with discrete Gamma distribution was used to model evolutionary rate differences among sites (5 categories (+ G, parameter = 1.6114)) for the Yellow proteins tree. For the SP15 family of proteins, the model WAG with a discrete Gamma distribution was used to model evolutionary rate differences among sites (5 categories (+ G, parameter = 4.0108)). The rate variation model allowed for some sites to be evolutionarily invariable ([+ I], 2.61% sites). For the C-type lectin, the best model was LG + F. A discrete Gamma distribution was used to model evolutionary rate differences among sites (5 categories (+ G, parameter = 3.6037)). For Gaps/Missing data treatment, a partial deletion option was utilized. Finally, the reliability of the trees was tested, by bootstrap method (N = 1000).

Results and discussion
cDNA library of Nyssomyia neivai salivary gland. cDNA library was constructed from salivary glands of N. neivai females dissected up to 3 days after emergence. From this cDNA library 1,302,396 high quality reads were assembled in 1200 contigs (Table 1). Contigs were classified in five categories namely: secreted, housekeeping, transposable elements, viral, and unknown. Remarkably, the secreted proteins category comprised 92.4% of the number of reads, dispersed in 41.2% of the contigs. Most salivary transcriptomes from Phlebotomus and Lutzomyia genera were based in low output cDNA library sequencing; nevertheless, a high abundance of transcripts encoding secreted proteins were also reported [17][18][19][20]23,27,43,44 . Our data further validates the specialization of the salivary gland machinery and the specificity of the material obtained with the sand fly dissection. The housekeeping category had 33.3% of the clusters and only 3.2% of the total sequences. The category of ''unknowns" comprised 24.6% of the clusters and 4.3% of the sequences. Finally, Viral products and transposable elements included less than 1% of the families (0.3% and 0.8% respectively) and less than 0.1% of total sequences. Recently,  The evolutionary history was inferred by using the Maximum Likelihood method. The tree with the highest log likelihood is shown. The tree is drawn to scale, with branch lengths measured in number of substitutions per site. A discrete Gamma distribution was used to model evolutionary rate differences among sites [ Secreted proteins sequences. The putative secreted salivary proteins of N. neivai were classified into 35 main protein families ( Table 2). The most abundant transcripts were within the SP13-15 protein family (35.09%), followed by C-type lectins (15.9%), Maxadilan-like (15.6%), ML domain salivary proteins (5.8%), and the Yellow protein family (5.1%). Previously, the novel families 8-kDa, 6-kDa and 5-kDa that were only described in the N. intermedia sialome 27 , are now also present in N. neivai sialotranscriptome and grouped as Toxin-like peptides.     (Table 2). This salivary family is present among all species of sand flies studied so far. In N. neivai, we have categorized eight full-length members of this family (Table 3). For further analysis we considered the four most abundant members JAV08233.1, JAV08232.1, JAV08238.1 and JAV08231.1 with 91.37% of the SP13-15 family abundance as highlighted in Table 3.
The SP15 family was described for the first time in the sialome of Lu. longipalpis as the SL1 family 45 . This family was then also reported in the Old World sand fly Phlebotomus perniciosus (Newstead 1911), and later named as SP15 family due to 15-kDa salivary protein from Phlebotomus papatasi (Scopoli 1786) (PpSP15: AF335487) 19 . Thus far, SP15-like proteins have only been reported in sand flies and not in any other Dipteran; It has been suggested that SP15-like proteins were derived from an ancestral odorant-binding protein and were closely related to mosquitoes short D7 proteins 16,19 . Alvarenga et al. 46 , demonstrated that SP15 from Phlebotomus duboscqi (Neveu-Lemaire 1906) inhibit anionic surface-mediated reactions suggesting a role in anticoagulation, inhibiting the activation of FXII and FXI, and anti-inflammatory processes.
PpSP15-like proteins were reported as promising anti-Leishmania vaccine candidates. Immunization of mice with P. papatasi SP15 protein conferred partial protection against Leishmania (Leishmania) major (Yakimoff and Schokhor 1914) infection 43 ; furthermore, a DNA vaccine containing the PpSP15 cDNA provided the same protection 43 . ParSP03 (AAX56359), a PpSP15-like protein from Phlebotomus ariasi (Tonnoir 1921), elicited similar delayed type hypersensitivity and humoral immune responses upon DNA vaccination 20 . Recently, BALB/c mice immunized to PsSP19 (HM56964), a protein member of the SP15 family from Phlebotomus sergenti (Parrot 1917), acted as an adjuvant to accelerate the cell-mediated immune response to co-administered Leishmania antigens, providing protection against Leishmania (Leishmania) tropica (Wright 1903) infection 47 .
Phylogenetic analysis comparing selected sequences from sand fly salivary transcriptomes to the N. neivai SP15 family clustered these 4 abundant members closely (Fig. 1A, red asterisks) in a New World sand fly clade next to members of N. intermedia, B. olmeca, Lu. ayacuchensis and to the only SP-15 family protein described in Lu. longipalpis so far, SL1 28 (AAD32197.1) (Fig. 1A). The remaining clades represent Old world VL and TL vectors and the three members of the S. schwetzi SP-15 family. We then aligned N. neivai SP15 proteins to SP15 salivary proteins from other sand fly vectors present in Brazil, namely N. intermedia and Lu. longipalpis (Fig. 1B). N. neivai SP15 family proteins shared a relatively high percentage of identity (41.7 to 98.3%) to N. intermedia and at a lesser extent to Lu. longipalpis SL1 (43.1 to 57.1%) ( Table 4).
Interestingly, all the members identified, but JAV08113.1 (Fig. 2) share the RGD domain in the carboxy region that are common in members of the disintegrin family 48,49 . These RGD containing proteins had been previously observed in other New World sand flies such as Lu. ayacuchensis (LuayaRGD; BAM69127.1) and Lu. longipalpis (LuloRGD; AAD32196). The function and relevance of this family during blood feeding remains to be tested.
C-type lectin family. We have categorized twelve novel full-length transcripts as C-type lectins in the N. neivai sialome, as shown in Table 6. The C-type lectins are the third most abundant salivary family in N. neivai. Of those twelve, six members (JAV08563.1, JAV08583.1, JAV08561.1, JAV08565.1, JAV08584.1, JAV08562.1) corresponded to 89.5% of this family abundance and were further considered for in depth analysis (Table 6). In vertebrates, protein-carbohydrate interactions serve multiple functions in the immune system. C-type lectin family members are components of the innate immune response and work via pathogen neutralization through the activation of the complement pathway and adaptive immune response 50 . The C-type lectin putative domain may function as a Ca 2+ -dependent carbohydrate-binding pocket involved in extracellular matrix organization, pathogen recognition, and cell-to-cell interactions 50 . Homologous salivary proteins with molecular weight of 16.2-16.5 kDa have been identified in New World sand flies. Recently, homologues were also found in Old World sand flies by next generation sequencing of salivary glands from P. kandelakii 24 , as a partial protein, and Sergentomyia schwetzi 25 . The most abundant members of the N. neivai C-type lectins family seem to have a close relationship to N. intermedia homologues (Fig. 3A). Interestingly, JAV08583.1N. neivai protein segregated from the other C-type lectin N. neivai members in a subtree that also encompass members from Lu. ayacuchensis and Lu. longipalpis. The protein sequence alignment comparing Brazilian sand flies depicts a scenario of fast evolution of this family (Fig. 3B), indicated by the large ranges of amino acid identity scores (from 96.4 to 26.7%) across species in pairwise comparisons (Table 7). This may be associated with multiple events of gene duplication and high immune pressure from hosts. The exact role of these proteins in sand flies remains elusive.
Maxadilan-like family. Maxadilan (AAA29288.1) is a 7-kDa peptide present in the salivary gland of the sand fly Lu. longipalpis. Maxadilan was the first molecule to be identified in sand fly saliva 51 , and it is recognized for its powerful vasodilator effect. N. neivai maxadilan-like family corresponds of 11 full length abundantly expressed proteins (Table 8)  Comparative analyses of abundant transcripts from N. neivai maxadilan-like family were able to identify three homologues in N. intermedia (Fig. 4A). Phylogenetic topology shows that a main clade clustered the most abundant N. neivai members (JAV08475.1, JAV08473.1, JAV08474.1) with Linb-9 (AFP99245.1), while the other N. neivai members clustered with Linb-25 from N. intermedia. Maxadilan has its own branch and, JAV08462.1 from N. neivai was the closest relative to Maxadilan with 37.5% identity and 56.2% similarity suggesting that N. neivai JAV08462.1 (Fig. 4B) may have preserved its pharmacological properties 52 . N. neivai JAV08462.1 represents the sixth most abundant member with 6.44% of the maxadilan-like family abundance. Linb-147 (JK846521), a partial sequence from N. intermedia, showed a similar match to maxadilan, provided for only 34% identity and 70% similarity over a stretch of 50 amino acids. This sequence seems to be scarcely present in N. intermedia sialome, with only one transcript identified in its cDNA library, as compared to 30 transcripts of maxadilan present in Lu. longipalpis sialome 27,28 .
Inoculation of maxadilan in experimental animals exacerbates Leishmania infection to the same degree as the whole salivary gland homogenate 55 . This peptide can drive a Th1 response to Th2, up-regulate IL-10 and TGF-β production, and suppress IL-12p40, TNF-α, and NO production 56,57 .
In animal models, mice vaccinated with maxadilan become markedly protected against Leishmania infection, producing not only anti-maxadilan antibodies, but also immune CD4 + T cells specific to maxadilan generating IFN-γ and inducing NO production 55 . Notoriously, immunization against maxadilan also inhibits blood meal acquisition by sand flies, a promising target to block the vector reproductive process 58 . Nonetheless, maxadilan The evolutionary history was inferred by using the Maximum Likelihood method. The tree with the highest log likelihood is shown. A discrete Gamma distribution was used to model evolutionary rate differences among sites [5 categories (+ G, parameter = 3.6037)]. The tree is drawn to scale, with branch lengths measured in number of substitutions per site. All positions with less than 95% site coverage were eliminated. Evolutionary analyses were conducted in MEGA7. (B) Multiple alignments of C-type lectins from Brazilian sand flies using Muscle. Black shading represents identical amino acids, light gray shading represents similar amino acids. ML domain peptide family. We have categorized 10 full length abundant proteins as being part of the ML domain family that mapped to 5.8% of the N. neivai transcriptome (Table 9). We focused on the six most abundant members (JAV08576.1, JAV08588.1, JAV08582.1, JAV08586.1, JAV08585.1, JAV08572.1) for a comparative analysis representing 88.01% of this family representativity ( Table 9). The MD-2-related lipid-recognition (ML)      www.nature.com/scientificreports/ domain is implicated in lipid-mediated membrane binding mechanisms with a role in the execution and regulation of many cellular processes, including cell signaling and membrane trafficking 62 . The ML domain from the SMART database hints that this proteins may be involved in innate immunity or as an antagonist of lipid mediators of hemostasis and inflammation 63 . This family is relatively common in tick sialomes 63 but it had only been described in N. intermedia and B. olmeca sialomes so far 27,29 . N. neivai phylogenetic analysis indicates the presence of 3 subfamilies with the ML domain, with several likely events of gene duplication occurring in N. neivai (Fig. 5A). ML domain salivary proteins present in sand flies seems to be a very divergent family with few conserved amino acids across species (Fig. 5B); however, when comparing the clustered molecules within each of the clades we start to appreciate a higher degree of conservation (Fig. 5C), for example comparing the N. intermedia ML family sequence AFP99241.1 and B. olmeca (ANW11447) that share the same subtree with N. neivai JAV08582.1 in Fig. 5A, we observe 97.8% and 64.9% identity (Fig. 5C), respectively, likely hinting at possibly three independent proteins families within the ML domain.

S K I E F D K A L I Q L L P E T D A C G T ----F I K --C P L K P G T H T I K V P V T I R D K V EWG E F V Y V G L H I N G T S N D ---R L V C AY V G F YA N R -V I P L P H N A C K G ----F Y K E K C P L K A G T H E I T L P L L I K D -V K K G E R L H V G V G I G D Y G T N Q --L F A C AY F E L N A K E T I T I AA G D A C N L ----L T K S R C L I Q P G L H E V K L P L R V K N -V K K G E K L T F S I T I R S A K K E ---P LV C A AV E L T A K T K F S L P E D V C K N Q A L Y K L K E P C P L K P G S Y K I T F P L K I P N -I K L K E WV Y I G L G V F D D K A D T E Y P F V C T Y V E I F T R A I I A L P F G D A C N S ----V V K T K C P L K P G A H K I K L P L R V K D -V K R G E K L T V S V T I R D N K N K ---P I V C A AV E L T A K
Yellow protein family. N. Neivai salivary transcriptome encompasses eight members of the yellow salivary proteins corresponding to 5.1% of the total sialome. From these eight members three of them corresponds to 99.72% of the yellow family abundance (Table 10).
Yellow-related proteins are abundantly expressed in salivary glands of phlebotomies, mainly in Old World sand flies [17][18][19][20][23][24][25]27,43,44 . The Yellow family was the most abundant salivary protein family detected, also by next generation sequencing, on Phlebotomus kandelakii salivary glands (accounting to 31.7% of the mRNA on salivary glands) 24 contrasting with the limited presence of this family in N. neivai (5.1%). Phlebotomine yellow-related proteins are characterized by having the major royal jelly protein domain (MRJP). Originally, MRJP proteins were described from honeybee larval jelly, making up to 90% of the protein content 64 . Sequences related to MRJP proteins were described in Drosophila, where it is related to cuticle pigmentation and, when mutated, it produced a yellow phenotype and thus named Yellow proteins 65 . It was later found that in Diptera they had a dopachrome oxidase function 66,67 .
In bloodsucking Diptera, salivary yellow-related proteins have only been described in sand flies (all sand fly species studied to date) [17][18][19][20][23][24][25]27,43,44 , and Glossina morsitans morsitans (Westwood 1851) 68 . The proteins of this family are immunogenic and host antibody responses to this protein can be a potential marker for sand fly exposure in experimentally bitten mice and dogs, as well as naturally exposed dogs, humans, and foxes 3,69 . Lu. longipalpis proteins, LJM11, LJM111 and LJM17, act as high affinity binders of pro-inflammatory biogenic amines such as serotonin, catecholamines, and histamine, suggesting that the proteins play a role for the reduction of inflammation during sand fly blood-feeding 3,70 , this activity has also been confirmed in salivary yellows from Old World P. orientalis and P. perniciosus 71,72 .    www.nature.com/scientificreports/ A combination of recombinant LJM17 and LJM11 successfully substituted Lu. longipalpis whole salivary gland homogenate in probing sera of individuals for vector exposure 73 . Yellow proteins are also under consideration for anti-Leishmania vector-based vaccines. LJM17 from Lu. longipalpis elicited leishmanicidal Th1 cytokines in immunized dogs 74,75 , and LJM11 protected laboratory animals against L. (L.) infantum (Nicolle 1908), L. (L.) major, and L. (V.) braziliensis 70,76,77 .
In contrast, mice immunized with P. papatasi yellow-related proteins PpSP42 or PpSP44 (AAL11052 and AAL11051, respectively) elicited Th2 cytokines and exacerbated L. (L.) major infection 78 . Other yellow-related proteins from P. papatasi, specifically PPTSP44 (AGE83095.1), induced a strong Th1 response constituting a potential vaccine candidates against leishmaniasis 79 . It remains to be elucidated whether the protection induced by yellow-related proteins is related to particular protein immunogenicity, to sand fly species, or to the vector-Leishmania host combination, as all of these factors can contribute to vaccine efficacy. New approaches using novel vaccine techniques, consisting of a single dose of plasmid, followed by two doses of recombinant Canarypoxvirus expressing Lu. longipalpis yellow-related salivary proteins, are a promising strategy to control Leishmania infection 75 .
The mechanisms through which exposure to sand fly bites may induce an autoimmunity triggering production of IgG autoantibodies against Dsg1 in genetically susceptible individuals remains to be teased out. In fact, IgG antibodies against salivary homogenates from N. neivai correlated positively with IgG anti-Dsg1 in PF patients 10 . An antigenic cross-reactivity between salivary proteins and Dsg1 is the most plausible hypothesis for anti-Dsg1 autoantibodies production following sand fly bites. Nevertheless, both BLAST and PSIBLAST do not detect any highly significant homology between N. neivai, Lu. longipalpis or P. papatasi salivary proteins and Dsg1 14 . Other pathogenic mechanisms that could explain the loss of Dsg1 self-antigen tolerance induced by exposure to sand fly salivary proteins remains to be tested.
The presence of an abundant maxadilan-simile transcripts in N. neivai (JAV08462.1) sialome may explain the antibody reactivity to maxadilan in NSPS patients, where N. neivai but no Lu. longipalpis is present 9 . Further testing with a N. neivai recombinant maxadilan can help confirm this assumption. Moreover, testing the PF sera from NSPS against recombinant yellow proteins identified in N. neivai would also be desirable in our PF casuistic.
Considering that non-homologous salivary proteins from different sand fly species were associated with PF pathogenesis [10][11][12][13][14]81,82 , we may expect that not a single peptide may act as an independent antigen to trigger PF. More than one protein may be involved in the PF pathogenesis considering the shared pharmacological properties and conformational mimotopes of these proteins in distinct biting sand fly species.

Conclusion
Leishmaniasis is still a frequent and neglected disease in Brazil. Our results add valuable data related to New World Phlebotomine salivary proteins, expanding the findings reported in Lu. longipalpis and N. intermedia sialomes. The availability of the identity of the most abundant N. neivai salivary proteins of the three main species of sand flies widely distributed in Brazil will bring new insights into the host-vector-parasite relationship of L. (L.) infantum and L. (V.) braziliensis infections and may point to targets of interest for a vector-based vaccine. We hope the availability of this compilation of N. neivai salivary proteins by abundance can inform researchers on the selection N. neivai candidates for future experiments. Production of distinct abundant N. neivai recombinant proteins can be used to test individual candidates for the etiology of PF as the trigger of anti-Dsg1 autoantibodies, and also used as biomarkers of vector exposure translating into monitoring tools for vector intervention campaigns.

Data availability
All relevant data are within the paper.