A possible role for autoimmunity through molecular mimicry in alphavirus mediated arthritis

Alphaviral infections are foremost in causing debilitating clinical outcomes in humans characterized by rheumatic arthritis like conditions. Though the presence of virus in joints and associated inflammation has been implicated as one of the reasons for the acute and chronic polyarthritis post alphaviral infections, the basis for rheumatic like outcomes is not clear. Through an in silico analysis, we have investigated the possibility of an autoimmune process mediated through molecular mimicry in alphaviral infection induced pathogenicity. Interestingly, sequence alignment of the structural polyproteins belonging to arthritogenic alphaviruses revealed conserved regions which share homology with human proteins implicated in rheumatoid arthritis (RA). These conserved regions were predicted to exhibit binding to HLA class II alleles, showcasing their potential to incite T cell help. Molecular docking of the viral peptide and the corresponding homologous region in the human protein onto HLA-DRB1 revealed strong similarities in their binding patterns. Linear and conformational B cell epitope prediction analyses showed that these potential mimics have high propensity to elicit an efficient B cell response. We thus propose that the origin of polyarthritis post-arthritogenic alphaviral infections may also be mediated through a hitherto unknown autoimmune response due to the presence of cross-reactive epitopes between viral and human proteins.

pulD from Klebsiella sp., nuclear antigen-1 from Epstein-Barr virus and OSP-A from Borrelia sp. may have a possible association with autoimmune diseases like ankylosing spondylitis, Systemic Lupus Erythematosus (SLE) and Lyme arthritis, respectively. A significant sequence similarity between P2-C protein of Coxsackie virus and glutamate decarboxylase of humans has been proposed as a trigger for Type 1 diabetes 18 . The similitude in clinical manifestation of an alphaviral infection induced arthralgia and rheumatoid arthritis at both phenotypic and molecular levels suggests that alphaviral infections could be a causative link for rheumatoid arthritis.
Overarching goal of this study is to explore the possibility of existence of peptides in alphaviruses which may set off an autoimmune response resulting in rheumatoid arthritis like symptoms in alphavirus infected patients. Structural polyproteins of the alphaviruses were scanned for the presence of conserved stretches of amino acids that might share homology with sequences in human proteins implicated in rheumatoid arthritis. These conserved regions were validated in silico both at sequence and structural level for their immunogenic potential in terms of their ability to act as T and B cell epitopes. Overall, our results posit that polyarthritis associated with alpha viral infections may also involve an autoimmune component due to cross reactivity between viral epitopes and host proteins.

Results
Arthritogenic alpha viruses such as Chikungunya virus (CHIKV), Ross river virus (RRV), Semiliki forest virus (SFV), Mayaro virus, O'nyong-nyong virus (ONV) and Barmah forest virus (BFV) are generally associated with rheumatic diseases in humans, primarily characterized by polyarthralgia and polyarthritis. Although most patients recover within few weeks, the clinical conditions last for 6 months to more than 3 years 6 . However, molecular mechanisms contributing to this pathology are not very well understood. In the present study, we show a possible role of autoimmune process mediated through molecular mimicry as a likely cause for arthritic like conditions post alphaviral infections. The overall workflow of the study is depicted in Fig. 1.
Multiple sequence alignment reveals conserved regions in the structural polyproteins of arthritogenic alpha viruses. Alphaviral infections share a common feature of inducing RA like conditions, but the commonality in the alphaviruses leading to such a rheumatic outcome is not known. We hypothesized that the commonality could lie in the structural polyprotein of these viruses. Therefore, we have analyzed the structural polyproteins of six arthritogenic alpha viruses for the presence of conserved regions amongst them. As described in the methods section, amino acid sequences of the structural polyproteins of CHIKV, RRV, SFV, ONV, BFV and Mayaro virus were retrieved from UniProt database and subsequently multiple sequence alignment was performed using MUSCLE. Conserved regions identified are shown in Supplementary Fig. 1. The positions of the conserved regions have been depicted in Fig. 2 using the structural polyprotein of the Chikungunya virus as a representative model. The structural polyproteins of arthritogenic alphaviruses are about 71% identical at the primary (amino acid sequence) level. A total of 41 conserved regions were identified in the structural polyproteins, with E1 glycoprotein region exhibiting highest conservancy with 18 conserved regions, followed by p62 and capsid proteins harboring 15 and 8 conserved regions respectively. conserved regions across alpha viruses share homology with human proteins implicated in arthritis. To deduce the possibility of autoimmune reactions involved in mediating post alphaviral polyarthralgia through the presence of molecular mimics, we performed a sequence similarity search between the conserved regions among arthritogenic alpha viruses and the human proteome using the standard BLASTp program. Each of the 41 conserved regions was scanned against the human proteome. Very interestingly, 24 regions showed varying degrees of homology (52.7% to 100%) with human proteins that have been implicated in arthritis like conditions in humans (Table 1). potential mimic regions harbor HLA binding motifs. Human leukocyte antigen (HLA) is the most widely distributed molecule with high level of polymorphism and are the major governing factors in initiating cellular immune responses. It is thus important that immunogenic peptides bind to multiple HLA alleles to elicit a strong immune response. The 24 conserved regions of the structural polyprotein (refer Table 1) were analyzed for HLA class II binding ability using propred II server. The nonameric peptides generated from each conserved region were scanned against 51 HLA alleles available in the ProPred II server. The number of nonameric peptides generated from each of 24 conserved regions and the number of peptides predicted to bind to different alleles is summarized in Table 2. Peptides generated from 19 of the conserved regions bound to at least one HLA allele, while the peptides from the remaining 5 of the conserved regions did not bind to any of the alleles.
As illustrated in Fig. 3, maximum numbers of nonamers were found to be recognized by the allele HLA DRB1_0308, which bound to nine of the conserved region peptides. DRB1_0101 bound to eight and DRB1_1321 bound to seven, DRB1_0102, 0401, 0426, 1101, 1107 and 1120 bound to six conserved region peptides respectively.
Since the quality of immune response elicited by the mimic peptide in humans should be similar to the one elicited by the viral peptide for an efficient autoimmune response to occur, the structure of the HLA DRII-peptide complex was analyzed. Molecular docking analysis was performed to assess the structural similarity in the binding patterns of the viral peptide and its homologous region in the human protein. The ability of one of the potential conserved regions, FPLANVTCM (viral derived) and its corresponding homologous region KPLANVTLM (human derived) to bind to a class II HLA allele was assessed by docking the peptides in the binding groove of HLA-DRB1, an allele known to predispose individuals to rheumatoid arthritis 19 . The peptides were docked using GalaxyPepDock server and the docked peptides are shown in Fig. 4. The root mean square deviation between the docked peptides shown in Fig. 4 is 0.86 Å.
The docked p-MHC complexes were analyzed for the hydrogen bonding interactions between the peptides and HLA-DRB1. The binding interactions between the viral and human peptide with HLA-DRB1 were visualized using LigPlot+. Post docking analysis of the peptides revealed significant interactions with HLA-DRB1. In viral peptide-HLA-DRB1 complex, ten hydrogen bonds within a distance of 3.5 Å suggested the stability of the peptide in the binding groove of the pocket. Similarly, a set of eleven hydrogen bonds were formed between human peptide and HLA-DRB1 within a distance of 3.5 Å. Furthermore, it can be seen in Fig. 5 that the amino acids in the conserved regions of both the peptides form identical hydrogen bonds with the amino acids (Ser51, Asn260, Lys249, Trp235, Asn67, Tyr 208) present in the binding pocket of HLA-DRB1. potential mimics could also act as B cell epitopes. To further strengthen the case of alpha viral induced molecular mimicry, we have analyzed the 24 conserved regions for the presence of both linear and structural B cell epitopes which in turn is a measure of their ability to act as stimulators of B cells and generate an autoimmune reaction against its homologous region in the human protein leading to polyarthralgia. Linear B cell epitope prediction was performed using three different prediction algorithms namely Bepipred linear epitope  Table 3.
Out of the 24 conserved regions analyzed, 5 regions were predicted to be plausible B cell epitopes by all the three algorithms. 6 regions were predicted to be plausible B cell epitopes by at least 2 of the 3 algorithms tested. 12 regions were predicted to be B cell epitopes by only one of the algorithm. Only one out of the 24 conserved regions was not predicted as B cell epitope by all the three algorithms.
Interestingly, 9 out of the 24 conserved regions shared homology with human proteins which were experimentally proven to have antibodies developed against them in patients diagnosed with rheumatoid arthritis (highlighted in bold in Table 1). www.nature.com/scientificreports www.nature.com/scientificreports/ To further validate if the potential mimics could act as plausible B cell epitopes also at a conformational level, CHIKV structural polyprotein was used as a model to locate conformational B cell epitopes using Ellipro server from IEDB. As shown in Fig. 6, the parts of the peptides MKSDASKFTHEKPEGYYNWHHGAVQ and YSGGRFTIPTGAGKPGDSGRPIFDN (labelled A in Fig. 6) present on the Capsid protein, the part of the peptide RKGKIHIPFPLANVTCMVPKA (labelled B), PTVTYGK (labelled C) and EGLEVTWGNN (labelled D) present on the p62 protein are exposed to the surface and predicted to be potential conformational B cell epitopes.

Discussion
Viral infection as one of the etiological agents for autoimmune diseases is being discussed for a long time. Several mechanisms have been proposed to explain this phenomenon including molecular mimicry, bystander activation and viral persistence 20 individually or in multiple combinations to account for the immunopathology observed at the site of infection and/or sites of autoimmune disease and normally the infections precede the occurrence of inflammation in the target organ 21 . Triggered immune response as result of infection is critical for viral clearance. However, in some instances, immune regulatory mechanisms may aberrate, leading to the breakdown of self-tolerance, resulting in immune-mediated attack directed against both viral and self-antigens as it happens in the case of molecular mimicry or shared homology between viral and host epitopes 22,23 .
Human infections caused by the alphaviruses group share a common feature of the clinical picture of arthralgia and chronic arthritis that closely resembles rheumatoid arthritis 7,24,25 . Further, more severe and delayed recovery of alpha viral disease in patients with pre-existing arthritic conditions has been reported 26 . Though the cellular components and inflammatory scenario involved in such conditions have been looked at 27 , knowledge about the actual cause of such persistent symptoms characterized by articular disease and myalgia remains sparse. Though one of the speculations regarding the basis for this rheumatic like manifestations is the possible induction of autoimmunity as a side effect of adaptive immune responses, caused by cross-reactivity between viral and host antigens 24,28 , there are no systematic studies towards proving this hypothesis.
In the present work, through an in silico analysis we have identified the presence of conserved regions in the structural polyprotein of alphaviruses. A high sequence identity (71%) suggests that the origin of commonality in the clinical features may lie in the amino acid sequence of the structural polyprotein of the viruses. On comparative analysis of these regions of consensus with the human proteome, we have identified proteins that share stretches of homologous regions with the structural polyprotein. Further, very interestingly we found that presence of auto antibodies specific to some of these proteins in rheumatoid arthritis (Table 1). Serum and synovial fluid from RA patients have been shown to contain auto antibodies specific to some of the proteins that we identified in our analysis. These proteins are highlighted in  Table 2. HLA binding profiles of potential mimics from the alpha viral structural proteins. A total of 51 HLA class II alleles were used for the study.  www.nature.com/scientificreports www.nature.com/scientificreports/ SKDVYANTQLVLQRPAAGTVHVPYS has been experimentally tested in mice to induce polyarthralgia. The mouse injected with this peptide recapitulated the pathology seen chikungunya infection 29 . These reports give further credibility to our findings and strengthen our current premise.
Our results further show that these regions are immuno-dominant stressing their ability for induction of potential auto-immune reaction. Based on the amino acid sequence analysis, twenty-four of the conserved regions (Table 2) bound to HLA class II alleles, suggesting the possibility of calling in for T helper cell responses. Three conserved regions were predicted to be potential B cell epitopes by all the algorithms tested. Since majority of the antigenic epitopes have been shown to have high surface accessibility, we hypothesized that identical amino acids would give rise to similar structural features too. Therefore, we validated the antigenicity of chosen conserved regions at structural level. The root mean square deviation between the peptides is 0.86 Å which suggests that the two peptides bind to class II MHC in a similar orientation (Fig. 4). Docking the conserved region (FPLANVTCM) from the structural polyprotein and its homologous sequence in the human proteome revealed similar interactions with HLA-DRB1 (Fig. 5). Plausible B cell epitope prediction analysis based on the structure of CHIKV structural polyprotein concurred with some of the conserved regions we identified. Thus, validation at both the amino acid and structural level affirmed the antigenic potential of conserved regions in the structural polyprotein. In addition, through the revelation of presence of significant conserved regions in the alphaviral proteins, we surmise that antibodies for differential diagnosis of alpha virus infections should be directed towards the non-conserved regions of the structural proteins for higher specificity. In summary (Fig. 7), our results computationally tease out a possible mechanism mediated through molecular mimicry leading to development of autoimmunity during alpha virus infections which could culminate in arthritis like conditions in infected individuals. www.nature.com/scientificreports www.nature.com/scientificreports/ Methods Viral protein sequences. We carried out the bioinformatic analysis using protein sequences available for the clinical isolates of alphaviruses known to cause polyarthralgia [30][31][32][33] and used as protype strains in the literature 29 . Structural polyproteins of the six arthritogenic alphaviruses: Chikungunya virus (strain S2-African

Conserved region
Bepipred ESA KT Table 3. Analysis of conserved regions of alpha viruses sharing homology with human peptides (refer Table 1) for their ability to act as B cell epitopes. The algorithms used for prediction are Bepipred, Emini Surface Accessibility Area (ESA), Kolaskar and Tongaonkar (KT). Tick (✓) indicates that it is a predicted potential B cell epitope and cross (×) indicates that it is not a potential B cell epitope.  www.nature.com/scientificreports www.nature.com/scientificreports/ Multiple sequence alignment. Conserved regions within the structural polyproteins were identified by multiple sequence alignment using MUSCLE alignment algorithm 35 in MEGA 7.0 36 . A stretch of amino acids was scored as conserved region if it is present in at least four out of the six alpha viruses.
Homology search. BLASTp program (https://blast.ncbi.nlm.nih.gov/Blast.cgi) 37 was employed to expound the existence of sequence homology between the conserved regions identified in the alpha viruses and the human proteome. The search set was limited to Homo sapiens (taxid: 9606) in the UniProtKB/Swiss-Prot database. Default BLASTp algorithm parameters were used and the results were limited only to the top 100 hits.
The homologous proteins list obtained from the BLAST search was manually curated using the Open Targets Platform server 38 to identify if a given protein has been reported to be involved in the pathogenesis of rheumatoid arthritis. The Open Targets Platform server is a repository of human proteins and their involvement in diseases at various levels. These proteins bearing the homologous sequences to conserved regions within the structural polyproteins were chosen for further investigation.
prediction of potential t cell epitopes. The shortlisted conserved regions in the structural polyprotein were explored for their ability to act as T cell epitopes. Peptides from these regions were subjected to HLA II binding analysis using ProPred II analysis tool 39 . The server uses quantitative matrices for predicting binding of nonamers to HLA class II. Quantitative matrices are chosen for predictive binding because they provide a linear model and are easy to implement. Nonameric peptides generated from the conserved regions, which were predicted as potential binders for HLA class II alleles were selected for further analysis. A frequency distribution of the ability of these peptides to bind to multiple HLA class II alleles was generated.
Peptide-MHC docking was performed using GalaxyPepDock 40 server to evaluate the similarity in binding patterns among a conserved region in the alphavirus structural polyprotein and its corresponding homologous region in the human protein. GalaxyPepDock works by identifying templates from experimentally resolved structure databases to predict the protein structure, followed by an energy-based optimization to provide structural flexibility. Default parameters available with the tool have been used and the back-end data containing the parameters used for docking are not available. Crystal structure of HLA-DRB1 complexed with Type II collagen peptide (PDB ID: 6BIN) was retrieved from Protein Data Bank database 41 . The resident peptide in the crystal structure was stripped off using PyMol to make the binding groove of HLA-DRB1 available for docking the viral peptide and its human homologue. prediction of potential linear B cell epitopes. Shortlisted conserved regions from the structural polyprotein were explored for their ability to act as B cell epitopes. Full length amino acid sequence of structural polyprotein of Chikungunya virus (strain S27-African prototype) (UniProt ID: Q8JUX5) was subjected to linear B cell epitope prediction analysis available at Immune epitope database (IEDB) 42 . The analysis was performed using three different prediction algorithms namely, Bipipred, Emini surface accessibility and Kolaskar and Tongaonkar with default algorithm parameters.
Further, to validate if the conserved regions act as potential linear B cell epitopes at the structural level, we employed ElliPro server 43 to predict the regions on the structural polyprotein of the Chikungunya virus (PDB ID: 3J2W). ElliPro accepts a PDB structure as input and uses three different algorithms based on protrusion index of the residues, protein shape approximation and neighboring residues clustering to predict linear antibody-based epitopes. Default parameters of the ElliPro program were used for prediction analysis.

Data availability
All the viral sequences used for analysis were retrieved are from publicly available databases. Criteria used for analyzing and short listing peptide sequences have been explained clearly in materials and methods section. Most of the softwares used for HLA class II-peptide binding prediction and B cell epitope prediction are also available on the public domain and have been referenced accordingly. Protein structures used are from protein data bank and the relevant literature has been cited wherever appropriate. The other raw datasets generated during and/or analyzed during the study are available from the corresponding author on reasonable request. We declare that we will be fully willing to comply with the journal policy and will be able to make any materials/data available required for the review process and thereafter.