Homology between SARS CoV-2 and human proteins

An extremely high contagiousness of SARS CoV-2 indicates that the virus developed the ability to deceive the innate immune system. The virus could have included in its outer protein domains some motifs that are structurally similar to those that the potential victim's immune system has learned to ignore. The similarity of the primary structures of the viral and human proteins can provoke an autoimmune process. Using an open-access protein database Uniprot, we have compared the SARS CoV-2 proteome with those of other organisms. In the SARS CoV-2 spike (S) protein molecule, we have localized more than two dozen hepta- and octamers homologous to human proteins. They are scattered along the entire length of the S protein molecule, while some of them fuse into sequences of considerable length. Except for one, all these n-mers project from the virus particle and therefore can be involved in providing mimicry and misleading the immune system. All hepta- and octamers of the envelope (E) protein, homologous to human proteins, are located in the viral transmembrane domain and form a 28-mer protein E14-41 VNSVLLFLAFVVFLLVTLAILTALRLCA. The involvement of the protein E in provoking an autoimmune response (after the destruction of the virus particle) seems to be highly likely. Some SARS CoV-2 nonstructural proteins may also be involved in this process, namely ORF3a, ORF7a, ORF7b, ORF8, and ORF9b. It is possible that ORF7b is involved in the dysfunction of olfactory receptors, and the S protein in the dysfunction of taste perception.

The interaction of SARS CoV-2 with the host immune system is largely determined by the structural similarities between viral and host proteins. The studies of SARS CoV-2 are still focused on the S protein 1 .
An extremely high contagiousness of the coronavirus SARS CoV-2 indicates that during its evolution the virus developed the ability to deceive the innate immune system. The simplest way to achieve this ability would be to incorporate into its membrane the proteins that share structural similarity with those which the immune system of the potential victim has learnt to ignore. Probably, the virus borrowed some n-mers from bats or other mammals. Any motif of any mammalian protein was suitable for borrowing, if only the immune system considered it to be of its own.
The knowledge of the homology between the SARS CoV-2 and human proteins would help understand the mechanisms of mimicry at the moment of infection. The SARS CoV-2 proteins may simulate human proteins, mislead the immune system, and slow down its response.
However, mimicry is not the only process that is determined by the protein homology between the virus and host organism. After the inevitable destruction of the virus particle, the proteins or their domains, which were inside the virus until then, come into contact with the immune system. With some structural similarity, a part of the immune response will be directed against the proteins of the host organism, i.e., an autoimmune response will arise.
This study aimed to identify the human proteins which share a significant structural homology with the SARS CoV-2 proteins. We hope this information will be useful to the developers of vaccines against coronavirus. Joshua Lederberg 2 believed that "microbes and their human hosts constitute a superorganism." According to this, we considered the concept of "human proteins" as a combination of human own proteome and the proteomes of gut microbiota. We have paid particular attention to the proteins that are involved in the three functions that are almost necessarily affected in this disease, namely digestion, olfaction and taste.

Methods
Using an open-access protein database Uniprot and our original computer program Ouroboros 3 , we compared the SARS CoV-2 proteome 4 with those of other organisms. We also searched for a separate database of 75,777 human proteins 5 . The algorithm we used compares primary sequences of SARS CoV-2 and human proteins, presented in the form of a one-letter code. We performed a comparison of proteins by a consecutive search for regions of one protein in the others, which is essentially a standard task of finding a substring in a string. This algorithm is implemented in standard methods of many programming languages, including Python, in which the main program was coded. The URL to the source code is provided above 3 . When assessing the homology between the viral and human proteins, we took into account the presence of the common 7-/8-mers and especially their fusion into longer sequences. For example, 7-dimensional viruses, one of which is homologous to the human protein A, and the other to the protein B, can "overlap" at the ends, forming regions of 8 to 14 amino acid residues in length.

Results and discussion
Structural proteins. Spike glycoprotein. S protein, 1273 aa.
Hereinafter, regions homologous to human proteins are highlighted in red. Transmembrane tail TM 1214-1237 is underlined.
In the S protein molecule, we localized more than two dozen of 7-/8-mers homologous to human proteins (Table 1).
Fragments homologous to human proteins are scattered along the entire length of the S protein molecule, and some of them fuse in sequences of considerable length, namely 10-mers SPRRARSVAS 680-689 , 11-mers GLTVLP-PLLTD 857-867 and two closely spaced 7-mers NASVVNI 1173-1179 and EIDRLNE 1182-1188 . Octamer RRARSVAS 682-689 is located at the junction of the S1 and S2 subunits. All these n-mers stand out from the virus particles and may be involved in the effect of mimicry.
SARS CoV-2 can cause smell and taste dysfunction, as well as muscle injury 6 . The 8-mer DEDDSEPV 1257-1264 , located in the cytoplasmic tail, can be released during the destruction of the virus particle and get involved in orchestrating the immune system's response, directing a part of it to the homologous 8-mer in human unconventional myosin-XVI 1404-1421 . The role of this mechanism in muscle dysfunction in coronavirus infection deserves a special investigation.
The 8-mer RRARSVAS 682-689 is homologous to the amiloride-sensitive sodium channel subunit alpha 201-208 , which is involved in salt taste perception 7 .
With a high degree of probability, it can be argued that the S protein is involved in the process of mimicry. It may also take some part in provoking an autoimmune response.
We have checked the S protein homology across10 species, specifically primates, bats and some other mammals. The results are presented in Table entitled Similarity of SARS CoV-2 spike glycoprotein structure with some mammalian proteins in the electronic attachement. Probably, attention should be paid to the homologous regions common to SARS CoV-2, humans, and bats. The data presented so far do not allow us to derive a more general rule.
Envelope small membrane protein. E protein, 75 aa (transmembrane domain  is underlined).
In the E protein molecule, we localized seven 7-mers and one 8-mer homologous to human proteins ( Table 2).
A fragment of the E 8-38 protein transmembrane domain can be represented as follows: The size of the letters (point size) corresponds to the frequency of the viral 7-/8-mers in the human proteome.
The protein E transmembrane domain contains 7-/8-mers, homologous to the proteins of some gut bacteria and even cereals, for example, corn, sorghum, wheat, and barley ( www.nature.com/scientificreports/ The simulation targets may have been the proteins synthesized by a macroorganism itself or by its normal gut microbiota. All protein E 7-/8-mers, homologous to proteins of humans, gut bacteria and cereals, are located in the transmembrane domain of the virus and form the 28-mer protein E  . A random selection of 28 amino acid residues in a row would require an astronomical number of iterations: 20 28 = 2.7 • 10 36 .  www.nature.com/scientificreports/ The involvement of the E protein in mimicry is hardly possible, but its implication in provoking an autoimmune response (after the destruction of the virus particle) seems very likely.
As a major target, the viral E protein has usually been used for the development of vaccines, specifically against HIV-1 9 , Dengue virus 10 , hepatitis B virus 11 , SARS CoV-2 12 and many other viruses. A deletion of the SARS-CoV E protein reduces pathogenicity and mortality in laboratory animals 13 . In the transmembrane domain of the SARS-CoV E protein, specific critical virulence-determining features have been identified 14 .
Membrane protein. Membrane protein, 222 aa. In the M protein molecule, we localized six 7-mers homologous to human proteins (Table 4).
Outside of the 10-mer, we found only two homologous 7-mers. It is unlikely that the M protein is involved in provoking an autoimmune response (after the destruction of the virus particle).
The N protein is located completely inside the virus particle and cannot be involved in mimicry. All heptamers homologous to human proteins form several rather long fragments, including the 13-mer SKQLQQSMSSADS 404-416 and 10-mer AEGSRGGSQA 173-182 , which increases the likelihood of the protein involvement in provoking an autoimmune response.
Nonstructural proteins. All non-structural proteins of SARS CoV-2 are located completely inside the virus particle and, by definition, cannot be involved in the process of mimicry. It remains to consider the possibility of their implication in provoking an autoimmune process.  In the ORF3a protein molecule, we localized five 7-mers homologous to human proteins ( Table 6). The 7-mers scattered along the entire length of its molecule do not form long n-mers anywhere else. ORF3a does not appear to be involved in provoking an autoimmune response.

ORF7a protein. ORF7a 121 aa.
In the ORF7a protein molecule, we found two 7-mers homologous to human proteins and located in close proximity to each other (Table 7).
It is possible that ORF7a is involved in provoking an autoimmune response.
ORF7b protein. ORF7b protein, 43 aa. In this polypeptide, we found only one 7-mer homologous to the human protein (Table 8).
ORF7b may be involved in provoking an autoimmune response, contributing to olfactory dysfunction.   The primary structure of SARS-CoV-2 ORF8 is close to that of bat RaTG13-CoV 15 . In this polypeptide, there are three 7-mers homologous to human proteins (Table 9). Due to the fusion of two 7-mers into 10-mer LVFLGIITTV 4-13 , the ORF8 protein can be involved in provoking an autoimmune response.
Some of the 8-mers are found in more than one human protein, some fold into long n-mers, for example EDIQLLKSAYENFNQH 1126-1141 , EVEKGVLPQLEQPY 55-68 and SVEEVLSEARQHL 34-46 .
Replicase polyprotein RPP 1ab. This huge (7096 aa; the primary structure see in 18 ) molecule contains 210 heptaand octamers homologous to human proteins. Some of them fold into long (more than 15 aa) n-mers.
The possibility of the involvement of replicases in provoking an autoimmune response is debatable. Enzymes in general, and cell cycle enzymes in particular, are evolutionarily highly conserved. Fragments homologous to human proteins must be thrown in huge quantities into the gut lumen during the decay of any microorganism that dies there. It is possible that the interaction of replicases with the host's immune system obeys the laws other than for shorter proteins.
ORF6, ORF10, and ORF14. In these polypeptides (61, 38, and 73 aa, respectively), we did not find 7-/8-mers homologous to human proteins. When assessing the role of SARS CoV-2 proteins in mimicry and provoking an autoimmune response in humans, we considered the following parameters: (i) the number of homologous n-mers; (ii) the compactness of their arrangement in the SARS CoV-2 protein molecules; (iii) intradomain localization (external, transmembrane, internal) of the SARS CoV-2 proteins, and (iv) physiological functions that involve the homologous human proteins (Table 12).

Conclusions
Analysis of homology between the SARS CoV-2 and human proteins led us to the following conclusions. Some of the SARS CoV-2 proteins can be implicated in mimicry that can delay the response of innate immunity to the invasion of virus particles into a macroorganism, and in provoking an autoimmune process that directs a part www.nature.com/scientificreports/ of the immune response to the proteins of a macroorganism (after the destruction of virus particles). Mimicry is probably more characteristic of the spike (S) protein, and the provocation of an autoimmune response seems to be a distinctive feature of the envelope (E) protein. The ORF7b protein may be involved in the impairment of olfactory receptors, and the S protein may be involved in taste perception dysfunction. Drugs aimed at destructing or blocking these and alike regions in proteins of SARS CoV-2 and other viruses can enable the human immune system not to succumb to viral deception and destroy the invader shortly after its penetration into a macroorganism. It should also be borne in mind that drugs affecting such imitation regions can damage native proteins present of the human body. Destroying or blocking such regions can weaken the autoimmune response. Homology to some proteins of commensal gut bacteria is also shown. Table 12. Qualitative assessment of the possibility for the SARS CoV-2 proteins to be involved in the processes of mimicry and provoking an autoimmune response.