Introduction

The interaction of SARS CoV-2 with the host immune system is largely determined by the structural similarities between viral and host proteins. The studies of SARS CoV-2 are still focused on the S protein1.

An extremely high contagiousness of the coronavirus SARS CoV-2 indicates that during its evolution the virus developed the ability to deceive the innate immune system. The simplest way to achieve this ability would be to incorporate into its membrane the proteins that share structural similarity with those which the immune system of the potential victim has learnt to ignore. Probably, the virus borrowed some n-mers from bats or other mammals. Any motif of any mammalian protein was suitable for borrowing, if only the immune system considered it to be of its own.

The knowledge of the homology between the SARS CoV-2 and human proteins would help understand the mechanisms of mimicry at the moment of infection. The SARS CoV-2 proteins may simulate human proteins, mislead the immune system, and slow down its response.

However, mimicry is not the only process that is determined by the protein homology between the virus and host organism. After the inevitable destruction of the virus particle, the proteins or their domains, which were inside the virus until then, come into contact with the immune system. With some structural similarity, a part of the immune response will be directed against the proteins of the host organism, i.e., an autoimmune response will arise.

This study aimed to identify the human proteins which share a significant structural homology with the SARS CoV-2 proteins. We hope this information will be useful to the developers of vaccines against coronavirus.

Joshua Lederberg2 believed that "microbes and their human hosts constitute a superorganism." According to this, we considered the concept of "human proteins" as a combination of human own proteome and the proteomes of gut microbiota. We have paid particular attention to the proteins that are involved in the three functions that are almost necessarily affected in this disease, namely digestion, olfaction and taste.

Methods

Using an open-access protein database Uniprot and our original computer program Ouroboros3, we compared the SARS CoV-2 proteome4 with those of other organisms. We also searched for a separate database of 75,777 human proteins5. The algorithm we used compares primary sequences of SARS CoV-2 and human proteins, presented in the form of a one-letter code. We performed a comparison of proteins by a consecutive search for regions of one protein in the others, which is essentially a standard task of finding a substring in a string. This algorithm is implemented in standard methods of many programming languages, including Python, in which the main program was coded. The URL to the source code is provided above3.

When assessing the homology between the viral and human proteins, we took into account the presence of the common 7-/8-mers and especially their fusion into longer sequences. For example, 7-dimensional viruses, one of which is homologous to the human protein A, and the other to the protein B, can "overlap" at the ends, forming regions of 8 to 14 amino acid residues in length.

Results and discussion

Structural proteins

Spike glycoprotein

S protein, 1273 aa
figure a

Hereinafter, regions homologous to human proteins are highlighted in red. Transmembrane tail TM1214-1237 is underlined.

In the S protein molecule, we localized more than two dozen of 7-/8-mers homologous to human proteins (Table 1).

Table 1 Localization of homologous 7-/8-mers in the S protein and human proteins.

Fragments homologous to human proteins are scattered along the entire length of the S protein molecule, and some of them fuse in sequences of considerable length, namely 10-mers SPRRARSVAS680-689, 11-mers GLTVLPPLLTD857-867 and two closely spaced 7-mers NASVVNI1173-1179 and EIDRLNE1182-1188. Octamer RRARSVAS682-689 is located at the junction of the S1 and S2 subunits. All these n-mers stand out from the virus particles and may be involved in the effect of mimicry.

SARS CoV-2 can cause smell and taste dysfunction, as well as muscle injury6.

The 8-mer DEDDSEPV1257-1264, located in the cytoplasmic tail, can be released during the destruction of the virus particle and get involved in orchestrating the immune system’s response, directing a part of it to the homologous 8-mer in human unconventional myosin-XVI1404-1421. The role of this mechanism in muscle dysfunction in coronavirus infection deserves a special investigation.

The 8-mer RRARSVAS682-689 is homologous to the amiloride-sensitive sodium channel subunit alpha201-208, which is involved in salt taste perception7.

With a high degree of probability, it can be argued that the S protein is involved in the process of mimicry. It may also take some part in provoking an autoimmune response.

We have checked the S protein homology across10 species, specifically primates, bats and some other mammals. The results are presented in Table entitled Similarity of SARS CoV-2 spike glycoprotein structure with some mammalian proteins in the electronic attachement. Probably, attention should be paid to the homologous regions common to SARS CoV-2, humans, and bats. The data presented so far do not allow us to derive a more general rule.

Envelope small membrane protein

E protein, 75 aa (transmembrane domain8-38 is underlined)
figure b

In the E protein molecule, we localized seven 7-mers and one 8-mer homologous to human proteins (Table 2).

Table 2 Localization of homologous 7-/8-mers in the E protein and human proteins.

A fragment of the E8-38 protein transmembrane domain can be represented as follows:

figure c

The size of the letters (point size) corresponds to the frequency of the viral 7-/8-mers in the human proteome.

The protein E transmembrane domain contains 7-/8-mers, homologous to the proteins of some gut bacteria and even cereals, for example, corn, sorghum, wheat, and barley (Table 3).

Table 3 Localization of some of homologous 7-/8-mers in the E protein and human gut proteome.

The simulation targets may have been the proteins synthesized by a macroorganism itself or by its normal gut microbiota.

All protein E 7-/8-mers, homologous to proteins of humans, gut bacteria and cereals, are located in the transmembrane domain of the virus and form the 28-mer protein E14-41. A random selection of 28 amino acid residues in a row would require an astronomical number of iterations: 2028 = 2.7 ∙ 1036.

The involvement of the E protein in mimicry is hardly possible, but its implication in provoking an autoimmune response (after the destruction of the virus particle) seems very likely.

As a major target, the viral E protein has usually been used for the development of vaccines, specifically against HIV-19, Dengue virus10, hepatitis B virus11, SARS CoV-212 and many other viruses. A deletion of the SARS-CoV E protein reduces pathogenicity and mortality in laboratory animals13. In the transmembrane domain of the SARS-CoV E protein, specific critical virulence-determining features have been identified14.

Membrane protein

Membrane protein, 222 aa
figure d

In the M protein molecule, we localized six 7-mers homologous to human proteins (Table 4).

Table 4 Localization of homologous 7-mers in the M protein and human proteins.

A N-terminus fragment1-19 of the M protein can be represented as follows:

figure e

In the protein M, four 7-dimensional homologues of human proteins are fused into 10-mer VEELKKLLEQ10-19, the hydrophilic composition of which indicates a possible contact with the external environment, i.e., with the host's immune system, and the involvement in mimicry.

Outside of the 10-mer, we found only two homologous 7-mers. It is unlikely that the M protein is involved in provoking an autoimmune response (after the destruction of the virus particle).

Nucleoprotein

Nucleoprotein, 419 aa
figure f

In the N protein molecule, we localized eleven 7-mers homologous to human proteins (Table 5).

Table 5 Localization of homologous 7-mers in the N protein and human proteins.

The N protein is located completely inside the virus particle and cannot be involved in mimicry. All heptamers homologous to human proteins form several rather long fragments, including the 13-mer SKQLQQSMSSADS404-416 and 10-mer AEGSRGGSQA173-182, which increases the likelihood of the protein involvement in provoking an autoimmune response.

Nonstructural proteins

All non-structural proteins of SARS CoV-2 are located completely inside the virus particle and, by definition, cannot be involved in the process of mimicry. It remains to consider the possibility of their implication in provoking an autoimmune process.

ORF3a protein

ORF3a protein, 275 aa
figure g

In the ORF3a protein molecule, we localized five 7-mers homologous to human proteins (Table 6).

Table 6 Localization of homologous 7-mers in the ORF3a protein and human proteins.

The 7-mers scattered along the entire length of its molecule do not form long n-mers anywhere else. ORF3a does not appear to be involved in provoking an autoimmune response.

ORF7a protein

ORF7a 121 aa
figure h

In the ORF7a protein molecule, we found two 7-mers homologous to human proteins and located in close proximity to each other (Table 7).

Table 7 Localization of homologous 7-mers in the ORF7a protein and human proteins.

It is possible that ORF7a is involved in provoking an autoimmune response.

ORF7b protein

ORF7b protein, 43 aa
figure i

In this polypeptide, we found only one 7-mer homologous to the human protein (Table 8).

Table 8 Localization of the homologous 7-mer in ORF7b and a human protein.

ORF7b may be involved in provoking an autoimmune response, contributing to olfactory dysfunction.

ORF8 protein

ORF8 protein, 121 aa
figure j

The primary structure of SARS-CoV-2 ORF8 is close to that of bat RaTG13-CoV15. In this polypeptide, there are three 7-mers homologous to human proteins (Table 9).

Table 9 Localization of homologous 7-mers in the ORF8 protein and human proteins.

Due to the fusion of two 7-mers into 10-mer LVFLGIITTV4-13, the ORF8 protein can be involved in provoking an autoimmune response.

ORF9b protein

ORF9b protein, 97 aa
figure k

In the ORF9b protein molecule, we localized six 7-/8-mers, homologous to human proteins (Table 10).

Table 10 Localization some of homologous 7-/8-mers in ORF9b protein and human proteins.

Some of these 7-/8-mers merge into larger n-mers TEELPDEFVV84-93 and LGSPLSLN48-55.

Octamer ELPDEFVV86-93 is homologous to the Maestro heat-like repeat-containing protein family member 2B (Fig. 1), which may play a role in the sperm capacitation16. Male reproductive dysfunction was proposed as a likely consequence of COVID-1917.

Figure 1
figure 1

The SARS CoV-2 S, E and ORF9b protein molecules contain hepta/octamers that are homologous to proteins in the human body, including some nutrients and intestinal commensal bacteria.

After the destruction of the virus particle, ORF9b can take part in provoking an autoimmune response.

Replicase polyprotein RPP 1a

Replicase polyprotein RPP 1a, 4405 aa
figure l

The longest n-mers are underlined.

In the RPP 1a molecule, we localized eleven 8-mers (Table 11) and more than a hundred 7-mers homologous to human proteins.

Table 11 Localization of homologous 8-mers in RPP 1a and human proteins.

Some of the 8-mers are found in more than one human protein, some fold into long n-mers, for example EDIQLLKSAYENFNQH1126-1141, EVEKGVLPQLEQPY55-68 and SVEEVLSEARQHL34-46.

In the RPP 1a molecule, 7-mers SCGNFKV505-511 and AIFYLIT2785-2791 are homologous to human olfactory receptor proteins 52N2190-196 and 2W132-38, respectively. A heptamer LKTLLSL1556-1562 is homologous to the human bitter taste receptor T2R55181-187 (Fig. 2).

Figure 2
figure 2

Some SARS CoV-2 hepta/octamers are homologous to human olfactory and taste receptor proteins. Homology to some proteins of commensal gut bacteria is also shown.

Replicase polyprotein RPP 1ab

This huge (7096 aa; the primary structure see in18) molecule contains 210 hepta- and octamers homologous to human proteins. Some of them fold into long (more than 15 aa) n-mers.

The possibility of the involvement of replicases in provoking an autoimmune response is debatable. Enzymes in general, and cell cycle enzymes in particular, are evolutionarily highly conserved. Fragments homologous to human proteins must be thrown in huge quantities into the gut lumen during the decay of any microorganism that dies there. It is possible that the interaction of replicases with the host's immune system obeys the laws other than for shorter proteins.

ORF6, ORF10, and ORF14

In these polypeptides (61, 38, and 73 aa, respectively), we did not find 7-/8-mers homologous to human proteins. When assessing the role of SARS CoV-2 proteins in mimicry and provoking an autoimmune response in humans, we considered the following parameters: (i) the number of homologous n-mers; (ii) the compactness of their arrangement in the SARS CoV-2 protein molecules; (iii) intradomain localization (external, transmembrane, internal) of the SARS CoV-2 proteins, and (iv) physiological functions that involve the homologous human proteins (Table 12).

Table 12 Qualitative assessment of the possibility for the SARS CoV-2 proteins to be involved in the processes of mimicry and provoking an autoimmune response.

Conclusions

Analysis of homology between the SARS CoV-2 and human proteins led us to the following conclusions. Some of the SARS CoV-2 proteins can be implicated in mimicry that can delay the response of innate immunity to the invasion of virus particles into a macroorganism, and in provoking an autoimmune process that directs a part of the immune response to the proteins of a macroorganism (after the destruction of virus particles). Mimicry is probably more characteristic of the spike (S) protein, and the provocation of an autoimmune response seems to be a distinctive feature of the envelope (E) protein. The ORF7b protein may be involved in the impairment of olfactory receptors, and the S protein may be involved in taste perception dysfunction.

Drugs aimed at destructing or blocking these and alike regions in proteins of SARS CoV-2 and other viruses can enable the human immune system not to succumb to viral deception and destroy the invader shortly after its penetration into a macroorganism. It should also be borne in mind that drugs affecting such imitation regions can damage native proteins present of the human body. Destroying or blocking such regions can weaken the autoimmune response.