Introduction

The spread of COVID-19, a disease caused by severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2), has led to the current global pandemic with already millions of people with confirmed infection and more than one million deaths1. According to the World Health Organization (WHO), the mode of infection for COVID-19 is predominantly through respiratory droplets, aerosol transmission due to pathogen-laden viral particles in the air, or close contact with infected people with increased viral loads, especially in the early stages of disease2. The mechanism of human pathogenesis, to a great extent, may simulate that of SARS-CoV (associated with SARS) and Middle East respiratory syndrome coronaVirus (MERS-CoV; associated with MERS) viral infections, including the prolonged persistence of the virus worsening the host immune response3,4,5. Clinical manifestation of COVID-19 ranges from mild respiratory symptoms to severe disease and death6. However, there are now reports suggesting heterogeneous manifestation of the disease affecting multiple organs, including kidney, liver, and brain7. Although age and compromised health history are considered critical prognostic factors, certain patients of younger age and good health have shown severe progression of the disease8.

Patients with COVID-19 may be asymptomatic9,10, but can still transmit infection11. Viral shedding from an infected person may occur, although resolution of symptoms12, and relapse has been reported despite consecutive negative testing13. Currently, there are no approved therapies for COVID-19 respiratory symptoms14,15 or way to screen for disease progression, and system-wide changes in patients.

In this study, we sought to understand and infer changes in coronavirus-infected patients at system levels through their peripheral blood mononuclear cells (PBMC) by performing comparisons with healthy volunteers by applying inferential statistics (IS). The inferences at the multiple levels of the system in patients provide an efficient way of understanding the heterogeneity and mechanism(s) of disease manifestation as a whole (Fig. 1A). These inferences can be derived systematically in a hierarchical fashion from the level of gene signatures to the whole organism, to study the pathophysiology of patients with coronavirus infection. Moreover, the patients’ response to insults from the virus along with other associated disease conditions can be studied.

Fig. 1: System-level analysis and PBMC-based gene signatures show association with SARS and COVID-19.
figure 1

A Schematic showing the identification of PBMC RNA gene signatures associated with disease staging, and hierarchical modeling of genes, pathways, networks, subcellular contents, cells, and disease symptoms using COVID-engine platform. This figure was prepared using Servier Medical Art (https://smart.servier.com) under a Creative Commons Attribution 3.0 Unported License (http://creativecommons.org/licenses/by/3.0/). B Heatmap showing 290 gene signature genes in 10 SARS patients and 4 healthy individuals. Both CoV-Up-gene signature (169 genes) and CoV-Down-gene signature (121 genes) are shown. C, D. CoV-Up-gene signature scores (C) and CoV-Down-gene signature scores (D) and their association with SARS-CoV-2-infected (COVID-19; n = 3) patients and healthy individuals (n = 3; datasets from Xiong et al.17). For C and D statistical significance was not considered due to low sample size. E, F. CoV-Up-gene signature scores (E) and CoV-Down-gene signature scores (F) and their association with acute and recovering SARS-CoV-infected patients (n = 25; 44 samples), bacteria-infected patients (n = 16), and healthy individuals (n = 9; datasets from Lee et al.19). Kruskal–Wallis statistical test with p < 0.001 for E and F. G Acute vs. recovering gene signature from Lee et al.19 predicted acute-like and recovering-like SARS-CoV-infected patients (n = 10) from Raghunathan et al.16 using NTP method and cosine distance measure.

Results

PBMC-based gene expression profiles identify distinct gene signatures in coronavirus-infected patients and healthy volunteers

To perform an integrative and systematic analysis of heterogeneous patients’ responses to the coronavirus infection, we used a limited RNA transcriptome data from PBMC of SARS-CoV patients (n = 10) and healthy volunteers (n = 4) from a published study16 (training data). We identified PBMC genes differentially expressed between patients and healthy volunteers by applying our in-house developed IS pipeline (“COVID-engine”; see “Methods“). We broadly identified 290 differentially expressed genes in patients and healthy volunteers using a supervised Statistical Analysis of Microarrays (SAM) approach (Fig. 1B, Supplementary Fig. 1 and Supplementary Tables 1A, B). Among these 290 genes, 169 (dubbed as CoV-Up-gene signature) were highly expressed, and 121 highly reduced in patients compared to healthy volunteers (dubbed CoV-Down-gene signature; Fig. 1B and Supplementary Table 1B).

We further explored our gene signatures in SARS-CoV-2-infected patients using a very limited sample size of three COVID-19 patients’ and three healthy volunteers’ PBMC from another published study by Xiong et al.17. Irrespective of different diseases (although related) and platforms, we observed that our CoV-Up-gene signature from SARS-CoV was higher in COVID-19 patients’ PBMC than in the healthy volunteers (see “Methods”). In contrast, CoV-Down-gene signature was higher in healthy volunteers and lower in COVID-19 patients. This result suggests that our gene signatures from SARS-CoV may be applicable to COVID-19 patients (Fig. 1C, D and Supplementary Table 1C).

When our CoV-Up-gene signature was analyzed using PBMC samples18,19 (n = 213) from patients infected with bacteria and influenza, we observed a broadly similar pattern (Fig. 1E, Supplementary Fig. 2A and Supplementary Table 1D, E). The reciprocal analyses with CoV-Down-gene signature was higher in PBMC from healthy volunteers than from SARS-CoV and other microbe-infected patients (Fig. 1F, Supplementary Fig. 2B and Supplementary Table 1D, E). Again, we performed enrichment analysis (hypergeometric test) using MSigDB’s C7 immune signature20 and found that 43% (60 out of 138 genes) of the signatures that were derived from PBMC (mostly associated with specific diseases) were significantly (false discovery rate (FDR) < 0.2) enriched for our CoV-Up-gene signature (Supplementary Fig. 3). Overall, the results suggest that CoV-Up-gene signature represents primarily diseased PBMC.

PBMC gene signatures may distinguish disease progression—acute-like vs. recovering-like coronavirus-infected patients

Next, we sought to assess the potential of our CoV-Up-gene signature to stratify patients into those at different progression stages of the disease (progression): acute vs. recovering. For this, we used PBMC transcriptome data from a limited size of 44 samples from the longitudinal collection over the disease course from a published study (validation data; ref. 19). Lee et al. defined acute samples (n = 25) as those that tested positive (using blood) for SARS-CoV during hospitalization or within 10 days of onset of the disease in patients. The samples derived from acute patients were also correlated with disease severity including an increased clinical pulmonary infection score19,21. The remaining samples were labeled as recovering samples (n = 19). Hence, acute vs. recovering samples refer to different stages of the disease that can occur in the same patient as specific samples were collected from the same individual during hospitalization.

Interestingly, disease phase appeared to be associated with our identified CoV expression signatures in samples from Lee et al.19 with recovering patients showing an intermediate signature between acute phase and healthy donors (Fig. 1E). We then applied a gene signature (39 genes with known gene symbols) for distinguishing acute from recovering patients identified by Lee et al.19 back to our training data16 using NTP analysis (see “Methods”)22. This resulted in 7 of the 10 patient samples showing maximal similarity to acute phase (termed acute-like patients), while 3 samples scored similar to recovering patients (termed recovering-like patients; measured as signature-based cosine distance; see “Methods”; Fig. 1G and Supplementary Table 1F). Our results therefore suggest that CoV signatures may distinguish different disease stages.

Gene expression of key cytokines and other genes represented in PBMCs in acute-like vs. recovering-like coronavirus patients

We next analyzed the expression patterns of genes associated with coronavirus infection in PBMCs in predicted acute-like and recovering-like using SARS-CoV-infected patients and healthy volunteers (from training data). We compared the expression levels of cytokines namely IL6 and TNF and other genes such as ACE2 and lactate dehydrogenases in PBMC from infected patients and healthy individuals, which are known to be expressed in circulating monocytes and macrophages after viral infection23,24. Although increased expression level of IL6 in acute-like patients compared to healthy individuals was observed, there was no significant difference between acute-like and recovering-like patients (Fig. 2A). Interestingly, TNF was highly expressed in acute-like patients compared to recovering-like patients and healthy individuals (Fig. 2B). Besides, analysis of ACE2 and subunits of lactate dehydrogenase (LDHA and LDHB; associated with hypoxia) genes showed LDHB was highly expressed in healthy individuals compared to coronavirus-infected patients, and inverse trends were observed for ACE2 and LDHA (Fig. 2C, E). Among these five genes, TNF was the only gene that showed differential expression between acute-like and recovering-like patients. The results show increased expression of these key coronavirus infection associated genes in patient PBMC samples.

Fig. 2: Significantly represented key genes at different progression stages of coronavirus infection.
figure 2

A–E Expression levels of genes: IL6 (A), TNF (B), LDHB (C), ACE2 (D), and LDHA (E) in acute-like and recovering-like SARS patients and healthy individuals. Blue asterisk (*) represents Kruskal–Wallis nominal p value across all three groups and red asterisk (*) represent Welch two-sample t-test p value between patients and healthy individuals. *p < 0.05; **p < 0.001; ***p < 0.0001 and p < 0.08. Multiple testing was not done due to low sample size.

We further examined multiple other candidate genes that act as chemoattractants to monocytes and macrophages, specifically those that interfere with innate and adaptive immunity and viral replication24. Among those genes, we observed CXCL8 (IL8) and CCL13, which are associated with chemoattraction of neutrophils/macrophages (innate immunity), to be highly expressed in acute-like patients, compared to recovering-like patients and healthy individuals (Fig. 3A). On the other hand, OAS2 and IL16 associated with T cells (adaptive immunity) and inhibition of viral replication were highly expressed in recovering-like patients and healthy individuals (Fig. 3B). These results suggest that PBMC from potential acute-like patients may be associated with the activity of innate immunity, whereas PBMC from recovering-like patients may be associated with an adaptive immune profile.

Fig. 3: Highly represented signaling pathways/processes in coronavirus-infected patients.
figure 3

A, B Expression levels of genes related to immune cell chemoattractants CXCL8 and CCL13 (A), genes involved in T cells and suppression of viral replication OAS2 and IL16 (B) in acute-like and recovering-like coronavirus-infected patients and healthy individuals. Blue asterisk (*) represents Kruskal–Wallis nominal p value across all three groups and red asterisk (*) represent Welch two-sample t-test p value between patients and healthy individuals. *p < 0.05; **p < 0.001; ***p < 0.0001 and p < 0.08. Multiple testing was not done due to low sample size. C Enrichment statistical analysis (hypergeometric test using hypeR57) using CoV-Up-gene signature and pathways/processes based on gene sets from MSigDB’s HALLMARKS database20.

Enrichment of TNF-alpha, IL6, and hypoxia-related pathways in PBMC of coronavirus patients

Based on the expression levels of key genes, including TNF, we next set out to explore the functional implications of the CoV-Up-gene signature. To do so, we performed enrichment analysis using the genes in CoV-Up-gene signature and MSigDB’s hallmark gene set database20 (Fig. 3C). This revealed multiple highly ranked pathways involved in cytokine storm and acute infection including TNF signaling, IL6 (IL6-JAK-STAT3) and IL2 (IL2-STAT5) signaling, inflammatory response, and KRAS/MTOR and late responses to estrogen pathways (Fig. 3C and Supplementary Table 2A). This is consistent with clinical manifestations including observations of high IL6 levels in COVID-19 patients25.

Interestingly, outside of the inflammatory response, multiple pathways related to hypoxia, angiogenesis, and oxygen transport (heme/iron) were also implicated, consistent with the oxygen limitation26 experienced during coronavirus infection (Fig. 3C). Of particular interest was the enrichment of complement and coagulation pathways which may explain the high hypercoagulability observed in COVID-19 patients27 and may represent one of the key pathological mechanisms of the virus. Enrichment of the apoptotic pathway may have relevance to cell death of lymphocytes (Fig. 3C) as suggested elsewhere17. These enriched pathways and processes may be linked together and may represent an association with COVID-19 infection in these patients.

Potential role for a network of related pathways representing cytokine storm and innate immune changes in PBMC of coronavirus-infected patients

Given that different pathways were enriched in infected patients, next, we interrogated how these pathways are linked together to convey a network of processes or changes at the cellular level. Hence, we used the REACTOME pathway database28 to connect different but related pathways that were enriched in infected patients using network analysis. Two evident and distinct networks were significantly enriched in CoV-Up-gene signature: (a) interleukins and cytokine signaling (potentially representing cytokine storm) and (b) neutrophils and innate immunity (Fig. 4A and Supplementary Table 2B). Nevertheless, we observed an increased enrichment of a unique network linking granulopoiesis, megakaryocyte differentiation, and platelet activation (Fig. 4A). This may be linked to coagulation system that can activate the innate immune system (e.g. monocytes/macrophages) to produce TNF29. Nevertheless, this requires further understanding. These may suggest innate immune system activation with potential cytokine storm in coronavirus patients.

Fig. 4: Highly represented related networks of molecular, cellular, and development pathways show cytokine (storm) network and innate immunity in coronavirus-infected patients.
figure 4

A REACTOME28 database-based connection of different but related pathways that were enriched (hypergeometric test of FDR < 0.2 and overlap a similar index of 0.5 using hypeR57) in infected patients using network plots. B KEGG60 pathways-based network showing enrichment (hypergeometric test of FDR < 0.2 and overlap a similar index of 0.25 using hypeR57) of different diseases and infection related pathways. For A and B, nodes and edges are of same size and length, respectively. C Enrichment statistical analysis (hypergeometric test of FDR < 0.2 using hypeR57) using CoV-Up-gene signature and pathways/processes based on gene sets from COMPARTMENTS database61.

Kyoto Encyclopedia of Genes and Genomes (KEGG)-based extensive analysis of network of molecular pathways suggested further understanding of functions and utilities at the levels of cells and organisms. Our KEGG analysis with CoV-Up-gene signature overlapped with different infections, including Helicobacter pylori and leishmania (Fig. 4B and Supplementary Table 2C). This suggests that coronavirus infection may share disease conditions with these infections. Interestingly, we also noted enrichment of systemic lupus erythematosus (SLE)-related genes, which is a chronic disease associated with inflammation in connective tissues and affects multiple organs, including the blood-forming system30,31. This suggests that coronavirus infection and SLE may be related to each other with respect to disease symptoms. The prevalence and risk factors of severe COVID-19 in SLE patients remain unclear30,31. The understanding at the systemic level and additional clinical reports are needed to know whether certain COVID-19 disease conditions may be associated with SLE. A more relevant and well-known individual pathway associated with this disease is the renin–angiotensin system associated with ACE2 function (Fig. 4B). There is also an enrichment of RIG-I-like receptor signaling pathway representing potential anti-viral event through pathogen-associated molecular patterns32. Multiple chemokine/cytokine and metabolism pathways were enriched as a part of the CoV-Up-gene signature in the KEGG database (Fig. 4B).

Changes in subcellular regulatory networks associated with PBMC of coronavirus-infected patients

While these genes to network processes provide information related to coronavirus infection, we were interested in investigating the next level in the hierarchy and the potential subcellular interaction networks that may inform viral interaction within host cells. Interestingly, cell–cell adhesion processes, secretory granules, vesicles, and exosomes spanning plasma membrane and lipid complexes and cytoplasm were enriched in CoV-Up-gene signature (Fig. 4C), suggesting that this may indicate the viral infection of monocytes. Furthermore, these were related to the enrichment of fatty acid synthase complex that is known to be involved in the plasma membrane and vesicle formation33 (Fig. 4C and Supplementary Table 2D). The data also suggests potential interaction of the virus with host immune cells through cell–cell adhesion processes, which requires further understanding.

Nonetheless, the host-specific subcellular changes in PBMC are also evident from this analysis. An increased replication and proliferation of potential host cells, mainly involving the innate immune system, may be evident based on the enrichment of genes associated with DNA polymerase processivity factor and proliferating cell nuclear antigen complex. Also, the production of immunoglobin complexes along with NFkB complex was higher in the patient gene signature, again, representing potentially increased immune responses. At the same time, the host’s potential responses to death signals associated with BCL2 complex are also enriched in this analysis. Again, this potentially represents lymphocyte-related apoptosis in connection with an enriched apoptotic pathway in Fig. 3C. Neutrophil-specific S100A8/A9 complexes are also enriched (Fig. 4C). Overall, these results suggest potential PBMC-based subcellular level changes associated with the viral integration in immune cells and associated pathophysiology.

Recovery from coronavirus infection is potentially associated with increased cytolytic activity and IFN-γ but not increased B cell levels

In order to gain insight into the cellular dynamics of the SARS-CoV-2 immune response, we calculated immune signature scores based on the gene markers from Rooney et al.34 (see “Methods”). In this case, we separated the coronavirus patient samples into those with acute-like or recovering-like disease and compared these with healthy control samples. As expected, we observed that the innate immune system involving macrophages and neutrophils were highly active in the acute-like patients, suggesting that they may be the first to encounter the coronavirus, with these decreasing in recovering patients (Fig. 5A and Supplementary Table 2E).

Fig. 5: Immune cells, their activities and HLA types in progression stage-specific coronavirus-infected patients.
figure 5

A, B Immune cell scores using ssGSEA analysis and specific immune signature from Rooney et al. representing specific immune cells and their activities in acute-like (n = 7), and recovering-like (n = 3) SARS patients and control healthy individuals (n = 4) (A), and COVID-19 patients and control healthy individuals (B). FDR for A and B are <0.2. C HLA profiles in acute-like, and recovering-like SARS patients and control healthy individuals. Kruskal–Wallis statistical nominal p values for HLA-DQA1 and HLA-DOB are <0.1 and HLA-F is <0.05. Multiple testing correction was not performed due to small sample size for C (the intention of the study is to infer from publicly available limited data). D. Enrichment analysis (hypergeometric test using hypeR57) of CoV-Up-gene signature using MSigDB’s C7 immune gensets20.

Perhaps most interestingly, a significant (FDR < 0.2) increase in natural killer (NK) cells, cytolytic activity, and plasmacytoid dendritic cells (pDCs) in recovering-like patients compared to acute-like patients was observed (Fig. 5A). It is noteworthy that the absolute levels of CD8+ T cells and co-stimulating helper T cells are not different between recovering-like and acute-like patients (Fig. 5A). This result suggests that the CD8+ T cells are potentially activated (cytolytic) in the recovering patients. Certain results from SARS patients were assessed using PBMC from limited number of COVID-19 samples (n = 3; FDR < 0.2; Fig. 5B and Supplementary Table 2F). No difference in B cells between both types of patients and healthy individuals was found. Intriguingly, higher level of expression of interferon (IFN)-γ type-II gene was found in the recovering-like patients and lower levels in both acute-like patients and healthy volunteers (Fig. 5A). These results suggest that T cell responses may be pivotal in successful response to coronavirus infection, consistent with the recent study by Grifoni et al.35 which found SARS-CoV-2 reactive T cells in 70% of convalescent COVID-19 patients.

Next, we examined the differential expression of major histocompatibility complex (MHC) class-I and class-II HLA that may reflect antigen presentation to and/or activation of CD4+/CD8+ T cells, and whose levels are increased by IFN-γ (Fig. 5C). Among the MHC class-II HLA genes, two of them were lower in acute-like patients compared to recovering-like and/or healthy individuals (borderline significance with nominal p < 0.1 due to low sample size). While most of the MHC class-II HLA genes showed no difference in expression levels between the acute-like and recovering-like patients, HLA-DQA1 showed an increasing trend in the recovering-like patients towards the healthy individuals and low level in acute-like patients. Similarly, MHC class-I HLA gene, HLA-F showed a trend akin to HLA-DQA1 (Fig. 5C and p < 0.05). On the other hand, there was no change in the antigen processing machinery (data not shown). However, all these speculations from limited data warrant further validation, and their functional immunological significance is currently unclear.

To confirm these analyses of immunologic composition, we performed two additional independent analyses. Hypergeometric test-based enrichment analysis of MSigDB’s C7 immune signature20 showed similar conclusions that innate immunity and myeloid (neutrophils and macrophages) cells are upregulated in coronavirus patients (in general), whereas adaptive immunity is downregulated in these patients (Fig. 5D and Supplementary Table 2G). Similarly, analysis using the BioGPS database—gene sets36 demonstrated an increased enrichment of CD33+ myeloid and CD14+ monocytes associated with upregulated genes in our CoV expression signature (Fig. 6A and Supplementary Table 2H). In contrast, CD8+ and CD4+ T cells showed up along with enrichment of CD56+ NK cells and CD19+ B in the CoV-Down-gene signature (Fig. 6B and Supplementary Table 2I). Compared to healthy individuals, these cells were under-represented in CoV-Up-gene signature or lower in coronavirus-infected patients (Fig. 6B).

Fig. 6: PBMC-based gene signatures show association with subset of immune cells in coronavirus-infected patients.
figure 6

A, B Enrichment (hypergeometric test using hypeR57) of subsets of immune cell genes in multiple BioGPS—gene portal system36 using CoV-Up-gene signature (A) and CoV-Down-gene signature (B).

Coronavirus-infected patients’ PBMC reveals genes that overlap with known diseases and disease conditions

Based on information in Supplementary Fig. 3, we reasoned that the disease conditions from coronavirus infection may be similar to other immune-related diseases. To perform this, we applied enrichment analysis to study the overlap of genes between CoV-Up-gene signature and other diseases and disease conditions. We found that CoV-Up-gene signature was enriched for various immune-related diseases, including septicemia, pneumonia, lung disease, arthritis, cystic fibrosis, thalassemia, pre-eclampsia, bacterial infections, asthma, acute coronary syndrome, and others (Fig. 7A and Supplementary Table 2J). The overlap of CoV-Up-gene signature and those genes from selected diseases—septicemia, pneumonia, lung diseases, arthritis, and cystic fibrosis—are shown in Fig. 7B. These results suggest that the diseases and disease conditions due to coronavirus may be complex and highly variable and may affect differently in patients with pre-existing disease conditions as recently reported7, which warrants further systematic investigation.

Fig. 7: PBMC-based gene signature identifies links to known and novel diseases and disease conditions in coronavirus-infected patients.
figure 7

A, B Enrichment analysis (hypergeometric test using hypeR57) of disease-based gene sets from DisGeNET62 (A). Top five diseases and associated PBMC genes (B).

Discussion

The clinical course of COVID-19 patients remains enigmatic, and no treatment options exist with proven efficacy37. The variety of clinical presentations of this disease has alarmed healthcare providers across the globe. The rampant spread of COVID-19 during asymptomatic stage is attributed to the high SARS-CoV-2 viral shedding in the upper respiratory tract38. We reasoned that the blood, in real-time, may reflect changes occurring in immune and other cells and potentially infected tissues in PBMC, thereby acting as a potential remote biosensor of highly complex system-wide changes. There is no systematic study performed to our knowledge that attempts to use PBMC samples to understand the system-wide changes along with the disease symptoms in COVID-19 patients. This type of study will have the strength to distinguish systemic changes during acute and recovering stages of the patient’s infection. This proof-of-concept study, in the future with further refinement and extensive validation, may support the development of personalized prognostic biomarkers (not the focus of the current study) and may provide the opportunity to save patients who are most likely to die of the disease.

In this proof-of-concept study, we performed a comprehensive analysis using publicly available blood cell RNA profiles from SARS and COVID-19 patients and cross-validated with patients with other infections or healthy individuals. Using IS approaches in our COVID-engine platform, we were able to develop a blood cell RNA profile-based gene signatures that are differentially expressed at different stages of infection (acute vs. recovering). In addition, our COVID-engine platform provided hierarchical and comprehensive analysis describing infection-associated changes in genes, pathways, networks, subcellular components, and cells, covering almost the whole system. This “integrated” analysis can help understand which other disease-related symptoms could manifest in COVID-19 patients.

Our results, using limited coronavirus-infected patient samples, represent that the innate immune system associated with increased neutrophils, macrophages, and monocytes with potential cytokine storm (including the expression of IL6, TNF, IL8 (CXCL8) and CCL13) is high in CoV-Up-gene signature and specifically in acute-like patients. Macrophages and monocytes are known to serve as factories for viral replication in other disease conditions24. These changes in immune cells may also be connected to increased neutrophil counts in these patients39. The cytokine storm-related to innate immune changes may be linked to changes in angiogenesis and coagulation, suggesting a potential relationship between inflammation, thromboembolism40, and coagulation41. While there is no change in overall CD8+ T cell population between patients in their acute-like vs. recovering-like stages (that may, however, attribute to the sample size), the change in cytolytic activity, pDCs, and NK cells suggests that the adaptive immunity is a late event represented in patients recuperating from this disease42. Congruently, this is associated with lower innate immunity in recovering-like patients than acute-like patients and associated with increased expression of anti-viral genes OAS2 and IL16.

Similarly, the higher expression levels of MHC Class-I HLA-F gene, which is known to be associated with the interaction between CD4 T cells and NK cells to inactivate human immunodeficiency virus (HIV)43, in recovering-like patients suggests the anti-viral effect in these individuals. Remarkably, the supposedly anti-viral CD4 T and NK cells, along with B cells, are low in coronavirus patients and are associated with low HLA-DR-expressing monocytes in these patients with severe respiratory failure25,44,45. This report corroborates with our results that HLA-DRA gene and all the above three cell types are low in acute-like patients. Specifically, B cell-based adaptive immunity seems to vary among patients and mostly low in coronavirus patients with severe respiratory failure25,46. Our data suggest that this may impact the development of effective vaccines for this infection. In addition, CoV-Up-gene signature was high in methicillin-resistant Staphylococcus aureus (Supplementary Fig. 2), which, similar to coronavirus, colonizes upper respiratory track and causes pneumonia. It is interesting to note that there is no vaccine for S. aureus infection47. While there are more changes in T cells than B cells, it may be interesting to consider T cell therapy for COVID-19 patients48. In our study, there are disease links and potential comorbidities (Fig. 7) that has evidence from recent reports7. Further refinement of our current COVID-engine platform may help to identify hidden/silent pre-existing symptoms and develop an effective personalized COVID-19 treatment strategy using PBMC.

There are a number of questions that arose from this study that could be of relevance in tackling the current pandemic. How does coronavirus infection downregulate the adaptive immune system? Is the dysregulation of the immune system described causally linked with clinical outcomes? Does the dysregulated immune system alert the body to respond, and how? The dysregulated immune system could alert the host response to produce an active adaptive immune response, which typically takes 10–14 days. Our findings may suggest that individuals with activated, appropriate immune responses, especially with increased INF-γ type-II responses and cytolytic activity, which may also serve as biomarkers with further validation, maybe on their way to recovery from symptoms. Moreover, those patients’ incapable of such progression may have multiorgan failures that may be represented in our data.

Although our study may be timely for the current pandemic, there are limitations. We have performed analysis using publicly available small number of coronavirus-based training (n = 10) and test (n = 47) samples from less annotated datasets with limited clinical data, which may be appreciated provided the current global lock-down scenario. Also, the acute-like and recovering-like patients may overlap with the symptomatic and asymptomatic patients, respectively, described in the original publication from where the training dataset was derived16. Further validation was curtailed due to lack of associated clinical data, which is difficult to obtain in the current scenario.

In conclusion, PBMC has information related to infection status, immune states, disease progression, severity, and disease conditions that are likely going to be manifested due to coronavirus infection and COVID-19 disease (Fig. 8).

Fig. 8: COVID-engine workflow.
figure 8

Schematic summarizing the gene signature and their relevance at systems level to disease progression stages, and disease conditions for personalized COVID-19 medicine.

Methods

Data sources

PBMC transcriptomes from SARS, COVID-19, and other patients were obtained from published studies16,17,18,19 with GEO Omnibus identifiers—GSE1739 and GSE6269, and normalized data directly from the original publications. Raw.CEL files from Affymetrix Human HG-Focus Target Array for GSE1739 were obtained from the authors of the original publication16 and robust microarray analysis-based normalization was performed using R-based Bioconductor package—affy49. GEOquery R package50 was used to obtained gene expression and/or phenotypic data from GEO Omnibus for GSE6269.

Overall analysis strategy

COVID-engine refers to a pipeline of different IS methods described below.

Differential PBMC gene expression analysis and validation

PBMC-based differential gene signatures between healthy volunteers and coronavirus-infected patients were selected by performing supervised SAM51 using R-based siggenes package52. Differentially expressed genes with FDR < 0.05 were chosen as described in our previous publications53,54,55. Gene scores for signatures up or downregulated in coronavirus-infected patients in other validation datasets were derived using ssGSEA20 using R scripts from GenePattern platform56.

Enrichment and other IS analysis for systems level understanding of the disease

Briefly, the differentially expressed gene expression signatures from PBMC of patients infected with coronavirus were used to query multiple databases for meta-signatures such as pathways, mechanistic processes, and their associated networks that are connected with different disease manifestations by enrichment analysis-based IS. These gene signatures and meta-signatures were further systematically linked to wire the pathophysiology in patients, again, in a hierarchical fashion, from cells to whole organism level (Fig. 1A).

Hypergeometric gene enrichment analysis for most of the studies were performed using R-based hypeR tool57. Disease conditions-based enrichment analysis was performed using R-based Dose package58. Nearest Template Prediction22 using R-based tool from GenePattern platform56 was used to derive distance between two signatures—acute-like vs. recovering-like patients. Different gene set databases were from downloaded from EnrichR59 and MSigDB20. Immune gene sets were from Rooney et al.34. Additional gene sets and databases used in this study are: REACTOME28, KEGG60, COMPARTMENTS database61, BioGPS—gene portal system36 and DisGeNE62.

Detailed information regarding intermediate data from COVID-engine are provided in Supplementary Tables 1 and 2. The number of samples used in this exploratory study was determined by public availability of transcriptomic datasets and for the need of the hour. Kruskal–Wallis and t-test statistical analysis were performed where appropriate.