A blood transcriptome-based analysis of disease progression, immune regulation, and symptoms in coronavirus-infected patients

COVID-19 patients show heterogeneity in clinical presentation and outcomes that makes pandemic control and strategy difficult; optimizing management requires a systems biology approach of understanding the disease. Here we sought to potentially understand and infer complex disease progression, immune regulation, and symptoms in patients infected with coronaviruses (35 SARS-CoV and 3 SARS-CoV-2 patients and 57 samples) at two different disease progression stages. Further, we compared coronavirus data with healthy individuals (n = 16) and patients with other infections (n = 144; all publicly available data). We applied inferential statistics (the COVID-engine platform) to RNA profiles (from limited number of samples) derived from peripheral blood mononuclear cells (PBMCs). Compared to healthy individuals, a subset of integrated blood-based gene profiles (signatures) distinguished acute-like (mimicking coronavirus-infected patients with prolonged hospitalization) from recovering-like patients. These signatures also hierarchically represented multiple (at the system level) parameters associated with PBMC including dysregulated cytokines, genes, pathways, networks of pathways/concepts, immune status, and cell types. Proof-of-principle observations included PBMC-based increases in cytokine storm-associated IL6, enhanced innate immunity (macrophages and neutrophils), and lower adaptive T and B cell immunity in patients with acute-like disease compared to those with recovery-like disease. Patients in the recovery-like stage showed significantly enhanced TNF, IFN-γ, anti-viral, HLA-DQA1, and HLA-F gene expression and cytolytic activity, and reduced pro-viral gene expression compared to those in the acute-like stage in PBMC. Besides, our analysis revealed overlapping genes associated with potential comorbidities (associated diabetes) and disease-like conditions (associated with thromboembolism, pneumonia, lung disease, and septicemia). Overall, our COVID-engine inferential statistics platform and study involving PBMC-based RNA profiling may help understand complex and variable system-wide responses displayed by coronavirus-infected patients with further validation.


Introduction
The spread of COVID-19, a disease caused by severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2), has led to the current global pandemic with already millions of people with confirmed infection and more than one million deaths 1 . According to the World Health Organization (WHO), the mode of infection for COVID-19 is predominantly through respiratory droplets, aerosol transmission due to pathogen-laden viral particles in the air, or close contact with infected people with increased viral loads, especially in the early stages of disease 2 . The mechanism of human pathogenesis, to a great extent, may simulate that of SARS-CoV (associated with SARS) and Middle East respiratory syndrome coronaVirus (MERS-CoV; associated with MERS) viral infections, including the prolonged persistence of the virus worsening the host immune response [3][4][5] . Clinical manifestation of COVID-19 ranges from mild respiratory symptoms to severe disease and death 6 . However, there are now reports suggesting heterogeneous manifestation of the disease affecting multiple organs, including kidney, liver, and brain 7 . Although age and compromised health history are considered critical prognostic factors, certain patients of younger age and good health have shown severe progression of the disease 8 .
Patients with COVID-19 may be asymptomatic 9,10 , but can still transmit infection 11 . Viral shedding from an infected person may occur, although resolution of symptoms 12 , and relapse has been reported despite consecutive negative testing 13 . Currently, there are no approved therapies for COVID-19 respiratory symptoms 14,15 or way to screen for disease progression, and system-wide changes in patients.
In this study, we sought to understand and infer changes in coronavirus-infected patients at system levels through their peripheral blood mononuclear cells (PBMC) by performing comparisons with healthy volunteers by applying inferential statistics (IS). The inferences at the multiple levels of the system in patients provide an efficient way of understanding the heterogeneity and mechanism(s) of disease manifestation as a whole (Fig. 1A). These inferences can be derived systematically in a hierarchical fashion from the level of gene signatures to the whole organism, to study the pathophysiology of patients with coronavirus infection. Moreover, the patients' response to insults from the virus along with other associated disease conditions can be studied.

PBMC-based gene expression profiles identify distinct gene signatures in coronavirus-infected patients and healthy volunteers
To perform an integrative and systematic analysis of heterogeneous patients' responses to the coronavirus infection, we used a limited RNA transcriptome data from PBMC of SARS-CoV patients (n = 10) and healthy volunteers (n = 4) from a published study 16 (training data). We identified PBMC genes differentially expressed between patients and healthy volunteers by applying our in-house developed IS pipeline ("COVID-engine"; see "Methods"). We broadly identified 290 differentially expressed genes in patients and healthy volunteers using a supervised Statistical Analysis of Microarrays (SAM) approach (Fig. 1B, Supplementary Fig. 1 and Supplementary Tables 1A, B). Among these 290 genes, 169 (dubbed as CoV-Up-gene signature) were highly expressed, and 121 highly reduced in patients compared to healthy volunteers (dubbed CoV-Down-gene signature; Fig. 1B and Supplementary Table 1B).
We further explored our gene signatures in SARS-CoV-2-infected patients using a very limited sample size of three COVID-19 patients' and three healthy volunteers' PBMC from another published study by Xiong et al. 17 . Irrespective of different diseases (although related) and platforms, we observed that our CoV-Up-gene signature from SARS-CoV was higher in COVID-19 patients' PBMC than in the healthy volunteers (see "Methods"). In contrast, CoV-Down-gene signature was higher in healthy volunteers and lower in COVID-19 patients. This result suggests that our gene signatures from SARS-CoV may be applicable to COVID-19 patients (Fig. 1C, D and Supplementary Table 1C).
When our CoV-Up-gene signature was analyzed using PBMC samples 18,19 (n = 213) from patients infected with bacteria and influenza, we observed a broadly similar pattern (Fig. 1E, Supplementary Fig. 2A and Supplementary Table 1D, E). The reciprocal analyses with CoV-Down-gene signature was higher in PBMC from healthy volunteers than from SARS-CoV and other microbeinfected patients (Fig. 1F, Supplementary Fig. 2B and Supplementary Table 1D, E). Again, we performed enrichment analysis (hypergeometric test) using MSigDB's C7 immune signature 20 and found that 43% (60 out of 138 genes) of the signatures that were derived from PBMC (mostly associated with specific diseases) were significantly (false discovery rate (FDR) < 0.2) enriched for our CoV-Up-gene signature ( Supplementary Fig. 3). Overall, the results suggest that CoV-Up-gene signature represents primarily diseased PBMC.
PBMC gene signatures may distinguish disease progression-acute-like vs. recovering-like coronavirusinfected patients Next, we sought to assess the potential of our CoV-Upgene signature to stratify patients into those at different progression stages of the disease (progression): acute vs. recovering. For this, we used PBMC transcriptome data from a limited size of 44 samples from the longitudinal collection over the disease course from a published study (validation data; ref. 19 ). Lee et al. defined acute samples (n = 25) as those that tested positive (using blood) for SARS-CoV during hospitalization or within 10 days of onset of the disease in patients. The samples derived from acute patients were also correlated with disease severity including an increased clinical pulmonary infection score 19,21 . The remaining samples were labeled as recovering samples (n = 19). Hence, acute vs. recovering samples refer to different stages of the disease that can occur in the same patient as specific samples were collected from the same individual during hospitalization.
Interestingly, disease phase appeared to be associated with our identified CoV expression signatures in samples from Lee et al. 19 with recovering patients showing an intermediate signature between acute phase and healthy donors (Fig. 1E). We then applied a gene signature (39 genes with known gene symbols) for distinguishing acute from recovering patients identified by Lee et al. 19 back to our training data 16 using NTP analysis (see "Methods") 22 . This resulted in 7 of the 10 patient samples showing maximal similarity to acute phase (termed acute-like patients), while 3 samples scored similar to recovering patients (termed recovering-like patients; measured as signature-based cosine distance; see "Methods"; Fig. 1G and Supplementary Table 1F). Our results therefore suggest that CoV signatures may distinguish different disease stages.
Gene expression of key cytokines and other genes represented in PBMCs in acute-like vs. recovering-like coronavirus patients We next analyzed the expression patterns of genes associated with coronavirus infection in PBMCs in predicted acute-like and recovering-like using SARS-CoVinfected patients and healthy volunteers (from training data). We compared the expression levels of cytokines namely IL6 and TNF and other genes such as ACE2 and lactate dehydrogenases in PBMC from infected patients and healthy individuals, which are known to be expressed in circulating monocytes and macrophages after viral infection 23,24 . Although increased expression level of IL6 in acute-like patients compared to healthy individuals was observed, there was no significant difference between acute-like and recovering-like patients ( Fig. 2A). Interestingly, TNF was highly expressed in acute-like patients compared to recovering-like patients and healthy individuals (Fig. 2B). Besides, analysis of ACE2 and subunits of lactate dehydrogenase (LDHA and LDHB; associated with hypoxia) genes showed LDHB was highly expressed in healthy individuals compared to coronavirus-infected patients, and inverse trends were observed for ACE2 and LDHA (Fig. 2C, E). Among these five genes, TNF was the only gene that showed differential expression between acute-like and recovering-like patients. The results show increased expression of these key coronavirus infection associated genes in patient PBMC samples.
We further examined multiple other candidate genes that act as chemoattractants to monocytes and macrophages, specifically those that interfere with innate and adaptive immunity and viral replication 24 . Among those genes, we observed CXCL8 (IL8) and CCL13, which are associated with chemoattraction of neutrophils/macrophages (innate immunity), to be highly expressed in acute-like patients, compared to recovering-like patients and healthy individuals (Fig. 3A). On the other hand, OAS2 and IL16 associated with T cells (adaptive immunity) and inhibition of viral replication were highly expressed in recovering-like patients and healthy individuals (Fig. 3B). These results suggest that PBMC from potential acute-like patients may be associated with the activity of innate immunity, whereas PBMC from recovering-like patients may be associated with an adaptive immune profile.
Enrichment of TNF-alpha, IL6, and hypoxia-related pathways in PBMC of coronavirus patients Based on the expression levels of key genes, including TNF, we next set out to explore the functional implications of the CoV-Up-gene signature. To do so, we performed enrichment analysis using the genes in CoV-Upgene signature and MSigDB's hallmark gene set database 20 (Fig. 3C). This revealed multiple highly ranked pathways involved in cytokine storm and acute infection including TNF signaling, IL6 (IL6-JAK-STAT3) and IL2 (IL2-STAT5) signaling, inflammatory response, and KRAS/MTOR and late responses to estrogen pathways ( Fig. 3C and Supplementary Table 2A). This is consistent with clinical manifestations including observations of high IL6 levels in COVID-19 patients 25 .
Interestingly, outside of the inflammatory response, multiple pathways related to hypoxia, angiogenesis, and oxygen transport (heme/iron) were also implicated, consistent with the oxygen limitation 26 17 ). For C and D statistical significance was not considered due to low sample size. E, F. CoV-Up-gene signature scores (E) and CoV-Down-gene signature scores (F) and their association with acute and recovering SARS-CoV-infected patients (n = 25; 44 samples), bacteria-infected patients (n = 16), and healthy individuals (n = 9; datasets from Lee et al. 19 ). Kruskal-Wallis statistical test with p < 0.001 for E and F. G Acute vs. recovering gene signature from Lee et al. 19 predicted acute-like and recovering-like SARS-CoV-infected patients (n = 10) from Raghunathan et al. 16 using NTP method and cosine distance measure.
apoptotic pathway may have relevance to cell death of lymphocytes (Fig. 3C) as suggested elsewhere 17 . These enriched pathways and processes may be linked together and may represent an association with COVID-19 infection in these patients.
Potential role for a network of related pathways representing cytokine storm and innate immune changes in PBMC of coronavirus-infected patients Given that different pathways were enriched in infected patients, next, we interrogated how these pathways are linked together to convey a network of processes or changes at the cellular level. Hence, we used the REAC-TOME pathway database 28 to connect different but related pathways that were enriched in infected patients using network analysis. Two evident and distinct networks were significantly enriched in CoV-Up-gene signature: (a) interleukins and cytokine signaling (potentially representing cytokine storm) and (b) neutrophils and innate immunity ( Fig. 4A and Supplementary Table 2B). Nevertheless, we observed an increased enrichment of a unique network linking granulopoiesis, megakaryocyte differentiation, and platelet activation (Fig. 4A). This may be linked to coagulation system that can activate the innate immune system (e.g. monocytes/macrophages) to produce TNF 29 . Nevertheless, this requires further understanding. These may suggest innate immune system activation with potential cytokine storm in coronavirus patients.
Kyoto Encyclopedia of Genes and Genomes (KEGG)based extensive analysis of network of molecular , and LDHA (E) in acute-like and recovering-like SARS patients and healthy individuals. Blue asterisk (*) represents Kruskal-Wallis nominal p value across all three groups and red asterisk (*) represent Welch two-sample t-test p value between patients and healthy individuals. *p < 0.05; **p < 0.001; ***p < 0.0001 and ■ p < 0.08. Multiple testing was not done due to low sample size. Fig. 3 Highly represented signaling pathways/processes in coronavirus-infected patients. A, B Expression levels of genes related to immune cell chemoattractants CXCL8 and CCL13 (A), genes involved in T cells and suppression of viral replication OAS2 and IL16 (B) in acute-like and recovering-like coronavirus-infected patients and healthy individuals. Blue asterisk (*) represents Kruskal-Wallis nominal p value across all three groups and red asterisk (*) represent Welch two-sample t-test p value between patients and healthy individuals. *p < 0.05; **p < 0.001; ***p < 0.0001 and ■ p < 0.08. Multiple testing was not done due to low sample size. C Enrichment statistical analysis (hypergeometric test using hypeR 57 ) using CoV-Up-gene signature and pathways/processes based on gene sets from MSigDB's HALLMARKS database 20 . pathways suggested further understanding of functions and utilities at the levels of cells and organisms. Our KEGG analysis with CoV-Up-gene signature overlapped with different infections, including Helicobacter pylori and leishmania ( Fig. 4B and Supplementary Table 2C). This suggests that coronavirus infection may share disease conditions with these infections. Interestingly, we also noted enrichment of systemic lupus erythematosus (SLE)related genes, which is a chronic disease associated with inflammation in connective tissues and affects multiple organs, including the blood-forming system 30,31 . This suggests that coronavirus infection and SLE may be related to each other with respect to disease symptoms. The prevalence and risk factors of severe COVID-19 in SLE patients remain unclear 30,31 . The understanding at the systemic level and additional clinical reports are needed to know whether certain COVID-19 disease conditions may be associated with SLE. A more relevant and well-known individual pathway associated with this disease is the renin-angiotensin system associated with ACE2 function (Fig. 4B). There is also an enrichment of RIG-I-like receptor signaling pathway representing potential anti-viral event through pathogen-associated molecular patterns 32 . Multiple chemokine/cytokine and metabolism pathways were enriched as a part of the CoV-Up-gene signature in the KEGG database (Fig. 4B).

Changes in subcellular regulatory networks associated with PBMC of coronavirus-infected patients
While these genes to network processes provide information related to coronavirus infection, we were interested in investigating the next level in the hierarchy and the potential subcellular interaction networks that may inform viral interaction within host cells. Interestingly, cell-cell adhesion processes, secretory granules, vesicles, and exosomes spanning plasma membrane and lipid complexes and cytoplasm were enriched in CoV-Up-gene signature (Fig. 4C), suggesting that this may indicate the viral infection of monocytes. Furthermore, these were related to the enrichment of fatty acid synthase complex that is known to be involved in the plasma membrane and vesicle formation 33 (Fig. 4C and Supplementary Table  2D). The data also suggests potential interaction of the virus with host immune cells through cell-cell adhesion processes, which requires further understanding.
Nonetheless, the host-specific subcellular changes in PBMC are also evident from this analysis. An increased replication and proliferation of potential host cells, mainly involving the innate immune system, may be evident based on the enrichment of genes associated with DNA polymerase processivity factor and proliferating cell nuclear antigen complex. Also, the production of immunoglobin complexes along with NFkB complex was higher in the patient gene signature, again, representing potentially increased immune responses. At the same time, the host's potential responses to death signals associated with BCL2 complex are also enriched in this analysis. Again, this potentially represents lymphocyte-related apoptosis in connection with an enriched apoptotic pathway in Fig.  3C. Neutrophil-specific S100A8/A9 complexes are also enriched (Fig. 4C). Overall, these results suggest potential PBMC-based subcellular level changes associated with the viral integration in immune cells and associated pathophysiology.

Recovery from coronavirus infection is potentially associated with increased cytolytic activity and IFN-γ but not increased B cell levels
In order to gain insight into the cellular dynamics of the SARS-CoV-2 immune response, we calculated immune signature scores based on the gene markers from Rooney et al. 34 (see "Methods"). In this case, we separated the coronavirus patient samples into those with acute-like or recovering-like disease and compared these with healthy control samples. As expected, we observed that the innate immune system involving macrophages and neutrophils were highly active in the acute-like patients, suggesting that they may be the first to encounter the coronavirus, with these decreasing in recovering patients ( Fig. 5A and Supplementary Table 2E).
Perhaps most interestingly, a significant (FDR < 0.2) increase in natural killer (NK) cells, cytolytic activity, and plasmacytoid dendritic cells (pDCs) in recovering-like patients compared to acute-like patients was observed (Fig.  5A). It is noteworthy that the absolute levels of CD8 + T cells and co-stimulating helper T cells are not different between recovering-like and acute-like patients (Fig. 5A). This result suggests that the CD8 + T cells are potentially activated (cytolytic) in the recovering patients. Certain results from SARS patients were assessed using PBMC from limited (see figure on previous page) Fig. 4 Highly represented related networks of molecular, cellular, and development pathways show cytokine (storm) network and innate immunity in coronavirus-infected patients. A REACTOME 28 database-based connection of different but related pathways that were enriched (hypergeometric test of FDR < 0.2 and overlap a similar index of 0.5 using hypeR 57 ) in infected patients using network plots. B KEGG 60 pathwaysbased network showing enrichment (hypergeometric test of FDR < 0.2 and overlap a similar index of 0.25 using hypeR 57 ) of different diseases and infection related pathways. For A and B, nodes and edges are of same size and length, respectively. C Enrichment statistical analysis (hypergeometric test of FDR < 0.2 using hypeR 57 ) using CoV-Up-gene signature and pathways/processes based on gene sets from COMPARTMENTS database 61 . number of COVID-19 samples (n = 3; FDR < 0.2; Fig. 5B and Supplementary Table 2F). No difference in B cells between both types of patients and healthy individuals was found. Intriguingly, higher level of expression of interferon (IFN)-γ type-II gene was found in the recovering-like patients and lower levels in both acute-like patients and healthy volunteers (Fig. 5A). These results suggest that T cell responses may be pivotal in successful response to coronavirus infection, consistent with the recent study by Grifoni et al. 35 which found SARS-CoV-2 reactive T cells in 70% of convalescent COVID-19 patients.
Next, we examined the differential expression of major histocompatibility complex (MHC) class-I and class-II HLA that may reflect antigen presentation to and/or activation of CD4 + /CD8 + T cells, and whose levels are increased by IFN-γ (Fig. 5C). Among the MHC class-II HLA genes, two of them were lower in acute-like patients compared to recovering-like and/or healthy individuals (borderline significance with nominal p < 0.1 due to low sample size). While most of the MHC class-II HLA genes showed no difference in expression levels between the acute-like and recovering-like patients, HLA-DQA1 showed an increasing trend in the recovering-like patients towards the healthy individuals and low level in acute-like patients. Similarly, MHC class-I HLA gene, HLA-F showed a trend akin to HLA-DQA1 (Fig. 5C and  p < 0.05). On the other hand, there was no change in the antigen processing machinery (data not shown). However, all these speculations from limited data warrant further validation, and their functional immunological significance is currently unclear.
To confirm these analyses of immunologic composition, we performed two additional independent analyses. Hypergeometric test-based enrichment analysis of MSigDB's C7 immune signature 20 showed similar conclusions that innate immunity and myeloid (neutrophils and macrophages) cells are upregulated in coronavirus patients (in general), whereas adaptive immunity is downregulated in these patients ( Fig. 5D and Supplementary Table 2G). Similarly, analysis using the BioGPS database-gene sets 36 demonstrated an increased enrichment of CD33 + myeloid and CD14 + monocytes associated with upregulated genes in our CoV expression signature ( Fig. 6A and Supplementary Table 2H). In contrast, CD8 + and CD4 + T cells showed up along with enrichment of CD56 + NK cells and CD19 + B in the CoV-Down-gene signature ( Fig. 6B and Supplementary Table  2I). Compared to healthy individuals, these cells were under-represented in CoV-Up-gene signature or lower in coronavirus-infected patients (Fig. 6B).

Coronavirus-infected patients' PBMC reveals genes that overlap with known diseases and disease conditions
Based on information in Supplementary Fig. 3, we reasoned that the disease conditions from coronavirus infection may be similar to other immune-related diseases. To perform this, we applied enrichment analysis to study the overlap of genes between CoV-Up-gene signature and other diseases and disease conditions. We found that CoV-Up-gene signature was enriched for various immune-related diseases, including septicemia, pneumonia, lung disease, arthritis, cystic fibrosis, thalassemia, pre-eclampsia, bacterial infections, asthma, acute coronary syndrome, and others ( Fig. 7A and Supplementary Table 2J). The overlap of CoV-Up-gene signature and those genes from selected diseasessepticemia, pneumonia, lung diseases, arthritis, and cystic fibrosis-are shown in Fig. 7B. These results suggest that the diseases and disease conditions due to coronavirus may be complex and highly variable and may affect differently in patients with pre-existing disease conditions as recently reported 7 , which warrants further systematic investigation.

Discussion
The clinical course of COVID-19 patients remains enigmatic, and no treatment options exist with proven efficacy 37 . The variety of clinical presentations of this disease has alarmed healthcare providers across the globe. The rampant spread of COVID-19 during asymptomatic stage is attributed to the high SARS-CoV-2 viral shedding in the upper respiratory tract 38 . We reasoned that the blood, in real-time, may reflect changes occurring in immune and other cells and potentially infected tissues in PBMC, thereby acting as a potential remote biosensor of highly complex system-wide changes. There is no systematic study performed to our knowledge that attempts to use PBMC samples to understand the system-wide changes along with the disease symptoms in COVID-19 patients. This type of study will have the strength to distinguish systemic changes during acute and recovering stages of the patient's infection. This proof-of-concept study, in the future with further refinement and extensive validation, may support the development of personalized prognostic biomarkers (not the focus of the current study) and may provide the opportunity to save patients who are most likely to die of the disease.
In this proof-of-concept study, we performed a comprehensive analysis using publicly available blood cell RNA profiles from SARS and COVID-19 patients and cross-validated with patients with other infections or healthy individuals. Using IS approaches in our COVIDengine platform, we were able to develop a blood cell RNA profile-based gene signatures that are differentially expressed at different stages of infection (acute vs. recovering). In addition, our COVID-engine platform provided hierarchical and comprehensive analysis describing infection-associated changes in genes, pathways, networks, subcellular components, and cells, covering almost the whole system. This "integrated" analysis can help understand which other disease-related symptoms could manifest in COVID-19 patients.
Our results, using limited coronavirus-infected patient samples, represent that the innate immune system associated with increased neutrophils, macrophages, and monocytes with potential cytokine storm (including the expression of IL6, TNF, IL8 (CXCL8) and CCL13) is high in CoV-Up-gene signature and specifically in acute-like patients. Macrophages and monocytes are known to serve as factories for viral replication in other disease conditions 24 . These changes in immune cells may also be connected to increased neutrophil counts in these patients 39 . The cytokine storm-related to innate immune changes may be linked to changes in angiogenesis and coagulation, suggesting a potential relationship between inflammation, thromboembolism 40 , and coagulation 41 .
While there is no change in overall CD8 + T cell population between patients in their acute-like vs. recoveringlike stages (that may, however, attribute to the sample size), the change in cytolytic activity, pDCs, and NK cells suggests that the adaptive immunity is a late event represented in patients recuperating from this disease 42 . Congruently, this is associated with lower innate immunity in recovering-like patients than acute-like patients and associated with increased expression of anti-viral genes OAS2 and IL16.
Similarly, the higher expression levels of MHC Class-I HLA-F gene, which is known to be associated with the interaction between CD4 T cells and NK cells to inactivate human immunodeficiency virus (HIV) 43 , in recovering-like patients suggests the anti-viral effect in these individuals. Remarkably, the supposedly anti-viral CD4 T and NK cells, along with B cells, are low in coronavirus patients and are associated with low HLA-DRexpressing monocytes in these patients with severe respiratory failure 25,44,45 . This report corroborates with our results that HLA-DRA gene and all the above three cell types are low in acute-like patients. Specifically, B cellbased adaptive immunity seems to vary among patients and mostly low in coronavirus patients with severe respiratory failure 25,46 . Our data suggest that this may impact the development of effective vaccines for this infection. In addition, CoV-Up-gene signature was high in methicillin-resistant Staphylococcus aureus (Supplementary Fig. 2), which, similar to coronavirus, colonizes upper respiratory track and causes pneumonia. It is interesting to note that there is no vaccine for S. aureus infection 47 . While there are more changes in T cells than B cells, it may be interesting to consider T cell therapy for COVID-19 patients 48 . In our study, there are disease links and potential comorbidities (Fig. 7) that has evidence from There are a number of questions that arose from this study that could be of relevance in tackling the current pandemic. How does coronavirus infection downregulate the adaptive immune system? Is the dysregulation of the immune system described causally linked with clinical outcomes? Does the dysregulated immune system alert the body to respond, and how? The dysregulated immune system could alert the host response to produce an active adaptive immune response, which typically takes 10-14 days. Our findings may suggest that individuals with activated, appropriate immune responses, especially with increased INF-γ type-II responses and cytolytic activity, which may also serve as biomarkers with further validation, maybe on their way to recovery from symptoms. Moreover, those patients' incapable of such progression may have multiorgan failures that may be represented in our data.
Although our study may be timely for the current pandemic, there are limitations. We have performed analysis using publicly available small number of coronavirus-based training (n = 10) and test (n = 47) samples from less annotated datasets with limited clinical data, which may be appreciated provided the current global lock-down scenario. Also, the acute-like and recovering-like patients may overlap with the symptomatic and asymptomatic patients, respectively, described in the original publication from where the training dataset was derived 16 . Further validation was curtailed due to lack of associated clinical data, which is difficult to obtain in the current scenario.
In conclusion, PBMC has information related to infection status, immune states, disease progression, severity, and disease conditions that are likely going to be manifested due to coronavirus infection and COVID-19 disease (Fig. 8).

Data sources
PBMC transcriptomes from SARS, COVID-19, and other patients were obtained from published studies [16][17][18][19] with GEO Omnibus identifiers-GSE1739 and GSE6269, and normalized data directly from the original publications. Raw.CEL files from Affymetrix Human HG-Focus Target Array for GSE1739 were obtained from the authors of the original publication 16 and robust microarray analysis-based normalization was performed using R-based Bioconductor package-affy 49 . GEOquery R package 50 was used to obtained gene expression and/or phenotypic data from GEO Omnibus for GSE6269.
Overall analysis strategy COVID-engine refers to a pipeline of different IS methods described below.
Differential PBMC gene expression analysis and validation PBMC-based differential gene signatures between healthy volunteers and coronavirus-infected patients were selected by performing supervised SAM 51 using R-based siggenes package 52 . Differentially expressed genes with FDR < 0.05 were chosen as described in our previous publications [53][54][55] . Gene scores for signatures up or downregulated in coronavirus-infected patients in other validation datasets were derived using ssGSEA 20 using R scripts from GenePattern platform 56 .
Enrichment and other IS analysis for systems level understanding of the disease Briefly, the differentially expressed gene expression signatures from PBMC of patients infected with coronavirus were used to query multiple databases for metasignatures such as pathways, mechanistic processes, and their associated networks that are connected with different disease manifestations by enrichment analysis-based IS. These gene signatures and meta-signatures were further systematically linked to wire the pathophysiology in patients, again, in a hierarchical fashion, from cells to whole organism level (Fig. 1A).
Hypergeometric gene enrichment analysis for most of the studies were performed using R-based hypeR tool 57 . Disease conditions-based enrichment analysis was performed using R-based Dose package 58 . Nearest Template Prediction 22 using R-based tool from GenePattern platform 56 was used to derive distance between two signatures-acute-like vs. recovering-like patients. Different gene set databases were from downloaded from EnrichR 59 and MSigDB 20 . Immune gene sets were from Rooney et al. 34 . Additional gene sets and databases used in this study are: REACTOME 28 , KEGG 60 , COMPARTMENTS database 61 , BioGPS-gene portal system 36 and DisGeNE 62 .
Detailed information regarding intermediate data from COVID-engine are provided in Supplementary Tables 1 and 2. The number of samples used in this exploratory study was determined by public availability of transcriptomic datasets and for the need of the hour. Kruskal-Wallis and t-test statistical analysis were performed where appropriate.