Meta-analysis of host transcriptional responses to SARS-CoV-2 infection reveals their manifestation in human tumors

A deeper understanding of the molecular biology of SARS-CoV-2 infection, including the host response to the virus, is urgently needed. Commonalities exist between the host immune response to viral infections and cancer. Here, we defined transcriptional signatures of SARS-CoV-2 infection involving hundreds of genes common across lung adenocarcinoma cell lines (A549, Calu-3) and normal human bronchial epithelial cells (NHBE), with additional signatures being specific to one or both adenocarcinoma lines. Cross-examining eight transcriptomic databases, we found that host transcriptional responses of lung adenocarcinoma cells to SARS-CoV-2 infection shared broad similarities with host responses to multiple viruses across different model systems and patient samples. Furthermore, these SARS-CoV-2 transcriptional signatures were manifested within specific subsets of human cancer, involving ~ 20% of cases across a wide range of histopathological types. These cancer subsets show immune cell infiltration and inflammation and involve pathways linked to the SARS-CoV-2 response, such as immune checkpoint, IL-6, type II interferon signaling, and NF-κB. The cell line data represented immune responses activated specifically within the cancer cells of the tumor. Common genes and pathways implicated as part of the viral host response point to therapeutic strategies that may apply to both SARS-CoV-2 and cancer.


Transcriptional signatures of the host response to SARS-CoV-2 infection in human bronchial epithelial cells and lung cancer cell lines.
provides an overview of our study's approach, utilizing transcriptional gene signatures of the host response to SARS-CoV-2. Independent biological triplicates of three lung cell lines-normal human bronchial epithelium (NHBE), A549 adenocarcinoma, and Calu-3 adenocarcinoma-were mock-treated or infected with SARS-CoV-2 and then profiled for gene expression using RNAseq 7 . We observed widespread expression changes 24 h post-infection for each cell line, though with notable fewer changes for NHBE than for the adenocarcinoma lines (Supplementary Data S1). At a nominal p value of p < 0.01 (t test), 566 human genes were altered for NHBE (with predicted false discovery rate, or FDR, of 29%), 5675 genes were altered for A549 (FDR 3%), and 4968 genes were altered Calu-3 (FDR 4%). Taking the above genes, supervised clustering of the differential patterns could identify host responses to infection common across multiple cell lines or specific to a single cell line (Fig. 2a). Using a relaxed cutoff for NHBE (one-sided p < 0.05), when combining results with those of A549 and Calu-3, we defined a signature of 308 genes (181 upregulated, 127 down-regulated) commonly altered in the same direction across all three cell lines (Fig. 2b). We used these data to define four distinct SARS-CoV-2 transcriptional signatures (Fig. 2c): altered in all three cell lines examined (308 genes), altered specifically in A549 (1326 genes), altered specifically in Calu-3 (1327 genes), and altered in the same direction specifically in both A549 and Calu-3 (2963 genes).
Although the A549 cell line has low expression of viral receptor ACE2 7 , the above SARS-CoV-2 signatures shared across multiple cell lines (i.e., NHBE/A549/Calu-3 and A549/Calu-3) were repeatedly observable in A549 cells transduced with ACE2 and infected with SARS-CoV-2 (Supplementary Figure S1). However, the Calu-3-specific transcriptional signature, but not the above A549-specific signature, was also manifested in A549 over-expressing ACE2 (Supplementary Figure S1). The transcriptional differences between A549 with and without ACE2 would presumably have something to do with ACE2 receptor, although NHBE primary cells also expressed ACE2 (Supplementary Figure S1), and so other factors involving the Calu-3 and A549 cancer cell lines, as well as very high ACE2 expression, may also be involved. Notably, alternative receptors with a potential role in SARS-CoV-2 entry may also exist 17 .
In several analyses presented below, we used the above transcriptional signatures of cell line response to SARS-CoV-2 infection as a frame of reference for comparisons with other transcriptional datasets from independent Pathways associated with the host response to SARS-CoV-2 infection include interferon signaling and inflammation. Each of the four SARS-CoV-2 signatures (three cell lines, A549-specific, Calu-3-specific, and A549/Calu-3) represented specific altered pathways or functional gene categories. We searched wikiPathways 18 for enrichment of any of our SARS-CoV-2-associated gene sets (Supplementary Data S2). Out of 417 pathways considered, 47 were significant by one-sided Fisher's exact test with FDR < 1% for at least one of our gene sets (Fig. 3a). Most of these enriched pathways involved the common three cell line signature, with the majority of these pathways also showing enrichment within the Calu-3-specific signature. Enriched pathways within the three cell line signature included "Photodynamic therapy-induced NF-κB survival signaling", "Cytokines and inflammatory response", "Type II interferon signaling", and "VEGFA-VEGFR2 Signaling Pathway". In terms of functional gene categories, significantly enriched Gene Ontology (GO) annotation terms for the three cell line SARS-CoV-2 signature ( Fig. 3b; Supplementary Data S3) included "immune response" (involving 44 out of a total of 741 genes with this annotation), "inflammatory response" (29 out of 370), "cytokine activity" (19 out of 176), "growth factor receptor binding" (13 out of 117), and "response to virus" (15 out of 229). A survey of the wikiPathways "Type II interferon signaling" pathway showed many genes that were statistically  Figure 1. Overview of the basic approach of the study. (a) Diagram of the overall analytical approach to score a set of differential expression profiles according to a given gene transcription signature. For a given expression dataset, we score each mRNA profile according to an independently-derived transcriptional signature representing the host response to SARS-CoV-2 infection. The scoring basis is on whether the relative differential patterns in the external sample profile, higher versus lower, are broadly similar to the patterns of up-versus down-regulation, respectively, in the SARS-CoV-2 signature. The "t score" signature scoring metric from previous studies is used [34][35][36][37] 8 . We scored each patient sample expression profile for each of the four SARS-CoV-2 transcriptional signatures (three cell lines, A549-specific, Calu-3-specific, and A549/ Calu-3). The scoring basis was on whether the relative differential patterns in the patient sample profile-higher versus lower-were broadly similar to the patterns of up-or down-regulation, respectively, in the in vitro infection dataset. As compared to the non-viral group, SARS-CoV-2 in vitro scores for the common three cell line signature and the Calu-3-associated signatures were higher in a substantial fraction of patient samples in both the COVID-19 and other viral groups (Fig. 4a). Taking the top set of genes correlated positively with SARS-CoV-2 viral load across the 238 patient samples (p < 0.01, Pearson's correlation), these significantly overlapped with the genes high in the in vitro three cell line signature (41 genes out of 181, p < 1E−13, one-sided Fisher's exact test) and with the genes high in the Calu-3-specific signature (194 out of 784 genes, p < 1E−69, Fig. 4b; Table S1). The overlapping genes involved pathways related to cytokines and the inflammatory response. Samples from COVID-19 patients with high viral loads of SARS-CoV-2 tended to most strongly manifest the in vitro signatures of infection, while samples from lower viral loads often appeared negative for the signatures (  Table S1), based on analysis of three external public datasets (GSE36969, GSE59185 19 , GSE68820 20 ). Of the 308 genes in our SARS-CoV-2 three cell line signature (from Fig. 2a), 181 (59%) were significantly altered in the same direction (p < 0.05) for at least one of the three SARS datasets in mice. Similarly, in another gene expression profiling dataset of blood and lung samples in mice, the three cell line signature and the Calu-3 specific signature of SARS-CoV-2 infection shared broad similarities with host responses to Toxoplasma gondii, Influenza A virus, RSV, acute Burkholderia pseudomallei, Candida albicans, and House dust mite 21 ( Fig. 5b; Table S1). In the above dataset, we found more commonalities with the SARS-CoV-2 signatures for the mouse lung samples than for the blood samples. We next examined human samples, using a gene expression dataset of 418 patients from both viral-associated and non-viral nasal lavage samples 22 Table S1), and the scores for each SARS-CoV-2 signature were significantly higher in the viral group compared to the non-viral group. In particular, the three cell line and Calu-3 signatures appeared markedly elevated in the viral group (p < 1E−11, t test).  Genes altered with p < 0.01 for any cell line are represented as a heat map. Each cell line profile is centered on the average of its corresponding mock control group. (b) Heat map representing a common set of 308 genes up-regulated or down-regulated across all three cell lines and in the same direction of change (one-sided p < 0.05 NHBE, and two-sided p < 0.01 A549 and Calu-3). (c) In the same study noted above, NHBE or A549 cells were infected with other viruses (IAV, RSV, HPIV3) or treated with interferon beta (IFNB). Differential SARS-CoV-2-associated expression patterns common to all three cell lines (NHBE, A549, Calu-3) or found for just one or two cell lines (taken from a) are shown for both the SARS-CoV-2 infection profiles and the additional profiles representing the other infections and treatments. Each treatment profile is centered on the average of its corresponding control group. Patterns of manifestation of SARS-CoV-2 signatures within the other virus or treatment groups are highlighted. (d) In an independent study (GSE148729), three cell lines-H1299 lung squamous, Caco-2 colorectal, and Calu-3 lung adenocarcinoma-were infected with SARS-CoV-1 or SARS-CoV-2 and transcriptionally profiled. The SARS-CoV-2 signatures from part c were examined in this additional dataset. Patterns of manifestation of SARS-CoV-2 signatures within the Calu-3 signatures of the independent dataset are highlighted. p values by t test using log2-transformed data. See also Supplementary Data S1.
Type II Interferon Signaling p<0.01 p<0.001 p<0.05 TGFB1 IL2 IL4   IL13  IL5  IFNG   IL6   IL4 IL10   IL4  IL2   IFNG   IL15  IL2   IL2  TGFB1   IL4   IL10  IL5   CSF2   IL6   IL7   IL4   IL10   IL13   IFNG   TNF   CD4   HLA-DRA   HLA-DRA1   IL2 IL12B IL15   CXCL2 TNF   TNF  IL6  IL1A IL1B  TNF   IL1A   IL1B   TNF  CSF2  CSF1  CSF3   IL1A IL1B IL6   IL11 IL12B   IL1B IL1A TNF   TGFB1   gene signature score (t) similar disssimilar  16 . Using the mRNA profiling dataset from TCGA, we scored each tumor expression profile for each of the four SARS-CoV-2 signatures (three cell lines, A549-specific, Calu-3-specific, and A549/Calu-3). A previous study classified the NSCLC profiles into nine molecular subtypes 16 , three associated with lung squamous cell carcinoma, and six associated with lung adenocarcinoma. Three of the adenocarcinoma subtypes-AD.2, AD.3, and AD.4-express several immune checkpoint genes, including PDL1 and PDL2, corresponding with patterns of greater immune cell infiltration 16 . We found that the common signature of SARS-CoV-2 infection across three cell lines represented a transcriptional program associated with the immune response and immune checkpoint pathway in human lung tumors. Interestingly, scores for all four SARS-CoV-2 signatures had higher levels in normal adjacent lung tissues than in lung tumors (Fig. 6a, Table S2, Supplementary Data S4). Normal adjacent tissues involve inflammation and immune cell infiltration, as well as a collection of cell types that differ from the cancer cell of origin. The common three cell line signature, in the TCGA lung tumor profiles, was uniformly manifested across the SQ.1, AD.2, AD.3, and AD.4 subtypes in particular (Table S2). Scores for the other three signatures related to A549 or Calu-3 were broadly correlated with those of the three cell line signature, though with less distinctive associations according to NSCLC subtype. As expected, the SARS-CoV-2-infected cell lines did not show signatures of immune cell infiltrates found in the above lung tumor subtypes, as tumors represent a mixture of cancer and non-cancer cells in contrast to cell lines. However, both A549 and Calu-3 showed elevated expression (p < 0.05, t test using log2-transformed data) of immune checkpoint genes CD274 (PDL1) and PDCD1LG2 (PDL2) with SARS-CoV-2 infection (Fig. 6a,b). A survey of immune checkpoint pathway genes 16 also showed up-regulation of TNFSF14 in A549 and Calu-3 in response to infection and TNFRSF14 in Calu-3. Previously, most immune checkpoint-related genes, those presumed to express in either T-cells or the target cells, have been found elevated across the AD.2, AD.3, and AD.4 NSCLC subtypes 16 .
The SARS-CoV-2 transcriptional signatures are manifested across large subsets of human cancers from diverse histopathological types. We sought to determine the relevance of our SARS-CoV-2 signatures to cancer types other than lung. We hypothesized that the transcriptional programs associated with viral infection in vitro could be manifested within well-defined subsets of human tumors. Therefore, we examined the entire TCGA pan-cancer cohort of 10,224 cases involving 32 major types and previously classified into ten major pan-cancer classes that cut across the tissue of origin 14 . These ten pan-cancer classes included a "c3" class (representing ~ 13% of all cancers), strongly associated with the immune response and immune checkpoint pathways, and "c7" and "c8" classes (representing ~ 11% and 9% of cancers, respectively), associated with mesenchymal or stromal cells. The c3, c7, and c8 classes also associated with hypoxia, NRF2/KEAP1, Wnt, and Notch pathways 14 . Using the mRNA profiling dataset from TCGA, we scored each tumor expression profile for each of the four SARS-CoV-2 signatures (three cell lines, A549-specific, Calu-3-specific, and A549/Calu-3).
We found that both the three cell line and Calu-3-specific signatures associated with SARS-CoV-2 infection manifested in the c3, c7, and c8 human tumors, though more prominently in c3 and c8 (Fig. 7a, Table S2, Supplementary Data S4, S5). Of the 181 genes up-regulated by SARS-CoV-2 across all three cell lines, 150 (83%) were also up-regulated (p < 0.01, t test) in c3 compared to other human tumors, and 111 (61%) were also up-regulated (p < 0.01) in c8 compared to other tumors (Supplementary Data S5). The SARS-CoV-2 signature specific to A549 and Calu-3 but not NHBE also manifested in c7 and c8 tumors. The set of genes both high in the three cell line SARS-CoV-2 signature and high in either c3 or c8 human tumors versus other tumors (p < 0.01, t test) were enriched for a similar set of wikiPathways associated above with SARS-CoV-2 alone (Fig. 3a; Supplementary Data S5). Enriched pathways common to SARS-CoV-2 infection and c3 and c8 human tumors included the NF-κB survival signaling pathway, including NFKB1, NFKB2, REL, and RELB genes, as well as downstream transcriptional targets (Fig. 7b). These findings were specific, as we found that other pathways previously associated with c3 and c8 (e.g., Wnt and Notch) were not associated with SARS-CoV-2 infection.    16 were probed according to the SARS-CoV-2 transcriptional signatures. Differential SARS-CoV-2associated expression patterns common to all three cell lines (NHBE, A549, Calu-3) or found for just one or two cell lines (from Fig. 2b,c) are shown for both the in vitro SARS-CoV-2 infection dataset and the NSCLC dataset ("normal adj lung", normal adjacent lung tissue samples in proximity to lung tumor, n = 110). Gene order is the same across both datasets. The ordering of NSCLC profiles is by nine previously-identified molecular subtypes 16 , three associated with lung squamous cell carcinoma (SqCC) and six associated with lung adenocarcinoma (AD). Heat map contrast (bright yellow/blue) is threefold change from control for SARS-CoV-2 dataset and 1 SD from median for NSCLC dataset. Selected patterns of manifestation of SARS-CoV-2 signatures within the human cancers are highlighted. Under the differential expression heat maps, scores for each SARS-CoV-2 signature across the NSCLC profiles are represented (orange-cyan heatmap). Gene expression-based signatures of immune cell infiltrates 38  higher SARS-CoV-2/pan-cancer c3/pan-cancer c8 lower SARS-CoV-2/pan-cancer c3/pan-cancer c8 Calu-3 SARS CoV-2 pan-cancer c8 vs others pan-cancer c3 vs others c1  c2  c3  c8  c9  c4 c6 c7 c10 c5 Figure 7. The SARS-CoV-2 transcriptional signatures are manifested in specific pan-cancer classes involving multiple tissues of origin, the immune response, and NF-κB signaling. (a) RNA-seq profiles of 10,224 cancer cases across 32 major types were previously classified into ten molecular-based pan-cancer "classes" (profiles normalized within their respective cancer types) 14 . Differential SARS-CoV-2-associated expression patterns common to all three cell lines (NHBE, A549, Calu-3) or found for just one or two cell lines (from Fig. 2b,c) are shown for both the in vitro SARS-CoV-2 infection dataset and the pan-cancer dataset. Gene order is the same across both datasets. Human tumor profiles are ordered by pan-cancer class, with cancer type based on tissue of origin and histopathology indicated along the bottom. Heat map contrast (bright yellow/blue) is threefold change from control for SARS-CoV-2 dataset and 1 SD from median for pan-cancer dataset. Selected patterns of manifestation of SARS-CoV-2 signatures within the human cancers are highlighted, involving c3 class (immune-related) and c7 and c8 classes (mesenchymal-or stroma-related). Under the differential expression heat maps, scores for each SARS-CoV-2 signature across the human tumor profiles are represented (orangecyan heatmap). Gene expression-based signatures of immune cell infiltrates 38  www.nature.com/scientificreports/

Discussion
In our present study, we have shown that the host transcriptional response to SARS-CoV-2 infection, as identified using cell lines, shares broad similarities with results from multiple independent studies of coronaviruses or other viruses, using other model systems or patient samples. Our results demonstrate how in vitro model system could be effective in identifying rapid responses within cancer cells that would also be observable in other cellular contexts. In particular, the Calu-3 model showed specific host responses to infection not observed in the other two cell lines but which validated in independent datasets. In particular, the overall similarities in host responses observed among different coronaviruses support some degree of leveraging of what has previously been learned towards our understanding of SARS-CoV-2. In the United States, current guidelines from the Centers for Disease Control (CDC) require Biosafety Level (BSL)-3 facilities and practices for experimental studies involving the SARS-CoV-2 virus, which can be rather restrictive in practice. For some proposed studies, coronaviruses with lower Biosafety Levels, such as 229E or NL63, might yield similar results to those of SARS-CoV-2.
The meta-analysis results across the various datasets, as provided in our supplemental, represents a resource for future investigations, whereby one can identify gene candidates for the study of the host response that appear common to multiple systems or viruses. Our data could also help identify genes that would be specific to coronaviruses or SARS-CoV-2 in particular. Genes that appear involved with COVID-19 in both experimental models and human patient samples may be particularly attractive for further study, and our results allow for honing in on a focused set of genes. However, identifying genes altered specifically in response to SARS-CoV-2 infection and not in response to any other viral infection may be challenging, as asserting a negative is inherently difficult. Our results also indicate that different cell types may respond differently to viral infection. For example, host response patterns observed in lung adenocarcinoma cells did not show in colon or lung squamous cells.
Our study further revealed that the transcriptional programs initiated in cancer cells in response to SARS-CoV-2 are also at work within ~ 20% of human tumors. This finding reflects known parallels between responses to viral infection and the immune response associated with cancer 10 . These associations would not be exclusive to SARS-CoV-2 but would involve host responses to a broad range of viruses. The viral response signatures manifest within specific and previously-identified cancer subtypes, which strongly indicates that the signatures represent a coordinated transcriptional program that underlies these subtypes. The inherent limitations of cancer cell line models, e.g., in their inability to capture the microenvironmental effects at work within human tumors, are well understood. At the same time, cell lines can reveal molecular properties intrinsic to cancer cells, independent of cellular environment or context. Tumors represent a mixture of cancer and non-cancer cells, which may include immune cells, and distinguishing between the two based on molecular data on bulk tumor samples is inherently difficult. In contrast, the cell line results allow us to de-convolute an immune response program that would be activated specifically within the cancer cells of the tumor. The in vitro viral infection model could help identify candidate genes with roles in cancer cell responses to immune cells and inflammation. For example, we found cancer-specific responses to viral infection to include up-regulation of PDL1 and PDL2 genes, two critical targets in cancer immunotherapy 23 . As cancer represents a collection of molecularly heterogeneous diseases, different cell lines may respond differently to infection, as observed here.
The links identified here between viral infection and cancer suggest opportunities for leveraging knowledge between domains 10 . Pathways identified as part of the host response to viral infection could be relevant for therapeutic targeting, both for certain viral infections and specific cancer subsets. Inflammation and immunity are inherent characteristics of cancer, and both viruses and cancers are associated with dominant Th1 responses 10 , as reflected in our results. The host response to SARS-CoV-2 includes interleukin-6 (IL-6) 7 , which plays an important role in the "cytokine storm," and IL-6 receptor antagonist tocilizumab is currently under evaluation as a treatment for severe COVID-19 24 . IL-6 is a major factor driving T helper 17 (Th17) responses 25 , which, under some circumstances, can interfere with the control of viral infections. Similarly, Th17 responses can either promote or inhibit tumorigenesis, depending on the precise tumor and other factors 26,27 . IL-6 also promotes tumorigenesis by regulating multiple hallmarks of cancer and signaling pathways. As such, blocking IL-6 is under investigation as an anticancer therapy 28 , but our findings suggest that strategies that block Th17 responses might offer additional benefit in the context of COVID-19. Similarly, NF-κB-mediated inflammation, known to be associated with several cancer types 29 , has also been investigated previously as a therapeutic target for SARS coronaviruses 30 , which findings would likely apply to SARS-CoV-2. Our present study has brought together disparate results from multiple systems, diseases, and domains. These results can lend support to current therapeutic strategies under investigation, as well as suggest new ones.

Materials and methods
Derivation of SARS-CoV-2 transcriptional signatures of host response in cell lines. To define transcriptional signatures of the host cell response to SARS-CoV-2 infection, we referred to the GSE147507 dataset 7 . In this dataset, three lung cell lines-NHBE, A549, and Calu-3-were mock-treated or infected with SARS-CoV-2 and then profiled for gene expression using RNA-seq 7 . We used data from the SARS-CoV-2 profiling experiments involving multiplicity-of-Infection (MOI) of 2. We converted raw gene-level sequencing read counts to reads per million Mapped (RPM) values and then log2-transformed them.
Using GSE147507, we defined a common set of 308 genes up-regulated or down-regulated across all three cell lines and in the same direction of change (one-sided p < 0.05 NHBE, and two-sided p < 0.01 A549 and Calu-3, t test using log2-transformed expression). For NHBE, we used a relaxed statistical cutoff, to lower false negatives, as we combined the NHBE results with results from A549 and Calu-3. A gene in the common three cell line signature had to meet multiple criteria for inclusion, which mitigated the relatively high FDR (adjusted for multiple testing) observed when considering NHBE alone. Also, significant patterns of correspondence were observed when examining the common signature across multiple independent datasets. www.nature.com/scientificreports/ In addition, we evaluated the set of genes differentially expressed for any one of the three cell lines, with p < 0.01 (t test using log2-transformed data) in infection versus mock-treated, to identify patterns specific to one cell line or common across multiple cell lines. We performed this supervised clustering approach 31 as follows: (1) expression values within each cell line were centered on the average of the corresponding control group; (2) each pattern of interest (i.e., genes up-regulated or down-regulated specifically in A549, or genes up-regulated or down-regulated in both A549 and Calu-3 but not NHBE) was represented as a series of 1 s and 0 s; (3) for each gene, we computed the Pearson's correlation between its expression values and each of the predefined patterns; (4) for each genes, the pre-defined pattern of interest best correlated with the gene's differential expression pattern was determined; and (5) we sorted the genes by their assigned patterns. The dominant signatures from this analysis included an A549-specific signature, a Calu-3-specific signature, and an A549/Calu-3 common signature. A subset of genes in the common three cell line signature was also part of the cell line-specific signatures, representing instances where the gene was differentially expressed in all three cell lines, but with the altered expressed being particularly prominent within one or two cell lines. Of the 308 genes in the three cell line signature, 47 were included in the Calu-3-specific signature, and 165 were included in the A549/Calu-3 common signature.
Pathway analyses. We searched each of the four SARS-CoV-2 in vitro signatures for enrichment of previously-curated pathways and functional gene groups. We evaluated enrichment of GO annotation terms 32 and wikiPathways 18 within sets of genes up-regulated in response to viral infection, using SigTerms software 33 and one-sided Fisher's exact tests. Gene sets for each wikiPathway were downloaded in July 2019 ("20190710" version). For GO term enrichment analysis, we used all 19510 unique proteins represented in at least one of the seven cancer types profiled as the reference population. For wikiPathways enrichment analysis, we used all 6597 unique proteins represented in at least one wikiPathway as the reference population.

Analysis of external transcriptome datasets.
For multiple viral and human cancer datasets, we scored each external mRNA profile according to the in vitro SARS-CoV-2 transcriptional signatures from GSE147507 dataset (three cell lines, A549-specific, Calu-3-specific, and A549/Calu-3). The scoring basis was on whether the relative differential patterns in the external sample profile, higher versus lower, were broadly similar to the patterns of up-versus down-regulation, respectively, in the in vitro SARS-CoV-2 signature. We based the SARS-CoV-2 signature score on our previously described "t score" metric [34][35][36][37] . We have defined the t score as the two-sided t statistic when comparing, within each external differential expression profile, the average of the SARS-CoV-2-up-regulated genes with the average of the down-regulated genes. For example, the t score for a given sample profile is high when both the up-regulated genes in the signature are high and the down-regulated genes are low. For viral expression datasets, we centered logged expression values (base 2) on the corresponding non-viral group. For TCGA lung 16 , and pan-cancer 14 datasets, logged expression values for each gene were centered on the median and divided by the standard deviation across the sample profiles. For TCGA pan-cancer dataset 14 , logged expression values for each gene were centered on the median and divided by the standard deviation within their respective cancer types (according to TCGA project). Computational inference of the infiltration levels of specific immune cell types using RNA-seq data, based on published immune signatures 38 , was carried out previously for TCGA datasets 14,16 . RPM values for the COVID-19 patient trachea dataset 8 were quantile normalized before the analysis.
When joining genes from microarray datasets to the GSE146507 dataset, for side-by-side comparisons of the differential patterns using heat maps, there were cases where multiple array probes referred to the same gene. In these cases, we used the probe with either the smallest p value (in either direction, where the dataset involved just two experimental groups) or the highest standard deviation across sample profiles (where multiple experimental groups were involved) to represent the gene.

Statistical analysis.
All p values were two-sided unless otherwise specified. We performed all tests using log2-transformed gene expression values. False Discovery Rates (FDRs) were estimated using the method of Storey and Tibshirini 39 . Visualization using heat maps was performed using both JavaTreeview (version 1.1.6r4) 40 and matrix2png (version 1.2.1) 41 . GSEA 42 was carried out using version 4.0.3 of the software, using weighted enrichment statistic and 10,000 gene set permutations. For GSEA, genes were ranked using GSEA's Signal2Noise metric, except for human trachea COVID-19 dataset, which used correlation with log2 viral load for the gene rankings.

Data availability
All data used in this study are publicly available. We obtained RNA-seq or microarray expression data from experimental models of viral infection or other treatments from the Gene Expression Omnibus (GEO). The COVID-19 trachea patient RNA-seq dataset is available at https ://githu b.com/czbio hub/covid 19-trans cript omics -patho genes is-diagn ostic s-resul ts and at GEO (GSE156063). TCGA data are available through the Genome Data Commons (https ://gdc.cance r.gov/) and the Broad Institute's Firehose data portal (https ://gdac.broad insti tute. org).