Induction of interferon response by high viral loads at early stage infection may protect against severe outcomes in COVID-19 patients

Key elements for viral pathogenesis include viral strains, viral load, co-infection, and host responses. Several studies analyzing these factors in the function of disease severity of have been published; however, no studies have shown how all of these factors interplay within a defined cohort. To address this important question, we sought to understand how these four key components interplay in a cohort of COVID-19 patients. We determined the viral loads and gene expression using high throughput sequencing and various virological methods. We found that viral loads in the upper respiratory tract in COVID-19 patients at an early phase of infection vary widely. While the majority of nasopharyngeal (NP) samples have a viral load lower than the limit of detection of infectious viruses, there are samples with an extraordinary amount of SARS-CoV-2 RNA and a high viral titer. No specific viral factors were identified that are associated with high viral loads. Host gene expression analysis showed that viral loads were strongly correlated with cellular antiviral responses. Interestingly, however, COVID-19 patients who experience mild symptoms have a higher viral load than those with severe complications, indicating that naso-pharyngeal viral load may not be a key factor of the clinical outcomes of COVID-19. The metagenomics analysis revealed that the microflora in the upper respiratory tract of COVID-19 patients with high viral loads were dominated by SARS-CoV-2, with a high degree of dysbiosis. Finally, we found a strong inverse correlation between upregulation of interferon responses and disease severity. Overall our study suggests that a high viral load in the upper respiratory tract may not be a critical factor for severe symptoms; rather, dampened antiviral responses may be a critical factor for a severe outcome from the infection.

3.6, much greater than influenza virus 2 . In addition, a significant number of people infected with SARS-CoV-2 might have not been properly diagnosed nor reported 3 , posing a challenge in preventing community-based spread of SARS-CoV-2.
The spectrum of the clinical outcomes from SARS-CoV-2 infections ranges from lack of symptoms to severe illnesses comprising the acute respiratory distress syndrome and multisystem inflammatory syndrome 4 . Aging, male sex and chronic comorbidities such as diabetes mellitus and arterial hypertension have been reported as host factors associated with the severity of illness and fatality in many cases but not in all 5 .
Outside of host comorbidities, our understanding in COVID-19 pathogenesis is still limited, largely due to the complex interactions among multiple virological, microbiological, and host factors contributing to pathogenic outcomes of COVID-19. Understanding of the viral dynamics of SARS-CoV-2 and host responses driving the pathogenic mechanisms in COVID-19 are evolving rapidly. Innate immune responses including type I interferons are known to be important to control the replication of SARS-CoV-2 [6][7][8][9] . The viral load has been reported as an important factor in efficient induction of host responses, including type 1 interferon responses 10 . However, at the same time, high viral load has been suggested to contribute to mortality. A clinical study found an independent association between viral load and mortality with a 7% increase in hazard for each log transformed viral RNA copy number 11 . In addition, Westblade et al. reported that the hospital admission viral load independently predicts mortality in COVID-19 patients with and without cancer 12 . On the contrary, Kimon et al. presented an inverse correlation between viral loads in NP swabs and clinical outcomes and duration of symptoms 13 . Importantly, no studies have shown all of these factors within a defined cohort to understand their interactions. To address this important question, we sought to understand how four key viral pathological components; (1) viral load, (2) viral strain, (3) host response to the infection and (4) co-infection, interplay with each other in a COVID-19 patient cohort and how they contribute to clinical outcomes.
Our study of patients with SARS-CoV-2 aimed (1) to examine the continuum of viral load in nasopharyngeal (NP) swabs, (2) to determine the meaning of this variety in viral load in relation to the virus dynamics and host responses and, (3) to understand if different viral loads can be related to pathogenic factors for the spectrum of the COVID-19 clinical outcomes.
We found a strong correlation between interferon responses and viral load, which inversely correlates with clinical severity. We postulate an interplay of these three factors where a strong up-regulation of antiviral responses mediated by high viral loads at the initial phase have led to protective immune responses.

Results
Sample collection. NP swab samples collected from in-and out-patients within the area of Louisville (KY, USA) were tested for SARS-CoV-2 with the real-time PCR assay developed by the US CDC 14 (See "Methods and Materials" for details). To understand the distribution of viral RNA loads in clinical NP samples, we first analyzed a cohort of 3,640 samples that were collected between 3/11/2020 and 4/11/2020. A total of 544 samples were determined as positive (Ct values of less than 39 for both N1 and N2) for SARS-CoV-2. The overall positive rate was 14.95% and the median Ct values for viral targets (N1, N2, and N3 probe/primer pairs for the viral Np gene) were between 30 Fig. 1A). The lowest Ct value for viral targets was 11.58 (N2 target) from a sample (sample ID: KY-9A10) with Ct values of 13.3, 12.43, and 33.9 for N1, N3, and Hu RNaseP target, respectively.
A histogram analysis of the Ct values showed that the distribution of viral RNA load is not symmetric around the median; rather, it skewed negatively. Unlike the viral RNA, the distribution of Ct values for Hu RNAseP was close to a normal symmetric distribution, supporting appropriate sample collection for the study (Fig. 1A). This analysis showed that viral RNA load in the upper respiratory tracts varies significantly and while more than 50%  (Table 2). To confirm the viral RNA loads in the selected samples, 24 specimens out of the 34 samples were re-tested independently in two secondary SARS-CoV-2 coronavirus assays using the Luminex ARIES® platform: 1) ARIES® SARS CoV 2 Assay (EUA) targeting the ORF1ab and N, and 2) ARIES® SARS-CoV-2 LDT Assay targeting N1 and N3. These three assays demonstrated consistent results (Supplementary Table 1) and correlation analysis of Ct values between all comparison groups were > 0.9 except one for the IVD_orf and CDC N1 target (r = 0.893). Test performance for individual samples are listed in Supplementary Table 2 (Supplementary Table 2). This analysis showed these three viral RNA load quantitation methods have a similar sensitivity and produced consistent results.
In vitro virological characterization. We then examined if samples with a lower Ct value in fact had a high infectious viral load. To test this, we directly titrated the viral load in the NP swab samples by using   (Fig. 1D).

Genetic characterization of viral genomes found in high viral load samples.
Our analysis clearly demonstrated that some patients had a significantly high viral load than others. To understand if the difference in viral load in the NP swabs is due to viral factors, we sought to determine and compare the viral genome sequences from the samples. A total of 32 samples were subjected to high-throughput sequencing. The total RNA was isolated and sequenced on the Illumina NextSeq 500 platform (See "Methods and Materials" for the sequence analysis and alignment). The total sequence reads are available from the NCBI's SRA (bioproject PRJNA626685). Among the 32 samples, six samples failed quality control checks and the remaining 29 samples showed data with a quality warranting a further analysis. From the NGS data, we were able to obtain the full viral sequences from 18 samples ( Table 2). To understand if a specific viral clade can generate a higher viral load than the others, we categorized the isolates based on viral clades and dCt. Currently there are six different clades of circulating SARS-CoV-2 reported 15 and some clades Table 2. Characterization of 34 samples used in the study. *Patients with a lethal outcome. # COVID-19 patient died, but this patient had DNR/DNI advanced directives. The patient did refuse to be intubated. ‡ Samples were unavailable or removed from the host gene expression analyses due to the quality of the NGS readings. www.nature.com/scientificreports/ including G and GH are known to produce a higher virus titer both in vitro and in patients 16,17 . In our samples, we found four different viral clades in our samples: S, G, GR, and GH. The dominant GH clade was detected in 13 samples, followed by S, G, and S clade detected in 3, 1, and 1 samples, respectively (Fig. 1D). This pattern is consistent with research reported by others 15 , showing a high prevalence of the GH type in the North America.
When we compared the dCt values of the samples based on the virus clade, all clades showed an average dCt between -12.55 (G clade) and -10.2 (GH clade, except an outlier, KY-37A05, with a dCt of 8.03), indicating a similar viral load among the clades.
Clinical correlates. Next, we sought to understand if viral loads in upper respiratory tract is affected by host factors or, further, the disease severity is determined by the viral loads in the early phase of symptomatic infection. To analyze this, we profiled the viral loads in the NP samples (dCt) based on various clinical relates including; sex, age, and clinical outcomes (Fig. 2). Neither sex (Fig. 2a) nor age ( Fig. 2b) showed significant correlation with the dCt. However, we found a strong inverse correlation between the viral load and the COVID-19 disease severity of the patients. Based on the clinical assessments, we classified the COVID severity into three groups: (1) Mild (outpatients, screening samples from long-term care facilities, or an emergency department visit with no hospital admission); (2) Moderate (hospital admission to ward), and (3) Severe (hospital admission to an intensive care unit within first 48 h). According to this grouping, a clear age-disease severity relationship was shown (Fig. 2c), validating our grouping strategy. The Mild group showed the overall highest viral load with an average dCt of − 9.26, and the Severe group showed the least (dCt ave = − 2.73). The difference of dCt between the groups were significant (P = 0.0027, Tukey's multiple comparisons test). There were three cases with a lethal outcome: two cases in the Severe group and one in the Moderate group (KY-9A10). The one patient (74 years old) in the Moderate group had Do-Not-Resuscitate (DNR) and Do-Not-Intubate (DNI) advanced directives. Despite the overall negative correlation between the viral load and disease severity, two cases with a lethal outcome showed a higher viral load in their NP samples, which may indicate a possibility that a high viral load may lead a critical clinical outcome in some patients. To see what host responses are specific to the high viral load samples, we profiled the host gene expressions using RNA-sequencing approaches. We first investigated what genes are expressed in the NP upon infection. www.nature.com/scientificreports/ We compared the Top 50% (Group 1, 2, and 3) against the negative control and found that a total of 60 genes were differentially expressed (35 upregulated and 25 downregulated genes in the Top 50% group, Fig. 3a). Many interferon stimulated genes (e.g., interferon alpha inducible (IFI) genes, HERC6, OAS2, and DDX58) were significantly up-regulated in the infected group compared to the negative group (Table 3 and Supplementary  Table 3 for down-regulated genes).
To understand what host factors are related to the viral loads, we used two approaches: (1) comparison between viral load groups, and (2) correlation studies. A clear comparison was detected when the Top 25% samples (Group 1 and 2) were compared to the Bottom 50% group (Fig. 3a,b). We found that the Top 25% samples showed a significantly high level of expression of genes that are related with type I interferons compared to the Bottom 50% group (Table 4). Type 1 interferon signaling pathway was identified with a high significance from a gene ontology analysis for biological processes using Cluster Profiler (Supplementary Table 4).
This high confidence of the difference between viral load and interferon responses were confirmed in our second approach with a correlation analysis between viral RNA load (dCt) and individual gene expression. In this analysis, a total of 2,155 genes (484 genes with positive correlations and 1,671 with negative correlations) were identified having a significant correlation (Spearman's Rho) at p < 0.01 (Tables 5 and 6). This approach identified several genes that are important for cell death (e.g., RIPK1, , and CFLAR) and RNA metabolism (SRSF7 18 , AFF4 19 , and PNPT1 20 ) in addition to antiviral responses (e.g., DDX58 , IFIT1, and HERC6 21 ) (Fig. 3c). A heatmap analysis with the 50 positively and negatively correlated genes (Fig. 3d) clearly demonstrated that the Top 25% group (Group 1 and 2) clearly showed a distinct gene expression pattern compared to the Bottom 75% (Group 3 and 4). Again, many of the positively correlated genes were interferon response related genes.
The analyses highlighted the interferon responses as the primary response to the high viral load in the upper respiratory tract.
Genes correlated with disease severity. Next, we sought to find host genes that correlate with disease severity. Gene expression profiles were compared between the disease severity groups; Mild, Moderate, www.nature.com/scientificreports/ and Severe. A differential expression analysis comparing three levels of disease severity was performed using DESeq2 22 . The number of differentially expressed genes at an adjusted p value cutoff < 0.05 is displayed in Table 7 for three comparisons. We sought to understand if innate antiviral responses (i.e., interferon responses) could be a determinant for mild symptoms and compared the expression of interferon-related genes between the severity groups. We found that both Mild and Moderate groups have a significantly higher expression level of interferon related genes compared to the Severe group (Table 8). In other words, the Severe group expressed interferon related genes a significantly lesser extent than the Moderate or Mild group.   geal tract was assessed and compared across the three groups. Diversity within individual samples was evaluated using an alpha diversity measure based on the Shannon Index, in order to measure species abundance and evenness across groups (Fig. 4a). Compositional changes between the samples were evaluated for their beta diversity using a JACCARD index. Our data revealed that compared to the negative control, the Mild group had a significantly reduced alpha diversity and distinct microbial composition. (Supplementary Fig. 1).

Microbial taxonomy.
Based on the significant difference in microbial diversity among the groups, we further conducted a metagenomics analysis. Taxa analysis of all kingdoms revealed that compared to all the groups, the negative control had a higher (~ 60%) bacterial microbiome. In particular, the Mild group had a significant reduction of the bacterial microbiome (contains less than 5% bacteria on average) and enhanced microbial dysbiosis marked by a decrease in F/B as compared to the negative control (Fig. 4b,c and Supplemental Table 5).
Microbial family enrichment analysis of all four groups showed that the Mild group did not show any distinguishing microbial feature as compared to other groups (Fig. 4d,e). Comparatively, the negative control and the Severe group showed a healthy nasal microbiome since both show enrichment of bacterial families belonging to the Actinobacteria and Firmicutes phyla 23,24 . A heatmap for the qualitative profile of microbial families clearly demonstrates that the microbiomes of the Mild group were dominated by an expansion of Coronaviridae (p < 0.05 in comparison of negative control) (Fig. 4e). Further, statistical analysis shows there is a significant reduction of 13 bacterial and 4 viral families in the mild group as compared to the negative control ( Fig. 4d and Supplemental Table 6). This decrease of microbial families not only contribute to enhanced dysbiosis but also a decrease of microbial diversity in patients with a higher viral load. Detailed statistical comparison of microbial families between groups is provided in Supplemental Table 6.

Discussion
Viral strain, viral load, host response, and co-infection are considered as key elements of viral pathogenesis. Studies analyzing each of these factors in the function of disease severity in COVID-19 infections have previously been published by others. Kimon et al. showed an inverse relationship between viral load and disease severity 13 . A paper describing the host responses in the function of viral load was published recently, in which IFN response genes were found to be the major differentially expressed genes related to the function of viral load, a finding which our study confirms 10 . However, no studies have shown how all of these factors interplay with each other within a defined cohort. To address this important question, we investigated the interplay of these four key components in our COVID-19 patient cohort, which is comprised of patients who were admitted to the hospital with clear symptoms at a very early stage of the outbreak in the USA.

Viral load in COVID-19 patients.
Our analysis showed that COVID-19 patients have a wide range of viral loads in their upper respiratory tracts (as measured using NP swab samples), from a Ct value of 11 (KY9-A10) to Ct value of 40 (the detection limit), leading to a dynamic range > a 10E8. The sensitivity of the CDC-developed assay was equivalent to the other assay that we tested (the Luminex ARIES® platform), indicating that the detection of the high viral loads was not due to a false positive or an abnormally sensitive assay. The variance in viral load was also confirmed by direct viral titration of the NP samples. Since viral load in COVID-19 patients is known to change over the course of infection, and our study is not longitudinal, the variability that we show here could be due to the course of infection. However, all of the samples in this study were collected when the patients were admitted with symptoms; we can therefore assume that they reflect viral load within a narrow range of time after infection. Consequently, despite the limited scope of our snapshot study, it is reasonable to believe that the viral load in COVID-19 patients might vary significantly at the individual level as well. Considering that more   Upregulated  NFKB1, PTPN1   ADAR, BST2, DDX58, HERC5, HLA-C, HLA-E, IFI16, IFI27,  IFI35, IFI6, IFIH1, IFIT1, IFIT2, IFIT3, IFITM1, IFITM2,  IFITM3, IRF2*, ISG15, MX1, MX2, MYD88, NLRC5, NMI,  OAS1, OAS2, OAS3, OASL, RSAD2, STAT1, STAT2, TNFAIP3,  TRIM21, TRIM25, XAF1   CD14, DHX58, EP300, GBP2, HLA-A, HLA-B, HLA-F, HLA-H,  IFNGR2**, IRF7*, ITCH, PSMB8, RIOK3, RIPK2, RNASEL,  SHFL, TAX1BP1, TLR2, TLR3, TLR4, TLR8, TRIM38, (Table 1), our analysis implies that the majority of SARS-CoV-2-positive patients may carry a very small load of infectious virus when they present symptoms. This could be due to higher shedding in the prodromal phase than in the symptomatic phase.
Relation of viral load to disease severity. Viral load has been regarded as a deterministic factor for the severity of disease outcomes (i.e., more virus, worse outcomes); however, this perception has not been proven www.nature.com/scientificreports/ or studied in depth for COVID-19. Rather, there seems to be an inverse correlation between viral load and disease outcomes, as shown here. Kimon et al. presented a similar finding to ours, showing that viral load in NP swab samples was inversely correlated with clinical outcomes and duration of symptoms 13 . As they highlighted in the paper, these phenomena could simply be due to the difference in timing between virus replication and onset of symptoms. SARS-CoV-2 viral load reaches its peak during the pre-symptomatic stage of the disease (a median of 5.2 days post-infection), and starts to decrease after 7 days post-infection when the symptoms are fully developed 25 . Therefore, the inverse relationship between viral load and disease severity could be due to the collection of samples at different points in the progression of the disease. However, considering that all of our samples were collected at the time of admission-when patients were symptomatic-and our disease severity was determined retrospectively, we believe that the effect of collection time on viral load may be limited. Of note, we had one patient who showed a high viral load and died during the treatment (KY-9E10).

Viral strains.
The existence of such high viral load and the negatively skewed distribution of viral load in the NP samples (Fig. 1a) led us to investigate the mechanism behind the high viral load. First, we investigated whether specific viral clades or specific single nucleotide polymorphisms affected the viral load. Korber  Host responses. We profiled the host gene expression against the viral load with two approaches: (1) comparison between stratified groups and (2) correlation analysis between viral load and individual genes. Both analyses showed a clear correlation between up-regulation of interferon-related genes and high viral load. The difference was clearly noticeable at the 25% viral load level, or dCt < -4.7. Samples within Groups 1 and 2 showed a stastically higher expression of interferon-related genes (Fig. 3a,b). This finding was surprising to us because interferon responses are presumptively inhibitory against viral replications; however, our finding clearly showed a strong statistical association between viral load and IFN responses in the upper respiratory tract. One explanation could be that the detected viral RNA is the product of viral clearance by innate immune systems; however, we showed that the samples indeed have high virus titers as shown in Fig. 1. Rather, a better explanation of our data would be that high viral load might have induced strong IFN responses in the upper respiratory tract.
Dysbiosis. Some reports indicated potential bacterial co-infections associated with severe COVID-19, especially Acinetobacter baumannii 29 . However, no information on the viral loads or interferon responses of the patients were known; hence, it is not clear if the co-infection was associated with strong viral replications or with other factors. Our metagenomics analysis indicates a relationship between dysbiosis within the nasal tract microbiota and COVID-19 severity, consistent with previous studies of the gut and respiratory microbiomes [30][31][32][33][34][35][36][37] . However, without retrospective longitudinal data on the microbiome of our patients, it is difficult to determine whether the initial infection is more likely to occur because of a pre-existing dysbiosis. This dysbiosis could also occur because of antibiotic treatments received by these patients. However, given a difference in the microbiota at different viral loads and levels of disease severity, coupled with consistency within the patient groups, the most likely conclusion is that the dysbiosis is due to SARS-CoV-2 infection, not the other way around. Our study showed no specific bacterial groups that proliferated with a high level of SARS-CoV-2 infections. Rather, we found that high viral load samples (i.e., strong Type 1 interferon responses) have almost no bacterial load in our NP samples, and NP samples with low viral loads (i.e., low in interferon responses) had similar loads of microflora to samples with no SARS-CoV-2 infection. Because our study merely shows a correlation, it is not clear if the up-regulated Type 1 interferon played a role in clearing normal microflora. While Type 1 interferons serve a protective purpose in some bacterial infections, in other cases, they exacerbate the infection, especially for secondary infections after a viral infection 38 . Our microbiome data reveal that patients with a high SARS-CoV-2 load have, (i) a distinct metagenomic composition, (ii) significantly reduced overall diversity (particularly decreased microbial diversity), and (iii) enhanced microbial dysbiosis signified by decreased F/B and loss of beneficial bacteria belonging to the Firmicutes and Actinobacteria families. Overall, our study shows a strong correlation between interferon responses and viral load, which inversely correlates with clinical severity. We postulate an interplay of these three factors where a strong up-regulation of antiviral responses mediated by high viral load at the initial phase leads to protective immune responses. This postulation is supported by evidence in the literature that interferon responses are a protective factor for COVID-19 6,7,9,13,39 . Even though the size of the cohort was small, and the nature of the study was correlation-based, we show that a high viral load in the early symptomatic phase may contribute to a strong antiviral response in the host, which could lead to a mild outcome. However, it is not clear if a lower viral load at early infection can be considered an indicator for severe symptoms, considering that the majority of our patients showed a lower viral load. A follow-up study with a larger cohort could confirm our findings; more precise, traceable longitudinal studies should investigate the viral characteristics that contribute to imbalanced innate immune responses, leading to severe outcomes. Our study also suggests that besides the kinetic changes in viral load over the course of infection, the relationship between symptoms and viral load may also play a role in the massive spread of the virus in the community, where asymptomatic or mildly symptomatic SARS-CoV-2-infected people can actively interact with others. We propose to closely monitor and make use of the Ct value of each sample so as to prioritize the allocation of resources and minimize behaviors associated with higher risks of spreading viruses. Virus isolation. The filtrate (~ 0.5 mL) was added to Vero E6 cells seeded in a T-25 flask one day prior to infection for one hour at 37 °C. Then cells were replenished with fresh media and incubated at 37 °C in an incubator with 5% CO 2 . After a full CPE was monitored (at 4 to 5 DPI), the cell supernatant was harvested, and cellular debris was removed by centrifugation at 3000 rpm.
Virus titration. Vero E6 cells grown in 24-well plates were incubated with viral samples diluted in the cell culture media. Vero E6 cells were seeded in 24-well plates and incubated to be confluent for 24 h. After cell supernatant was removed, tenfold serially diluted virus samples were added to each well (167 µL/well) and incubated for 1 h at 37 °C in a CO 2 incubator. Cells were washed with PBS (1 ml/well) once, overlaid with DMEM with 0.5% methlycelluose and 10% FBS, and further cultured for 5 days. Cells were fixed and plaques by CPE were visualized by counter-staining with crystal violet.
De novo RNA sequencing. Total RNA was isolated from samples with a magnetic beads-based RNA isolation method (Direct-zol-96 MagBead RNA, Zymo Research). A total of 200 µL of sample was used and then RNA was eluted in 25 µL. Libraries were prepared using the Ovation SoLo RNA-Seq Systems (NuGEN) with approximately10 ng of RNA. Libraries were subjected to a depletion of human and yeast rRNA using AnyDeplete Probe Mix (NuGEN) and (NuGEN). A total 1.3 mL of the final library at 1.8 pM, with 1% PhIX spike in was used for sequencing. Sequencing was performed on Illumina NextSeq 500 using the NextSeq 500/550 75 cycle High Output Kit v2.5 (20,024,906).
Sequence pre-processing and assembly. The demultiplexed reads for each sample were assembled into preliminary contigs using MEGAHIT (v1.2.9) 42 . These preliminary contigs were then examined for their similarity to known Betacoronavirus sequences in order to merge them where possible, resulting in a preliminary viral assembly. The raw reads were then mapped back onto the preliminary assemblies using STAR (v2.6) 43 . Each position of the preliminary reference was further examined for locations varying from the actual reads using bcftools mpileup (v1.8) 44 , resulting in a small number of corrections to the preliminary reference. The raw reads were deposited into NCBI's SRA (bioproject PRJNA626685) and the assembled viral genomes were deposited into GenBank (accessions MZ472095-MZ472108).
SNP Analysis. Lastz 45 was used to perform a pairwise alignment comparing the SARS-CoV-2 reference assembly NC_045512.2 against viral sequences of the isolates to identify locations having gaps, SNPs, and insertions/deletions across each pair.
Metagenomics analysis. The sequences were aligned to the human hg38.p12 and SARS-CoV-2 reference NC_045512 reference genomes using STAR (version 2.6) 43 . Sequences not mapping to the host human genome were then analyzed for metagenomics composition using metaphlan2 46 . Functional profiling of metagenomics composition was performed using LEfSe 47 to determine a significant linear discriminant analysis (LDA) score between groups.
Ethic statement. The