A single transcript for the prognosis of disease severity in COVID-19 patients

With many countries strapped for medical resources due to the COVID-19 pandemic, it is highly desirable to allocate the precious resources to those who need them the most. Several markers have been found to be associated with the disease severity in COVID-19 patients. However, the established markers only display modest prognostic power individually and better markers are urgently needed. The aim of this study is to investigate the potential of S100A12, a prominent marker gene for bacterial infection, in the prognosis of disease severity in COVID-19 patients. To ensure the robustness of the association, a total of 1695 samples from 14 independent transcriptome datasets on sepsis, influenza infection and COVID-19 infection were examined. First, it was demonstrated that S100A12 was a marker for sepsis and severity of sepsis. Then, S100A12 was found to be a marker for severe influenza infection, and there was an upward trend of S100A12 expression as the severity level of influenza infection increased. As for COVID-19 infection, it was found that S100A12 expression was elevated in patients with severe and critical COVID-19 infection. More importantly, S100A12 expression at hospital admission was robustly correlated with future quantitative indexes of disease severity and outcome in COVID-19 patients, superior to established prognostic markers including CRP, PCT, d-dimer, ferritin, LDH and fibrinogen. Thus, S100A12 is a valuable novel prognostic marker for COVID-19 severity and deserves more attention.

Since none of the currently proposed prognostic markers have satisfactory performance, it is highly desirable to discover novel factors with better prognostic power. Certain factors may be unique to COVID-19 infection, but some factors are likely universal to all kinds of severe infection. Sepsis is a severe form of infection. Therefore, it is conceivable that certain prognostic markers for sepsis may be transferable to COVID-19 infection. In fact, viral sepsis was proposed as a mechanism for severe COVID-19 infection 37,38 . It has also been found that the most severe cases (including death) of COVID-19 infection indeed all had sepsis 39 . Among COVID-19 patients, septic patients had significantly abnormal immune profile including higher serum IL-6 40 . Many features were similar between bacterial sepsis and SARS-CoV-2 sepsis, although cytokine storm was generally milder in the latter case 41 . In addition, COVID-19 infection is a type of respiratory viral infection. So certain prognostic markers for other types of respiratory viral infection such as influenza infection may also be transferable to COVID-19 infection. The rationale of using datasets encompassing sepsis, influenza infection and COVID-19 infection is that common factors may exist in different types of infection and certain common factors may be transferable from one type of infection to another.
In our previous works on host response to infection, we proposed several genes as the signature for bacterial infection, among which S100A12 was the most prominent marker [42][43][44] . To investigate the potential prognostic power of S100A12 in COVID-19 infection, a three-step approach was applied. First, using RNA-Seq data from three independent studies on sepsis, I demonstrated that S100A12 was a marker for sepsis and severity of sepsis. Then, using microarray data from six independent studies on influenza infection, I demonstrated that S100A12 was a marker for severity of influenza infection. Finally, using RNA-Seq data from five independent studies on COVID-19 infection, I demonstrated that S100A12 was indeed a valuable prognostic marker for COVID-19 severity.

Materials and methods
RNA-Seq datasets for sepsis. All three datasets were downloaded from NCBI gene expression omnibus (GEO, https:// www. ncbi. nlm. nih. gov/ geo/). All three datasets were derived from whole blood. Dataset GSE154918 contains 105 samples, including 40 samples from healthy controls, 12 samples from patients with uncomplicated infection, 39 samples from sepsis patients, and 14 samples from follow-up of sepsis (no reference available yet). Dataset GSE63042 contains 129 samples, including 23 samples from patients with systemic inflammatory response syndrome (SIRS), 24 samples from patients with uncomplicated sepsis (no disease progression), 21 samples from patients with severe sepsis (severe status at day 0 through day 3), 33 samples from patients with septic shock, and 28 samples from patients with sepsis death 45 . Sepsis is essentially SIRS plus infection. In addition, sepsis is highly heterogeneous due to the pathogen types, site of infection and many other factors. Dataset GSE131411 contains 96 samples from 21 septic shock patients and 11 cardiogenic shock patients, where each patient was sampled at three time points 46 . Septic shock is caused by overwhelming systemic inflammation, while cardiogenic shock is caused by heart problem. In total, 328 samples were included in these three RNA-Seq studies on sepsis.
Microarray datasets for respiratory viral infection. Six datasets were downloaded from GEO. All six datasets were derived from whole blood. Dataset GSE27131 contains 21 samples, including 7 samples from healthy controls, 7 samples from patients with severe H1N1 infection at day zero, and 7 samples from patients with severe H1N1 infection at day six 47 . The severe H1N1 infection was defined as having bilateral chest X-ray infiltrates and requiring mechanical ventilators. Dataset GSE21802 contains 40 samples, including 4 samples from healthy controls, 6 samples from patients with mechanical ventilation at the early course of severe H1N1 infection, 6 samples from patients without mechanical ventilation at the early course, 14 samples from patients with mechanical ventilation at the late course, and 10 samples from patients without mechanical ventilation at the late course 48 . All of the H1N1 patients were ICU patients with acute respiratory stress. Dataset GSE40012 contains 150 samples, including 36 samples from healthy controls at day one and day five, 61 samples from patients with severe bacterial pneumonia at day one through day five, 39 samples from patients with severe H1N1 pneumonia at day one through day five, and 14 samples from patients with severe pneumonia caused by mixed bacterial and viral infection at day one through day five 49 . All of the pneumonia patients were ICU patients.
Dataset GSE68310 contains 488 samples from patients with mild influenza infection in a community monitoring study where people were sampled at baseline, day zero through day twenty one of the symptom onset, and the next spring 50 . These individuals had influenza-like illness but did not have severe respiratory disease. Dataset GSE101702 contains 159 samples, including 52 samples from healthy controls, 63 samples from patients with moderate influenza infection, and 44 samples from patients with severe influenza infection 51 . All of the patients had influenza-like illness. Case assignment was done retrospectively. Moderate infection was defined as having emergency department visit but requiring no invasive respiratory support. Severe infection was defined as having significant respiratory failure requiring mechanical ventilation. Dataset GSE111368 contains 200 samples, including 130 samples from healthy controls, 29 samples from patients with H1N1 infection at tier one severity, 22 samples from patients with H1N1 infection at tier two severity, and 19 samples from patients with H1N1 infection at tier three severity 52 . All of the patients had influenza-like illness. Tier 1 severity was defined as having no substantial respiratory compromise and > 93% blood oxygen saturation. Tier 2 was defined as having < 93% blood oxygen saturation and requiring non-invasive oxygen support. Tier 3 was defined as respiratory compromise requiring invasive mechanical ventilation. In total, 1058 samples were included in these six datasets on influenza infection.  57 . The disease severity in this study was based on WHO guideline. In total, 309 samples were included in these four RNA-Seq studies on COVID-19.

Data analysis.
The expression values of S100A12 were extracted from the processed GEO data. For RNA-Seq data, the normalized counts were log transformed. For microarray data, the normalized expression values were also log transformed when necessary. The group comparison for each dataset was done in R (https:// www.rproje ct. org/) using t-test. The figures were also drawn with R. We shall note that the quality control of the data is intrinsic to the original study design (please refer to the original peer-reviewed publications for more details). For example, in the COVID-19 dataset GSE157103, only adult patients 18 years or older were enrolled, all of which had symptoms compatible with COVID-19 infection. Patients with imminent death were excluded. In addition, there were equal number of patients in the ICU group and non-ICU group, the mean age, mean BMI and proportion of male patients were similar in both groups, and so on. Blood collection and RNA-Seq experiment and data processing followed standard protocols, and two samples were excluded due to poor sequencing quality. In addition, we have found that S100A12 expression is not affected by age or sex in our previous data analysis. The S100A12 expression is also not significantly altered in non-infectious illness in our previous work.
In the original publication of the dataset GSE157103, the authors chose hospital free days at day 45 of hospital admission (HFD-45) as the disease severity index for COVID-19 infection for the following reasons, (1) being a single metric, (2) universally applicable to patients with different severity, (3) more suitable for COVID-19 infection, (4) compatible with COVID-19 development. Death was assigned as 0 HFD-45 value to reflect the disease severity.

Results
S100A12 expression is an indicator of severe infection. First, I examined whether S100A12 expression can be a marker for severe infection especially sepsis. The dataset GSE654918 included samples from healthy controls, uncomplicated infection, sepsis and sepsis follow-up (Fig. 1A). It was evident that S100A12 expression was significantly elevated in the uncomplicated infection group compared to healthy controls (p = 1.95e-8), where 11 of the 12 patients had higher S100A12 expression than the highest value in the control group. The S100A12 expression was further elevated in the sepsis group compared to the uncomplicated infection group (p = 0.0074), where 16 of the 39 sepsis patients (41%) had higher S100A12 expression than the highest value in the uncomplicated infection group. In addition, S100A12 expression was not significantly different between the sepsis follow-up group and the uncomplicated infection group (p = 0.26). Thus, S100A12 expression is a clear indicator of severe infection.
The next question is whether S100A12 expression is correlated with sepsis severity. The dataset GSE63042 included samples from SIRS, mild sepsis, severe sepsis, septic shock and sepsis death (Fig. 1B). It was evident that S100A12 expression was significantly elevated in the mild sepsis group compared to the SIRS group (p = 0.027), further validating S100A12 as a marker for infection. The S100A12 expression was further elevated in the severe sepsis group compared to the mild sepsis group (p = 0.042). Additionally, no significant difference was found between severe sepsis and septic shock or septic death (p > 0.05 for both comparison). Therefore, S100A12 expression is an indicator of severity in sepsis.
As a marker of disease severity, the dynamic pattern is also worth of investigation. The dataset GSE131411 included samples from three time points of septic shock and cardiogenic shock (Fig. 1C). It was evident that S100A12 expression was significantly elevated in the septic group compared to the cardiogenic shock group at the first time point (p = 2.99e-7), again validating S100A12 as a marker for infection. Interestingly, the S100A12 expression was significantly decreased from T1 to T2 (p = 0.002) and also from T2 to T3 (p = 0.023) for patients with septic shock. In fact, 19 of the 21 patients with septic shock had this downward trend during the treatment process, while the other two patients had the lowest expression at the first time point (probably within the normal range). Thus, S100A12 expression is an indicator of treatment progress in sepsis. S100A12 expression is an indicator of severe influenza infection. Next, I examined whether S100A12 expression can be a marker for severe respiratory viral infection especially influenza infection. The dataset GSE27131 included samples from healthy controls and patients with severe H1N1 infection (defined as requiring mechanical ventilation) at day 0 and day 6 of ICU admission ( Fig. 2A). It was evident that S100A12 expression was significantly elevated in the H1N1 group compared to the control group at both time point (p = 9.85e-5 and 4.50e-5). In addition, all of the samples in the patient group had higher S100A12 expression than the highest value in the control group. www.nature.com/scientificreports/ observed in the dataset GSE21802 which also included samples from healthy controls and patients with severe H1N1 infection (Fig. 2B). Compared to the control group, patients with mechanical ventilation had significantly elevated S100A12 expression at both time points (p = 0.019 and 0.014), and even patients without mechanical ventilation also had significantly elevated S100A12 expression at both time points (p = 0.0086 and 0.031). The elevation of S100A12 expression in severe influenza infection can be compared to that in severe bacterial infection. The dataset GSE40012 included samples from healthy controls and patients with severe communityacquired pneumonia at day 0 through day 5 of ICU admission (Fig. 2C). Compared to the control group at both time points, the patients with severe bacterial pneumonia had significantly elevated S100A12 expression at day one (p = 5.74e-13 and 1.79e-12). All of the samples in this group had higher S100A12 expression than the highest value in the control groups, again validating S100A12 expression as a prominent marker for bacterial infection. During the next four days, S100A12 expression stayed high (p < 4.15e-8 for all the comparison against the control group). Compared to the control group at both time points, the patients with severe H1N1 pneumonia also had significantly elevated S100A12 expression at day one (p = 2.58e-5 and 3.62e-5). During the next four days, S100A12 expression also stayed high (p < 1.10e-5 for all the comparison against the control group). In addition, the patient group with mixed bacterial and viral infection also had significantly elevated S100A12 expression compared to the control group (p < 0.046 for all the comparison). Thus, the three studies described above demonstrated that S100A12 expression is a marker for severe influenza infection.
Elevation of S100A12 expression and the severity of influenza infection. Then, I further examined how S100A12 expression is elevated at different severity levels of influenza infection. The dataset GSE68310 included samples from a prospective study of community monitoring, likely all with mild influenza infection. Although small fluctuation was observed in the S100A12 expression, it stayed relatively constant throughout the whole year (p = 0.11 between day 0 of influenza infection and the next spring) (Fig. 3A). This is consistent with previous findings where the marker genes for respiratory viral infection are mainly interferon-stimulated genes (ISGs) such as IFI27 and RSAD2. Thus, S100A12 expression is not elevated in mild influenza infection.
As the severity level of influenza infection increases, we can observe higher level of S100A12 expression. The dataset GSE101702 included samples from healthy controls and patients with influenza infection. Patients with moderate flu had significantly elevated S100A12 expression compared to the control group (p = 1.03e-6) (Fig. 3B). In addition, patients with severe flu also had significantly elevated S100A12 expression compared to the moderate flu group (p = 1.91e-13). This demonstrated a step-wise increase of S100A12 expression as the severity level of influenza infection increased.
This trend was further supported by the dataset GSE111368 which included samples from healthy controls and patients with various severity levels of H1N1 infection, some of which had known status of bacterial co-infection. For patients at tier 1 severity level, they did not have significantly different S100A12 expression compared to the controls unless they had confirmed bacterial co-infection (Fig. 3C). In contrast, patients at tier 2 and tier 3 levels had significantly elevated S100A12 expression compared to the controls (p = 0.025, 0.0093 and 0.0029 for the patient groups at tier 2 level, and p = 0.00043 and 2.93e-7 for the patient groups at tier 3 level). This also validated the upward trend of S100A12 expression as the severity of influenza infection increases.

S100A12 expression is an indicator of severe COVID-19 infection. The more direct question is
whether S100A12 expression is elevated in patients with severe COVID-19 infection. The dataset GSE152641 included samples from healthy controls and hospitalized COVID-19 patients (Fig. 4A). It was evident that S100A12 expression was significantly elevated in the COVID-19 infection group compared to the healthy control group (p = 1.02e-7). More specifically, 31 of the 62 hospitalized COVID-19 patients (50%) had S100A12 expression above the highest value in the control group. Another dataset GSE161731 included samples from both hospitalized and non-hospitalized COVID-19 patients (Fig. 4B). It was evident that S100A12 expression was significantly elevated in the hospitalized COVID-19 group compared to the non-hospitalized COVID-19 group (p = 0.00039). Thus, S100A12 expression is elevated in a subgroup of hospitalized COVID-19 patients.
Another dataset GSE152418 included samples from healthy controls and COVID-19 patients with various severity (Fig. 4C). The S100A12 expression was not significantly different between the moderate COVID-19 infection group and the healthy control group (p = 0.51). However, S100A12 expression was significantly elevated in the severe COVID-19 infection group (p = 0.0066) and ICU group (p = 0.0054) compared to the moderate COVID-19 infection group. Similar results were found in another study with similar experimental design 57 . In that study, S100A12 expression was not significantly different between the moderate cases and the healthy controls, but it was significantly elevated in the severe cases compared to the moderate cases (log2FC = 3.58, q = 0.00034). Therefore, S100A12 expression is indeed elevated in patients with severe COVID-19 infection. S100A12 expression is correlated with future disease severity indexes in COVID-19 infection. To go one step further, it will be interesting to find out whether S100A12 expression at hospital admission is correlated with future quantitative indexes of disease severity in COVID-19 patients (assessed retrospectively). The dataset GSE157103 included samples from COVID-19 patients with various severity using hospital free days at day 45 of hospital admission (HFD-45) as a disease severity index (Fig. 5A). It was evident that S100A12 expression was robustly correlated with disease severity as measured by HFD-45 (r = 0.625), which was much better than the established prognostic markers such as CRP, PCT, ferritin, d-dimers, LDH and fibrinogen (r between 0.009 and 0.35, Fig. 5D). In contrast, the expression of ISGs displayed extremely weak correlation with HFD-45 (r = 0.025 for IFI27 and r = 0.11 for RSAD2).
For more detailed examination, the samples were equally divided into four groups based on the S100A12 expression (25 patients in each group). Using the Q1 group as the reference, the Q2 group was only marginally different from www.nature.com/scientificreports/ (p = 0.062), while the other two groups were significantly different from the Q1 group (p = 4.25e-6 for the Q3 group, and p = 2.23e-11 for the Q4 group) (Fig. 5C). More specifically, the Q1 group had only one patient with HFD-45 less than 15 days, while it was 4, 13 and 15 patients in the Q2, Q3, and Q4 groups, respectively (Fig. 5B). This can be considered as having fourfold, 13-fold and 15-fold increased risk of having severe patients in the Q2, Q3, and Q4 groups, respectively. On the other hand, the Q1 group had 21 patients with HFD-45 more than 30 days, while it was 14, 7 and 1 patients in the Q2, Q3, and Q4 groups, respectively (Fig. 5B).
As additional assessment of disease severity, information on ICU admission and mechanical ventilation were included in this study. There were 4 patients admitted to ICU in the Q1 group, while it was 9, 19 and 18 patients in the Q2, Q3 and Q4 groups, respectively (Fig. 5B). In addition, there were only 2 patients using mechanical ventilation in the Q1 group, while it was 7, 13 and 21 patients in the Q2, Q3 and Q4 groups, respectively.
The same study included 26 non-COVID-19 patients. Due to the limited sample size, these patients were only divided into two groups based on the S100A12 expression (H1 and H2 groups). The H1 group had 4 of the 13 patients admitted to ICU, while the H2 group had 12 of the 13 patients admitted to ICU. None of the patients in the H1 group used mechanical ventilation, while 9 of the 13 patients in the H2 group used mechanical ventilation. Thus, S100A12 expression seems to be a more general indicator of disease severity, which deserves further investigation.

Discussion
A variety of biomarkers have been proposed for the prognosis of disease severity in COVID-19 patients, and some of the more accessible ones have been validated in many independent studies. However, all of the reported biomarkers only have modest prognostic power. In the current study, it has been demonstrated that S100A12 expression in the whole blood is a robust marker for severe infection and severe respiratory viral infection. Not surprisingly, it is also a marker for severe COVID-19 and is robustly correlated with future quantitative indexes of COVID-19 severity, much better than the established prognostic biomarkers. Due to limited data availability specifically for COVID-19, relevant data on severe infection and severe respiratory viral infection has been extensively investigated in this study. The intention of this study is to find a biomarker universally applicable to infection in general and more specifically to respiratory viral infection. If the purpose is find biomarkers unique to COVID-19, a lot more data on COVID-19 will be required for the conclusion to be convincing.
I shall further clarify that the aim of this work is to find novel prognostic markers for COVID-19, not diagnostic markers. The field of COVID-19 diagnosis is quite mature. There are dozens of commercial kits available with high sensitivity and specificity. There are also commercial kits available with fast and convenient detection of COVID-19. Additionally, there are commercial kits for simultaneous detection of dozens of common pathogens including COVID-19. As for host response to different types of infection, people have also derived gene panels to differentiate common infection types 53,54 . However, it may not be practical to clinical use because hundreds of genes are required.
The reason for the selection of S100A12 expression in this study is that it stands out to be the most prominent marker for bacterial infection in our previous works. Since most genes including immune-related genes are multifunctional, it is conceivable that the function of S100A12 is not limited to the response to bacterial infection. It has been well-established that the signature of host response to viral infection is ISGs. However, the severity of COVID-19 does not seem to be correlated with the expression of ISGs at all. It is possible that patients with severe COVID-19 may activate pathways involving S100A12 in addition to the initial activation of interferon signaling pathways as a response to the overwhelming infection. The exact role of S100A12 in the response to severe infection will require more in-depth investigation.
It shall be noted that ISGs as the signature for host response to respiratory viral infection was mainly derived from mild infections or even human challenge experiments. Host response to severe respiratory viral infection including COVID-19 could be much more complicated. It's not entirely surprising that "bacterial signature" such as S100A12 is activated in severe respiratory viral infection. It's possible that sometimes viral infection is simply overwhelming for the immune systems of certain infected individuals and interferon response by itself is way too weak to stop virus replication and systemic damage to the human host. More specifically, neutrophils which express high level of S100A12 could be heavily involved in host response to severe COVID-19 even at the early stage of the disease development, including both neutrophil expansion and neutrophil-related gene activation. www.nature.com/scientificreports/