Biomarker candidates for progression and clinical management of COVID-19 associated pneumonia at time of admission

COVID-19 pathophysiology is currently not fully understood, reliable prognostic factors remain elusive, and few specific therapeutic strategies have been proposed. In this scenario, availability of biomarkers is a priority. MS-based Proteomics techniques were used to profile the proteome of 81 plasma samples extracted in four consecutive days from 23 hospitalized COVID-19 associated pneumonia patients. Samples from 10 subjects that reached a critical condition during their hospital stay and 10 matched non-severe controls were drawn before the administration of any COVID-19 specific treatment and used to identify potential biomarkers of COVID-19 prognosis. Additionally, we compared the proteome of five patients before and after glucocorticoids and tocilizumab treatment, to assess the changes induced by the therapy on our selected candidates. Forty-two proteins were differentially expressed between patients' evolution groups at 10% FDR. Twelve proteins showed lower levels in critical patients (fold-changes 1.20–3.58), of which OAS3 and COG5 found their expression increased after COVID-19 specific therapy. Most of the 30 proteins over-expressed in critical patients (fold-changes 1.17–4.43) were linked to inflammation, coagulation, lipids metabolism, complement or immunoglobulins, and a third of them decreased their expression after treatment. We propose a set of candidate proteins for biomarkers of COVID-19 prognosis at the time of hospital admission. The study design employed is distinctive from previous works and aimed to optimize the chances of the candidates to be validated in confirmatory studies and, eventually, to play a useful role in the clinical practice.

www.nature.com/scientificreports/ The specific pathophysiology of the SARS-CoV-2 infection is not yet fully understood. Different molecular, biological and immunological pathways have been suggested to describe both its aggressiveness and the specific body response to the viral infection [2][3][4] . Unfortunately, none of them has provided an explanation to all the COVID-19 features, and few of them have resulted in the proposal of new therapeutic strategies 5 . Approximately 80% of affected patients suffer from an asymptomatic to a mild form of the disease, while the remaining 20% of patients display a more severe form 6 . Although no current standardized treatment is available, admitted patients presenting a more severe form usually receive therapy based in three management options: glucocorticoids 7,8 ; antiviral drugs, currently remdesivir 9,10 ; and immunosuppressive drugs as interleukin-6 inhibitors (IL-6) 11,12 . Nevertheless, controversies remain about the efficacy and cost-benefits of these treatments, even after recent evaluation in clinical trials 13,14 . Thereby, the research on prognosis and treatment biomarkers seems a priority to identify patients at risk of critical disease evolution at the time of admission, and to provide them with therapies tailored to their clinical presentation.
Studies on SARS-CoV-2 infection using mass spectrometry (MS)-based high-throughput proteomics technologies are now leading the compilation of a large amount of protein data, which is likely to contribute to a complete understanding of the infection. In contrast to the traditional clinical approaches 15,16 , MS allows the detection of proteome changes at a global scale, simultaneously interrogating the expression levels of a high number of proteins according to a specific characteristic of interest, in an agnostic way and providing broad insights on the protein networks involved in the molecular pathways 17 . Recent results have evidenced that COVID-19 has a substantial impact on plasma and sera proteome [18][19][20] .This holds a promising potential for the prompt identification of biomarkers for the diagnosis, prognosis and/or therapeutic targeting in this rapidly growing pandemic which requires a quick scientific response. For studies such as these to succeed, a proper definition of the clinical outcome on a homogeneous and accurately designed set of patients is crucial 21 .
This study aims to identify candidate proteins with the potential to be used in clinical practice as prognosis and clinical management biomarkers of the COVID-19 associated pneumonia at the time of admission. We conducted a MS-based proteomics experiment on plasma samples from 20 matched COVID-19 hospital patients before any prescription of COVID-19 specific treatment. Additionally, we profiled the plasma proteome of five patients before and after glucocorticoids and tocilizumab treatment, and assessed the changes induced by the therapy on our selected biomarker candidates (Fig. 1). + S D S 6 0 ºC ( S A R S -C o V-2 in a c Ɵv a Ɵo n ) + S -t r a p + L y s -C /Tr y p L C -M S MS P r o t e in id e n Ɵfic a Ɵo n a n d q u a n Ɵfic a Ɵo n

Results
We evaluated data from 13 COVID-19 patients with pneumonia who progressed to a critical condition during their hospital stay, together with 10 matched controls that experienced a favourable disease evolution (Fig. 1 2). To identify which proteins were responsible for these differences, we performed a proteome-wise differential expression analysis on the 10 case-control sets where estimations for all time points were averaged within each evolution group (see details in Supplementary Methods). Up to 350 proteins provided quantifications in enough samples to perform such analysis, which identified 42 proteins differentially expressed between the two groups at 10% FDR (  Fig. 3, Supplementary Fig. 1).
Among the 30 proteins over-expressed in the critical condition group, (FCs from 1.17 to 4.43), we notably found a set of proteins related to the inflammatory cascade and immune modulation as Beta  Table 2).

Discussion
In our MS-based proteomics study, we found a total of 42 proteins differentially expressed in the plasma of critical and non-critical COVID-19 pneumonia patients at the time of admission. Furthermore, large-scale changes in the proteome of critical subjects were observed after their treatment with glucocorticoid and tocilizumab. Among them, two proteins (17%) under-expressed in critical patients underwent a significantly increase after therapy, while 11 proteins (37%) over-expressed in critical patients showed their expression decreased. Previous studies using a broad range of design and proteomics technologies have reported several proteins (27-93) as candidates of COVID-19 severity. Their profile of patients' plasma pointed to high specificity of several inflammation and immune modulators, in particular, pro-inflammatory signalling both upstream and downstream of IL-6, metabolic and immune dysregulation, and platelet and coagulation system activation [18][19][20]22,23 . Unfortunately, and despite these promising results, no sensitive biomarkers of COVID-19 prognosis have been successfully applied in clinical practice or trials to this date, and none of them had been evaluated for their usefulness in patients' clinical management.
Among the 12 proteins showing over-expression in non-critical patients, two of them stand out based on the magnitude of the differences observed between evolution groups and the changes induced in their expression by the therapy: Olygoadenilate synthetase 3 (OAS3) and Conserved oligomeric Golgi complex subunit 5 (COG5).
OAS3 is the highest molecular weight isoform among the OAS family, and its expression is activated by Type-1 and Type-3 interferons. Its combined high affinity for dsRNA and capability to produce 2-5As of sufficient length to activate RNase L, suggests that OAS3 might be a potent activator of RNase L, providing antiviral activity against RNA viruses. Previous studies identified an impaired type 1interferon (IFN) response associated with a persistent blood viral load and an exacerbated inflammatory response, so we hypothesize that subjects with high levels of OAS3 might develop a better response to SARS-CoV-2 24 . Additionally, type 1 IFN are crucial for immediate antiviral response by restricting replication and spread of the viruses. Therefore, an adequate production of IFN leads to an efficient T cell response while a delayed IFN response might cause the T cell exhaustion present in critical COVID-19 subjects 25 26 . This finding suggests that COG5 might have a role in protection from COVID-19 infection via processes related to membrane transport. OAS3 and COG5 are two of the proteins that, while showing a lower expression in critical patients, their expression increased after treatment with glucocorticoids and tocilizumab. Taken together, these results suggest a role of these two proteins in COVID-19 severity and underscore their potential as biomarkers not only as for prognosis, but also in the decision making of patients' clinical management. On the other hand, and although also over-expressed in non-critical subjects, KRT6A and Immunoglobulin J Chain did not significantly change after treatment and, hence, their value as treatment biomarkers warrant further research.  Table 2. Proteins differentially expressed between critical and non-critical patients at 10% of false discovery rate (FDR). The table also shows the results from the analysis comparing Treated and Not-treated samples for the same proteins. Results are derived from a linear mixed-effects model fitted to each protein independently that included peptide and patient as random effects. Digestion batch, evolution group (critical/non-critical), blood extraction day and the interaction of the two latter were modelled as fixed effects. Comparisons were performed by averaging time point estimations within each evolution group. Differences between treatment status were assessed in an analogous way, using a model that included treatment and batch of samples' digestion as fixed effects. Positive fold-changes indicate over-expression in critical patients or in Treated samples while negative fold-changes represent over-expression in non-critical patients or in Not-treated samples, respectively. Statistical significance was assessed using a Wald test derived from the models. The Benjamini-Hochberg method was used for control of the FDR (Adjusted p value).  30 were found over-expressed in critical condition. Regarding the inflammatory cascade, only Haptoglobin showed an expression decrease after treatment, while the rest of them remained over-expressed. This result could be explained by the short time interval between extraction of pre-and post-treatment samples (within 48-72 h), which might not be enough to observe a substantial decrease in the expression of the inflammatory factors. Confirmation of this observation, however, requires further research.
Three proteins involved in the coagulation process were over-expressed in critical patients. Transferrin is an important clotting regulator, which might be related to thrombotic events in COVID-19 27 . Notably, Transferrin levels dropped after treatment, which may indicate an effect on the coagulation cascade. Kininogen-1 and fibrinogen alpha chain are involved in the stimulation of coagulation processes. Their decreased levels in post-treatment samples suggest an impact of therapy on the regulation of coagulation processes, as several pro-coagulant and anti-coagulant factors are found stimulated. Nevertheless, these observations need further research to draw definitive conclusions.
Higher levels of a set of proteins related to lipid metabolism were also associated to critical condition, which also experienced an increase of expression after treatment. Although we hypothesize that their association might be related to inflammation, this point requires confirmation in specific studies.
Finally, we identified several proteins with no apparent shared signalling pathway. First and intriguingly we identified, ARNTL, which is part of biological clock helping to environment adaptation 28 and showed the most extreme over-expression in critical patients. Previous studies have observed an increase of ARNTL levels in response to hypoxia conditions 29 . Interestingly, its expression significantly decreased after treatment, which could indicate potential not only for prognosis but also as a biomarker for clinical management. Other proteins displaying such expression decrease after treatment and, therefore, with potential value for clinical management were SP9, HGF and OAF.
Regarding immunoglobulins, our results suggests that effector functions of antibodies might be beneficial to control SARS-COV-2 infection. In non-critical patients, we observed an up-regulation of the classical complement activation pathways, while higher levels of alternative pathways were observed in subjects that reached a critical status 30 . Nevertheless, these associations could highly depend on the immunoglobulin subclasses and the mechanism of complement activation, so interpretation of these results should be taken with caution. It is noteworthy to highlight that, after treatment, all complement components significantly increased pointing to an immunological rather than anti-inflammatory effect of the treatment prescribed. Figure 3. Volcano plot summarizing the results obtained from the differential expression analysis between critical and non-critical patients. X-axis represents the log2-transformed fold-change (FC). Y-axis shows the minus-log10-transformed p value associated to the protein in the comparison. Positive log2-fold-changes indicate over-expression in critical patients while negative log2-fold-changes represent over-expression in noncritical patients. Results were derived from a linear mixed-effects model fitted to each protein independently that included peptide and patient as random effects. Digestion batch, evolution group (critical/non-critical), blood extraction day and the interaction of the two latter were modelled as fixed effects. Comparisons were performed by averaging time point estimations within each evolution group. Statistical significance was assessed using a Wald test derived from the models. (Figure created  www.nature.com/scientificreports/ Strengths and limitations. Our study has some strengths and limitations. Data collection was carried out at the end of the first COVID-19 incidence peak in Spain (late April 2020), which prevented us to recruit a higher number of patients. The small sample size analysed in this study determines the exploratory and provisional nature of its results. Therefore, a confirmatory study is required in a larger and independent set of patients to validate the reliability and applicability of these findings. This is especially true for the comparisons of pre-and post-treatment samples where, in spite of the paired design that enhance the statistical power of these analyses, only five samples per group were available. The exclusion criteria regarding age and previous pathologies also represent a limitation, as they avoid extrapolating the results to the general population. On the other hand, this patients' selection also represents an advantage, as it allows to characterize the specific proteome associated with COVID-19 evolution, using a well-defined clinical outcome, in a highly homogeneous set of SARS-CoV-2 patients and, thus, increasing the chances to identify biomarkers of a mild to moderate magnitude. This study was carried out in a prospective cohort of patients with no previous COVID-19 specific treatment, although all patients with an unfavourable evolution ended up with some form of such therapies. Up to four blood samples from each patient in consecutive days were analysed, together with samples extracted before and after treatment with COVID-19 specific therapy for a limited number of patients, which ensured the robustness and consistency of the findings across the follow-up and allowed to obtain information about the candidates' potential as clinical management biomarkers. This design makes of the proteins identified in our study a good set of candidates for biomarkers of COVID-19 evolution, with potential to discriminate, at the time of admission, patients who could benefit from an early and more aggressive treatment. We reported unique candidates, especially three proteins, which had not been previously described as potential biomarkers for COVID-19 severity.
In conclusion, we propose a set of proteins as candidates for biomarkers of prognosis and clinical management of COVID-19 associated pneumonia patients. Since promising new therapies are ongoing, testing and validating the results reported in this work may have a critical impact on driving decision-making in clinical practice (see Supplementary Material for an extended Discussion section).

Material and methods
Patients and samples. Study's subjects were selected from a prospective cohort of patients systematically admitted to Hospital Universitari Parc Taulí between 14 and 28th April 2020. Patients had a confirmed diagnosis of COVID-19 based on viral sequence detection by reverse transcription-polymerase chain reaction (RT-PCR) of nasopharyngeal and/or oropharyngeal swab. All patients also showed SARS-CoV-2 pneumonia (defined as peripheral bilobar or bilateral infiltrates). Patients were included in the study before any prescription of COVID-19 specific treatment, as glucocorticoids, remdesivir or IL-6 inhibitors. However, all patients had started hydroxychloroquine, azithromycin and/or lopinavir/ritonavir treatment as per clinical practice at that time, previously to the time of inclusion. Recruitment stopped on 28th April due to Spain's favourable evolution of the pandemic at that time, and the subsequent drop in the number of COVID-19 hospitalized patients.
Exclusion criteria aimed to homogenize the patients' sample to optimize the chances for identification of clinically relevant markers in our proteomic study, and comprised potential confounding factors such as immunomodulatory treatments, active chemotherapy, age over 75 years, chronic renal failure, or patients under hemodialysis treatment, previous immunodeficiency, severe chronic obstructive pulmonary disease (FEV1 < 50%) and any opportunistic infection. All patients were remotely monitored to establish their condition status during follow-up as stable or as progression to critical COVID-19. Criteria for critical evolution was defined a priori as clinical features such as respiratory rate ≥ 30 breaths per minute with a PaO2 < 94% while on FiO2 ≥ 0.35, PaO2/ FiO2 ratio < 200, or non-invasive mechanical ventilation or orotracheal intubation requirement. All patients who progressed to a critical disease condition (n = 13) were included in our study. For 10 of them, we selected a control (stable) patient matched by age, gender and, when possible, by classical cardiovascular risk factors, for a total of 23 individuals (Fig. 1, Table 1, Supplementary Table 1). According to the clinical management guidelines used at that time, all critical patients ended up under glucocorticoids and tocilizumab treatment. In contrast, none of these therapies were prescribed for any of the matched controls. All the subjects included in this study survived the infection process. This design and the characteristics of the patients selected for the study provide a suitable framework for the identification of prognostic biomarkers in the plasma proteome of COVID-19 patients, which could be used to discriminate patients at high risk of progressing to a critical condition of the disease and, thus, likely to benefit from specific treatments.
All patients systematically underwent four daily consecutive blood samples to get a complete display of their plasma proteome during the first days after hospital ward. Samples collection was stopped when the patient progressed to a critical condition or glucocorticoid treatment was prescribed according to clinical practice. Additionally, we obtained blood samples from five patients after glucocorticoids and tocilizumab treatment, which enabled us to profile proteome changes induced by these treatments and to assess these changes specifically in the prognosis biomarker candidates. Plasma samples were centrifuged, diluted, processed and stored until proteomic procedures. Patients' clinical information is described in Table 1 (see further details in Supplementary Methods).

Proteomics analysis.
For the MS-based proteomic analysis, we used a standardized label-free approach workflow described as follows. In order to control and correct experimental variation sources, samples were processed in batches that included a complete set of case-control patients. Ten μl of deactivated plasma sample (diluted 1:1 in SDS 8% 0.1 M DTT and heated at 60 °C) were reincubated with 4% SDS 0.05 M DTT for 5 min at 95 °C followed to 30 min at 55 °C. Detergent removal free cysteine thiols alkylation with iodoacetamide and sample digestion with Trypsin/LysC was done using S-trap columns (S-Trap mini kit (10 × 100 -300 μg), reference K02-mini-10, Protifi) according to the established manufacturer protocol. Digested plasma samples were dried and reconstituted with 1% formic acid, 3% acetonitrile in an aqueous solution. On-line nanoLC- www.nature.com/scientificreports/ ESI-MS/MS was performed using a Dionex Ultimate 3000 ultrahigh-pressure chromatographic system coupled to an Orbitrap Fusion Lumos Tribrid mass spectrometer (Thermo Scientific). The Advion TriVersa NanoMate (Advion Inc. Biosciences) was used as the nanospray interface. Sample injections (600 ng of protein on column) were carried out in a specific order previously assigned at random. The mass spectrometer was operated in datadependent acquisition (DDA) mode. Database search was done with Proteome Discoverer v2.3.0.523 (Thermo Scientific) using Sequenst HT as a search engine and Minora Feature Detector node to extract the LC-MS peaks used for peptide and protein quantification. SwissProt Human (released 2020_06) and Swissprot SARS (released 2020_07) databases were used. Peptides with a False Discovery Rate (FDR) < 1% were considered as positive identifications with a high confidence level. Unique peptides (peptides that are not shared between different protein groups) were considered for further quantitative and statistical analysis (see Supplementary Methods for a more detailed description). For quality control purposes, contribution of technical and clinical sources to the variability of the data was evaluated using a principal components analysis (PCA). Evolution groups differences in the proteomic data were assessed for each protein independently, using linear mixed-effects models fitted to the log2-transformed expression values. The models included the patient's disease evolution (critical/non-critical), the time point of blood extraction (day 1-day 4) and the interaction of these two terms. Batch of sample's' digestion was considered as covariate for statistical control and random intercepts were considered to model the sample's patient of origin and the feature effect (peptide + modification + charge) when needed. Differences between treatment status (glucocorticoids and tocilizumab) were assessed in an analogous way, using a model that included treatment and batch of samples' digestion as fixed effects. A 10% FDR threshold was set for statistical significance. Results were graphically represented using heatmaps and Volcano plots (see further details in Supplementary Methods).
Ethics approval and consent to participate. The Local Ethical Committee approved this study at the Parc Taulí Hospital Universitari (2020/569,14 of April 2020). All patients were verbally informed, and a witness informed consent was obtained as per regulatory conditions for COVID-19 studies in Spain. All methods were performed following the relevant guidelines and regulations.

Data availability
All data relevant to the study are included in the article or uploaded as supplementary information. All data is available for the authors upon request.