Complex changes in serum protein levels in COVID-19 convalescents

The COVID-19 pandemic, triggered by severe acute respiratory syndrome coronavirus 2, has affected millions of people worldwide. Much research has been dedicated to our understanding of COVID-19 disease heterogeneity and severity, but less is known about recovery associated changes. To address this gap in knowledge, we quantified the proteome from serum samples from 29 COVID-19 convalescents and 29 age-, race-, and sex-matched healthy controls. Samples were acquired within the first months of the pandemic. Many proteins from pathways known to change during acute COVID-19 illness, such as from the complement cascade, coagulation system, inflammation and adaptive immune system, had returned to levels seen in healthy controls. In comparison, we identified 22 and 15 proteins with significantly elevated and lowered levels, respectively, amongst COVID-19 convalescents compared to healthy controls. Some of the changes were similar to those observed for the acute phase of the disease, i.e. elevated levels of proteins from hemolysis, the adaptive immune systems, and inflammation. In contrast, some alterations opposed those in the acute phase, e.g. elevated levels of CETP and APOA1 which function in lipid/cholesterol metabolism, and decreased levels of proteins from the complement cascade (e.g. C1R, C1S, and VWF), the coagulation system (e.g. THBS1 and VWF), and the regulation of the actin cytoskeleton (e.g. PFN1 and CFL1) amongst COVID-19 convalescents. We speculate that some of these shifts might originate from a transient decrease in platelet counts upon recovery from the disease. Finally, we observed race-specific changes, e.g. with respect to immunoglobulins and proteins related to cholesterol metabolism.

We analyzed pooled quality control (QC) samples along with cohort examples (every 6-10 samples).Data was acquired in four sets (batches) listed in Supplementary Information 2. To assess variability of the data, we examined the 334 proteins we discuss in the main text for their variability across the QC samples (labeled 'Upool' in the figure) .All data was normalized within and across batches (sets) as described in the Methods.
A. Antibody titer levels amongst COVID-19 convalescents (Titer as 1:X) and Days since Diagnosis show substantial correlation.Importantly, PZP is the only protein of the 334 total proteins with a sex difference: it has slightly higher levels in healthy women than inhealthy men (adjusted p-value < 0.20).As PZP levels are thought to associate with pregnancy, the results suggest that some of the female individuals were pregnant, causing the difference in levels between the two sexes.Further, while not significant, LTF shows a general decline in levels with age of healthy female individuals, concurrent with the role of LTF in breastmilk.

Figure S3. Quality control: expected expression patterns of obesity related proteins
The figure shows the meta data and protein levels for IGFALS and IGF1 for 5 healthy control individuals.
Both proteins have been connected to obesity.The 5 individuals were selected based on similar age (50 to 53 years) but different body mass index (BMI): normal weight (BMI<25) versus obese (BMI>30).Due to the small size of the cohort, there were no additional sets of age-matched individuals from different BMI categories.We found a significant difference in IGFALS and IGF1 levels between normal weight and obese individuals (P-value , 0.05, two-sample t-test).The demographic information and protein level data are provided in Supplementary Information 2.  Clusters: 1 Coagulation cascade; 2 Complement system; 3 Inflammation.

Figure S6. Post-translational modifications
The heatmap shows examples of normalized levels of hexose modified peptides.Three peptides (two for albumin (ALB) and one for Immunoglobulin heavy constant alpha 1 (IGHA1) were significantly more modified amongst COVID-19 convalescents compared to healthy controls (adjusted p-value < 0.05) .The heatmap shows additional modified peptides with an adjusted p-value < 0.20.For visualization, data was row median centered and row Z-score normalized.While highly heterogeneous, healthy controls show more often low levels of modification than COVID-19 convalescents, in particular amongst women.The complete results are provided in Supplemental Information 3.
The peptide sequence is shown in the key below the heatmap.

Figure S1 .
Figure S1.Quality control: fragment intensity variation for quality control samples

B.
The four histograms show the distribution of the Coefficients of Variance (CoV) for the protein levels grouped into the four technical batches.Almost all proteins (330 of 334) have <50% CoV in at least one set which is in the expected range for untargeted (shotgun) proteomics.We did not filter for CoV but marked the few proteins of CoV>50% in Figure3Band in the main text.C. The plot shows all samples, including the QC samples ('Upool'), after normalization in their distribution across the first two principal components.This figure is analogous to Figure 3A.The Healthy controls and COVID-19 convalescents are clearly separated and QC samples cluster in the center.A minor exception are QC samples from technical batch 3 which group locate off center; however COVID-19 convalescents and Healthy control samples from batch 3 cluster correctly with their respective groups, indicating successful data acquisition and normalization.

Figure S2 .
Figure S2.Quality control: expected expression levels of female-specific proteins Heatmap showing levels of Pregnancy Zone Protein (PZP) along with proteins with similar patterns as well as Lactotransferrin (LTF).A. The panel shows the color-coded protein levels for female individuals from both cohorts.B. The panel shows the significance values for different comparisons and different models.Significance values (adjusted p-values) were transformed as follows: if the observed log 10 -transformed level fold change was positive, we calculated 1-[adjusted p-value]; if negative, we calculated -(1-[adjusted p-value]).Dark colors indicate adjusted p-value <0.05; light colors adjusted p-value < 0.20.Columns represent select comparisons: the Overall difference between COVID-19 convalescents and healthy controls; the role of sex and race in a multivariate model in which other factors such as age were also considered; and the role of Days since diagnosis and the presence of Symptoms in a univariate model which considered only one factor at a time.Each statistical model was developed separately for the factor in question and the respective sample set: healthy controls (beige), COVID-19 convalescents (brown); log 10 -transformed ratio of paired protein levels for COVID-19 cases and healthy control (beige-brown striped).The complete results of the statistical testing are provided in Supplementary Information 2.

Figure S4 .
Figure S4.Proteins with statistically different levels between COVID-19 convalescents and healthy controlsShown are the normalized protein levels for the statistically significantly expressed proteins (adjusted p-value < 0.05).All data is available in Supplementary Information 2. Samples are organized according to sex and age.Proteins with high Coefficient of Variance are marked with * (>50%).POC -person of color; S.d.-since diagnosis.

Figure S5 .
Figure S5.Proteins whose levels were similar between COVID-19 convalescents and healthy controlsHeatmap showing proteins from three pathways that are largely similar in healthy controls and COVID-19 convalescents.A.The panel shows the color-coded protein levels with samples sorted according to sex and age.B. The panel shows the significance values (associations) for different comparisons and different models.Significance values (adjusted p-values) were transformed as follows: if the observed log 10 -transformed fold change was positive, we calculated 1-p; if negative, we calculated -(1-p).Dark colors indicate adjusted p-value <0.05; light colors adjusted p-value < 0.20; grey: no significance.Columns represent select comparisons: the 'Overall' difference between COVID-19 convalescents and healthy controls; and the impact of Days since diagnosis, the presence of Symptoms, Sex, Age, and Race in a multivariate model using the healthy control (beige) or COVID-19 convalescents (brown).The complete results of the statistical testing are provided in Supplementary Information 2.

Table S1 : Demographics of COVID-19 convalescents.
Demographic data per healthy and convalescent subject are provided in Supplementary Information 2.