Distinct SARS-CoV-2 antibody reactivity patterns elicited by natural infection and mRNA vaccination

We analyzed data from two ongoing COVID-19 longitudinal serological surveys in Orange County, CA., between April 2020 and March 2021. A total of 8476 finger stick blood specimens were collected before and after a vaccination campaign. IgG levels were determined using a multiplex antigen microarray containing antigens from SARS-CoV-2, SARS, MERS, Common CoV, and Influenza. Twenty-six percent of specimens from unvaccinated Orange County residents in December 2020 were SARS-CoV-2 seropositive; out of 852 seropositive individuals 77 had symptoms and 9 sought medical care. The antibody response was predominantly against nucleocapsid (NP), full length, and S2 domain of spike. Anti-receptor binding domain (RBD) reactivity was low and not cross-reactive against SARS S1 or SARS RBD. A vaccination campaign at the University of California Irvine Medical Center (UCIMC) started on December, 2020 and 6724 healthcare workers were vaccinated within 3 weeks. Seroprevalence increased from 13% pre-vaccination to 79% post-vaccination in January, 93% in February, and 99% in March. mRNA vaccination induced higher antibody levels than natural exposure, especially against the RBD domain and cross-reactivity against SARS RBD and S1 was observed. Nucleocapsid protein antibodies can be used to distinguish vaccinees to classify pre-exposure to SARS-CoV-2 Previously infected individuals developed higher antibody titers to the vaccine than non pre-exposed individuals. Hospitalized patients in intensive care with severe disease reach significantly higher antibody levels than mild cases, but lower antibody levels compared to the vaccine. These results indicate that mRNA vaccination rapidly induces a much stronger and broader antibody response than SARS-CoV-2 infection.

Next, a similar approach is applied to flag samples for which the overall control spots distribution is out of range (2*IQR + third Quartile for the upper limit and first quartile -2*IQR for the lower limit). For this, all controls spots of a given sample are used. Out of range samples are flagged for further visual inspection or reprobing.
Finally, the printing buffer background reactivity is subtracted from each spot and the samples are normalized.
Step 2: Normalization Data normalization is performed in two steps. First The control spots are normalized against the training set using the Quantile Normalization method. This allows to calculate a normalization factor that will be used to rescale the data to match the training set and preserving the individual reactivity diversity. After normalizing the control spots, their sum is calculated. A rescaling factor is calculated by dividing the sum of the normalized control spots of the training set by the sum of the normalized control spots of each sample. The resulting factor is then multiplied by the reactivity of each spot resulting in a rescaled data frame. The mean reactivity of the normalized data is then calculated.
The Construction of the prediction models was performed as following.
1. Data is pre-processed and normalized as described above.
2. The reference data set was decomposed into a vector using the function 'unmatrix' from the package gData (version 2.18.0).
3. A mixture model is calculated for the vector using the function 'normalmixEM' from the package 'mixtools' (version 1.2.0). 4. A cutoff is then calculated as 3 standard deviations over the mean of the negative signal curve. 5. Wilcox test for each antigen was performed comparing the positive controls and negatives control, considering significant, antigens with p < 0.05.
following the selection of seropositive antigens, an optimal predictive combination of these antigens was selected. (that left us with 7 antigens as seropositive for IgG, and 8 for IgM).
The selection was performed as follows: 1. For every possible combination of the seropositive SARS-CoV-2 antigens from 1 all (7 for IgG and 8 for IgM), the reference set was randomly divided into a training and a testing sets at a 70%/30% ratio. 2. The coordinates of each candidate were compared in order to select the candidate with the highest sensitivity, given a fixed specificity of 1 (100%).
In addition to the logistic regression model, a Random Forest model was constructed using all reactive antigens.
After Data Normalization, the predictions models, constructed as described above, are loaded and reactivity predictions are performed using Random Forest and Logistic Regression for the multi antigen combinations. In addition to the multi antigen predictions, a prediction for each single SARS-CoV-2 antigen was performed for every sample, for both IgG, and IgM. These predictions were performed using the threshold calculated using the optimal 'youden' index. Every sample can be classified as reactive or not reactive for each single SARS-CoV-2 antigen.
The report phase consists on the output of single pdf files with the individual subject predictions and interpretation. The file consists on a brief explanation of the array on the first page, as well as some information on the performance of the array with the current settings. In addition, on the first page there is a short disclaimer of the scope and limitations of the assay.
The second page consists of a table for all the SARS-CoV-2 antigens with their ROC predictions. These predictions are for a qualitative understanding of one's reactivity and may not directly correlate with the multi antigen prediction.
The Multi antigen prediction, or the sample classification into the three reactive groups, is presented also on a short table displaying the prediction of IgG and IgM separately.
The overall sero-reactivity of the sample to all antigens is depicted on two graphs on the second page. One showing the reactivity for IgG and one for IgM.
On each graph, the individual`s reactivity is represented as dots with its standard errors.
For reference, a red line representing the positive control mean reactivity with its confidence interval, as well as a blue line representing the negative controls mean reactivity with its confidence interval are also plotted.  The boxes represent the first quartile, median and third quartile and the whiskers extend 1.5 times the interquartile range (IQR). Wilcoxon test was performed for pairwise comparisons. The figure shows that antibody responses against common cold antigens are not significantly different in both populations. A relatively higher reactivity was for the UCIMC group was observed for the influenza antigens.

Supplementary Figure 2. General COVAM analysis pipeline
The general analysis pipeline consists of three main steps: the preprocessing, the normalization and then the statistical prediction analysis. The preprocessing includes steps like calculation the Signal to Noise Ratio (SNR) and determine if a sample needs to be further checked or re-assayed (due to the background reactivity levels). If successful, samples are successful analyzed for their SNR, the controls spots are checked to remove outlier spots that could skew normalization. Then, the distribution of the control spots is analyzed and low-quality samples (for which the control spots deviate from the expected) are flagged to be re-assayed. Then the samples are normalized, and the mean fluorescence intensity calculated from the average of the 3 replicates in the array. After normalization, a machine learning based algorithm is used to classify each sample as reactive or not reactive to SARS-CoV-2 (using multiple antigens) as well as to individual antigens. Then, individual reports are generated for each sample (this can be in the form of individual pdf files that may be delivered to the subject).

Supplementary Figure 3. Individualized pdf report models.
After the machine learning classification of each sample individual pdf files containing the results can be generated. The panels in the figure are representative of a typical negative (or non-reactive) result (left panel) and of a typical positive (Reactive) sample (on the right). The data printed on the reports are basic reactivity classification for the SARS-CoV-2 antigens (Only reactive and Non-reactive denominations are given). As well as the machine learning classification (multi antigen classification) denominations. For the multi antigen classification, the results from the logistic regression as well as the results from random forest, as well as the random forest probabilities are given. The multi antigen classification is the main result and is the one used to classify an individual as exposed, or reactive to SARS-CoV-2 as individual antigens alone have a much lower performance in the classification. Finally, since the COVAM is composed of multiple viruses, the reactivity to the entire array is given to both IgG and IgM. This reactivity is given as the normalized mean florescence intensities and as a reference, the confidence intervals of a known control set of samples (known positives red line and red bands and known negatives blue line and blue bands) are given. Although these reports give a much more comprehensive view of an individual`s reactivity status to SARS-CoV-2, they are intended mainly as a guidance as the COVAM array is not approved by the FDA as a diagnostic test.  Scatterplots can be used to compare antibody reactivities of any 2 antigens on the COVAM array.
(A) There are 920 seropositive specimens from Orange County residents. antibody reactivity against SARS-CoV-2 and SARS NP in this population are well correlated (R 2 = 0.93). antibodies against NP from SARS-CoV-2 cross reactive against the NP from SARS. (B) antibody reactivities between SARS-CoV-2 S1 and hCoV-299E S1 are not correlated (R 2 = 0.009) so antibodies against SARS-CoV-2 S1 do not cross-react against S1 from hCoV-299E. The R 2 value can be used as a metric to determine cross-reactivity between any 2 antigens.