Global Absence and Targeting of Protective Immune States in Severe COVID-19.

While SARS-CoV-2 infection has pleiotropic and systemic effects in some patients, many others experience milder symptoms. We sought a holistic understanding of the severe/mild distinction in COVID-19 pathology, and its origins. We performed a whole-blood preserving single-cell analysis protocol to integrate contributions from all major cell types including neutrophils, monocytes, platelets, lymphocytes and the contents of serum. Patients with mild COVID-19 disease display a coordinated pattern of interferon-stimulated gene (ISG) expression across every cell population and these cells are systemically absent in patients with severe disease. Severe COVID-19 patients also paradoxically produce very high anti-SARS-CoV-2 antibody titers and have lower viral load as compared to mild disease. Examination of the serum from severe patients demonstrates that they uniquely produce antibodies with multiple patterns of specificity against interferon-stimulated cells and that those antibodies functionally block the production of the mild disease-associated ISG-expressing cells. Overzealous and auto-directed antibody responses pit the immune system against itself in many COVID-19 patients and this defines targets for immunotherapies to allow immune systems to provide viral defense.


Patients, participants, severity score, and clinical data collection
Patients admitted to the Hospital of the University of California with known or presumptive COVID-19 were screened within 3 days of hospitalization.Patients, or a designated surrogate, provided informed consent to participate in the study.This study includes a subset of patient enrolled between April 8 and May 1 in the COMET (COVID-19 Multi-immunophenotyping projects for Effective Therapies; https://www.comet-study.org/)study at UCSF.COMET is a prospective study that aims to describe the relationship between specific immunologic assessments and the clinical courses of COVID-19 in hospitalized patients.Healthy donors (Ctrl) were adults with no prior diagnosis of or recent symptoms consistent with COVID-19.This analysis includes samples from participants who provided informed consent directly, via a surrogate, or otherwise in accordance with protocols approved by the regional ethical research boards and the Declaration of Helsinki.For inpatients, clinical data were abstracted from the electronic medical record into standardized case report forms.We used both a severity score at the time of sampling and at the end of hospitalization (Fig S1A  S1.The study is approved by the Institutional Review board: IRB# 20-30497.

Isolation of blood cells and processing for scRNA-seq:
ScRNA-seq was performed on fresh whole blood in order to preserve granulocytes.Briefly, peripheral blood was collected into EDTA tubes (BD, catalog no.366643).Whole blood was prepared by treatment of 500µL of peripheral blood with RBC lysis buffer (Roche, 11-814-389-001) according to manufacturer's procedures.Cells were then counted and 15.000 cells per individual were directly loaded in the Chromium TM Controller for partitioning single cells into nanoliter-scale Gel Bead-In-Emulsions (GEMs) following manufacturer's procedures (10x genomics).Some samples were pooled together (at 15,000 cells/ sample) prior to GEM partitioning.Single Cell 5' reagent kit v5.1 was used for reverse transcription, cDNA amplification and library construction of the gene expression libraries (10x Genomics) following the detailed protocol provided by 10x Genomics.Libraries were sequenced on an Illumina NovaSeq6000 using 28 cycles for R1 and 98 cycles for R2.All samples were encapsulated, and cDNA was generated within 6 hours after blood draw.

Bulk RNASeq library preparation for Genotyping:
RNA was extracted from aliquots of 250K Peripheral Blood Mononuclear Cells (PBMCs) utilizing the ZYMO Research Quick RNA MagBead kit (R2133) on a Thermofisher KingFisher Flex system following manufacturer's procedures.RNA integrity was inspected with Agilent Fragment Analyzer.Ribosomal and hemoglobin depleted total RNA-sequencing library were created using FastSelect (Qiagen cat#: 335377) and Tecan Universal Plus mRNA-Seq (0520-A01) with adaptations for automation of a Beckmen BioMek FXp system.Libraries were subsequently normalized and pooled for Illumina sequencing using a Labcyte Echo 525 system available at the Center for Advanced Technology at UCSF.The pooled libraries were sequenced on an Illumina NovaSeq S4 flow cell lane with paired end 150bp reads.
Data pre-processing of 10x Genomics Chromium scRNA-seq data: Sequencer-obtained bcl files were demultiplexed into individual sample the mkfastqs command on the Cellranger 3.0.2suite of tools (https://support.10xgenomics.com).Feature-barcode matrices were obtained for each sample by aligning the raw fastqs to GRCh38 reference genome (annotated with Ensembl v85) using the Cellranger count.Raw feature-barcode matrices were loaded into Seurat 3.1.5(2)and genes with fewer than 3 UMIs were dropped from the analyses.Matrices were further filtered to remove events with greater than 20% percent mitochondrial content, events with greater than 50% ribosomal content, or events with fewer than 100 total genes.The cell cycle state of each cell was assessed using a published set of genes associated with various stages of human mitosis (3).

Inter-sample doublet detection
Libraries containing samples pooled prior to loading were processed using Freemuxlet (https://github.com/statgen/popscle), the genotype-free version of Demuxlet(4), to identify clusters of cells belonging to the same patient via SNP concordance.Briefly, the aligned reads from Cellranger were filtered to retain reads overlapping a high-quality list of SNPs obtained from the 1000 Genomes Consortium (1KG) (5).Freemuxlet was run on this filtered bam using the 1KG vcf file as a reference, the input number of samples/pool as a guideline for clustering groups of cells by SNP concordance, and all other default parameters.Cells are classified as singlets arising from a single library, doublets arising from two or more libraries, or as ambiguous cells that cannot be accurately assigned to any existing cluster (due to a lack of sufficient genetic information).Clusters of cells belonging to a unique sample were mapped to patients using their individual Freemuxlet-generated genotype, and ground truth genotypes per patient identified via bulk RNASeq.The pairwise discordance between inferred and ground-truth genotypes was assessed using the bcftools gtcheck command (6).Ambiguous, and doublet events were filtered from the major analysis (see platelet-first analysis).

Data quality control and Normalization:
The filtered count matrices were normalized, and variance stabilized using negative binomial regression via the scTransform method offered by Seurat (7).The effects of mitochondrial content, ribosomal content, and cell cycle state were regressed out of the normalized data to prevent any confounding signal.The normalized matrices were reduced to a lower dimension using Principal Component Analyses (PCA) and the first 30 principal coordinates per sample were subjected to a non-linear dimensionality reduction using Uniform Manifold Approximation and Projection (UMAP).Clusters of cells sharing similar transcriptomic signal were identified using the Louvain algorithm, and clustering resolutions varied between 0. 6 and 1.2 based on the number and variety of cells obtained in the datasets.Clusters were loosely grouped into major cell types (T/NK, B/Plasma, mononuclear phagocytes, Neutrophil, Platelet, and Erythrocytes) using a curated list of 5 genes per cell type (5-gene signature) and the Seurat AddModuleScore function.Briefly, genes in the library are binned into one of 12 bins based on average expression in the dataset.The average expression of the genes in each signature are compared to a background list of randomly selected from the bins and used to generate a score per cell for each signature.

Intra-sample heterotypic doublet detection
All libraries were further processed to identify heterotypic doublets arising from the 10X sample loading.Processed, annotated Seurat objects were processed using the DoubletFinder package (8).Briefly, the cells from the object are modified to generate artificial duplicates, and true doublets in the dataset are identified based on similarity to the artificial doublets in the modified gene space.The prior doublet rate per library was approximated using the information provided in the 10x knowledgebase (https://kb.10xgenomics.com/hc/en-us/articles/360001378811)and this was corrected to account for homotypic doublets using the per-cluster numbers in each dataset.Heterotypic doublets were removed from the major analysis (see platelet-first analysis).

Data integration and Batch correction:
The individual processed objects per library were filtered to remove Erythrocyte contamination.
The raw and log-normalized counts per library were then pruned to retain only genes shared by all libraries.Pruned counts matrices were merged into a single Seurat object and the batch (or library) of origin was stored in the metadata of the object.The log-normalized counts were reduced to a lower dimension using PCA and the individual libraries were aligned in the shared PCA space in a batch-aware manner (Each individual library was considered a batch) using the Harmony algorithm (9).The resulting Harmony components were used to generate a batch corrected UMAP, and to identify clusters of transcriptionally similar cells.Clusters were broadly labeled based on the 5-gene signature (Fig S1 ) using a modified, bootstrapped version of the Seurat AddModuleScore to account for the numerous sequencing batches in our dataset.The modified function ran the AddModuleScore on random subsets of the data (subsampling rate = 0.6) 10 times and averaged the score to provide a stable score per signature.A Seurat object was generated for each broad cell type containing clusters scoring highly for that cell type.Each broad cell-type object was subjected to the same harmony analysis to generate batch-aware log normalized counts that were used for visualization and subtype identification.To visualize the effect of harmonizing our single cell data, we identified the library diversity in the neighborhood of every cell on the plot.The neighborhood of a cell is defined as the collection of n nearest neighbors in the UMAP space (where n = sqrt(total cells)), pruned to retain cells lying within the 90th percentile of all calculated neighbor distances.The diversity is the set of all libraries represented within the neighborhood.

Differential expression tests and cluster marker genes, cluster annotation and volcano plot
Differential gene expression (DGE) tests were performed on log-normalized gene counts using the Poisson test (with a latent batch variable to account for multiple library preparations) provided by the FindMarkers/FindAllMarkers functions in Seurat.Genes with > 0.35 log-fold changes, an adjusted p value of 0.05 (based on Bonferroni correction), at least 25% expressed in tested groups, were regarded as significantly differentially expressed genes (DEGs).Cluster marker genes were identified by applying the DE tests for upregulated genes between cells in one cluster to all other clusters in the dataset.Top ranked genes (by log-fold changes) from each cluster of interest were extracted for further illustration.The exact number and definition of samples used in the analysis are specified in the legend of Figure 1, 2 and 3 and summarized in Table S1.The neutrophils, mononuclear phagocytic cells, T cells, B cells and platelets subtypes were identified by comparing cluster marker genes with public sources referenced in the text.The R package EnrichR were used to generate volcano plot from differential gene expression using FindMarkers function in Seurat.

Platelet First scSeq Analysis
To identify the differential coagulation of platelets, we reintroduced heterotypic doublets to each library and filtered them to extract cells expressing at least 1 UMI of PF4 or PPBP, both plateletspecific marker genes.The raw and log-normalized counts per library were integrated using Harmony and processed as above.Broad cell types were identified using the score generated with the bootstrapped AddModuleScore and the per-sample rate of platelet aggregation with each cell types was inferred to be the fractions of cell counts in this dataset to the fractions of cell counts in the overall analysis.Significance testing was conducted using a non-parametric Kruskal-Wallis test with multiple comparisons.

Monocle analysis
Raw counts from the Individual cell-specific were used to create a monocle3 (10-12) cell_data_set object, and the batch-corrected PCA and UMAP embeddings were imported directly from the Seurat object.Each cell-specific trajectory was inferred by reverse embedding the UMAP coordinates using the DDRTree method.The root cell states for the trajectory in monocytes and neutrophils were chosen based on literature, and for platelet cell based on the signature list defined in Figure S1.Relative pseudotime was obtained through a linear transformation relative to the cells with the lowest and highest pseudotimes ( (1-min_pseudotime)/max_pseudotime ).

Generation of gene expression scores
ISG and Degranulation scores were generated by taking the mean of log-normalized expression for a particular set of genes related to the specific pathway or phenotype.The following gene lists were used to generate the scores-ISG: MT2A, ISG15, LY6E, IFIT1, IFIT2, IFIT3, IFITM1, IFITM3, IFI44L, IFI6, MX1, IFI27; Degranulation: 486 genes from Neutrophils degranulation GO term #GO:0043312.To visualize the distribution of these scores across cells, we binarized the distribution of the score at the 75th percentile and overlaid on the calculated UMAP coordinates.

Correlation plots and heatmap visualization
Correlation coefficients used in variable against variable comparisons were calculated using Spearman's method to avoid assumption of linearity.Significance testing of correlation was performed with the following tests: Spearman's for continuous v. continuous, Kruskal-Wallis or Wilcoxon rank sum test for categorical v. continuous depending on number of categories and Fisher's exact for categorical v. categorical comparisons.Unless otherwise specified, variables on both axes were hierarchically clustered based on the distance matrix computed from the correlation coefficient.

Embedding a low-dimensional representation of patients using PhEMD:
PhEMD was employed to generate a three-dimensional embedding of patients based on their immune cell profiles (13).Briefly, PhEMD first generates a reference map of cell subtypes, then uses Earth Mover's Distance (EMD) to compute pairwise dissimilarities between patients (incorporating patient-to-patient differences in cell fractions of each cell subtype as well as intrinsic dissimilarities between subtypes based on the cell subtype reference map), and finally applies a dimensionality reduction technique to the patient-to-patient distance matrix to generate a final embedding of patients.The Seurat implementation of 3D Uniform Manifold Approximation and Projection (UMAP) was used to map the cell-subtype space using the Harmony batch-corrected components as input and a "min.dist"parameter of 0.4, and cell subtypes (i.e., clusters) were defined as described in the "Data quality control and Normalization" section of Methods (2,9).Dissimilarity between each pair of cell subtypes was defined as the distance between the centroids (in UMAP space) of all cells assigned to the two respective subtypes.PHATE was applied to the EMD patient-to-patient distance matrix to generate the final 3D embedding of patients (14).

Luminex Assay for Antibody Titer
Highly immunogenic linear regions of the SARS-CoV-2 proteome were isolated by ReScan and conjugated to Luminex beads as previously described (15).Briefly, high concentration T7 phage stocks displaying immunodominant epitopes of the S, N and ORF3a proteins were propagated and grown to high (>10 11 PFU/mL) titer then were each conjugated to unique bead IDs according to manufacturer's Antibody Coupling Kit instructions (Luminex).Whole N protein (RayBiotech) beads were conjugated similarly using manufacturer instructions with 5µg of protein per 1 million beads.
For other whole protein Luminex-based beads, MagPlex-Avidin Microspheres (Luminex) were coated with either the S protein RBD (residues 328-533) or the trimeric S protein ectodomain (residues 1-1213).All beads were blocked overnight before use and pooled on day of use.2000-2500 beads per ID were pooled per incubation with patient serum at a final dilution of 1:500 for 1 hour, washed, then stained with an anti-IgG pre-conjugated to phycoerythrin (Thermo Scientific, #12-4998-82) for 30 minutes at 1:2000.Primary incubations were done in PBST supplemented with 2% nonfat milk and secondary incubations were done in PBST.Beads were processed in 96 well format and analyzed on a Luminex LX 200 cytometer.Median Fluorescence Intensity from each set of beads within each bead ID were retrieved directly from the LX200 and log transformed after normalizing to the mean signal across two intra-assay negative controls (glial fibrillary acidic protein (GFAP) and Tubulin phage peptide conjugated beads).

Luminex Assay for Serum Cytokines
Soluble proteins were quantified in EDTA anticoagulated plasma using the Luminex® multiplex platform (Luminex, Austin TX) with custom-developed reagents (R&D Systems, Minneapolis, MN), as described in detail ((16) or single-plex ELISA (R&D Systems, Minneapolis, MN).Analytes quantified using the Luminex® multiplex platform were read on the MAGPIX® instrument and raw data were analyzed using the xPONENT® software.Analytes quantified using single-plex ELISA were read using optical density.Values outside the lower limit of quantification were assigned a value of 1/3 of the lower limit of the standard curve for analytes quantified by Luminex and 1/2 of the lower limit of the standard curve for analytes quantified by ELISA.

O-link Assay for Serum Factors
Circulating proteins were measured in plasma using the multiplexed Proximity Extension Assay (PEA) from Olink Proteomics AB (Uppsala, Sweden).20 μL each of plasma collected from the COMET patient cohort (21 COVID-19 positive, 13 COVID-19 negative, and 14 healthy individuals) were analyzed using the Olink® Target 96 Inflammation panel, which is a set of 92 inflammationrelated protein biomarkers.Plasma for all samples regardless of COVID-19 status were inactivated using a final concentration of 1% (v/v) Triton-X-100 solution over 2 hours.Data from the analyzed protein biomarkers is presented as Normalized Protein eXpression (NPX) values, an arbitrary unit on a log2 scale.

ELISA Method for Serum IFNa Measure
IFN-α levels were quantified from serum by an ELISA (catalog numbers 41115 for IFN-α; PBL Assay Science).ELISA was performed according to the manufacturer's instructions with minor modifications.Briefly, an 8-point standard curve was prepared and diluted in sample buffer.Serum was also diluted by adding 80-90 ul serum to 30 ul sample buffer (depending on availability of serum).Samples were prepared in duplicates, whereas standards were prepared in triplicates.Initial incubation was performed for 20 hours at 4C. Antibody was added and incubated at 4C overnight.HRP was added and incubated 1hr at room temperature, TMB substrate was added for 30min and incubated in the dark at room temp, stop solution was added and samples were read using a SpectraMax M2 Microplate Reader (Molecular Devices) at 450nm.For analysis, a 4parameter logistic fit was applied to OD values of the standards after background subtraction.Samples with ODs below blank samples were considered as 0 pg/ml IFN-α.

PBMC co-culture experiment with patient serum and flow cytometry analysis
PBMCs were isolated from EDTA-anticoagulated whole blood from healthy donors using Polymorphprep (Alere Technologies), and resuspended in culture medium (RPMI 1640 + 10% FBS).

.Supplementary Figure 7 .
Patient symptoms plot: symptom at day of sampling (first day of admission to the hospital) is represented in black, while symptom based on the entire course of hospitalization is in green.In the rest of the manuscript we categorized patient into mild or severe cases based on all the entire course of hospitalization (green).B. Quantification of the batch effect using neighbor diversity score in the global object UMAP before (left) and after (middle) batch correction, along with the neutrophil (right) UMAP plot, as in Fig1B and Fig1D, using the diversity in neighborhood method.C. Dotplot representation of landmark genes expressed by global populations in Fig1B.D. Spearman's correlation comparison between between disease severity and population frequencies calculated from 10X scRNAseq analyses (10X) or complete blood cell counts (CBC).Patients for which CBC counts were unavailable were excluded.Significance was calculated using Spearman's method.* p value<0.05; ** p value <0.05; *** p value<0.005 (n=29) E. Frequency of the global populations in Fig1B among all cells across SARS-CoV-2 status.F and G. Frequencies of the neutrophil subsets among all neutrophils across control, mild/moderate (M/M) and severe individuals at the overall start/late states of the trajectories (F) or at specific early stages of the pseudotime (G).H. Frequency of the LCN2, S100A12, RIBO., NEAT1, G0S2 and SLPI neutrophils among all neutrophils across SARS-CoV-2 status and disease severity.I to L. Volcano plots showing DEG (I and K) and bar plots showing GO term enrichment from these DEG (J and L) between all neutrophils from either SARS-CoV-2 positive vs negative patients (I and J) or mild/moderate vs severe patients (K and L) M to Q. Scores of ISG signature (M to O) and neutrophil degranulation (P and Q) in either all neutrophils across control, mild/moderate an severe patients (P), all neutrophils across SARS-CoV-2 status and disease severity (M and Q) or specific neutrophil subtypes across severity in either all patients (N) or only SARS-CoV-2 negative patients (O).Statistical significance was assessed using a two-way ANOVA test with multiple comparisons for panels E, F and H, and using a Wilcoxon test for panel M. * p.value < 0.05; ** p.value < 0.01; *** p.value < 0.001; **** p.value < 0.0001.Supplementary Figure 2: A. Quantification of the batch effect before and after batch correction using neighbor diversity score in the mononuclear phagocytic cells (MPC) object from UMAP plot in Fig2C, using the diversity in neighborhood method.B. Violin plot of number of unique genes (bottom) and number of unique molecules (top) detected from Single cell sequencing for each cluster identified in the MPC dataset.C. Overlay of previously described blood mononuclear phagocytic cell signature from healthy individual on MPC from UMAP plot in Fig2C.D. Violin plots of canonical genes previously described as expressed by blood MPC for each for each cluster identified in the MPC dataset.E. Frequencies of the MPC subsets among all MPC across control, SARS-CoV-2 negative and SARS-CoV-2 positive individuals.F. UMAP visualization of the 19,289 MPC colored (left) and split by (right) by disease severity.G. Frequencies of the classical monocytes, cycling monocytes, non-classical monocytes and C1Q+ non classical monocytes among all MPC across SARS-CoV-2 negative and SARS-CoV-2 positive individuals split it by disease severity.Statistical significance was assessed using a two-way ANOVA test with multiple comparisons.* p.value < 0.05; ** p.value < 0.01; *** p.value < 0.001; **** p.value < 0.0001.Supplementary Figure 4: A. Dotplot representation of the top DEG between clusters identified in the T and NK cell subset.B. UMAP visualization of 16,708 T and NK cells in the entire dataset showing various subsets, colored distinctly by their identity.C. Overlay of the above UMAP of all T and NK cells, colored by disease severity underlining the lack of batch effects while merging the datasets from all patients.D. Abundance of the Interferon-stimulated-gene (ISG) + subset among all T and NK cells in healthy donors, SARS-CoV-2 negative and SARS-CoV-2 positive patients (top) and in healthy donors and patients with mild/moderate (M/M) and severe disease (bottom).E. ISG signature score between healthy controls, SARS-CoV-2 negative and SARS-CoV-2 positive patients.F. Dotplot representation of the top DEG between clusters identified in the B and plasma cell subset.G. UMAP visualization of 4,380 B and plasma cells isolated from the entire dataset showing various subsets, colored distinctly by their identity.H. Violin plots of canonical genes previously described as expressed by B and plasma cells for each identified cluster.I. Frequencies of the identified clusters among all B and plasma cells in healthy donors and patients with mild/moderate and severe disease.Differences in D. and E. were calculated using Kruskal-Wallis non-parametric ANOVA with multiple comparisons.* p <0.05 and **** p< 0.001.Differences in I. were calculated using a two-way ANOVA test with multiple comparisons.* p.value < 0.05 and **** p.value < 0.0001.ns, non-significant.Quantification of Serum Staining and ISG Generation in the Presence of Patient Serum. A. Gating strategy for PBMCs to identify different subpopulations.B. Geometric MFI of serum staining on CD14+ monocytes treated with IFNa, quantifying data in Figure 5A.C. Geometric MFI of serum staining of CD3+ and CD19+ lymphocytes, following treatment with IFNa, quantifying data in Figure 5A.D. Modulation of intermediate to classical CD14 monocytes transition by mild/moderate (orange) and severe (red) patient serum.Each plot represents a single serum sample.Representative experiment from three independent trials and two different healthy PBMC donors.E. Histograms CD3+ CD19+ lymphocytes from healthy donor cultured with IFNa and serum from heathy donor (blue), mild/moderate (orange) and severe (red) SARS-CoV-2 positive patients.Mild/Moderate (light yellow) or Severe (pink) sera were pre-treated with protein G/A before incubation with PBMC.Each plot represents a single serum sample.Representative experiment from two independent trials and two different healthy PBMC donors.F. Contour plots and histograms of CD14+ monocytes from healthy donor co-cultured with SARS-COV-2 virus for 48h at 0.1 (pink) and 1 (red) MOI.Differences in B. and C. were calculated using a two-way ANOVA test with multiple comparisons.* p.value < 0.05; ** p.value < 0.01; *** p.value <0.001; **** p.value < 0.0001.ns, non-significant.