Meta-analysis of COVID-19 single-cell studies confirms eight key immune responses

Several single-cell RNA sequencing (scRNA-seq) studies analyzing immune response to COVID-19 infection have been recently published. Most of these studies have small sample sizes, which limits the conclusions that can be made with high confidence. By re-analyzing these data in a standardized manner, we validated 8 of the 20 published results across multiple datasets. In particular, we found a consistent decrease in T-cells with increasing COVID-19 infection severity, upregulation of type I Interferon signal pathways, presence of expanded B-cell clones in COVID-19 patients but no consistent trend in T-cell clonal expansion. Overall, our results show that the conclusions drawn from scRNA-seq data analysis of small cohorts of COVID-19 patients need to be treated with some caution.


Results and discussion
For the presented meta-analysis, all COVID-19 scRNA-seq available by 9th Oct 2020 were considered (Supplementary Table S1), while the 9 datasets we could get access to and were extracted from peripheral mononuclear cells (PBMC) or bronchoalveolar lavage fluid (BALF) or nasopharyngeal/bronchial (NB) tissues were included in this study. These 9 scRNA-seq datasets are summarised in Table 1, which also gives each dataset a name and a healthy control dataset. In total, the studies comprise 159 samples and 862,354 cells across 9 different disease conditions. We map the disease stages to standardised terms Healthy, Mild, Moderate, Severe, Post Mild, Convalescent, Late recovery, Asymptomatic and Influenza (Supplementary Table S2). Although such mapping may introduce a certain level of noise, it is essential for meta-analysis (Fig. 1), and we have taken care to consider the detailed descriptions in the respective publications (see Supplementary Data 1). Visualisation of the combined data ( Fig. 2a) reveals strong batch-effects, however after the data integration with Harmony 8 , the cells from different studies largely clustered by the underlying biology (Fig. 2b), and moreover, with minor exceptions, the healthy samples are well-mixed (Fig. 2c).   By integrating the datasets, we are able to refine the annotation of cell subpopulations. For example, monocyte cells have been discussed in a previous publication 11 , however we are able to refine this type to include four CD16 + monocytes subpopulations and two CD14 + monocytes subpopulations ( Supplementary Fig. S5). These cell clusters correlate well with the cell populations reported by Schulte-Schrepping et al. 11 . Also, neutrophils were not annotated in all the reported datasets 4 . However, after annotating neutrophils according to the canonical markers (FCGR3B and CXCR2), Supplementary Fig. S6, we find that they exist in all disease samples (but considerably less in the healthy controls). The resulting cell clusters were further refined using SCCAF 12 all the 5 major cell populations achieved self-projection accuracies above 92% in the Harmony latent space (Supplementary Table S3). The final result includes 37 cell subpopulations, excluding doublets (Fig. 2e, Supplementary  Fig. S7). A more detailed description of these subpopulations is given in the Supplementary Material Note 1.

Scientific Reports
After obtaining consistent cell-type annotations across the datasets, we moved on to validating the published results. Specifically, we analysed (1) the cell-type proportion changes captured in different experiments to infer the immune response upon SARS-CoV-2 infection; (2) the gene expression regulation and pathway activation in COVID-19 patients; (3) the clonal expansion in T cells and B cells. We could reproduce 16 out of 20 (80%) published results in the original datasets, while only 8 out of 20 (40%) results across the datasets ( Table 2). More detail and possible explanations for the limited reproducibility in each case are given in Supplementary Table S4.
We suspect the main reasons for non-reproducibility in the study's own datasets to be the heterogeneity in data processing and analyzing pipelines along with differences in cell-annotation. Non-reproducibility in other datasets could be due to heterogeneity among datasets collected from different small cohorts of COVID-19 patients along with the differences in stage mapping. We also called a result "reproducible" only if it was observed in all the datasets considered for comparison, even though the dataset was collected from a different tissue. This stringency might have also reduced the number of results we considered as reproducible. For more detailed explanations, please refer to the Supplementary Table S4. TNF: tumor necrosis factor; IL: interleukin; HLA: human leukocyte antigen; CD8eff T-cell: CD8 + effector T-cell; Supp: Supplementary; ISG: Interferon stimulated genes.
In addition, Fig. 3a gives an overview of the cell populations' proportions across all the stages (PBMC and BALF specific information given in Supplementary Fig. S8a,b, respectively). We observe that 4 datasets (Lee, Zhang, Wilk and Chua) show a similar proportion of T cells in healthy donors, which is around 30-40%, while the Table 2. Summary of reproducible results across multiple single-cell COVID-19 studies analyzed in this metaanalysis.

Study
Key finding(s) in COVID-19 patients compared to healthy controls Reproduced in the original dataset Reproduced in all relevant datasets Wen et al. 3 Decreased CD4 + and CD8 + T-cells (data not shown) ✓ ✓   www.nature.com/scientificreports/ 10× healthy PBMC reference shows ~ 60% T cells and the Liao dataset of BALF shows only ~ 10% T cells (Fig. 3b). Due to the lack of bronchoalveolar lavage fluid (BALF)-derived healthy samples in the He dataset, we cannot confirm whether this is expected in healthy BALF samples or if it corresponds to the general individual-specific variability, for instance, as explored by Wong et al. 13 . However, we can still find a decrease of T cell proportion from moderate to severe stages in the Liao dataset. An overview could be found in Fig. 3c and Supplementary  Fig. S8c,d. The percentages of monocytes and neutrophils relatively increase from healthy to severe stages. Similar to the T cell proportion distribution, the Lee, Zhang, Wilk, Chua and 10× datasets show ~ 20% monocytes in the healthy samples (Fig. 3d). In the Liao data this percentage is around 80% and it still shows an increase in monocytes from moderate to severe stage. If we use 20% as a reference value for monocytes in healthy controls, monocytes further increase from healthy to severe patients in the Liao dataset. For neutrophils, the 6 datasets consistently illustrate an increasing trend from healthy to severe, though the levels of increase can vary (Fig. 3e). In the He and Chua dataset, some of the samples include 40-70% neutrophils. As mentioned in Table 2 (and  Supplementary Table S4), we observed an increase in the proportion of plasma cells from healthy to severe patients ( Fig. 3f-g, Supplementary Fig. S8e,f, S9h) with no clear pattern in memory B and näive B cells (Fig. 3g, Supplementary Figs. S8e,f, S9b,c).
Although dendritic cells take up only a small portion of the population (~ 2% in healthy PBMCs), they decrease over the disease stages in Lee, Zhang, Chua and Liao datasets (Supplementary Fig. S9e). Plasmacytoid dendritic cells are even fewer than dendritic cells, the Zhang and Wilk datasets show clear increasing trends over the disease stages ( Supplementary Fig. S9f). This conclusion however cannot be derived from other datasets ( Supplementary Fig. S9f).
Furthermore, while looking at the response of gene regulatory pathways to COVID-19 infection, we observed that the log fold changes of the differentially expressed (DE) genes (False Detection Rate < 0.01) between severe and healthy samples correlate with that between moderate and healthy. This indicates that the genes upregulated and downregulated in moderate and severe samples compared to the healthy controls correlate ( Supplementary  Fig. S10b,c). This correlation happens in most of the cell types in all studies (including Zhang, Wilk, Liao and Chua). We also found that the correlation between mild/healthy (Lee dataset) and severe/healthy is lower than moderate/healthy and severe/healthy; nevertheless, this correlation too is positive in all the cell types.
We also identified the genes upregulated in both moderate and severe samples for each study and checked the overlaps between different studies. In CD14 + monocytes, the pathway analysis of 153 consistently upregulated genes across all the three studies ( Supplementary Fig. S10d) show overrepresentation in immune response, response to virus and type I interferon signalling pathway (Supplementary Fig. S10e). Although the upregulation of type I interferon signalling pathway in monocytes has been reported by Wilk et al. 2 , the upregulation is observed mostly in CD14 + monocytes and not in CD16 + monocytes. We find the upregulation of type I interferon signalling pathway (specifically, IFIT, IFI, ISG and OAS genes) to be consistent for many of the cell types, including T cells (CD4 + T, CD8 + T and γδ T), NK cells, DC, pDC, B cells, Plasma cells and Neutrophils (Supplementary Next, we analyzed the T-cell receptor (TCR) repertoire data from Wen et al. 3 , Zhang et al. 4 and Liao et al. 10 where the samples from the first two studies were derived from PBMC, while those from the third one were derived from BALF. Among these, the conditions healthy and convalescent were present in both the PBMC datasets, while the moderate and severe stages were present in the Zhang 4 and Liao 10 . We first pooled T-cell and NK cell data from all the studies together and visualized them using UMAP ( Supplementary Fig. S6: "Lymphoid cells", Supplementary Fig. S15a). As expected, NK-cells mostly had no corresponding TCR data. For the cells with TCR data, we observed clonal expansion of CD8m T-cells, particularly the subpopulations CD8m T(GZMH) and CD8m T(GZMK) and Vγ9Vẟ2 T-cells ( Supplementary Fig. S15a), however, the expansion did not appear to be stage-specific (Supplementary Fig. S15a). We also observed an over-representation of CD8 + effector T-cells in severe patients ( Supplementary Fig. S15a). These observations were confirmed in the bar plots where samples from all the datasets at a specific stage were pooled and normalized together according to the total number of cells at each stage ( Supplementary Fig. S15b). This is also consistent with Zhang et al. 14 . To further check if these observations were consistent across studies, we analyzed the overall clonal expansion trend of T-cells. Contrary to previously published results 3,4 , we didn't observe a consistent trend of increased clonal expansion of T-cells in COVID-19 patients compared to healthy controls ( Supplementary Fig. S15c, Table 2, Supplementary Table S4). Further details of the clonal expansion status of specific T-cell subpopulations are given in Supplementary Material Note 3.1.
Upon visualizing the B-cell receptor (BCR) repertoire data from Wen et al. 3 and Zhang et al. 4 using the UMAP representation, we noticed the presence of clonally expanded plasma cells, mostly in COVID-19 patients (Supplementary Fig. S17a). This observation was confirmed when cells from all the studies at each stage were taken together and normalized by the total number of cells present at each stage ( Supplementary Fig. S18a). Increased clonal expansion of Naïve B(EEF1G) and Memory B(EEF1G) cell subpopulations was also observed in severe COVID-19 patients compared to other B-cell subpopulations in severe patients (Supplementary Fig. S18a). We further analyzed whether the previously reported observations 3,4 of increased B-cell clonal expansion in COVID-19 patients compared to healthy controls can be confirmed across studies and found that this is indeed consistent (Supplementary Fig. S17a). More details about the clonal expansion status of B-cell subpopulations across multiple studies is given in Supplementary Material Note 3.2.
Finally, we checked whether we could validate the results of the recently published Ren et al. 15 in these 9 datasets and found that 3 out of 10 observations are indeed reproducible in most of the datasets, including the  Table S4). Since we did not re-analyze the Ren et al. 15 dataset in a standardized manner with others, there is also a difference in the cell-type annotations complicating any one-to-one comparisons to Ren et al. 15 dataset (Supplementary Fig. S19b).

Conclusions
Although our meta-analysis is able to address some issues arising from limited sample size and lack of standardization in data collection, pre-processing, cell-type annotation and analysis, it has obvious limitations. A part of the reason why we were not always able to confirm all the published conclusions in the author's own dataset may be due to the remapping of the scRNA-seq reads to different genome builds and using different methods. For instance, we applied an additional cut-off of 100 cells per sample for each T-cell subpopulation, which was different in the analysis of Liao et al. 10 (Supplementary Fig. S16a). Nevertheless, the discrepancies that we found, still indicate that the conclusions from early scRNA-seq studies of COVID-19 patients may not always be robust and need to be validated before fully relied upon. The explanation of even larger difficulties in validating most conclusions across datasets, may be the result of inconsistent mapping between the disease stages in different studies and differences in protocols used. The samples used in different studies were collected from patients at different stages and were not annotated on a standardized scale (e.g. the WHO scale 16 ). Therefore, we had to make various assumptions, in particular, we assumed that the healthy, severe, early recovery (convalescent) and asymptomatic stages are comparable. Also, four datasets had their sampling day after symptom onset information missing in the source publications 1,3,6,7 (Supplementary Data 1) which could have contributed to additional heterogeneity between samples across datasets. Other limitations are mentioned in Supplementary Material Note 4. Overall, our results show that the conclusions drawn from scRNA-seq data analysis of small cohorts need to be treated with some caution. Liao dataset. For the Liao dataset 10 , the Cell Ranger mapped results together with the TCR results were downloaded from GEO 19 under the accession number GSE145926. The downloaded scRNA-seq data includes 33,538 features, which is the same as the reference genome used in the above mentioned datasets. The cell type annotation of the dataset was downloaded from GitHub [https:// raw. githu buser conte nt. com/ zhang zlab/ covid_ balf/ master/ all. cell. annot ation. meta. txt]. The unannotated cells were considered as low quality and were excluded from analysis.

Methods
Lee dataset. The Lee dataset 1 was downloaded from GEO under the accession number GSE149689. The expression matrix also includes 33,538 features, which is the same as the reference genome used in the above-mentioned datasets.
Yu dataset. We downloaded the raw fastq files for the Yu dataset 6 from the GSA (Genome Sequence Archive) database under the accession number PRJCA002579. Reads mapping was performed in the same way as on the Wen dataset.  All the datasets are combined together for quality control and downstream analyses. We filtered the cells with higher than 15% of mitochondrial contents. We also excluded cells of fewer than 500 UMIs or fewer than 200 features. Features expressed in fewer than 3 cells were removed in the analysis. We used Scrublet (version 0.2.1) 21 to determine the doublets in the datasets. As Scrublet was designed to deal with 10× data, the result on the Wilk dataset was not used. Cell clusters represented by doublets were removed from the analysis ( Supplementary  Fig. S20). The cell numbers after quality control were listed in Table 1.
Data integration. SCANPY 22 workflow was used to analyze the data. The following steps were performed: data normalization, log-transformation, highly variable genes selection using the 'cellranger' flavor and principal component analysis. Then, we used Harmony (version 0.0.5) 8 to integrate data from different samples. UMAP 23 and Louvain 24 clustering were calculated according to the Harmony corrected latent space.
Cell type annotation. For the eight datasets of the same reference genome (Wen, Liao, Lee, Yu and Jiang datasets) we combined the dataset to annotate all the cell types. To obtain good cellular annotations, we used two approaches: a machine learning-based cell type annotation approach using a reference dataset and manual annotation with previously reported marker genes (as summarized in Supplementary Data 2). Specifically, we first used logistic regression in SCCAF 12 to train a machine learning model using the possible cell type labels according to a reference dataset (the Wilk dataset) 2 . Then, each Louvain cluster was assigned to a cell type label according to the machine learning-annotated label. For the Wilk dataset and Chua dataset, we used the cell type annotations adopted from their publications. For the He dataset, we annotated the cell types according to the marker gene expression. We then checked all these marker genes' expression in the integrated dataset to assure our annotation (Fig. 2f). According to the results, our annotations are highly consistent among the datasets (Supplementary Data 2, Fig. 2b). They are also comparable to the known annotations reported in previous papers (e.g., Differential expression and Gene Ontology analysis. To account for the technical effects such as number of cells or sequencing depth, the hurdle model in MAST 25 was used to model the differential expression of the cells. Only the genes with a false detection rate (FDR) lower than 0.01 was used for volcano plot and later gene ontology analysis.
In the MAST results, the genes with a log fold change value greater than 0 are considered as up-regulated genes, while the rest are the down-regulated genes. And the up-regulated genes in a cell cluster are used for gene ontology analysis. The python package of GProfiler 26 was used to understand the pathway regulation. The gene module scores for HLA class II and ISG signature were calculated for each individual cell using sc.tl.score_genes function of scanpy 22 v1.5.1.
TCR/BCR analysis. The TCR and BCR V(D)J data from the studies Wen et al. 3 , Zhang et al. 4 and Liao et al. 10 (TCR only) were analyzed separately using the python library pyvdj 27 v0.1.2. For TCR V(D)J data, only cells with at least one productive TRA and at least one productive TRB chain were considered for analysis. Similarly, for BCR V(D)J data, only cells with at least one productive IGH and at least one productive IGL or IGK chain were considered for analysis. The UMAP plots were generated using scanpy 22 v1.5.1. The boxplots and barplots were generated using R-package ggplot2 28 v3.3.2, ggpubr 29 v0.4.0.999, rstatix 29,30 v0.6.0.999 and tidyverse 31 v1.3.0.

Data availability
For the studies included in the current manuscript, the raw sequencing data are available in the European Genome-phenome Archive (EGAS00001004481 for Chua et al. 5 ), GEO (GSE149689 for Lee et al. 1 , GSE150728 for Wilk et al., GSE145926 for Liao et al. 10 ) and Genome Sequence Archive (PRJCA002413 for Wen et al. 3 , PRJCA002564 for Zhang et al. 4 , PRJCA002579 for Yu et al. 6 , GSE147143 for He et al. 7 ). Details are listed in Table 1. All source data will be provided with this paper. The merged dataset can be visualised interactively through cellxgene in the Human Cell Atlas Galaxy. EU instance 32 , following instructions in the Supplementary Material Note 5.