Single-cell proteo-genomic reference maps of the hematopoietic system enable the purification and massive profiling of precisely defined cell states

Single-cell genomics technology has transformed our understanding of complex cellular systems. However, excessive cost and a lack of strategies for the purification of newly identified cell types impede their functional characterization and large-scale profiling. Here, we have generated high-content single-cell proteo-genomic reference maps of human blood and bone marrow that quantitatively link the expression of up to 197 surface markers to cellular identities and biological processes across all main hematopoietic cell types in healthy aging and leukemia. These reference maps enable the automatic design of cost-effective high-throughput cytometry schemes that outperform state-of-the-art approaches, accurately reflect complex topologies of cellular systems and permit the purification of precisely defined cell states. The systematic integration of cytometry and proteo-genomic data enables the functional capacities of precisely mapped cell states to be measured at the single-cell level. Our study serves as an accessible resource and paves the way for a data-driven era in cytometry.


34
In order to identify a sparse, yet maximally informative set of marker genes, we followed the approach 35 for target selection in targeted single-cell transcriptomics developed previously (Schraivogel et al., 36 2020). In short, we determined the genes that were differentially expressed between cell types and used 37 these as input to train a generalized linear model of cell type identity while applying a LASSO 38 regularization in order to select a maximally sparse set of features. Regularization parameters were 39 determined using 10-fold cross-validation. A total 257 genes were selected with this method. In addition 40 to these genes, we included 83 cell cycle markers (Kowalczyk et al., 2015), 88 genes corresponding to 41 the Abseq antibodies, and 75 genes with high variability in single-cell datasets of AML patients (Velten    In order to exclude that the targeted assay leads to biases in clustering or cell type annotation, we 56 performed whole transcriptome sequencing (WTA) together with profiling of the same 97 antibodies on 57 a sample from a healthy individual (Young3). For this, we processed 14,378 cells using the whole 58 transcriptome protocol for the BD Rhapsody system. On average we sequenced ~60,000 reads (i.e. 7x 59 deeper than with the targeted approach) for the RNA layer and 18,000 reads for the antibody layer per 60 3 cell. After normalizing the counts by the library size, the top 3,000 highly variable genes and all 97 61 antibodies were used as input for the MOFA dimensionality reduction, unsupervised clustering, and 62 UMAP calculations as described in the methods section. Thereby, 34 distinct clusters were identified 63 ( Figure N2a) Subsequently, we utilized the label transfer approach from Seurat v3 (Stuart et al., 2019) 64 to predict the cell type identities using the targeted datasets as reference. We calculated the mutual 65 overlap of the predicted labels with the unsupervised cluster ( Figure N2b), and also projected the whole 66 transcriptome data into the original reference space for comparison ( Figure N2c, d and see also 67 Supplementary Note 7). In the majority of cases, a 1:1 correspondence between clusters from the 68 targeted approach and clusters from the whole transcriptome approach was observed. Some cell types 69 were not covered in the WTA approach due to low cellular coverage. Together, these data suggest that 70 our targeted panel resolves cell types equally well as the WTA approach at strongly reduced costs.  analysis between the unsupervised clusters from the WTA and the identities predicted from the targeted 78 datasets. Mutual overlap is defined as the product between the % of cells from a targeted cluster that 79 form part of a given WTA cluster, and the % of cells from a WTA cluster that form part of a given

86
In order to exclude that staining live, primary cells with 97 antibodies affects gene expression, we 87 performed a control experiment. For this purpose, we used a sample from a healthy donor (Young1) and 88 proceeded as described in the Methods section 'Cell sorting for Abseq'. Half of the sample was then 89 incubated with 97 Abseq antibodies, while the other half was left on ice. Finally, the BD Rhapsody 90 protocol was performed as described (see Methods 'Abseq surface labeling, single-cell capture and 91 library preparation'). In order to compare both samples, only the transcriptomic information was 92 considered. We normalized the datasets, performed unsupervised clustering, and annotated the clusters 93 into major cell types from the bone marrow based on canonical markers ( Figure N3a

107
The datasets established in this study was obtained from cryopreserved bone marrow cells. To evaluate                N5i). They can be further characterized by high mRNA expression of S100A8, CSTA and S100A9.

306
Besides B cells, CD3+ positive T cells, which passed several differentiation stages in the thymus, re-307 locate to the bone marrow. One can generally distinguish between alpha-beta T cells and gamma-delta 308 T cells, which describes an intrinsic difference in the respective T cell receptor (TCR) composition.

309
Alpha-beta T cells can be further separated in CD4+ and CD8+ T cells, that are either CD45+, CD3+,

344
Besides CD4+ and CD8+ αβ-T cells, γδ-T cells, are apparent in our dataset. Within that γδ-T cells 345 cluster, one subset was highly positive for surface CD226, CD26, CD94 and negative for CD45RA 346 expression ( Figure N5g). In the second subset, CD26, CD226 and CD94 expression is absent or only 347 dim, but TCR-γδ (clone B1) is highly expressed ( Figure N5g). Interestingly, TCR-γδ antibody clone B1, 348 which is present in the 97 Ab panel seems to be less efficient in γδ-T cell detection than clone 11F2,

11
The second CD3+ TCRαβ-negative cluster are putative natural killer T cells (NKT cells), that can be 355 characterized by elevated CD69 and CD314 surface expression, which were accompanied by dim 356 binding of the TCRγδ antibody (clone B1).

357
Next to CD3+ T cells, two well-known natural killer (NK) cell clusters, namely CD56 dim CD16+ and 358 CD56 bright CD16-NK cells, are present in our dataset. Both subsets are readily identified via high surface 359 expression of CD45, CD7, CD94 and CD45RA ( Figure N5h). CD56 bright CD16-NK cells express higher 360 levels of CD56 and CD335, as well as lower levels of CD16 on their surface. In contrast, CD56 dim 361 CD16+ NK cells have higher surface levels of CD16, CD127 and CD152. The latter are thought to be 362 more cytotoxic, which is also reflected in the transcriptional differences between the two subsets.

363
Compared to other clusters in the dataset, generally cytotoxic mRNAs are highly expressed in these two 364 clusters. In addition, a small cluster located in proximity of HSCs and MPPs was particularly interesting, 365 as it expressed mature NK surface markers like CD16, CD56 and CD7 as well as surface markers 366 specific for immature progenitor cells like CD34 and CD133 (( Figure N5a and N5h). A similar 367 phenotype was observed at the mRNA level, as these cells both expressed mature NK mRNAs like 368 NKG7 or KLRK1 and stem and progenitor specific mRNA like CRHBP, CD34 and NPR3. We therefore 369 named this cluster NK cell progenitor. Besides healthy hematopoietic cells, at least one mesenchymal 370 stromal cell (MSC) cluster is present in our dataset ( Figure N5k). MSC cluster 1 was characterized by 371 high surface expression of CD10, CD13, CD26 and CD49a, which was accompanied by typical MSC 372 gene expression like CXCL12 and SPARC. Putative MSC cluster 2 expressed antibodies found in 373 scavenger cells like macrophages, such as CD206, CD141, CD163 and CD16, but also expressed 374 CXCL12 suggesting a mixed composition and some degree of heterogeneity.

375
All clusters described above were consistently identified in six healthy BM donors. In the reference 376 AML patients (n=3, Figure 1b), we were able to determine three additional cell clusters. Some of these 377 were either specific for individual AML samples or a mix of cells from different AML patients.

378
Regarding the latter, the cluster annotated as immature blasts is a mixture of cells from all three patients.