Single-cell tracing of the first hematopoietic stem cell generation in human embryos

Tracing the emergence of the first hematopoietic stem cells (HSCs) in human embryos, particularly the scarce and transient precursors thereof, is so far challenging, largely due to technical limitations and material rarity. Here, using single-cell RNA sequencing, we constructed the first genome-scale gene expression landscape covering the entire course of endothelial-to-HSC transition during human embryogenesis. The transcriptomically defined HSC-primed hemogenic endothelial cells (ECs) were captured at Carnegie stage 12-14 in an unbiased way, showing an unambiguous arterial EC feature with the up-regulation of RUNX1, MYB and ANGPT1. Importantly, subcategorizing CD34+CD45− ECs into CD44+ population strikingly enriched hemogenic ECs by over 10-fold. We further mapped the developmental path from arterial ECs via HSC-primed hemogenic ECs to hematopoietic stem progenitor cells, and revealed a distinct expression pattern of genes that were transiently over-represented upon the hemogenic fate choice of arterial ECs, including EMCN, PROCR and RUNX1T1. We also uncovered another temporally and molecularly distinct intra-embryonic hemogenic EC population, which was detected mainly at earlier CS 10 and lacked the arterial feature. Finally, we revealed the cellular components of the putative aortic niche and potential cellular interactions acting on the HSC-primed hemogenic ECs. The cellular and molecular programs and interactions that underlie the generation of the first HSCs from hemogenic ECs in human embryos, together with distinguishing HSC-primed hemogenic ECs from others, will shed light on the strategies for the production of clinically useful HSCs from pluripotent stem cells.


Introduction 34
Hematopoietic stem cells (HSCs) give rise to all blood lineages and permanently maintain the adult 35 hematopoietic system throughout the entire lifespan of an individual via self-renewal and differentiation. 36 Although the ontogeny of HSCs during development has been extensively investigated in animal models 37 including zebrafish and mouse, it is largely unclear in human embryos, given the limited accessibility of human 38 embryonic tissues. By functional assessment in xenotransplantation, human HSCs are reported to be detected 39 sequentially in multiple embryonic sites. Long-term multi-lineage repopulating HSCs are detected firstly in the 40 aorta-gonad-mesonephros (AGM) region at Carnegie stage (CS) 14 (32 days post coitus, dpc), with a frequency 41 of less than one per embryo equivalent, and then in the yolk sac several days later (CS 16, 35-38 dpc), showing 42 an even lower frequency than that in AGM region 1, 2 . The first human HSCs manifest a phenotype of 43 CD34 + CD144 + CD45 + KIT + CD90 + Endoglin + RUNX1 + CD38 -/lo CD45RA -, similar to those in the embryonic liver 44 or cord blood 3-5 , although the enrichment is yet far from being efficient. An evident presence of HSCs in the 45 embryonic liver, the major organ for HSC expansion during embryogenesis, is witnessed only after CS 17 (39-46 42 dpc), generally from 7-8 weeks of gestation 1, 2, 6 . 47 48 Histologically, the first HSCs are supposed to locate within the intra-aortic hematopoietic clusters (IAHCs) 49 predominantly on the ventral wall of dorsal aorta (DA) in human embryos 3 . Given the expression of multiple 50 endothelial markers in emerging HSCs, and the spatiotemporally intimate relationship between them and 51 vascular endothelial cells (ECs), it has been widely proposed that human HSCs are derived from hemogenic 52 ECs, a specified endothelial lineage with blood-forming potential, similar to what has been comprehensively 53 validated in mice 7,8 . Up-to-date, little is known about the developmental dynamics and molecular identity of 54 the HSC-primed hemogenic ECs in human embryos, except for the simultaneous expression of endothelial 55 10 and 3D). HLF, as reported to be expressed in HSCs in multiple mouse embryonic hematopoietic sites and might 180 maintain their quiescence 45,46 , was included in top 10 TFs enriched in HSPC clusters (Figures 3C and S3A). 181 The results further validated our identification of HSPCs in dorsal aorta. 182 183 Three HSPC clusters showed distinct features. The one farthest from the lineage-primed clusters, which should 184 be developmentally the most upstream population within CD45 + CD34 + cells (Figure 3B), manifested higher 185 endothelial (KDR and FLT1) and arterial (HEY1, GJA5 and GJA4) features as compared to the other two HSPC 186 populations, thus was annotated as GJA5 + HSPC ( Figure 3E). The finding suggested arterial-featured 187 hemogenic ECs as the source of the first HSCs in human AGM region. The most significant difference between 188 the other two HSPC clusters was the cell cycle status, and the one with highly proliferative property was 189 consequently annotated as Cycling HSPC ( Figure S3C). The other one was annotated as GFI1B + HSPC given 190 its highest expression of GFI1B among the three HSPC clusters ( Figure 3E). As a direct target of Runx1, Gfi1b 191 is a marker whose expression can be used in enriching mouse AGM HSCs with long-term repopulating 192 capacity 47 . 193 194 To investigate the molecular events during the emergence of HSCs, we pooled together the datasets of relevant 195 cell populations from modified STRT-seq data, including vEC, CXCR4 + aEC and HEC in CS 12 (CH), CS 13 196 and CS 14 of AGM region (Figures 2B and 2E), and the three HSPC populations in CS 15 for further analysis. HSCs ( Figure 3G). The expression patterns of CD44, RUNX1, MYB and ANGPT1 were similar, being seldom 206 expressed in vEC, gradually increased from CXCR4 + aEC via HEC to HSPCs, with obvious expression in all 207 HSPC clusters ( Figure 3G). Of note, HLF, SPN and GFI1B were exclusively expressed in three HSPC clusters 45 208 ( Figure 3G). In accord with their involvement in arterial specification 48 , several genes of NOTCH signaling 209 pathway, including DLL4, JAG1, NOTCH1 and NOTCH4 14 , were most highly expressed in CRCR4 + aEC, and 210 gradually down-regulated along the specification via HEC towards HSPCs ( Figure 3G). We also witnessed the 211 difference in cell cycle status among distinct clusters. CXCR4 + aEC and HEC showed a relatively quiescent 212 status as compared to vEC, with a dramatically increased proliferation in HSPCs ( Figure S3C). These findings 213 suggested that once the hematopoietic fate is acquired, HSPCs may rapidly expand in situ before differentiation. 214 215 Monocle analysis revealed the developmental trajectory of endothelial-to-hematopoietic transition in human 216 AGM region, clearly showing the path from CXCR4 + aEC via HEC, GJA5 + HSPC, to Cycling and GFI1B + 217 HSPC clusters (Figures 3H and S3D). Genes that were significantly differentially expressed along the 218 pseudotime axis were subjected to k-means clustering analysis, resulting in four distinct gene expression 219 patterns (Figures 3I, 3J, S3E and S3F, Table S2). Genes grouped in pattern 1 highly expressed only in CXCR4 + 220 aEC while rapidly decreased in HEC and HSPCs, represented by GJA1, NES, S100A10 and PREX1, were related 221 to the regulation of cytoskeleton organization. Genes in both pattern 2 and pattern 3 were related to angiogenesis, 222 EC migration and cell junction organization, with the former simultaneously sharing the biological terms with 12 pattern 1 ( Figure S3F). Genes in pattern 2 showed gradual decrease in expression accompanied with 224 hematopoietic specification, represented by COL4A1, CXCR4, CLDN5 and PDGFB (Figures 3J and S3E). 225 Genes in pattern 3 maintained or even slightly increased expression level upon the initial step of hemogenic fate 226 choice of arterial ECs, while being down-regulated in HSPCs, represented by GJA5, EMCN, RUNX1T1 and 227 PROCR (Figures 3J and S3E). Some of these genes were validly studied in HSCs. In mice, functional HSCs 228 in E11.5 AGM region are exclusively within Emcn + cells based on CD45 + CD41 + phenotype 51 . PROCR is 229 reported to efficiently enrich HSCs in human fetal liver 52 and its homolog has been proven to be a marker for 230 type I and type II pre-HSCs in mice 53 . RUNX1T1, whose homolog in mice, with other five TFs, could impart 231 multi-lineage potential to lineage-primed hematopoietic progenitors in reprograming research, implying its 232 function in initiating the molecular program towards multi-potential HSPCs 54 . Genes in pattern 4 were those 233 up-regulated in HSPCs, including the ones already highly expressed in HEC as compared to CXCR4 + aEC, such 234 as ANGPT1, MYB, CD44 and RUNX1 (Figures 3J and S3E), and also those specifically expressed in HSPCs, 235 such as SPINK2, PTPRC, AMICA1 and SPN (Figures 3J and S3E). Of note, SPINK2 has been reported to be 236 expressed in HSCs in human umbilical cord blood 31 . 237 238 A distinct hemogenic EC population lacking arterial feature existed prior to HSC-primed hemogenic ECs 239 in human embryos 240 As the HSC-primed hemogenic ECs in human embryos could be detected at CS 12, to explore whether 241 hemogenic ECs existed at stages much earlier, cells from CS 10 body part and CS 11 CH of embryo proper were 242 collected for droplet-based scRNA-seq and analyzed together. The strategy applied to define different 243 populations was the same as that used in analysis for the data from CS 13 DA (Figures 1A-1C). For clearer 244 presentation, populations from CS 10/11 were prefixed with "early" while those from CS 13 were prefixed with 245 "late" (Table S3). Endothelial population (Endo), featured by CDH5 and SOX7 expression, and hematopoietic 246 13 population with RUNX1 expression much adjacent to Endo (Hema) were picked out from CS 10/11 dataset and 247 re-analyzed in searching of hemogenic ECs (Figures S4A and S4B  the primitive EC feature gene ETV2 56 , which was predominantly sampled from CS 10/11, was localized in the 282 center of EC populations, with one direction towards venous feature and the other towards arterial feature 283 specification (Figures 4F and 4G). In contrast to late HEC that was specified from late AEC with the strongest 284 arterial feature, early HEC was segregated from the ECs showing ambiguous arteriovenous feature, indicating 285 their generation did not transit through an arterial stage (Figures 4F and 4G). Therefore, two distinct hemogenic 286 The cellular interactions were represented by the simultaneous expression of heterologous ligand-receptor gene 312 pairs. Regarding mesenchymal and epithelial populations as potential niche components, only ligands expressed 313 by these populations and receptors expressed by HEC were considered by us (Figure 5A). Among a total of 314 128heterologous ligand-receptor pairs detected, cytokine-cytokine receptor interaction, axon guidance, 315 endocytosis, HEDGEHOG signaling pathway, and TGF-beta signaling pathway were the pathways 316 predominantly involved (Figure 5B). Those related to NOTCH signaling pathway, focal adhesion, MAPK 317 signaling pathway, dorso-ventral axis formation and MTOR signaling pathway were also recurrent ( Figure 5B). 318 The ligand-receptor gene pairs exhibited different expression patterns among distinct stromal populations when 319 coupled with HEC (Figures 5C and S5A). Among three sub-aortic mesenchymal populations, DLK1 + Mes and 320 FBLN5 + Mes, paired with HEC, showed relatively similar expression patterns of heterologous ligand-receptor 321 gene pairs, while TWIST2 + Mes showed only a few interactions with HEC ( Figure 5C). DLK1-NOTCH1 were 322 presented in DLK1 + Mes and FBLN5 + Mes, when coupled with HEC (Figures 5D and S5A). More specifically, In the present study with the precious human embryonic tissues, we constructed for the first time a genome-341 scale gene expression landscape for HSC generation in the AGM region of human embryos, in which we focused 342 specifically on the cell populations and molecular events involved in the endothelial-to-hematopoietic transition. 343 We transcriptomically identified the HSC-primed hemogenic ECs, meeting the widely acknowledged criteria 344 for hemogenic ECs that simultaneously express core feature genes of ECs (such as CDH5 and SOX7) and critical 345 hematopoietic TFs (such as RUNX1 and MYB) but yet not specific surface markers of hematopoietic cells 346 (including PTPRC and SPN) 9, 69 (Figures 2C and 2I). The hemogenic ECs exhibited a much more intimate 347 relationship with vascular ECs than with the functionally validated HSPCs showing an immunophenotype of 348 CD45 + CD34 + (Figure 3F), confirming their endothelial identity and origin. The finding is in line with the notion 349 in the mouse embryos that non-hemogenic ECs and hemogenic ECs in AGM region show great similarity in 350 transcriptome 70 (Figures 3F and 3G). Of note, RUNX1 was ranked as the top one DEG when the EC population 351 with an arterial feature in AGM region was further sub-divided in an unsupervised way ( Figure 2F). Together 352 with the specific GO terms and enriched pathways (Figures 2G and S2G), the transcriptomically identified 353 hemogenic ECs here should be recognized as the earliest cell population specified to choose a hematopoietic 354 fate, which has not been uncovered in human embryos. 355

356
We revealed that the HSPCs in CS 15 AGM region showed certain levels of arterial feature but nearly absent of 357 venous feature (Figure 3G), in line with the reports in the mouse embryos that functional type I and type II pre-358 HSCs in the AGM region express evident arterial markers but much lower levels of venous markers 53 . The data 359 suggested an arterial EC origin of emerging HSCs in human, in accord with the apparent arterial feature of the 360 hemogenic ECs we transcriptomically identified in the CS 12-14 AGM region (Figure 3G), which is previously 361 19 unknown in human embryos. Concerning the in vitro differentiation system of human PSCs, the arterial-specific 362 markers DLL4 and CXCR4 can be used to identify hemogenic ECs with lympho-myeloid potential 14 . 363 Interestingly, we found that the hemogenic EC sub-population showed much lower CXCR4 expression than the 364 other one with comparable arterial feature in the AGM (Figure 2F). Thus, albeit CXCR4 is regarded as an 365 arterial gene 17, 71 , CXCR4 + ECs represented a proportion but not all of arterial ECs, showing much less 366 hemogenic characteristics. Therefore, it can be explained that CXCR4 + cells generated from human PSCs in 367 vitro lack direct hematopoietic potential 71 . Our finding emphasizes the importance of discriminating cell identity 368 transcriptomically at single-cell resolution and further suggests that CXCR4 expression might be rapidly down-369 regulated upon the hemogenic specification in the arterial ECs. 370 371 Transcriptionally, the homing receptor CD44 was expressed in a subset of ECs with arterial feature but seldom 372 in venous ECs of AGM region in human embryos (Figures 2B and 2C). Importantly, almost all the cells with 373 hemogenic characteristics were enriched in the immunophenotypic CD44 + EC population, and the expression 374 of CD44 was gradually increased along with hematopoietic specification (Figure 3G). The results are in line 375 with the histological finding that CD44 is expressed in the IAHC cells, as well as the nearby ECs lining the 376 inner layer of the dorsal aorta of human embryos at the stages similar to ours 72 . Therefore, together with CD44 377 labeling, which constituted an average 7.4% of CD34 + CD45 -ECs in AGM region, the hemogenic ECs in human 378 embryos are enriched, for the first time, more than 10-fold as compared to that by pan-EC markers. Combining In an effort to determine the earliest time-point detecting the intra-embryonic hemogenic ECs in human embryos, 386 we unexpectedly revealed two temporally and molecularly distinct hemogenic EC populations, with the earlier 387 one lacking obvious arterial feature (Figure 4C). It has been proven that distinct waves of hematopoiesis occur 388 sequentially in model organisms including mouse and zebrafish embryos, with both transient definitive non-389 HSC hematopoiesis and definitive HSCs generated from hemogenic ECs 8, 73 . We proposed that the previously 390 undefined two distinct intra-embryonic hemogenic EC populations might correspond to the putative two 391 definitive waves of hematopoiesis in human embryos ( Figure 4F). The functional difference between these two 392 transcriptomically identified intra-embryonic hemogenic EC populations in human embryos needs further 393 investigations. It is to be determined that whether two hemogenic EC populations we transcriptomically 394 identified have the common ancestor, such as the immature hemogenic ECs proposed in the PSC differentiation 395 system 14, 17 . Much likely, along the arterial specification path from primordial ECs, two cohorts of the intra-396 embryonic hemogenic ECs are sequentially segregated out, with one before and the other after arterial fate 397 settling. It is extremely pivotal to discriminate the initial steps for different types of hematopoiesis during human 398 embryogenesis, which are presumably related to quite distinct self-renewal and differentiation capacities and 399 physiological functions. More importantly, this will provide crucial clues for the in vitro blood regeneration 400 studies.  Table S5) and template 439 switch oligo (TSO) primer for template switching 79-81 . After reverse transcription and second-strand cDNA 440 synthesis, the cDNAs were amplified by 16 cycles of PCR using ISPCR primer and 3' Anchor primer (see 441   Table S5). Samples were pooled and purified using Agencourt AMPure XP beads (Beckman, A63882). 4 cycles 442 of PCR were performed to introduce index sequence (shown in Table S5) and subsequently, 400 ng cDNAs 443 were fragmented to around 300 bp by Covaris S2. After being incubated with Dynabeads MyOneTM 444 Streptavidin C1 beads (Thermo Fisher, 65002) for 1 hour at room temperature, cDNA Libraries were generated 445 using KAPA Hyper Prep Kit (Kapa Biosystems, kk8505). After adaptor ligation, the libraries were amplified 446 23 by 8 cycles of PCR using QP2 primer and short universal primer (shown in Table S5

Quality control 461
Then quality control was performed to filter low-quality cells. For 10X-derived datasets, only cells with more 462 than 1,000 genes less than 1,000,000 UMIs were retained for downstream analysis. For modified STRT-seq 463 datasets, UMI values for each cell were grouped in an expression matrix, cells were retained when more than 464 2,000 genes and 10,000 UMIs were detected. The Seurat (version 2.3.4) 84 implemented in R (version 3.4) was applied to reduce the dimension of CS 13 10X-468 derived datasets. UMI count matrix was applied a logarithmic transformation with the scale factor 10,000. High 469 variable genes (HVGs) were calculated using FindVariableGenes function with default parameters except for 470 "x.low.cutoff" = 0.0125. PCA was performed using HVGs, and significant PCs were selected to perform 471 dimension reduction and clustering. Cells were projected in 2D space using UMAP with default parameters. 472 Using graph-based clustering function Findcluster with default parameters except for "resolution" = 0.6, we 473 divided the 592 cells into 8 transcriptionally similar clusters ( Figure 1A). For CS 12/13/14 modified STRT-seq 474 data, the Normalization scale factor was set to 100,000. The PCA was performed using HVGs with parameter 475 "x.low.cutoff" = 0.2 , and then UMAP analysis with parameters "n_neighbors" = 10, "min_dist" = 0.3 and 476 clustering with parameter "resolution" =0.8, resulting in 2 endothelial clusters ( Figure 2B) To account for batch differences among CS 10, CS 11 and CS 13 10X-derived datasets, we used the Batch 484 balanced KNN (BBKNN), a batch effect removal tool in Scanpy package. BBKNN actively combats technical 485 artifacts by splitting data into batches and finding a smaller number of neighbors for each cell within each of 486 the groups, which helps create connections between analogous cells in different batches without altering the 487 counts or PCA space. We took the union of the top 2,000 genes with the highest expression and dispersion from 488 both datasets, which were used for PCA. We then aligned the subspaces on the basis of the first 20 PCs and 489 25 selected top 10 neighbors to report for each batch, which generated new PCs that were used for subsequent 490 analysis. 491 492

Differential expression analysis 493
To find DEGs among different clusters, we performed non-parametric Wilcoxon rank sum tests, as implemented 494 in Seurat. DEGs with adjusted P value less than 0.01 were thought to be significant. We applied 495 tl.rank_genes_groups function with default parameters and DEGs were filtered with "min_fold_change" = 0.5, 496 "min_in_group_fraction" = 0.2, as implemented in Scanpy. 497 498

Surface markers and TFs 499
Surface markers and TFs lists (

Correlation analysis 503
Pearson correlation analysis was performed using the top 50 PCs calculated in CS 13 10X-derived dataset 504 ( Figure S1C). For genes positively correlated with RUNX1, we performed Pearson correlation using corr.test 505 function in psych R package with "BH" adjustment ( Figure 2H). 506 507

Monocle 2 analysis 508
Pseudotime trajectories were constructed with the Monocle 2 package (v2.10.1) 86 according to the 509 documentation (http://cole-trapnell-lab.github.io/monocle-release). ordering genes were selected based on PCA 510 loading. The Discriminative Dimensionality Reduction with Trees (DDRTree) method was used to reduce data 511 26 to two dimensions. To investigate the different patterns of gene expression during this developmental path, 512 significant DEGs along pseudotime were identified by Monocle 2's differentialGeneTest function. For 513 identification of major patterns, we filtered out those genes with average normalized expression less than 1 in 514 all involved clusters and then clustered the retained genes into four distinct patterns using k-means clustering. 515 516 Gene functional annotation analysis 517 GO term and KEGG pathway enrichment was done on significant DEGs using clusterProfiler (version 3.10.1) 87 518 with default parameters. 519 520

Cell cycle regression 521
To reduce the variation in cell cycle stages which contribute to the heterogeneity in scRNA data, we performed 522 CellCycleScoring function in Seurat to evaluate the cell-cycle status using the previously reported G1/S and 523 G2/M phase-specific genes 81, 88 . Then using regressout in ScaleData function to remove cell cycle effects. 524 525

Cellular interactions analysis 526
Cellular interactions analysis was performed using CellphoneDB software (version 2.0) 89 with default 527 parameters. Significant ligand-receptor pairs were filtered by P value less than 0.01. Pairs that ligand genes 528 expressed in Mes, Epi, EC clusters and receptor genes expressed in HEC were retained. Directed network 529 ( Figure 5A) visualization was done using Cytoscape (version 3.6.0). 530

DATA AND SOFTWARE AVAILABILITY 532
The scRNA-seq data has been uploaded to GEO, the accession number for the data is pending.