Main

A major obstacle to building a comprehensive human cell atlas1 is obtaining a full range of ‘high-quality’ samples of sufficient size. Given their evolutionary proximity, non-human primates (NHPs) represent the nearest-to-human alternative. Generating an NHP cell atlas (NHPCA) would produce a catalogue of features that could be used to study human physiology, disease and ageing. It would also provide insights into the evolutionary mechanisms underlying different body function between NHPs and humans.

NHPs comprise a large and diverse group of species with major ecological, dietary, locomotor and behavioural differences2. Because of their characteristics, including a more frequent reproductive cycle and wide availability, macaques, in particular Macaca fascicularis (also known as cynomolgus, crab-eating or long-tailed monkey), are now used for research purposes worldwide3. Here we used adult M. fascicularis tissues to generate the largest NHP cell transcriptomic dataset thus far. To facilitate exploration of this resource, we have created the NHPCA website, an open and interactive database (https://db.cngb.org/nhpca/).

Generation of an adult monkey cell atlas

We isolated cells/nuclei for 45 different tissue samples from five male and three female 6-year-old monkeys (Fig. 1a, Supplementary Fig. 1 and Supplementary Table 1a). Most tissues were profiled by single-nucleus RNA sequencing (snRNA-seq), which circumvents complications associated with dissociation protocols and allowed us to profile frozen samples, but for some tissues we used single-cell RNA sequencing (scRNA-seq). The lymph node was profiled using both scRNA-seq and snRNA-seq for comparison. All experiments used the DNBelab C4 droplet-based platform for library generation4. After filtering, we retained transcriptomic data for a total of 1,144,706 cells/nuclei (Fig. 1a), with numbers ranging from 84,619 in the cerebellum to 2,694 in the vagina (Supplementary Table 1a). For lymph node, the comparison between scRNA-seq and snRNA-seq identified a similar number of genes and unique molecular identifiers (UMIs) (Supplementary Fig. 2a–f). Likewise, cell cluster integration showed a good match between the two methods, although snRNA-seq was more efficient at capturing less abundant cell types. These results confirm the utility of snRNA-seq for generating large-scale cell atlases5,6.

Fig. 1: Generation of a cell atlas across 45 tissues of adult M. fascicularis monkey.
figure 1

a, Left, schematic representation of the monkey tissues analysed in this study. The cartoons used to generate this schematic diagram were purchased from BioRender.com. A total of 45 tissues were collected from 3 female and 5 male 6-year-old monkeys. A UMAP visualization is shown of global clustering of all cells from the dataset coloured by tissue (middle), and bar plots show the number of cells/nuclei profiled for each tissue after quality control (right). n = 1,144,706 individual cells/nuclei analysed. b, UMAP visualization of all clusters coloured by major cell types. A total of 113 cell clusters were identified in the dataset. Cell type annotation for all major clusters is provided in the legend to the right; NKT, natural killer T; OPC, oligodendrocyte progenitor cell. Source data

Source data

In global visualization of cell clustering using uniform manifold approximation and projection (UMAP), each tissue tended to cluster separately, with those from the same system generally clustering more closely to each other (Fig. 1a and Supplementary Figs. 37). On the basis of the expression levels of specific markers (Supplementary Fig. 8), we defined 113 cell clusters in the global UMAP view of all tissues (Fig. 1b and Supplementary Table 1b, c). On average, we detected 1,445 genes and 2,583 UMIs per cell/nucleus (Supplementary Fig. 9). The number of cells for each of these 113 cell types ranged from 76,602 for granule cells in the cerebellum to 21 for oligodendrocytes in the pineal gland (Supplementary Fig. 10). Reassuringly, many of the 113 clusters were largely composed of a cell type belonging to a specific tissue (Fig. 1b and Supplementary Fig. 11a). However, cell types such as endothelial, stromal and various immune cells were shared between different tissues, as expected (Supplementary Fig. 11b). We next generated individual UMAP representations for each tissue and applied unbiased graph-based Seurat clustering, identifying 463 cell clusters across all tissues. A detailed annotation of the cell populations detected in each tissue is provided in Supplementary Figs. 1215 and Supplementary Table 1d, e. Our M. fascicularis atlas can be searched interactively by tissue, cell type and gene through the NHPCA website.

To demonstrate the potential for cross-species comparisons, we selected a total of 12 NHP tissues overlapping with single-cell mouse (Mouse Cell Atlas, MCA) and human (Human Cell Landscape, HCL) cell atlases7,8 (Supplementary Figs. 1619 and Supplementary Table 1f). Cell numbers as well as gene and UMI capture rates were higher in NHPCA for all 12 tissues. We observed good correlation of tissue marker genes with both the mouse and human datasets in all cases. Likewise, the number of detected main cell types was roughly comparable in the three species (111 in monkeys, 110 in mice and 106 in humans), but with differences in the proportions. For example, over 80% of liver cells detected in monkeys corresponded to hepatocytes, in line with the normal proportion of 60–80% in this tissue9,10, but only 3% and 6.7% of corresponding cells were hepatocytes in human and mouse liver, respectively. This discrepancy might be related to a bias in cell population capture when using different platforms or the use of nuclei versus whole cells. We performed immunostaining of monkey liver sections for the hepatocyte marker albumin, observing as expected that most cells were positive (Supplementary Fig. 20a). Differentially expressed genes (DEGs) between specific tissue cell populations in the three datasets can be examined using our website. As proof of principle of the application for studying body-wide cell–cell interactions, we examined the distribution of insulin and glucagon receptors throughout the 12 tissues (Supplementary Fig. 20b). Although the patterns were similar, species-specific differences were observed. Additional ligand–receptor interactions in each of the 45 monkey tissues and the comparison between species for the 12 shared tissues can also be explored using our website, and we have provided an option for uploading individual tissue datasets to enable customized comparisons.

Common cell types across tissues

We inspected common cell types populating different tissues throughout the monkey body8,11,12,13. First, we selectively combined and reclustered stromal cells, macrophages (including microglia), endothelial cells and smooth muscle cells from all analysed tissues. Although considerable diversity was observed, many cell clusters grouped together on the basis of tissue origin (Supplementary Fig. 21a–d). We also performed DEG analysis to obtain tissue-specific signatures, identifying substantial heterogeneity (Supplementary Fig. 21e–h and Supplementary Table 2a–d).

Notably, our snRNA-seq data offer the possibility of studying cell populations that cannot be characterized by conventional scRNA-seq analysis, such as myonuclei from multinucleated skeletal muscle fibres. We combined and reclustered cells from tissues in our atlas known to contain skeletal muscle cells. This approach identified distinct populations in the abdominal wall, diaphragm and tongue, whereas nuclei from the oesophagus were more homogenous (Fig. 2a). Myonuclei in the abdominal wall, diaphragm and tongue comprised MYH7+ type I (slow-twitch) and MYH2+ type II (fast-twitch) myofibres14 (Fig. 2b, c and Supplementary Table 2e–g). Differential thresholds of MYH2 and GPD2 expression further subdivided type II myonuclei into type IIa (MYH2high) and type IIb (MYH2lowGPD2+) myonuclei. In line with previous reports, we did not detect type IIb myonuclei in the tongue15. Moreover, type I and type IIa tongue myonuclei clustered in close proximity, which may be related to the tongue being a highly innervated muscle.

Fig. 2: Characterization of monkey skeletal myofibres and mesothelial cells.
figure 2

a, UMAP visualization of global clustering of skeletal muscle cells. Clusters are coloured by tissue (abdominal wall, diaphragm, oesophagus and tongue). b, UMAP representation of all reclustered skeletal muscle cells coloured by subtype. c, UMAP visualization of specific markers used to identify type I (MYH7), type IIa (MYH2) and type IIb (GPD2) myonuclei, FAPs (LVRN), MTJ nuclei (NAV3 and COL22A1), NMJ nuclei (ETV5 and MUSK) and satellite cells (PAX7), as shown in b. Because of their small proportions, the latter four populations are indicated by a red arrow. d, Stacked bar plots representing the proportions of skeletal muscle nuclei (myonucleus subtypes type I, type IIa and type IIb, MTJ and NMJ nuclei, and satellite cells and FAPs) in the indicated tissues. e, Heat map showing DEGs among the skeletal muscle populations highlighted in d. f, Bubble plots showing DEGs for each of the myonucleus subtypes comparing different tissues. g, UMAP visualization of mesothelial (meso) cells from selected tissues (adrenal gland, bladder, diaphragm, fallopian tube, ovary and visceral adipose tissue). Four different clusters of mesothelial cells belonging to the visceral adipose tissue are indicated by the dashed red line. h, Violin plots showing the differential expression of mesothelial and immune markers in the visceral adipose tissue clusters highlighted by the dashed red line in g. i, UMAP visualization of three different clusters of mesothelial cells from the ovary. Mesothelial cells, surface epithelial (surface epi) cells and progenitor-like epithelial (prog-like epi) cells are highlighted in red, blue and yellow, respectively. j, UMAP visualization of LGR5 expression in ovarian mesothelial cells. k, Violin plots showing DEGs among the three populations of ovarian mesothelial cells highlighted in the UMAP visualization. Source data

Source data

Differential thresholds of MYH2 and GPD2 expression further subdivided type II myonuclei into type IIa (MYH2high) and type IIb (MYH2lowGPD2+). In addition, we discriminated, albeit at low proportions, NAV3+ neuromuscular junction (NMJ) nuclei in the diaphragm and ETV5+ myotendinous junction (MTJ) nuclei in both the tongue and diaphragm (Fig. 2b–d). Moreover, we detected PAX7+ nuclei from satellite cells in the tongue and diaphragm, while a small cluster of LVRN+ fibroadipogenic progenitors (FAPs) could be annotated in the diaphragm, abdominal wall and oesophagus. Skeletal muscle nuclei exhibited subtype-specific and tissue-specific gene expression signatures and Gene Ontology (GO) terms (Fig. 2e, f and Supplementary Fig. 22a–c). We also observed substantial myonucleus heterogeneity within the same subtype and tissue (Fig. 2f).

Next, to study the heterogeneity among adipocytes, we combined and reclustered cells from subcutaneous and visceral adipose tissues, resulting in nine major clusters (Supplementary Fig. 23a–d). We noticed a marked distinction between mature adipocytes and putative adipocyte progenitors, as reflected by differential expression of ADIPOQ and CD34. Subcutaneous mature adipocytes and adipocyte progenitors were enriched for FOS expression. Likewise, SLC11A1 and SPOCK3 marked mature subcutaneous and visceral adipocytes, respectively. Adipocyte progenitors were composed of two populations for visceral tissue (WT1+ITLN1+ and CFDhighWT1lowITLN1), three populations for subcutaneous tissue (ESR1+, CXCL14+APOD+ and DPP4+) and one population shared by both tissues (NOX4+). These results are consistent with markers described in previous reports16,17,18,19. We validated coexpression of CD34 and NOX4 in a subset of adipocyte progenitors of both subcutaneous and visceral adipose tissue by immunostaining (Supplementary Fig. 24a, b). Pseudotime analysis characterized the trajectory of adipocyte maturation from progenitors in both subcutaneous and visceral adipose tissue (Supplementary Fig. 24c, d). We did not detect substantial proliferation in any of the progenitor populations on the basis of expression of the pan-cycling marker MKI67 (Supplementary Fig. 23c), suggesting that these populations are not transitory.

Finally, we combined and reclustered all tissues that contained mesothelial cells, a type of specialized epithelial cell. Mesothelial cells from the bladder, ovary and fallopian tube were in close proximity, whereas those from other tissues clustered more separately (Fig. 2g). We also detected within-tissue mesothelial cell heterogeneity, in particular for visceral adipose tissue and ovary. In the former, we observed a cluster of immune-like mesothelial cells that, apart from expression of the typical mesothelial markers (MSLN, ITLN1 and PKHD1L1), also expressed high levels of immune cell markers (for example, PTPRC, IL7R and TRAC) (Fig. 2h). This is in agreement with the emerging concept that structural cells have immune properties8,11 and the known immunomodulatory role of the visceral adipose tissue in responses to gut bacteria20. In the ovary, we identified a classical mesothelial population and two close PAX8+ (ref. 21) epithelial-like populations (one mature and one progenitor-like) of mesothelial origin (Fig. 2i–k). Progenitor-like ovarian epithelial cells have previously been reported22. In line with previous work, we observed that they expressed well-known stem cell markers such as LGR5 (ref. 22) and CD44 (ref. 23). Immunostaining for CD44 and single-molecule fluorescence in situ hybridization (smFISH) for LGR5 confirmed their coexpression in a subset of monkey surface epithelial cells (Supplementary Fig. 24e). Pseudotime analysis reconstructed the trajectory from progenitor-like cells to ovarian epithelial cells (Supplementary Fig. 24f). As in adipose tissue, we did not detect substantial proliferation in progenitor-like ovarian epithelial cells on the basis of expression of MKI67.

These findings substantially add up with previous studies of common cell type heterogeneity and tissue-specific molecular signatures8,11,12,13. Our dataset provides a new interactive resource for further dissecting these, clarifying the underlying mechanisms and studying interspecies differences.

Wnt signalling components in tissues

A cell body atlas of large dimensions is ideal for investigating multifaceted cell–cell interactions, including those occurring in cytokine or growth factor-mediated signalling pathways. Apart from having essential roles in embryonic development, Wnt factors control growth and maintenance of numerous tissues throughout life. We thus performed a survey of Wnt pathway24 components throughout the monkey body to thoroughly dissect target cells and potentially identify previously unappreciated populations.

LGR proteins (LGR4, LGR5 and LGR6) act as amplifiers of Wnt signals by inhibiting negative regulators25. Accordingly, LGR5 and LGR6 often mark and regulate cells with homeostatic or adult stem cell function in specific mammalian tissues, whereas LGR4 has a less well-understood function26. We observed expression of LGR5 across multiple monkey tissues, with the highest levels in type I skeletal muscle myonuclei, epithelial cells of the uterus and fallopian tube, oligodendrocyte progenitor cells (OPCs) and kidney tubule cells (Fig. 3a). To the best of our knowledge, with the exception of epithelial cells in the uterus and fallopian tube25, these tissues have not previously been reported to contain substantial numbers of LGR5+ cells in adult mammals. In this regard, it is worth noting that the majority of reports of LGR5+ cells thus far have been in genetically engineered mouse models owing to the lack of specific tools and reagents to study other mammals25. The expression of LGR6 was more restricted (Supplementary Fig. 25a), with higher abundance in cardiomyocytes, thyroid follicular cells, folliculostellate cells of the pituitary gland and, as previously reported, smooth muscle cells27. We also detected LGR5+ or LGR6+ cells in other tissues, including in both previously reported (for example, ovary epithelial cells22, hepatocytes28 and colon enterocytes29) and unreported (for example, LGR5+ cells in bipolar cells of the neurosensory retina) tissues (Supplementary Figs. 2630 and Supplementary Table 3). In general, expression of LGR5 and LGR6 did not overlap, apart from in fallopian tube epithelial cells and gallbladder smooth muscle cells (Supplementary Fig. 25b). Moreover, there was little overlap between LGR5+ or LGR6+ cells with those expressing MKI67, apart from epithelial cells of the fallopian tube and uterus and basal cells from the salivary gland. In contrast to LGR5 and LGR6, LGR4 was ubiquitously expressed across most tissues (Supplementary Fig. 25c).

Fig. 3: Analysis of LGR5+ cells across all monkey tissues.
figure 3

a, Top, global UMAP visualization of LGR5 expression across all tissues. Bottom, bubble plot showing the LGR5 expression level and ratio in the indicated cell types. b, Co-embedding of kidney snRNA-seq (blue) and scATAC-seq (red) datasets. c, Integrated kidney snRNA-seq and scATAC-seq data. Cell clusters are coloured by cell type. DCTC, distal convoluted tubule cell; Endo, endothelial cell; LOH, loop of Henle; Myofibro, myofibroblast. d, UMAP visualization of LGR5 across kidney cell types (top) and ArchR track visualization of aggregated scATAC-seq signal at the LGR5 locus in each cell type (bottom). The bar plots on the right indicate the ratio of LGR5+ cells in kidney cells. e, Representative images from smFISH detection of LGR5 (red) and SLC12A3 (yellow) expression in kidney. Scale bars, 50 μm (left and middle) and 20 μm (right). The right panel represents a magnification of the area indicated by the white boxes in the left and middle panels. f, UMAP visualization of muscle cells clustered by tissue (abdominal wall, aorta, bladder, carotid, diaphragm, fallopian tube, heart, oesophagus, ovary, prostate, salivary gland, spermaduct, testis, tongue, uterus and vagina). The dashed lines encompass clusters of cells belonging to a specific muscle type (cardiac, skeletal or smooth muscle). g, UMAP visualizations of LGR5, MYH2 and MYH7 across skeletal muscle cell types. The dashed line in the left panel indicates clusters belonging to the diaphragm; the one in the right panel indicates LGR5+MYH7+ cells. h, Representative images from smFISH detection of LGR5, MYH7 and their coexpression in skeletal myonuclei of the diaphragm. Scale bar, 20 μm. The panel at the bottom is a magnification of the area indicated by the white box in the adjacent panel; scale bar, 40 μm. Source data

Source data

In the kidney, LGR5+ cells were mostly enriched in the distal convoluted tubule (DCT) and, to a less extent, in the descending and ascending loop of Henle (Fig. 3a). To support this observation, we performed single-cell assay for transposase-accessible chromatin with sequencing (scATAC-seq) of monkey kidney and integrated the results with our snRNA-seq dataset (Fig. 3b, c and Supplementary Fig. 31a, b). The analysis showed peaks of open chromatin at both the LGR5 promoter and a putative enhancer in cell types expressing LGR5 (Fig. 3d). Double smFISH for LGR5 and the DCT cell (DCTC) marker SLC12A3 confirmed coexpression of both genes in a substantial proportion of DCTCs, but showed little or no expression in other cell types (Fig. 3e). To study potential interspecies differences in the Wnt pathway, we merged our monkey kidney data with adult human8,30,31 and mouse7,32,33 kidney snRNA-seq and scRNA-seq datasets. Interestingly, there was lower LGR5 expression in adult human and mouse kidneys, including in DCTCs, than in monkey (Supplementary Fig. 32a–c). The finding in mice is consistent with the low levels of Lgr5 detected in adult mouse kidney using reporter mice or FISH probes34. We also performed a head-to-head comparison of DCTC gene expression, which showed that interspecies differences extend beyond LGR5 (Supplementary Fig. 32d, e and Supplementary Table 4).

In the neocortex, cell cluster integration of available human35 and mouse snRNA-seq datasets with the monkey data indicated differential LGR5 expression patterns between species. LGR5 was highest in OPCs in monkeys and in oligodendrocytes in humans, whereas in mice it was higher in inhibitory neurons than in OPCs and oligodendrocytes (Supplementary Fig. 33a–c). Pseudotime analysis showed high LGR5 abundance along the OPC maturation trajectory towards oligodendrocytes in monkey OPCs (Supplementary Fig. 33d, e). Double immunostaining for the OPC marker PDGFRA and LGR5 confirmed their coexpression in OPCs from monkey neocortex (Supplementary Fig. 33f). We also combined and reclustered all types of muscle cells in our atlas (Fig. 3f). LGR5 was more enriched in MYH7+ slow-twitch myonuclei of the abdominal wall and diaphragm (Fig. 3g), whereas LGR6 was higher in cardiomyocytes and smooth muscle cells (aorta, ovary, carotid and vagina) (Supplementary Fig. 34a). LGR5 and LGR6 expression in slow-twitch skeletal myonuclei and cardiomyocytes, respectively, was validated by smFISH (Fig. 3h and Supplementary Fig. 34b). In mice, Lgr5 is known to be expressed in NMJ myonuclei36 and a subset of satellite cells activated following injury37, but we did not detect enrichment of LGR5 in either cell type in our monkey dataset (Supplementary Figs. 26 and 27). The lack of enrichment in satellite cells is unsurprising given that we did not apply any injury before obtaining the skeletal muscle tissues. Yet, we could detect LGR6 in mouse and human cardiomyocytes using previously reported snRNA-seq datasets38,39 (Supplementary Fig. 34c, d). Similarly, LGR6 was enriched in several monkey pituitary cell populations, with the highest expression in folliculostellate cells, which have been reported to be pituitary gland stem cells40 (Supplementary Fig. 34e). In line with this, these cells also showed expression of other progenitor markers such as SOX2, PAX6, CD44 and CXCR4 (Supplementary Fig. 34f). Moreover, DEGs specific to this LGR5+ population in comparison with other pituitary cells were enriched in GO terms related to development (Supplementary Fig. 34g).

Next, we examined the genes encoding Wnt factors and the R-spondin family (RSPO1–RSPO4) of ligands for LGR proteins25 in a panel of monkey tissues containing cells with high LGR5 and LGR6 (Supplementary Figs. 35a, b and 3639). RSPO cytokine expression was widely distributed among tissues, but higher levels were found in mesenchymal-like cells (for example, smooth muscle cells of the epididymis, hepatic stellate cells and folliculostellate cells from the pituitary gland) and mesothelial cells (for example, of the diaphragm, fallopian tube and ovary). Of note, RSPO2 expression was high in inhibitory neurons from the neocortex (Supplementary Fig. 38a). The expression of Wnt factors was more limited and in general lower than that of RSPO cytokines, but we noticed high levels of WNT9B in principal cells and principal-like cells from the collecting duct in the kidney (Supplementary Fig. 35a, c), WNT2B in mesothelial cells from the fallopian tube (Supplementary Fig. 37a) and ovary (Supplementary Fig. 38c), and, as expected, WNT2 in endothelial cells from the liver41 (Supplementary Fig. 37c). WNT9B expression was lower in mouse7,32,33 and in particular human8,30,31 kidney snRNA-seq datasets than in monkey (Supplementary Fig. 35e). Supporting the monkey snRNA-seq data, scATAC-seq analysis of the WNT9B locus showed increased enhancer accessibility in monkey principal and principal-like cells (Supplementary Fig. 35d). High levels of WNT9B in these cells may be responsible for inducing LGR5 (a Wnt pathway target) in monkey DCTCs. In fact, Wnt factors are known to act predominantly on neighbouring cells24,42, and cells from the collecting duct and DCT are in closer physical proximity than other nephron structures (Supplementary Fig. 35f). We further analysed Wnt receptors and other co-receptors43 as well as the TCF family of transcription factors bound by β-catenin44 as a resource for exploration (Supplementary Figs. 35a, b and 3639). Thus, Wnt and other signalling pathways can be explored in monkey tissues and compared between species using our NHPCA website.

Cell type vulnerability to viruses

To examine the utility of our atlas for advancing knowledge of disease pathogenesis, we first mapped the expression of the main viral receptors and co-receptors for a panel of 126 viruses, including respiratory pathogens, across all monkey tissues. As expected, NCAM1 (encoding the rabies virus receptor) was enriched in astrocytes, oligodendrocytes and neurons, in line with knowledge of this virus attacking the central nervous system45. CD46 (encoding the receptor for measles and herpes viruses) was enriched in epithelial cells from the bladder, cells from the female and male reproductive system, and liver endothelial cells (Fig. 4a, Supplementary Fig. 40 and Supplementary Table 5a).

Fig. 4: Global analysis of ACE2 and TMPRSS2 across monkey tissues.
figure 4

a, Heat map showing the expression of entry receptors and related molecules for a selection of viruses (indicated on the right) in all cell types (indicated at the bottom). Definitions for abbreviations are provided in the Supplementary Note. b, UMAP visualizations of ACE2 (left) and TMPRSS2 (right) expression in all cell types. The bubble plot next to each UMAP plot shows the expression level and ratio of ACE2 and TMPRSS2 in the indicated cell types. c, UMAP projection of ACE2+TMPRSS2+ cells (highlighted in yellow). The bar plots on the right show the ratio of cells expressing both genes. d, Bubble plots showing the ratio and expression levels of ACE2 and TMPRSS2 in gallbladder, kidney, liver and lung in monkeys and humans. The colour of each bubble represents the level of expression, and the size indicates the proportion of expressing cells. e, Left, ArchR track visualization of aggregated scATAC-seq signal at the ACE2 locus in each of the annotated kidney cell types. Predicted binding of human transcription factors based on DNA sequence is shown in the corresponding open chromatin regions of ACE2. Right, bar plots indicating the ratio (%) of ACE2+ cells in each annotated cell type of the monkey kidney. Inset, UMAP visualization of ACE2 in the integrated scATAC-seq and snRNA-seq data from monkey kidney. The red dashed line demarcates the separation of proximal tubule S1 and S3 cells. Asc LOH, ascending loop of Henle cell; AT1, alveolar type 1 cell; CTC, connecting tubule cell; Desc LOH, descending loop of Henle cell; Hep, hepatocyte; Mono, monocyte; PTC, proximal tubule cell; SMC, smooth muscle cell. Source data

Source data

Given the current coronavirus disease 2019 (COVID-19) pandemic caused by severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2; ref. 46), we focused on the receptor for this virus, ACE2, and the serine protease TMPRSS2 (ref. 47) to assess their expression in monkey tissues. This knowledge offers the major advantage of studying COVID-19 pathogenesis in a species that is often used for modelling the disease48. Although lung is the predominantly affected tissue in humans, other tissues such as the kidney (especially proximal tubule cells) and liver are also affected, and clarifying the mechanisms of tissue targeting would improve understanding of disease course and transmissibility5,49. TMPRSS2 showed broad expression across multiple monkey tissues, whereas ACE2 was more restricted (Fig. 4b, Supplementary Figs. 41 and 42, and Supplementary Table 5b). The highest ACE2 expression was found in the gallbladder (mucous, endothelial, glandular and smooth muscle cells), Sertoli cells from the testis, kidney epithelial cells (mostly proximal tubule cells), the lung (ciliated, club and, in particular, alveolar type 2 (AT2) cells) and the liver (hepatocytes and especially cholangiocytes). ACE2 in these tissues was notably heterogeneous, suggesting that regulatory mechanisms fine-tune its expression levels. Double-positive (ACE2+TMPRSS2+) cells have a higher risk of infection by SARS-CoV-2 (refs. 5,47,50,51), and we noticed the largest numbers of these cells among monkey gallbladder cells, in agreement with reports of patients with COVID-19 developing acute cholecystitis52. Considerable coexpression was also observed in cells from the lung and kidney, with less overlap observed in other cell types such as bladder epithelial cells and pancreatic ductal and islet cells (Fig. 4c). We next performed a comparative analysis of ACE2 and TMPRSS2 expression in monkeys and humans8. Similar patterns were seen in liver in the two species, whereas more distinct patterns were observed in the gallbladder, kidney and lung (Fig. 4d).

For a representative tissue with substantial ACE2 levels and a substantial proportion of ACE2+TMPRSS2+ cells, we looked at integrated snRNA-seq and scATAC-seq data from monkey kidney. This analysis identified discrete peaks of open chromatin in the ACE2 promoter and enhancer regions, with the greatest signal in a population of proximal tubule cells containing the highest proportion of ACE2-expressing cells (Fig. 4e). Motif analysis of these peaks demonstrated enrichment in binding sites for STAT1, STAT3, FOXA1, JUNB and several interferon response factor (IRF) proteins. These transcription factors are targets of tissue-protective and innate immune responses mediated by interleukin (IL)-6, IL-1 and interferons53. In this regard, dysregulation of both IL-6 and IL-1β has been implicated in the pathogenesis of severe COVID-19 disease54. Thus, we investigated the coexpression of their receptors (IL6R, IL1R1 and IL1RAP) with ACE2 in monkey kidney, only observing good correlation with ACE2 expression in proximal tubule cells for IL6R (Supplementary Fig. 43a). This observation suggests a potential link between IL-6, STAT transcription factors and enhanced ACE2 levels that may either facilitate viral reservoirs or exacerbate COVID-19 disease progression owing to increased viral dissemination (Supplementary Fig. 43b). In addition to ACE2 and TMPRSS2, numerous other molecules have been implicated in facilitating SARS-CoV-2 binding to the cell surface or in COVID-19 pathogenesis55. Their expression or coexpression in monkey tissues, other associations and virus–host interactions, as well as interspecies differences, can be studied using our NHPCA website.

Mapping traits and diseases to cell types

We next assessed the potential effect of genetic variation linked to complex human traits and diseases in specific monkey body cell types by applying a large panel of genome-wide association studies (GWAS) to our NHPCA. We linked human single-nucleotide polymorphisms from 163 GWAS taken from the UK Biobank (https://nealelab.github.io/UKBB_ldsc/downloads.html) to orthologous coordinates in the monkey transcriptome to calculate the enrichment of traits across the genes expressed in each cell cluster annotated in our dataset (Fig. 5, Supplementary Fig. 44 and Supplementary Table 6a). As a general trend, we observed enriched heritability for neurological traits such as ‘schizophrenia’ and ‘depression’ in clusters corresponding to neural cells. Alzheimer’s disease traits were enriched in immune cells, in line with the knowledge that immune dysfunction contributes to the pathogenesis of this disease56. In line with expectations, we also observed enrichment of immunological-related traits (‘lymphocyte count’, ‘monocyte count’ and traits related to immune disorders) in myeloid cells and B and T lymphocytes. Likewise, blood-related traits such as ‘mean sphered cell volume’ and ‘red blood cell distribution width’ were enriched in erythroid cells. Notably, we observed enrichment for traits such as ‘body mass index’ or ‘waist–hip ratio’ in lower digestive tract epithelial cells and somatotrope cells from the pituitary gland. Similarly, type 2 diabetes- and cholesterol-related traits showed not only the expected association with pancreatic cells (acinar, ductal and islet cells) and hepatocytes, but also associations with several kidney cell populations57. Our analysis also indicated enrichment of attention deficit and hyperactivity disorder, which often presents with motor abnormalities58, in skeletal muscle type II myonuclei in addition to neural cells (Fig. 5). To evaluate differences in target cell specificity among species, we further compared a selected panel of GWAS traits to cell types within the neocortex (our own dataset), heart and kidney in mice33,39, humans30,35,38 and monkey (Supplementary Fig. 45a). Neurological and neuropathological traits were more strongly linked to neurons in humans and monkeys than to those in mice. Notably, migraine had a higher score in human and monkey excitatory neurons than in mice but was more highly enriched in kidney intercalated cells of these two species.

Fig. 5: Association of monkey cell transcriptomic profiles with common human traits and genetic diseases.
figure 5

The heat map shows the association of selected common human traits and diseases (indicated on the right) with the monkey cell types (indicated at the bottom) annotated in our dataset. The coloured boxes indicate selected enriched patterns. Definitions for abbreviations are provided in the Supplementary Note.Source data

Source data

We also generated a correlation map of specific mutant genes causing human diseases (Supplementary Fig. 46 and Supplementary Table 6b). As expected, genes related to retinitis pigmentosa were specifically expressed in monkey photoreceptors, while genes related to porphyria were associated with erythroblasts. In addition, we compared the interspecies distribution of a selection of genes related to human neurological diseases in mouse, human35 and monkey neocortex. As with the GWAS, we observed a generally higher correlation of the expression in specific cell types between humans and monkeys than between either of these species and mice (Supplementary Fig. 45b). However, some genes were linked to different cell types in monkeys and humans. For instance, spinocerebellar ataxia caused by mutations in PLEKHG4 (ref. 59) and ataxia telangiectasia caused by mutations in ATM were enriched in astrocytes and oligodendrocytes60, respectively, in humans, while they were enriched in distinct types of inhibitory neurons in monkeys and mice. Further scrutiny of these and other GWAS datasets and disease-related genes as well as wider interspecies comparisons using our website should provide additional insights.

Discussion

Despite the enormous potential, few NHP tissues have been profiled thus far at the single-cell/nucleus level, and use of different species, experimental conditions and platforms makes comparisons challenging13,61. To address this, we have generated the first version of a large-scale cell transcriptomic atlas for an NHP widely used in research studies, M. fascicularis, and an open, expandable and interactive NHPCA database to facilitate its exploration.

In addition to the study of NHP physiology, our dataset will be valuable for understanding tissues that either have not been profiled at all at the single-cell/nucleus level in humans or lack sufficient cell numbers, enabling interspecies adaptive comparisons and predicting disease susceptibility. With respect to the latter, the observed association between IL-6, STAT transcription factors and ACE2 in the kidney could explain the reported positive effects of tocilizumab, a humanized monoclonal antibody against IL-6R, for the treatment of patients with severe COVID-19 disease62. Although it is currently under debate whether the human kidney is infected by SARS-CoV-2 (ref. 63), this positive feedback loop may exist in other tissues. Notably, we have also shown that the distribution of ACE2 and TMPRSS2 expression across different cell types is not identical between monkeys and humans. This could influence SARS-CoV-2 pathogenesis and may for example explain why drugs such as hydroxychloroquine, despite providing promising results in monkey cell lines in vitro, are not effective in humans64. The analysis of human genetic disease susceptibility confirmed clinical associations between motor symptoms and attention deficit and hyperactivity disorder58 as well as between migraine and the kidney65. Interspecies comparison for a panel of genes showed that differences in target cell susceptibility exist between humans and monkeys, further demonstrating that a cautious approach is required when modelling human diseases in NHPs.

Notably, in the survey for Wnt pathway components, we identified an unexpected enrichment of LGR5+ cells in the monkey DCT in comparison with mice and humans. The maintenance of high levels of LGR5 in DCTCs and of WNT9B in cells from the collecting duct suggests that the monkey DCT could have different properties than in mice and humans, but this remains to be studied. Similarly, LGR5+ cells in the neocortex correspond mainly to OPCs in monkeys, oligodendrocytes (and, to a lesser extent, OPCs) in humans and inhibitory neurons in mice. This is consistent with the knowledge that Wnt activity regulates oligodendrocyte function and OPC to oligodendrocyte differentiation66, but points to interspecies differences in the mode of action. Likewise, the expression of LGR5 in skeletal slow-twitch myofibres and LGR6 in the pituitary gland and heart is intriguing. During development, Wnt activity regulates skeletal myogenesis and myofibre typing, cardiomyocyte progenitor proliferation and pituitary gland growth67,68, but little is known about its role in adults.

Apart from these analyses and comparisons, our NHPCA website provides a platform for interactive comparisons with manually uploaded datasets. When doing this, the type of sequencing platform and use of single-nucleus versus single-cell analysis should be considered, as these factors can influence the number of captured genes as well as the cell populations detected and their relative proportions. In the future, the NHPCA database will be extended with additional omics layers and datasets from disease modelling studies and ageing. It will also be relevant to compare our M. fascicularis atlas with future cell atlases from humans and other non-endangered NHPs. Altogether, this information will be instrumental for advancing knowledge of primates.

Methods

Ethics statement

All experimental protocols in this study were reviewed and approved by the Institutional Animal Care and Use Committee of Huazhen Bioscience (permit no. HZ2019027) and the Institutional Review Board on Ethics Committee of Beijing Genomics Institute (BGI; permit nos BGI-IRB 19125-T2 and BGI-IRB 21136). The study was also implemented in compliance with the US National Institutes of Health Guide for the Care and Use of Laboratory Animals (8th edition, 2011).

Collection of animal tissues

A total of three female and five male cynomolgus monkeys, approximately 6 years old, were obtained from Huazhen Laboratory Animal Breeding Centre (Guangzhou, China). Monkeys were anaesthetized with an injection of ketamine hydrochloride (10 mg per kg) and sodium pantabarbital (40 mg per kg) before being euthanized by exsanguination. Wild-type C57BL/6J male mice, approximately 8 weeks old, were purchased from Guangdong Medical Lab Animal Center. Mice were provided with food and water ad libitum and maintained on a regular 12-h day/12-h night cycle. Ambient temperature was set to 18–23 °C, and relative humidity was set to 40–60%. One mouse was euthanized by neck dislocation. Monkey and mouse tissues were isolated and placed on an ice-cold board for dissection. Each tissue (except for bone marrow, peripheral blood and tissues on which enzymatic digestion was performed) was cut into 5–10 pieces of roughly 50–200 mg each. Samples were transferred to cryogenic vials (Corning, 430488) and then snap frozen in liquid nitrogen and stored in liquid nitrogen until nuclear extraction was performed. Peripheral blood mononuclear cells (PBMCs) from heparinized venous blood and bone marrow cells were isolated using Lymphoprep medium (STEMCELL Technologies, 07851) according to a standard density gradient centrifugation protocol. Cells from these two tissues were resuspended in freezing medium composed of 90% FBS (Thermo Fisher, 1921005PJ) and 10% DMSO (Sigma-Aldrich, D2650) and frozen using a Nalgene Mr. Frosty Cryo 1 °C Freezing Container (Thermo Fisher Scientific, 5100-0001) in a −80 °C freezer for 24 h before being transferred to liquid nitrogen for long-term storage.

Single-nucleus/cell suspension preparation

Single-nucleus isolation was performed as previously described69. In brief, tissues were thawed, minced and transferred to a 1-ml Dounce homogenizer (TIANDZ) with 1 ml of homogenization buffer A containing 250 mM sucrose (Ambion), 10 mg ml–1 BSA (Ambion), 5 mM MgCl2 (Ambion), 0.12 U μl–1 RNasin Plus (Promega, N2115), 0.12 U μl–1 RNasein (Promega, N2115) and 1× cOmplete Protease Inhibitor Cocktail (Roche, 11697498001). Frozen tissues were kept in an ice box and homogenized by 25–50 strokes of the loose pestle (pestle A), after which the mixture was filtered using a 100-µm cell strainer into a 1.5-ml tube (Eppendorf). The mixture was then transferred to a clean 1-ml Dounce homogenizer to which 750 μl of buffer A containing 1% Igepal (Sigma, CA630) was added, and the tissue was further homogenized by 25 strokes of the tight pestle (pestle B). After this, the mixture was filtered through a 40-µm strainer into a 1.5-ml tube and centrifuged at 500g for 5 min at 4 °C to pellet the nuclei. The pellet was resuspended in 1 ml of buffer B containing 320 mM sucrose, 10 mg ml–1 BSA, 3 mM CaCl2, 2 mM magnesium acetate, 0.1 mM EDTA, 10 mM Tris-HCl, 1 mM DTT, 1× cOmplete Protease Inhibitor Cocktail and 0.12 U μl–1 RNasein. This was followed by a centrifugation at 500g for 5 min at 4 °C to pellet the nuclei. Nuclei were then resuspended with cell resuspension buffer at a concentration of 1,000 nuclei per μl for library preparation.

Because of technical limitations in obtaining high-quality nuclei, scRNA-seq was performed for colon, duodenum, spleen, stomach, skin and testis. To do this, cells were obtained from fresh tissue by enzymatic digestion. Tissues were first rinsed with PBS, minced into small pieces by mechanical dissociation and incubated for 1 h in 10 ml DS-LT buffer (0.2 mg ml–1 CaCl2, 5 μM MgCl2, 0.2% BSA and 0.2 mg ml–1 Liberase in HBSS) at 37 °C. After this, the tissue digestion was stopped by adding 3 ml of FBS, followed by filtration through a 100-µm cell strainer and centrifugation at 500g for 5 min at 4 °C. Cells from lymph node and spleen were obtained from fresh tissue by mechanical dissociation. Cells from bone marrow and PBMCs were obtained as described in the ‘Collection of animal tissues’ section. Samples were filtered through a 40-µm cell strainer and centrifuged at 500g for 5 min at 4 °C. Pellets were resuspended in cell resuspension buffer at 1,000 cells per μl for library preparation.

scRNA-seq and snRNA-seq sample preparation

The DNBelab C Series Single-Cell Library Prep Set (MGI, 1000021082) was used as previously described4. In brief, single-nucleus/cell suspensions were used for droplet generation, emulsion breakage, bead collection, reverse transcription and cDNA amplification to generate barcoded libraries. Indexed libraries were constructed according to the manufacturer’s protocol. Concentrations were measured with a Qubit ssDNA Assay Kit (Thermo Fisher Scientific, Q10212). Libraries were sequenced on a DNBSEQ-T1 or DNBSEQ-T7 sequencer at the China National GeneBank (Shenzhen, China) with the following sequencing strategy: 41-bp read length for read 1 and 100-bp read length for read 2.

scATAC-seq sample preparation

scATAC-seq libraries were prepared using the DNBelab C Series Single-Cell ATAC Library Prep Set70 (MGI, 1000021878). In brief, nuclei were extracted from tissue using the same protocol as describe above. After Tn5 tagmentation, transposed single-nucleus suspensions were converted to barcoded scATAC-seq libraries through droplet encapsulation, pre-amplification, emulsion breakage, captured bead collection, DNA amplification and purification. Indexed libraries were prepared according to the manufacturer’s protocol. Concentrations were measured with a Qubit ssDNA Assay Kit. Libraries were sequenced on a BGISEQ-500 sequencer at the China National GeneBank (Shenzhen, China) with the following sequencing strategy: 50-bp read length for read 1 and 76-bp read length for read 2.

scRNA-seq and snRNA-seq data processing

Raw data processing

Raw sequencing reads from DNBSEQ-T1 or DNBSEQ-T7 were filtered and demultiplexed using PISA (v0.2; https://github.com/shiquan/PISA). Reads were aligned to the Macaca_fascicularis_5.0 genome using STAR (v2.7.4a)71 and sorted by sambamba (v0.7.0)72. For tissues sequenced with scRNA-seq, reads were aligned to the exons of mRNA as normal. For tissues sequenced with snRNA-seq, a custom ‘pre-mRNA’ reference was created for alignment of count reads to introns as well as to exons because of the large amount of unspliced pre-mRNA in the cell nucleus. Thus, each gene’s transcript in snRNA-seq was counted by including exon and intron reads together73. Next, a cell/nucleus versus gene UMI count matrix was generated with PISA.

Ambient RNA removal

Ambient RNA noise was reduced using SoupX (v1.4.8; https://github.com/constantAmateur/SoupX)74 with default settings apart from the contamination fraction (represented as rho). The rho value was automatically parameterized using the autoEstCont function in tissues where rho was lower than 0.05 or higher than 0.2. In other tissues, the rho value was manually set to 0.2 using the setContaminationFraction function if the autoEstCont value was between 0.05 and 0.2.

Doublet removal

For each library, we performed doublet removal using DoubletFinder75. DoubletFinder first averages the transcriptional profile of randomly chosen cell pairs to create pseudo-doublets and then predicts doublets according to each real cell’s similarity in gene expression to the pseudo-doublets. Doublet removal was performed with the default parameter of DoubletFinder, and the 5% of cells most similar to the pseudo-doublets were excluded.

Cell clustering and cell type identification in scRNA-seq and snRNA-seq data

Cells or nuclei were preprocessed and filtered on the basis of a minimal expression threshold of 500 genes and genes being expressed by at least three cells or nuclei. Cells or nuclei fulfilling these criteria were kept for downstream analysis. In addition, cells or nuclei with more than 10% mitochondrial gene counts were removed. Global clustering of the complete cynomolgus monkey tissue dataset was performed using Scanpy (v1.6.0)76 in a Python environment (v3.6). Filtered data were transformed by ln(counts per million (CPM)/100 + 1). Three thousand highly variable genes were selected according to their average expression and dispersion. The number of UMIs and the percentage of mitochondrial genes were regressed out, and each gene was scaled with default options. Parameters used in each function were manually curated to obtain the optimal clustering of cells. Dimension reduction started with principal-component analysis, and the number of principal components used for UMAP visualization depended on the importance of the embeddings. The Louvain method was then used to detect subgroups of cells. For individual clustering, each tissue dataset was visualized using the Seurat package (v4.0.3)77 in the R environment (v4.0.2). Data from different replicates were normalized using the NormalizeData function with default options, and the top 2,000 most variable genes of each replicate were then calculated by FindVariableFeatures with the vst method. The replicable variable genes across replicates were selected to perform the FindIntegrationAnchors function for batch correction and then used to created an integrated data assay. The standard workflow for clustering and visualization was performed on the basis of the integrated data assay with default parameters according to the guidance of Seurat (https://satijalab.org/seurat/articles/integration_introduction.html). For kidney data, replicates were aligned to the monkey FM1 data with the FindIntegrationAnchors function using option reference = 1. Finally, each cluster was annotated by extensive literature review and searches for specific gene expression patterns.

DEGs and GO term enrichment

In the global clustering, we performed DEG analysis using the sc.pl.rank_genes_groups function in Scanpy. In other analyses, we used the FindMarkers or FindAllMarkers function in Seurat. Analysis of DEGs among different cell types within one tissue was performed with the FindAllMarkers function. DEGs were defined as genes with a fold change > 2 and adjusted P < 0.01. GO enrichment analysis was performed using the CompareCluster function of ChIPseeker (v1.22.1)78. Only GO terms with Q value < 0.05 were retained.

Cross-species comparisons

Between-atlas comparisons

For interspecies cell atlas analysis, data were retrieved from the HCL8 and MCA7. The count matrix for each tissue in the three species was preprocessed in three steps: (1) orthologous gene lists were downloaded from Ensembl79 and only genes that were orthologous for all three species were kept; (2) only genes expressed in at least one cell in each of the three species were kept; and (3) gene names for the human and mouse count matrix were converted into orthologues in M. fascicularis. After preprocessing, the count matrices of the three species were integrated and subjected to clustering using the standard integrated pipeline of Seurat with one additional criterion that only cells expressing more than 200 genes were kept. Seurat clusters were then annotated into different cell types using cell-type-specific markers defined in this paper.

Cross-species comparisons for other tissues

To obtain more accurate comparisons, we specifically chose three tissues, namely kidney7,8,30,31,32,33, neocortex35 (mouse neocortex data from our own samples) and heart38,39. Apart from the MCA and HCL kidney data, we downloaded the following data from public databases: human kidney, GSE121862 and GSE151302; mouse kidney (Tabula Muris), GSE107585; human neocortex, GSE97942; human heart, ERP123138; mouse heart, E-MTAB-7869; we also used our own mouse neocortex data (https://db.cngb.org/nhpca/). All data, except those from the MCA, HCL and Tabula Muris, were processed using our pipeline described above in the ‘scRNA-seq and snRNA-seq data processing’ section. Data were integrated using the same preprocessing, clustering and annotation method described above. Clusters with cell numbers lower than 200 were excluded. After annotation, we performed DEG analysis by comparing our dataset and each of the downloaded datasets within the same cell type. We used a critical cut-off in this analysis: fold change > 2 and adjusted P < 0.01. Only DEGs shared by three human datasets or three mouse datasets were considered to be species-specific DEGs.

Common cell analysis

For each common cell type, we extracted cells from all tissues in our dataset according to the cell type annotation presented in Supplementary Figs. 1215. For the downstream analysis, we excluded common cell clusters from each individual tissue if the cell number of the cell cluster was less than 200. Data from different replicates were integrated following a standard integration pipeline using Seurat. To reduce the influence of ambient RNA and technical differences between snRNA-seq and scRNA-seq, the analysis of tissue-specific DEGs in Fig. 2 and Supplementary Fig. 21 was stringently defined. We first performed DEG analysis by comparing a selected cell type and other cell types within an individual tissue to define selected cell-type-specific genes in each tissue. We computed Pi, j,k as the fraction of cells in tissue i expressing gene j in cell population k. A given cell-type-specific gene j in tissue i (SCSGi) was defined using the following cut-off: log2(fold change) > 2, adjusted P < 0.01 and (Pi, j,c1 – Pi, j,c2)/Pi, j,c1 > 0.8 (where c1 represents a given cell type in tissue i and c2 represents other cell types in tissue i). After this, we tested whether SCSGi genes were differentially expressed in a given cell type in tissue i as compared to other tissues. Genes were finally determined to be tissue-specific DEGs of a given cell type in tissue i if they met the following conditions: log2(fold change) > 0.5 and adjusted P < 0.01.

Pseudotime trajectory analysis

The cell lineage trajectory was inferred using Monocle 2 (ref. 80) according to the tutorial. After the cell trajectory was constructed, DDRtree was used to visualize it in two-dimensional space.

Cell–cell interaction networks

To assess the cellular cross-talk between different cell types in each tissue, we used CellPhoneDB, a public repository of ligand–receptor interactions81. Cell-type‐specific receptor–ligand interactions between cell types were identified on the basis of specific expression of a receptor by one cell type and a ligand by another cell type. The interaction score refers to the mean total of the average expression values for all individual ligand–receptor partners in the corresponding interacting pairs of cell types. Before analysis, cells from the same cell type were aggregated in groups of 20 to make pseudo-cells in each organ. For this analysis, we applied a statistical method to ensure that only receptors or ligands expressed by more than 10% of the cells in the given cluster were considered. The total mean of the average expression values for individual partners in the corresponding interacting pairs of cell types was calculated.

Association of human GWAS and genetic disease data with monkey cell types

To test the enrichment of genes related to human diseases and traits for each cluster of cells based on global clustering, we applied linkage disequilibrium (LD) score regression analysis as previously described (https://github.com/bulik/ldsc/wiki/LD-Score-Estimation-Tutorial)82. For this, we only considered DEGs with an adjusted P < 0.01 and fold change > 2 in the tested cell types. Then, we converted the genome coordinates of Macaca_fascicularis_5.0 into hg19 genome coordinates by orthologous gene list download from Ensembl. The summary statistics file for each trait was downloaded from the UK Biobank database or published studies (Supplementary Table 6a). To calculate cell-type-specific LD scores, we first created annotation files for 22 chromosomes in each cell type with script make_annot.py using options --bed-file --bimfile 1000G.EUR.QC.bim --annot-file. Then, the annotation files were used as input to compute LD scores with the ldsc.py script using options --l2 --bfile 1000G.EUR.QC --ld-wind-cm 1 --annot --thin-annot --print-snps. Next, we ran the ldsc.py script with the --h2-cts flag to perform regressions following the standard workflow (https://github.com/bulik/ldsc/wiki/Cell-type-specific-analyses). We report the coefficient P value as a measure of the association of each cell type with the traits. All plots show the −log10-transformed P-value z-score of partitioned LD score regression. The cross-species GWAS analysis was performed on the basis of the integrated Seurat object.

scATAC-seq data processing

Raw sequencing reads from BGISEQ-500 were filtered, demultiplexed and aligned to the Macaca_fascicularis_5.0 genome using PISA. Fragment files for each library were generated for downstream analysis. The transcription start site enrichment score and fragment number for each nucleus were calculated using ArchR83. Cells with transcription start site enrichment scores lower than 5 and fragment numbers lower than 1,000 were removed. We then calculated the doublet score with the addDoubletScores function in ArchR and filtered using the filterDoublets function with parameter filterRatio = 2. Clustering analysis was performed using ArchR by first identifying a robust set of peak regions followed by iterative latent semantic indexing (LSI) clustering. In brief, we created 500-bp tiles across the genome and determined whether each cell was accessible within each tile. Next, we performed an LSI dimensionality reduction on these tiles with the addIterativeLSI function in ArchR. We then performed Seurat clustering (FindClusters) on the LSI dimensions at a resolution of 0.8. Anchors between the scATAC-seq and scRNA-seq/snRNA-seq datasets were identified and used to transfer cell type labels identified from the scRNA-seq/snRNA-seq data. Data were co-embedded using the TransferData function of Seurat.

Transcription factor motif enrichment analysis

To predict the motif footprint in peaks within the ACE2 promoter and enhancer sequences, we extracted the genome sequence in each peak region with Seqkit (v0.7.0)84. Sequences were matched to all Homo sapiens motifs form JASPAR2018 using the matchMotifs function in motifmatchr (v1.8.0) with the default parameter.

Immunofluorescence staining

Staining of monkey liver, subcutaneous and visceral adipose tissue, ovary and neocortex samples was conducted following a standard protocol. In brief, paraffin-embedded sections were deparaffinized, incubated with primary antibody for albumin (1:250 dilution; Abcam, ab207327) in liver, with primary antibody for CD34 (1:50 dilution; BioLegend, 34063) and NOX4 (1:100 dilution; Invitrogen, MA5-32090) in both types of adipose tissue, with primary antibody for CD44 (1:50 dilution; Proteintech, 60224-1-lg) in ovary, and with primary antibodies for PDGFRα (1:500 dilution; Cell Signaling, 3174S) and LGR5 (1:50 dilution; Abcam, ab273092) in neocortex overnight at 4 °C, followed by incubation with a secondary antibody conjugated to Alexa Fluor 488 (1:250 dilution; Jackson ImmunoResearch, 715-545-150) or Cy3 (1:250 dilution; Jackson ImmunoResearch, 711-165-152) for 30 min at room temperature. Slides were mounted with Slowfade Mountant+DAPI (Life Technologies, S36964) and sealed.

smFISH

smFISH of monkey kidney, diaphragm and heart tissues was performed using RNAscope Fluorescent Multiplex and RNAscope Multiplex Fluorescent v2 (Advanced Cell Diagnostics) according to the manufacturer’s instructions. The following alterations were made: the thickness of the paraffin section was adjusted to 5 μm, the target retrieval boiling time was adjusted to 15 min, and the incubation time with Protease plus at 40 °C was adjusted to 30 min. The following fluorescence channels were used for RNAscope probes: LGR5 (C1), SLC12A3 (C2), LGR6 (C2) and MYH7 (C2). For ovary, LGR5 (C1) probe was used before staining with primary antibody for CD44 (Proteintech, 60224-1-Ig) and subsequent incubation with secondary antibody (Alexa Fluor, Jackson ImmunoResearch) for 30 min at room temperature. Slides were mounted with Slowfade Mountant+DAPI (Life Technologies, S36964) and sealed.

Statistics and reproducibility

For smFISH and immunofluorescence staining experiments, each in situ hybridization probe or antibody staining was repeated with similar results on at least three separate samples and on at least two sections per sample. The experiments were not randomized, and the investigators were not blinded to allocation during experiments and outcome assessment. No statistical methods were used to predetermine sample size.

Reporting summary

Further information on research design is available in the Nature Research Reporting Summary linked to this paper.