Cells of the adult human heart

Cardiovascular disease is the leading cause of death worldwide. Advanced insights into disease mechanisms and therapeutic strategies require a deeper understanding of the molecular processes involved in the healthy heart. Knowledge of the full repertoire of cardiac cells and their gene expression profiles is a fundamental first step in this endeavour. Here, using state-of-the-art analyses of large-scale single-cell and single-nucleus transcriptomes, we characterize six anatomical adult heart regions. Our results highlight the cellular heterogeneity of cardiomyocytes, pericytes and fibroblasts, and reveal distinct atrial and ventricular subsets of cells with diverse developmental origins and specialized properties. We define the complexity of the cardiac vasculature and its changes along the arterio-venous axis. In the immune compartment, we identify cardiac-resident macrophages with inflammatory and protective transcriptional signatures. Furthermore, analyses of cell-to-cell interactions highlight different networks of macrophages, fibroblasts and cardiomyocytes between atria and ventricles that are distinct from those of skeletal muscle. Our human cardiac cell atlas improves our understanding of the human heart and provides a valuable reference for future studies. Single-cell and single-nucleus RNA sequencing are used to construct a cellular atlas of the human heart that will aid further research into cardiac physiology and disease.

The heart is a complex organ, composed of four morphologically and functionally distinct chambers (Fig. 1a). Deoxygenated blood from the low-pressure right atrium and ventricle is propelled into the lungs. Oxygenated blood enters the left atrium and ventricle, which propels blood across the body at systemic pressure. Chambers are separated by the interatrial and interventricular septa, and unidirectional flow is established by the atrio-ventricular and ventricular-arterial valves. An intrinsic electrophysiological system rapidly propagates electrical impulses from the sinoatrial node to the atrioventricular node, and along Purkinje fibres to the apex where contraction begins. Cardiac anatomical and functional complexity requires exquisite orchestration of heterogeneous cell populations to enable continuous contraction and relaxation across different pressures, strains and biophysical stimuli in each chamber.
The heart is derived from multipotent progenitor cells that comprise two heart fields. Cells of the first heart field primarily populate the left ventricle; second heart field cells populate the right ventricle, and both fields contribute to the atria. Haemodynamics changes in the postnatal period and the distinct gene regulatory networks that operate in each heart field presumably prime gene expression patterns of adult heart cells 1 .
Single-cell and single-nucleus RNA sequencing (scRNA-seq and snRNA-seq, respectively) and multiplex single-molecule fluorescence in situ hybridization (smFISH) enable the identification of anatomical specificities, molecular signatures, intercellular networks and spatial relationships by illuminating the coordinated communication of cardiac cells within their microenvironments 2 .
We present comprehensive transcriptomic data on six distinct cardiac regions, providing, to our knowledge, the largest reference framework so far 3,4 . We incorporate snRNA-seq to ensure high-throughput capture of large cardiomyocytes (length and width approximately 100 and 25 μm) and scRNA-seq to upsample and enrich endothelial and immune cell populations. Using multiplex smFISH imaging, we describe the spatial distribution of selected cell populations and cell-cell co-localizations. We compare cardiac cell and nuclear transcriptomes with those of skeletal muscle and kidney, highlighting cardiac-specific cell signatures. Our study defines the cellular and molecular signatures of the adult healthy heart, and enables functional plasticity in response to varying physiological conditions and diseases.

Cellular landscape of the adult human heart
We isolate single cells, nuclei and CD45 + enriched cells from the left and right ventricular free walls, left and right atrium, the left ventricular apex, and interventricular septum, from 14 adults (Fig. 1a, b, Supplementary Table 1). After processing with 10X Genomics and a generative deep variational autoencoder, the resulting dataset comprises 45,870 cells, 78,023 CD45 + enriched cells and 363,213 nuclei for 11 major cell types: atrial cardiomyocytes, ventricular cardiomyocytes, fibroblasts (FBs), endothelial cells (ECs), pericytes, smooth muscle cells (SMCs), immune cells (myeloid and lymphoid), adipocytes, mesothelial cells and neuronal cells (Fig. 1c, e, Extended Data Figs. 1,2).
The ventricular proportions of ventricular cardiomyocytes and FBs are negatively correlated, whereas pericytes and SMC proportions are positively correlated, indicating a functional organization (Supplementary Table 3). Cell distributions are generally similar in male and female hearts. However, the mean percentages of ventricular cardiomyocytes from left and right ventricles are higher in female hearts (56 ± 9%; mean ± s.d.) and associated with a stronger negative correlation between ventricular cardiomyocytes and FBs (r = −0.8; slope = −0.9) compared to male hearts (47 ± 11%; P = 0.03; ventricular cardiomyocytes-FBs, r = −0.4; slope = −0.3). Differences in the proportions of cardiomyocytes is unexpected given the average smaller female heart mass, and if confirmed might explain higher cardiac stroke volumes in women 5 and lower rates of cardiovascular disease.

Cardiomyocyte heterogeneity
Cardiomyocytes show high-level expression of genes that encode contractile force-generating sarcomere proteins (TTN, MYBPC3 and TNNT2) and calcium-mediated processes (RYR2, PLN and SLC8A1). Consistent with bulk tissue RNA-sequencing (RNA-seq) data 6 , we observe markedly distinct transcriptional signatures in ventricular and atrial cardiomyocytes, reflecting developmental origins and differences in electrophysiological, contractile and secretory processes (Extended Data Fig. 3, Supplementary Table 4). Ventricular cardiomyocytes are enriched in genes encoding sarcomere proteins (MYH7 and MYL2), transcription factors (IRX3, IRX5, IRX6, MASP1 and HEY2), and PRDM16, mutated in left ventricular non-compaction cardiomyopathy 7 . Other abundant transcripts enable tissue integrity despite high ventricular strain: PCDH7 encodes a calcium-dependent strong adhesive molecule 8 ; SMYD2 encodes a lysine methyltransferase that promotes sarcomere formation and stabilization 9 . Atrial cardiomyocytes abundantly express prototypic genes and also ALDH1A2, an enzyme required for retinoic acid synthesis, ROR2, which participates in Wnt signalling during lineage differentiation 10 , and SYNPR, which functions in the mechanosensing of TRP channels by atrial volume receptors 11 .
We identify five ventricular cardiomyocyte (vCM1-vCM5) populations: vCM1 comprise 63.9% of left ventricular cardiomyocytes but only 36.7% of right ventricular cardiomyocytes (Fig. 2a,  Article fluorescence in situ hybridization (smFISH) (Fig. 2c, Extended Data Fig. 3c). Among vCMs, vCM2 has the highest expression of the myosin gene MYH6 and CDH13, a cell surface T-cadherin receptor for cardioprotective adiponectin and low density lipoproteins, both associated with several cardiometabolic traits 13 . vCM3 and vCM4 are present across all ventricular regions. The vCM3 transcriptional profile resembles a prominent right atrium population (aCM3, discussed below) with retinoic-acid-responsive SMC gene enrichment (MYH9, NEXN and CNN1) 14,15 . vCM3 also express stress-response genes including ANKRD1 16 , FHL1 17 (verified by smFISH) (Fig. 2d, Extended Data Fig. 3c), DUSP27 18 , and XIRP1 and XIRP2, interacting with intercalated disc ion channel proteins implicated in cardiomyopathy and arrhythmias 19 . The small population vCM4 is enriched for nuclear-encoded mitochondrial genes (NDUFB11, NDUFA4, COX7C and COX5B) and Gene Ontology terms indicative of a high energetic state (Extended Data Fig. 3f). vCM4 also demonstrates high levels of CRYAB, which encodes a cytoprotective and antioxidant heat shock protein 20 , of sarcomere protein genes and PLN, indicating that these ventricular cardiomyocytes are outfitted to perform higher workload than other ventricular cardiomyocytes.
vCM5 (approximately 1%) comprises cells with high levels of DLC1 and EBF2 21 , regulating brown adipocyte differentiation and perhaps cardiac pacemaker activity, and transcripts identified in neuronal lineages (SOX5, EBF1 and KCNAB1). As EBF1-depleted mice have a profoundly hypoplastic ventricular conduction system 22 , vCM5 may participate in electrophysiology.
We identify five atrial cardiomyocyte populations (aCM1-aCM5) (  Tables 6, 7). HAMP, a master regulator of iron homeostasis, is considerably enriched in more than 50% of right atrium cardiomyocytes versus 3% left atrium cardiomyocytes (verified by smFISH) 23 (Fig. 2g, Extended Data Fig. 3c), indicating energetic differences 6 . HAMP has unknown roles in cardiac biology, but Hamp-null mice have deficits in the electron transport chain and lethal cardiomyopathy 24 . aCM1 shows robust expression of prototypic atrial transcripts, indicative of a basal atrial cardiomyocyte gene program, and lower levels of molecules with neuronal functions (ADGRL2, NFXL1 and ROBO2). aCM2 predominantly expresses HAMP within the right atrium and is enriched for SLIT3, the developmental ligand for cardiac ROBO receptors 25 , ALDH1A2 26 and BRINP3, involved in retinoic acid signalling, and GRXCR2, supporting cilia involved in mechanosensing 27 . aCM3 and vCM3 share similar transcriptional profiles including enrichment of the SMC gene CNN1 (verified by smFISH) (Fig. 2h, Extended Data Fig. 3c). The molecular signatures of aCM2, aCM3 and vCM3 indicate derivation from the second heart field 28 . aCM4 transcripts denote high metabolic activity, similar to vCM4, and have the highest expression of NPPA. aCM5 expresses similar transcripts to vCM5.

Vascular, stromal and mesothelial cells
The vascular compartment includes 17 distinct populations of EC, SMC, pericyte, mesothelial cells with anatomical and arterio-venous specificities (Fig. 3a,    in atria. PC1_vent cells express adhesion molecules (NCAM2 and CD38), and CSPG4, which is involved in microvascular morphogenesis and EC cross-talk 35 (Extended Data Fig. 4d-f). PC3_str co-express pericyte markers and very low levels of pan-EC transcripts. RNA velocity analyses suggest a directionality that indicates PC3_str cells as a transitional state between pericytes and ECs (Extended Data Fig. 4h, i). These observations may relate to bidirectional pericyte or endothelial cell (trans)differentiation, which remains controversial 36 . Vascular SMCs that express MYH11 split into two populations. SMC1_basic cells express transcripts that indicate immaturity, including the stem-cell marker LGR6 37 and proliferation-associated RGS5 38 . SMC2_art cells express considerably higher levels of CNN1, ACTA2 and TAGLN, indicating arterial origin, whereas SMC1_basic cells may be venous-derived 39 (Extended Data Fig. 4d-f).
Cell-cell interaction analyses indicate connections between ECs and mural cells in different vascular segments ( Fig. 3b- Fig. 5) shows co-occurrence of EC5_art and SMC2_art markers and JAG1 and NOTCH2, thereby supporting this interaction between ECs and SMCs. A venous-specific DLL1-NOTCH3 interaction is predicted for EC6_ven and SMC1_basic cells. Notably, many of the venous and arterial EC predicted interactions are shared with capillary ECs, which suggests gradual changes along the arterio-venous axis 39 .
We define a distinct small population as mesothelial cells that enrich for MSLN, WT1 and BNC1 41 but lack EC, FB or mural genes. smFISH confirms this annotation with localization of BNC1 + /CDH5 − cells to the epicardium (Extended Data Fig. 4l-n).

Cardiac fibroblasts
Cells of the FB compartment show enriched expression of DCN, GSN and PDGFRA within seven populations (Fig. 3g) with regional enrichment in ventricles (FB1) and atria (FB2). This is consistent with distinctive functional properties, including stronger profibrotic responses, by atrial FBs 42 . FB1 and FB2 cells express canonical genes and define a basal, chamber-specific FB expression program (Extended Data Fig. 6a, Supplementary Table 11).
FB4 and FB5 cells are less abundant in the right atrium than other regions, whereas FB3 are less abundant in the left ventricle (Extended Data Fig. 6c). FB4 cells express genes responsive to TGFβ signalling (for example, POSTN and TNC) (Fig. 3e). FB5 cells have higher expression of genes involved in the production, remodelling and degradation of extracellular matrix (ECM). By contrast, FB3 cells have lower expression of ECM-related genes but higher expression of cytokine receptors such as OSMR and ILST6 43 (Fig. 3f, Extended Data Fig. 6b, Supplementary Table 12). These distinctive fibroblast gene programs probably govern stress-responsive cardiac remodelling and contribute to homeostasis.
Separate clustering of atrial and ventricular FBs recapitulated the populations described above, including an OSM-signalling population in each chamber (aFB4 and vFB3). In addition, we identify distinct chamber-specific ECM-producing FBs that differ in the expression of collagen isoforms and other ECM-related (aFB2 versus vFB2) (Extended Data Fig. 6g
Predicted cell-cell interactions identify receptor-ligand circuits among immune cells, cardiomyocytes, and FBs. LYVE1 + , monocyte-derived and antigen-producing macrophages are predicted to interact with FB4 via CD74-MIF (Fig. 4b, Extended Data Fig. 8d, e, Supplementary Table 17). Inhibition of this interaction leads to fibrosis 48 and tissue damage 49 . FB4 also enrich for FN1, COL4A1 and TNC, facilitating cellular proliferation in the fetal heart 50 and predicted to interact with different integrins in atrial and ventricular cardiomyocytes. In skeletal muscle (SKM), predicted cell-cell interactions between PRG4 + FBs (analogous to FB4) and cardiomyocytes involve COL1A2, COL6A2 and α10β1 integrins, whereas SKM FBs and monocytes appear to interact via the ICAM1-AREG and CXCR4-CXCL12 chemokine pairs (Fig. 4b, Supplementary Table 18), indicating tissue-specific homeostatic transcriptional circuits.
Using a logistic regression model, we find that lymphoid cells are more similar across heart, SKM and kidney, whereas there is less concordance for myeloid cells (Extended Data Fig. 8f, Supplementary Tables 16,19), probably due to tissue-specific adaptability of myeloid cells 51 . Notably, populations corresponding to cardiac monocyte-derived macrophages, LYVE1 + MP1, DOCK4 + MP1-2 and antigen-presenting macrophages are absent from the SKM and kidney. Cardiac LYVE1 + MP2-3, pro-inflammatory monocytes, classical monocytes and mast cells are more similar to their SKM counterparts, indicating greater similarity of striated muscle and cardiac myeloid populations versus kidney. The transcriptional signature of cardiac LYVE1 + MP2-3 is specific without overlap in SKM and kidney (Fig. 4c).

Conduction system and neuronal cells
Among 3,961 cells expressing prototypic electrophysiologic transcripts (NRXN1, NRXN3 and KCNMB4), we identify six neuronal cell subclusters (Extended Data Fig. 9a-c). NC1 constitutes 75-80% of neuronal cells and exhibits a basal gene program including LGI4, required for glia development and axon myelination 52 . NC2 and NC4 show strong expression of the central nervous system marker, PRKG1 53 , and co-express typical fibroblast and cardiomyocyte genes, respectively. NC3 has overlapping gene expression signatures with ECs. NC5 expresses LGR5, a Wnt signalling, G-protein-coupled receptor and stem-cell marker that promotes cardiomyocyte differentiation in the outflow tract 54 , an arrhythmogenic area 55 . This cluster also expresses the neurodegenerative disease gene PPP2R2B 56 (verified by smFISH) (Extended Data Fig. 9d, Supplementary  Table 20); LSAMP, which guides the development of specific patterns of neuronal connections 57 ; and the lipoprotein transport enzyme LPL that remyelinates damaged neurons 58 . NC6 mimics Schwann cells, expressing MBP, PRX and MPZ, which encode myelin constituents 59 .

Adipocytes
Cardiac adipocytes uniformly express GPAM, FASN and ADIPOQ and at lower levels, LEP 60 (Extended Data Fig. 9e-h). ADIP1 expresses genes for PPAR pathways, metabolism of lipids and lipoproteins, and lipolysis. ADIP2 expresses ECM genes such as LAMA2, IGFBP7 and FBN1, which encodes both the glycoprotein fibrillin1 and asprosin, a white adipose tissue secreted hormone involved in glucose homeostasis (Supplementary Table 21). Given a stromal-related molecular signature,  Supplementary Table 17. c, Gene expression signature for cardiac-specific LYVE1 + macrophages compared against predicted matched populations in skeletal muscle and kidney.

COVID-19 and GWAS disease relevance
Transcripts encoding the severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) receptor ACE2 63 are highest in pericytes, followed by FBs and lowest in cardiomyocytes, where expression is twofold higher in ventricular than atrial cardiomyocytes. Among proteases priming viral entry 63 , TMPRSS2 transcripts are absent in pericytes, FBs and cardiomyocytes, whereas CTSB and CTSL are lowly expressed with higher levels in cardiomyocytes. ACE2 expression in pericytes and fibroblasts is depicted by smFISH (Extended Data Fig. 10a-e).
We define cells enriched for genes from 12 cardiovascular genome-wide association studies (GWAS) and involved in SARS-CoV-2 infection using MAGMA 64 (Extended Data Fig. 10f). Atrial fibrillation GWAS signals are associated with transcriptional profiles in vCM3, owing to higher mean expression of CAV1, CAV2 and PRRX1. PR interval GWAS signals are associated with vCM3 and aCM5, with high expression of SCN5A, CAV1, ARHGAP24, MEIS1, TBX5 and TTN. GWAS signals for QRS duration are associated with specific gene expression in NC2 (PRKCA, CEP85L, SLC35F1, SIPA1L1, KLF12 and FADS2). Coronary artery disease and hypertension GWAS signals are associated with transcripts from many cell lineages, particularly SMCs, FBs, and ECs, reflecting the relevance of vascular cells in both disorders.

Discussion
Our analyses of approximately half a million single cells and nuclei from six distinct cardiac regions from fourteen donors considerably expand an emerging reference adult heart cell atlas. By combining single-cell and single-nuclear RNA-seq data with machine learning and in situ imaging techniques, we provide detailed insights across the repertoire of cardiac cells, including cardiomyocytes (excluded by single-cell RNA-seq) and ECs (underrepresented in cardiac snRNA-seq). We quantify the cellular composition highlighting chamber-specific features and differences between male and female donors. Within each cell compartment, we identify and validate prototypic lineage-specific genes, and genes with previously unknown cardiac expression. Our results begin to unravel the molecular underpinnings of cardiac physiology and the cellular response to stress and disease.
Cardiomyocytes are the most prevalent cardiac cells and comprise higher percentages in ventricles than atria, and in female versus male ventricular tissues. Transcriptional differences between atrial and ventricular cardiomyocyte populations indicate different developmental origins, distinctive haemodynamic forces and specialized functions in cardiac chambers. Cellular diversity of FBs reveals ECM-producing and ECM-organizing activities that with other cells support cardiomyocytes across varying biophysical stimuli. The vascular compartment contains several ECs and pericyte populations and two SMC subtypes with distinct anatomical and arterio-vascular characteristics. Arterial and venous ECs are predicted to interact with mural cells via Notch signalling pathways involved in regulating vascular homeostasis and development. Immune cells interact with FBs and cardiomyocytes. In addition to confirming previous findings 65,66 , we show macrophage complexity and infer paracrine circuits for cardiac homeostasis. Cross-tissue analyses delineate cardiac populations distinct from skeletal muscle and kidney.
We illustrate the relevance of cardiac cell atlas by defining cell lineages enriched in cardiovascular GWAS and molecules involved in SARS-CoV-2 infection. High expression of the viral receptor ACE2 in pericytes and its correlation with AGTR1 is consistent with the role of renin-angiotensin-aldosterone system signalling in cardiac haemodynamics 67 .
We recognize limitations associated with cell capture by different data sources and unintended bias from surgical sampling. However, we expect our results will inform studies of other cardiac regions (valves, papillary muscle and conduction system), propel studies with large cohorts to determine the roles of age, gender and ancestry on normal cardiac physiology and provide crucial insights to enable mechanistic understanding of heart disease.

Online content
Any methods, additional references, Nature Research reporting summaries, source data, extended data, supplementary information, acknowledgements, peer review information; details of author contributions and competing interests; and statements of data and code availability are available at https://doi.org/10.1038/s41586-020-2797-4. Publisher's note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/.

Data reporting
No statistical methods were used to predetermine sample size. The experiments were not randomized, and investigators were not blinded to allocation during experiments and outcome assessment.

Tissue acquisition and processing
Tissues were acquired from UK and North American donors (D1-7 and 11, H2-7) after circulatory death (DCD) (D2, D4-D7 and D11) and after brain death (DBD) (D1, D3, H2-H7). For UK DCD donors, after a five-minute stand-off and for DBD, the chest is opened, the aorta is cross-clamped and cardiac samples are acquired. For North American DBD donors, the aorta is cross-clamped, cold cardioplegia (Celsior) is administered under pressure via the aorta to arrest beating, the heart is excised, rinsed in cold saline and samples acquired. All donor samples were full-thickness myocardial biopsies from the left and right atrium, left and right ventricles, interventricular septum and apex, with intentional exclusion of large epicardial fat deposits. Samples used for single nuclei isolation were flash-frozen and stored at −80 °C. Single-cell isolation and CD45 + enrichment was carried out on freshly collected samples. Residual tissue after nuclei and cell isolation procedures was formalin-fixed or frozen in OCT for additional studies.
All tissues were stored and transported on ice at all times until freezing or tissue dissociation to minimise any transcriptional degradation. Previous studies on the post-mortem tissue stability of the GTEx consortium on bulk tissues 68 and in single cells 69 suggest only minor changes in tissues within the first 24 h post mortem when stored in cold conditions.

CD45 + cell enrichment
Cell suspension was prepared as described above and subsequently labelled using anti-human CD45 monoclonal antibody-conjugated microbeads according to the manufacturer's protocol (Miltenyi Biotec). In brief, up to 10 7 cells were incubated for 15 min at 4 °C in 80 μl of PBS, BSA, EDTA buffer (1× PBS pH 7.2, 0.5% BSA, 2 mM EDTA) containing 20 μl CD45 microbeads. Cell suspension was washed in PBS, BSA, EDTA buffer once and collected by centrifugation (330g, 10 min, 4 °C). Resuspended cells were applied to MACS LS columns (Miltenyi Biotec). CD45-depleted cell fraction was discarded after three washes with PBS, BSA and EDTA buffer and the CD45 + cell fraction was collected in PBS, BSA and EDTA buffer by removal of the columns from the magnetic field. CD45 + cells were counted and resuspended in PBS, BSA and EDTA buffer to a concentration of at least 2 × 10 6 per ml before further processing using a Chromium Controller (10X Genomics) according to the manufacturer's protocol.

Chromium 10X library preparation
Single cells and nuclei were manually counted by Trypan blue exclusion or automatically using a Countess II (Life Technologies) using at least two separate counts. Cell or nuclei suspension was adjusted to 400-1,000 cells per microlitre and loaded on the Chromium Controller (10X Genomics) with a targeted cell or nuclei recovery of 4,000-10,000 per reaction. 3′ gene expression libraries were prepared according to the manufacturer's instructions of the v2 or v3 Chromium Single Cell Reagent Kits (10X Genomics). Quality control of cDNA and final libraries was done using Bioanalyzer High Sensitivity DNA Analysis (Agilent) or 4200 TapeStation System (Agilent). Libraries were sequenced using HiSeq 4000 (Illumina) at Wellcome Sanger Institute, and NextSeq 500 (Illumina) at Harvard Medical School with a minimum depth of 20,000-30,000 read pairs per cell or nucleus (Supplementary Table 22).

Spatial validation using smFISH with RNAscope probes
During preparation of formalin-fixed paraffin-embedded (FFPE) samples fresh tissue was fixed in neutral-buffered 10% formalin for 18-36 h and subsequently embedded in paraffin blocks. Fixed-frozen tissue samples were fixed in 4% paraformaldehyde (ThermoFisher). Sections were cut at 5-μm thickness using a microtome and placed onto Super-Frost Plus slides (VWR). FFPE tissue slides were automatically stained using BOND RX (Leica) and the RNAscope Multiplex Fluorescent Reagent Kit v2 Assay (ACDBio) according to the manufacturer's protocol. Fixed-frozen tissue slides were processed according to the protocol of RNAscope Multiplex Fluorescent Assay v1 (ACDBio). RNAscope ready-or custom-made target probes were run in parallel to multiplex positive and negative controls (Extended Data Fig. 12b, Supplementary Table 23). All nuclei were DAPI-stained. All FFPE tissue slides were imaged using an Opera Phenix High-Content confocal Screening System (Perkin Elmer) with a 1-μm z-step size and 20× water-immersion objective (NA 0. 16

Haematoxylin and eosin staining
Tissue samples were fresh-frozen in isopentane (ThermoFisher) at −80 °C and embedded in OCT (VWR). Sections were cut at a thickness of 10 μm using a microtome, placed onto SuperFrostPlus slides (VWR) and further processed according to a standard haematoxylin and eosin staining protocol (Extended Data Fig. 12a).

Acquisition of skeletal muscle tissue
Intercostal muscle samples were obtained from between the second and third rib on the left side. This is typically from the deepest layer of muscle (furthest away from the skin). Samples were collected directly into the cold preservation solution.

Single-cell isolation for skeletal muscle
Muscle tissue was washed in 1× PBS, cleaned of any visible fat depositions and finely minced. Then, 2 g of the minced tissue was transferred to digestion buffer 1 (750 U ml −1 collagenase type 2 in 1× PBS) and incubated at 37 °C in a water bath for 90 min. The partially digested tissue was collected by centrifugation (650g, 5 min, 4 °C) and the pellet was resuspended in digestion buffer 2 (100 U ml −1 collagenase type 2, 2 U ml −1 dispase in PBS). After 30 min incubation at 37 °C in a water bath, the digestion was stopped by the addition of 2% FBS. Cells were filtered through a 100-μm and a 40-μm nylon strainer (BD Falcon), collected by centrifugation (650g, 4 °C, 3 min) and washed with 1× PBS, 2% FBS. Subsequently, a 20% Percoll gradient (15,000g, 4 °C, 20 min) was used for cell purification. The layer containing cells was collected, washed in PBS containing 2% FBS, and viable cells were counted by Trypan Blue exclusion using a haemocytometer. Nuclei were profiled using a Chromium Controller (10X Genomics) according to the manufacturer's protocol.
The methods key resources table is in Supplementary Table 24.

Transcriptome mapping
After sequencing, samples were demultiplexed and stored as CRAM files. Each sample was mapped to the human reference genome (GRCh38 v.3.0.0) provided by 10X Genomics, and using the CellRanger suite (v.3.0.1) with default parameters. Single-cell samples were mapped against the reference as it was provided. Single-nuclei samples, the reference for pre-mRNA, was created using the 10X Genomics instructions (https://support.10xgenomics.com/single-cell-gene-expression/ software/pipelines/latest/advanced/references).

Count data processing
After mapping, samples from each data source (single nuclei, single cell and CD45 + cell) were grouped into individual AnnData objects by concatenating the raw_feature_bc_matrix_h5.h5 and adding the appropriate metadata information. For each data source object, the mean of unique molecular identifiers (UMIs) (n_counts) was calculated and used as a threshold for empty droplets.

Doublet detection
After removal of empty droplets, we applied scrublet 73 to assign a doublet score (scrublet_score) to each cell. These cells were clustered and visualized using the UMAP method 74 . In addition, each cell was processed for doublet detection using a percolation method to allow for improved detection of doublets 75 .

Batch alignment using deep variational autoencoder
We built a global manifold by aligning all the data sources and donors in our data. This was done in a three-step procedure: (1) Each source was analysed and annotated separately, aligning only for donors using a pericyte-space linear regression step before batch alignment with bbknn 78 . Differentially expressed genes (DEGs) were calculated using a Wilcoxon rank sum test with Bonferroni-Hochberg adjustment as implemented in the Scanpy framework. (2) To annotate each cluster, we used an integrative approach by searching the top significant DEGs (P < 1 × 10 −5 ) with a logFC >1 against the ToppFun 79 and EnrichR 80 databases. Significant hits on pathways, transcriptional regulation and biological processes were prioritized to annotate a given cluster. Each cellular compartment was labelled under the adata.obs['cell_type'] slot after grouping source-specific cell states. (3) All sources were combined into a single AnnData object under the label adata.obs['cell_sources']. Batches were aligned using the batch_correction function from the scGen variational autoencoder 81 . First we align for adata.obs['cell_ sources'], using adata.obs['cell_type'] as an anchor. Next, we aligned for adata.obs['donor'], using adata.obs['cell_type'] as an anchor. Each batch alignment round was run for 50 epochs.
Manifolds for the adipocytes, vascular and immune cardiac populations, as well as the skeletal muscle analysis, were created using this method and the clustering accuracy was evaluated with SCCAF 82 (Extended Data Fig. 12d).

DEGs
To help with the annotation of the subpopulations of each cell compartment, we calculated the DEGs using the Wilcoxon rank sum test as implemented in the scanpy workflow and recommended by recent benchmarking studies 83 . A gene was considered to be differentially expressed if it has a log 2 -transformed fold change > 1 and a P < 1 × 10 −5 , unless stated otherwise in the analysis section.

Cell-cell interactions
Expression matrices of the populations under study were exported from the AnnData, together with a metadata table that contained the cell-barcodes as indices. We then ran CellPhoneDB as follows: cellphonedb method statistical_analysis meta.tsv counts.tsvcounts-data = gene_name-threads = 60. CellPhoneDB raw predictions were filtered by removing those interactions with a P > 1.0 × 10 −5 . Significant pairs were then submitted for gene set enrichment analysis into ReactomeDB, enrichR and ToppFun for functional classification. The vascular cells were randomly sub-sampled to 39,000 cells before the analysis, and the cardiac repair group (atrial and ventricular cardiomyocytes, FBs, and immune cells) was randomly sub-sampled to 69,295 cells before the analysis.

Estimation of RNA velocity
To calculate the RNA velocity of the single cells and CD45 + enriched single cells, we used the CellRanger output BAM file and the GENCODE v33 GTF (ftp://ftp.ebi.ac.uk/pub/databases/gencode/Gencode_human/ release_33/gencode.v33.chr_patch_hapl_scaff.annotation.gtf.gz) file together with the velocyto 84 CLI v.0.17.17 to generate a loom file containing the quantification of spliced and unspliced RNA. Next, we built a manifold, cluster the cells and visualize the RNA velocities using scVelo 85 .
Donor effects were aligned as described in step (1) above. For FB and neuronal cells, sources were aligned as described in step (3) above. Leiden clustering and UMAP visualization were performed for identifying subpopulations and visualization 86 . Differentially expressed genes were calculated using the Wilcoxon rank sum test. Genes were ranked by score.

Cross-tissue comparison of cardiac immune populations with skeletal muscle, kidney and blood immune populations
We collected single-cell transcriptome data for adult kidneys from ref. 99 (https://www.kidneycellatlas.org/), and subset all immune cells reported in their study. For the SKM we selected the annotated immune cells from the merged manifold. For the human blood, we used the publicly available 10,000 single PBMC cells dataset provided by 10X Genomics (https://support.10xgenomics.com/single-cell-gene-expression/ datasets/3.0.0/pbmc_10k_v3). As previously described 87 , we trained a logistic regression model on the cardiac immune cells using 80% of the expression data and tested its accuracy on the remaining 20% to produce a model with an accuracy of 0.6862 (Extended Data Fig. 8f, Supplementary Table 16). We then applied this model to predict analogue cardiac immune populations in the adult kidney, SKM and PBMCs. Predictions with a probability less than 0.8 were excluded from downstream comparative analyses.

Gene Ontology enrichment analysis
For the ventricular cardiomyocyte population, we used the R package gProfileR (https://cran.r-project.org/web/packages/gProfileR/index. html) with the score-ranked gene list of vCM4 as input and the set of genes expressed in ventricular cardiomyocytes as background (those genes having a UMI count >1). To perform the Gene Ontology analysis on the vascular cells, the top 500 significant DEGs (P < 1 × 10 −5 ) with a log-transformed fold change > 1 were searched against the Gene Ontology biological process database using ToppFun 79 (Supplementary  Table 25). The top five significantly enriched terms (q < 0.05) for each subpopulation were selected and plotted on a heat map. To perform the pathway analysis on the adipocytes, the top 500 significant DEGs (P < 1 × 10 −5 ) with a log-transformed fold change > 0.5 were searched against ToppFun 79 pathway databases (Supplementary Table 26). The top five significantly enriched pathways (q < 0.05) for each subpopulation were selected and plotted on a heat map.

Gene set score
We use the score_genes function as implemented in scanpy to calculate the enrichment of genes involved in the Oncostatin M pathway. A list of genes was collected upon literature research 88,89 . For gene set enrichment, only highly expressed genes were considered to reduce noise (more than 500 UMIs across all cells). The same analysis was performed for comparison of cardiac immune cells in our study with the observations of previous studies on cardiac-resident macrophages 51 , mouse tissue-remodelling macrophage 45 and yolk sac lineage origin 90 .

Statistics and reproducibility
All analyses were performed using R Software, v.3.6.1. Student's t-tests were used to compare cell type distributions at each site. P < 0.05 was considered statistically significant. Linear regression models (correlations) were obtained using the R linear model function (lm), which estimates statistical likelihood (P value) of a linear relationship. Bonferroni correction was applied for multiple testing.
The depicted RNAscope micrographs in the figures are representative. The micrographs in Figs

GWAS enrichment analysis
We downloaded GWAS summary statistics from broad cvdi, EBI GWAS catalogue and GWAS atlas. We selected traits with well-powered GWAS (n > 5,000 and number of significant loci >10). GWAS datasets are summarized in Supplementary Table 27. Gene expression data of protein-coding genes were mapped onto Entrez gene ids and these gene annotations were used on the human genome assembly hg19/37. We only used gene expression data from nuclei. We implemented the analysis previously described 64 in python and in R. The log-transformed counts (plus one pseudocount) were used to compute average cell type-specific expression profiles. We performed individual magma analyses for each cell type, always conditioning on default gene level covariates (for example, gene length) and average gene expression across all cells. Subsequently, we applied the Benjamini-Hochberg method and selected cell type trait associations with FDR < 10%. These pairs were then subjected to conditional analysis as previously described 64 to define 'independent', 'jointly explained' and 'partially jointly explained' pairs of associations (Supplementary Table 28).

Distributions of dispersed cells and isolated nuclei
The different procedures for obtaining isolated nuclei and dispersed cells resulted in significantly different distributions of cell types (Supplementary Table 29, Extended Data Fig. 2). Notably, 30.1% and 49.2% of isolated nuclei were derived from atrial and ventricular cardiomyocytes in the atrial and ventricular regions, whereas these cells were mostly excluded from preparations of isolated and CD45-selected cells (Supplementary Table 2).
Excluding cardiomyocytes, the distribution of cell types identified from isolated nuclei and dispersed cells remained distinct (Supplementary Table 30). Although 59.0% of dispersed cells were ECs, only 15.7% of nuclei were derived from ECs. By contrast, 64.2% of nuclei were from FBs (31.2%) and pericytes (33.0%), whereas only 17.1% of dispersed cells were FBs (2.3%) and pericytes (14.8%). These differences may reflect sensitivity of EC nuclei to isolation procedures or resistance of pericytes and FBs to cellular enzymatic digestion.
Despite differences in cell distributions between isolated nuclei and dispersed cells, the gene expression profiles of cell lineages were reasonably correlated (r > 0.4 for each cell type). To address the concordance of the genes captured by cells and nuclei, we compared the expression of the major cell type markers from Fig. 1c across the three sources (Extended Data Fig. 1c). As nuclei lack cytoplasmic RNA, the expression of certain genes, especially immune genes NKG7 and C1QA, was lower in nuclei than in cells. Nevertheless, the general trend with respect to marker genes was consistent across the three sources, and the same genes distinguished individual cell types independent of the source.

Further analysis of vascular cells
The PC3_str contained similar contribution of cells and nuclei, and had a scrublet score below the stringent threshold used; nevertheless, the average number of genes and counts in this cluster was higher than average. Thus, despite our stringent quality filtering, we cannot exclude the possibility that there might be doublets in this cluster. EC10_CMC-like and PC4_CMC-like co-express EC or pericyte genes with cardiomyocyte markers and further studies are required to understand whether they represent previously unknown cell states or doublets.
The observations of the arterial and venous SMC are supported by previous studies, which predict that arterial SMCs are more contractile, and venous SMCs are less differentiated 91 .
EC3_cap enrich for transcripts encoding components of AP1 (JUN and FOS), which mediates multiple EC fate decisions including response to VEGF, inflammatory and stress signals, and ATF3, an adaptive-response gene induced by diverse signals [92][93][94] .

Skeletal muscle characterization
We collected intercostal skeletal muscle samples from five healthy individuals, including one donor with matched cardiac tissue, and profiled the transcriptome of 35,665 single cells and 39,597 single nuclei. Analogous to the heart, the combination of cells and nuclei allowed us to capture and resolve major cell lineages, including cardiomyocyte, fibroblasts, endothelial cells, smooth muscle cells, pericytes, myeloid and lymphoid immune cells and satellite cells (Extended Data Fig. 11a,  b, Supplementary Table 31).
Further analysis of the vascular cells of the skeletal muscle identified ten distinct populations. The endothelial cells showed five clusters separated based on their respective vascular beds with signatures similar to the ones we observe in the heart. The EC_cap expresses VWF and RGCC. Venous EC_ven express ACKR1 and PLVAP, whereas arterial EC_art show SEMA3G and HEY1, in line with our heart data (Extended Data Fig. 11c, d, Supplementary Table 32).
The overall distributions of vascular and stromal cell populations in skeletal and cardiac muscle were similar, including the arterial and venous features of ECs; however; skeletal muscle contained a single SMC cluster, potentially related to the smaller size of the dataset. In skeletal muscle, the predicted cell-cell interactions of the EC_art and SMCs included NOTCH1/4-JAG1 as well as JAG1/JAG2/DLL4-NOTCH3, but not JAG1/JAG2/DLL4-NOTCH2, inferred in the heart (Extended Data Fig. 11e, f, Supplementary Table 33).

Cardiac immune cells
Using the logistic regression model, we did not identify any counterpart of the cardiac IL17RA + monocytes in SKM or kidney, possibly owing to the small size of this population.
Naive T cells (CD4+T_naive) identified expressed CCR7 and SELL, indicative of their naive and tissue-resident nature 95 . Memory T cells (CD8+T_tem) expressed BACH2, STAT4 and IL7R, associated with long-term immune memory 96,97 . We further characterized the lymphoid cells using scNym 98 , and trained it using published data 99,100 . The resulting model was applied to our cardiac immune cells and those cells, with a predicted score higher than 0.8 were presumed to be likely candidates for re-annotation. Using this approach, we identified candidates for plasma B cells (109), dendritic cells (645), innate lymphoid cells (89), MAIT T cells (219), T helper cells (80) T regulatory cells (11), T central memory cells (103), γδ T cells (30) and plasmocytoid dendritic cells (27). These annotations can be found in the cardiac immune object annotations under the label 'scNym' at www.heartcellatlas.org.

Reporting summary
Further information on research design is available in the Nature Research Reporting Summary linked to this paper.

Data availability
Data objects with the raw counts matrices and annotation are available via the www.heartcellatlas.org webportal. Raw data are available through the Human Cell Atlas (HCA) Data Coordination Platform (DCP) with accession number: ERP123138 (https://www.ebi.ac.uk/ena/ browser/view/ERP123138). The 10X Genomics Visium data for the heart left ventricle tissue can be accessed at: https://support.10xgenomics. com/spatial-gene-expression/datasets/1.1.0/V1_Human_Heart. GWAS data used in this study can be found in Supplementary Table 27. All of our data can be explored at www.heartcellatlas.org.

Code availability
All code used for this study can be accessed as Jupyter notebooks in the project GitHub repository: https://github.com/cartal/HCA_Heart.

Article
Extended Data Fig. 1 | Expression of the canonical markers. a, UMAP embedding of selected canonical markers shown in Fig. 1c. b, Scaled expression (log 2 -transformed fold change, log 2 FC) of selected canonical markers shown in Fig. 1c. c, Expression (log 2 FC) of marker genes from Fig. 1c in each source highlighting that the same marker genes are used for identification of the same cell types in both cells and nuclei.    Supplementary Table 25. d, SCCAF scores for each batch aligned manifold. For each population, we plotted the true positive (TPR) versus false positive (FPR) learning ratios from the subpopulation in each manifold. Next, we plotted how accurately the manifold represents each learned subpopulation based on the test training set and the CV cross-validation set. The closer the CV value to the test value, the better the manifold is at representing the subpopulations.

Reporting Summary
Nature Research wishes to improve the reproducibility of the work that we publish. This form provides structure for consistency and transparency in reporting. For further information on Nature Research policies, see our Editorial Policies and the Editorial Policy Checklist.

Statistics
For all statistical analyses, confirm that the following items are present in the figure legend, table legend, main text, or Methods section.

n/a Confirmed
The exact sample size (n) for each experimental group/condition, given as a discrete number and unit of measurement A statement on whether measurements were taken from distinct samples or whether the same sample was measured repeatedly The statistical test(s) used AND whether they are one-or two-sided Only common tests should be described solely by name; describe more complex techniques in the Methods section.
A description of all covariates tested A description of any assumptions or corrections, such as tests of normality and adjustment for multiple comparisons A full description of the statistical parameters including central tendency (e.g. means) or other basic estimates (e.g. regression coefficient) AND variation (e.g. standard deviation) or associated estimates of uncertainty (e.g. confidence intervals) For null hypothesis testing, the test statistic (e.g. F, t, r) with confidence intervals, effect sizes, degrees of freedom and P value noted For manuscripts utilizing custom algorithms or software that are central to the research but not yet described in published literature, software must be made available to editors and reviewers. We strongly encourage code deposition in a community repository (e.g. GitHub). See the Nature Research guidelines for submitting code & software for further information.

Data
Policy information about availability of data All manuscripts must include a data availability statement. This statement should provide the following information, where applicable: -Accession codes, unique identifiers, or web links for publicly available datasets -A list of figures that have associated raw data -A description of any restrictions on data availability The data is made available through the Human Cell Atlas (HCA) Data Coordination Platform (DCP) and can be accessed here: https://www.ebi.ac.uk/ena/browser/ view/ERP123138. The data can also be accessed and explored through the HCA Heart Project website at www.heartcellatlas.org.