## Main

Single-cell transcriptomic and epigenomic methods have been effective in elucidating the cellular makeup of complex brain tissues from patterns of gene expression and underlying regulatory mechanisms2,3,4,5,6. In the mouse and human neocortex, diverse neuronal and non-neuronal cell types can be defined2,3,5,7 by their distinct transcriptional profiles and regions of accessible chromatin or of DNA methylation (DNAm)4,8, and can be aligned between species3,9,10,11 on the basis of these profiles. Studies such as these have shown the feasibility of quantitatively studying the evolution of cell types, but have limitations: different cortical regions have been profiled in humans and mice; different sets of transcripts have been captured with single-cell and single-nucleus assays; and transcriptomic and epigenomic studies have mostly been carried out independently.

The primary motor cortex (M1, also known as MOp in mice) is an ideal region with which to address questions about cellular evolution in rodents and primates. M1 is essential for fine-motor control and is functionally conserved across mammals1. The layer 5 (L5) region of carnivore and primate M1 contains specialized ‘giganto-cellular’ corticospinal neurons (Betz cells in primates12,13,14,15,16) with distinctive action-potential properties that support a high conduction velocity17,18,19. Some Betz cells synapse directly onto spinal motor neurons, unlike rodent corticospinal neurons, which synapse indirectly via spinal interneurons20. These observations suggest that Betz cells possess species-adapted intrinsic mechanisms to support rapid communication that should be reflected in their molecular signatures. To explore the evolutionary conservation and divergence of M1 cell types and their underlying molecular regulatory mechanisms, we analysed single-nucleus transcriptomic and epigenomic data from mouse, marmoset, macaque and human M1.

## Multi-omic taxonomies of cell types

To characterize the molecular diversity of M1 neurons and non-neuronal cells, we applied single-nucleus transcriptomic assays (plate-based SMART-seq v4 (SSv4) and droplet-based Chromium v3 (Cv3) RNA sequencing) and epigenomic assays (single-nucleus methylcytosine sequencing 2 (snmC-seq2) and single-nucleus chromatin accessibility and messenger RNA expression sequencing (SNARE–seq2)) to isolated M1 samples from human, marmoset and mouse brains (Extended Data Fig. 1a–d); we also applied Cv3 to M1 L5 from macaque brains. Single nuclei were dissociated from all layers combined or from individual layers (in the case of human SSv4 assays), and sorted using the neuronal marker NeuN to enrich cellular input to roughly 90% neurons and 10% non-neuronal cells (Extended Data Fig. 1e). Datasets from mice are reported in a companion paper5. The median detection of neuronal genes in humans was higher when we used SSv4 (7,296 genes) as compared with Cv3 (5,657 genes), partially because of the 20-fold greater read depth, and detection was lower in marmosets (4,211) and mice (5,046) when using Cv3 (Extended Data Fig. 1f–m).

For each species, we defined a diverse set of neuronal and non-neuronal clusters of cell types on the basis of unsupervised clustering of snRNA-seq datasets (Extended Data Fig. 1n–r and Supplementary Tables 12). We organized cell types into hierarchical taxonomies on the basis of transcriptomic similarities (Fig. 1a–c, Extended Data Fig. 2 and Supplementary Table 3). As previously described for temporal cortex (middle temporal gyrus, MTG)3, taxonomies were broadly conserved across species, and neuronal subclasses reflected developmental origins and targets of long-range neuronal projections. Cell-type labels include the dissected layer (if available), major class, subclass marker gene and most-specific marker gene (Supplementary Tables 46). GABAergic (γ-aminobutyric acid-producing) types were uniformly rare (fewer than 4.5% of neurons), whereas glutamatergic and non-neuronal types were more variable in number (0.01–18.4% of neurons and 0.15–56.2% of non-neuronal cells, respectively). Finally, independent clustering of epigenomic data resulted in diverse clusters that were associated one-to-one with RNA clusters or at a slightly higher level in the hierarchy on the basis of shared marker expression.

Single-nucleus sampling provides a relatively unbiased survey of cellular diversity3,21 and enables an estimation of cell-type frequencies. Consistent with histological measurements (reviewed in ref. 22), we identified twice as many GABAergic neurons in human M1 (33%) as in mouse M1 (16%), and an intermediate proportion (23%) in marmosets (Fig. 1d). L2 and L3 intratelencephalic neurons were significantly more common in humans than in marmosets and mice (Fig. 1d)23, while L6 corticothalamic and L5 extratelencephalic neurons, including corticospinal neurons and Betz cells in primate M1, were significantly rarer in primates than in mice.

## Consensus M1 taxonomy across species

We integrated Cv3 datasets across species on the basis of shared patterns of coexpression for GABAergic neurons (Fig. 2 and Extended Data Fig. 3), glutamatergic neurons (Extended Data Fig. 4) and non-neuronal cells (Extended Data Fig. 5). GABAergic nuclei were well mixed across species and segregated into six subclasses (Fig. 2a); 17 to 54 subclass markers were conserved across species (Fig. 2b, c, Extended Data Fig. 3a and Supplementary Tables 7, 8), while most markers had enriched expression in only one species. To establish a consensus taxonomy of cross-species clusters, we over-split the integrated space (Extended Data Fig. 3b) and merged clusters until they included nuclei from all species. We defined 24 GABAergic cell types on the basis of consistent overlap of clusters across species (Fig. 2d–f); these cell types had conserved marker genes (Extended Data Fig. 3c) and high classification accuracy (Extended Data Fig. 3d, e and Supplementary Table 9). Distinct consensus types such as ChC and Sst-Chodl were more robust (mean area under the receiving operating characteristic (AUROC) curve = 0.99 within species, 0.88 across species) than were closely related types such as Sncg and Sst subtypes (mean AUROC = 0.84 within species, 0.50 across species). Most types were enriched in the same layers in humans and mice (Fig. 2g), with notable differences. ChCs were enriched in L2/3 in mice and in all layers in humans, as was seen in MTG3. Sst-Chodl was restricted to L6 in mice and was also found in L1 and L2 in humans, consistent with the reported sparse expression of SST in L1 in human but not mouse cortex24.

More consensus clusters could be resolved by pairwise alignment between humans and marmosets than between either of these primates and mice, particularly for Vip subtypes (Fig. 2h and Extended Data Fig. 3f, g). Genes related to neuronal connectivity and signalling were most informative of cell-type identity (Fig. 2i), and showed similar classification performance when trained and tested in the same species (r values of greater than 0.95) but reduced performance when trained and tested in different species (62% as high in humans and marmosets, and 40% in primates and mice). Therefore, similar genes show selectivity for subsets of cell types across species, yet individual genes often change the specific cell types in which they are expressed.

Glutamatergic neuron subclasses also aligned well across species, with 6–66 conserved markers and many more species-enriched markers (Extended Data Fig. 4a–c and Supplementary Tables 10, 11). We defined a consensus taxonomy of 13 types as above, which was similarly robust to the GABAergic taxonomy (GABAergic AUROC = 0.86; glutamatergic, 0.85; Extended Data Fig. 4i, j and Supplementary Table 9) but had fewer conserved markers (Extended Data Fig. 4h). Human and marmoset consensus types shared more markers (25%) with each other than with mice (16%) for 13 of 14 neuronal subclasses (Fig. 2b and Extended Data Fig. 4b). Moreover, humans and marmosets could be aligned at somewhat higher resolution (Extended Data Fig. 4k), particularly for L5/6 near-projecting and L5 intratelencephalic subclasses.

Non-neuronal consensus types were clearly defined by conserved marker genes, except for rare or immature types that were undersampled in humans and marmosets (Extended Data Fig. 5a–d). The human cortex contains several morphologically distinct astrocyte types25. We reported two transcriptomic clusters in human MTG that corresponded to protoplasmic and interlaminar (ILA) astrocytes3, and we validated these types in M1 by in situ hybridization (ISH; Extended Data Fig. 5f, g). We identified a third type, Astro L1-6 FGFR3 AQP1, that expresses APQ4 and TNC and corresponds to fibrous astrocytes in white matter. Non-neuronal gene expression diverged with evolutionary distance: ILAs (Astro_1) had 560 differentially expressed genes (DEGs) (Wilcox test; false discovery rate (FDR) less than 0.01; log-transformed fold change greater than 2) between humans and mice, and only 221 DEGs between humans and marmosets (Extended Data Fig. 5e).

Primates had a unique oligodendrocyte population (Oligo SLC1A3 LOC103793418 in marmosets and Oligo L2-6 OPALIN MAP6D1 in humans) that was not a distinct cluster in mice (Extended Data Fig. 5c). Surprisingly, this oligodendrocyte population clustered with glutamatergic neurons (Extended Data Fig. 1a, b) and was associated with neuronal transcripts such as NPTX1, OLFM3 and GRIA1 (Extended Data Fig. 5h). This was not an artefact, as fluorescent in situ hybridization (FISH) for markers of this type (SOX10 and ST18) co-localized with neuronal markers in the nuclei of cells that were sparsely distributed across many layers of human and marmoset M1 (Extended Data Fig. 5i). This type may represent an oligodendrocyte population that has phagocytosed parts of neurons and accompanying transcripts, similar to the reported phagocytic function of some oligodendrocyte precursor cells26.

To assess the usage of differential isoforms between humans and mice, we used SSv4 data with full transcript coverage and estimated isoform abundance in cell subclasses. Remarkably, 25% of moderately expressed isoforms showed a more than ninefold change in usage between species, and isoform switching was more common in non-neuronal than in neuronal subclasses (Fig. 2j, Extended Data Fig. 3h and Supplementary Table 12). For example, β2-chimaerin (CHN2) was highly expressed in L5/6 near-projecting cells, and the short isoform was dominant in mice, while longer isoforms were also expressed in humans (Extended Data Fig. 3i).

## Cell-type-specific epigenetic regulation

Epigenomic profiling of M1 cell types can reveal regulatory mechanisms of transcriptomic identity. To profile the accessible chromatin of RNA-defined cell populations from humans and marmosets, we used SNARE–seq2 (refs. 6,27,28; Extended Data Fig. 6a, b and Supplementary Table 13). We defined ‘RNA-level’ clusters by mapping single nuclei to human and marmoset taxonomies (Fig. 1a, b) on the basis of expression similarity; predicted cell-type identities were consistent with independent clustering (Extended Data Fig. 6c–f). Some RNA-level clusters could not be predicted robustly from profiles of accessible chromatin and were iteratively merged (Fig. 3a and Extended Data Fig. 6g–k). Clusters at the level of accessible chromatin had similar coverage across donors, and inferred gene activity was highly correlated with RNA expression (Extended Data Fig. 7a–f). To identify cell-type-specific candidate cis-regulatory elements, we determined differentially accessible regions (DARs) in clusters identified from accessible chromatin (Fig. 3b) and RNA information (Extended Data Fig. 7g, h and Supplementary Table 14). These results highlight the ability of SNARE–seq2 to characterize accessible chromatin at higher cell-type resolution than available from accessible chromatin alone. Distal regulatory elements were linked to marker genes by predicting marker expression on the basis of features of DARs located within 500 kilobases of transcriptional start sites (Fig. 3b, Extended Data Fig. 7i and Supplementary Table 14).

To further characterize the epigenomic landscape of M1 cell types, we profiled DNAm from humans, marmosets and mice29 using snmC-seq2 (ref. 30) (Extended Data Fig. 8, Supplementary Table 15). On the basis of DNAm profiles in CpG (CG methylation, or mCG) and non-CpG (CH methylation, or mCH) sites, we grouped single nuclei into 31 DNAm clusters in humans, 36 in marmosets and 42 in mice (Fig. 3c and Extended Data Fig. 8a, b) that correspond to transcriptomic cell types (Extended Data Fig. 8e–g). Notably, we identified more Vip neuron types in human M1 by using DNAm rather than accessible chromatin, despite profiling only 5% as many nuclei with snmC-seq2. DNAm clusters could be robustly discriminated and had distinct marker genes based on DNAm signatures for neurons (mCH) or non-neuronal cells (mCG) (Extended Data Fig. 8d and Supplementary Table 16). Differentially methylated regions (DMRs) were determined for each cell type versus all other types, and overlapped only partially with DARs (Extended Data Fig. 8c, i, j)5. The intersection of these genomic regions may guide the identification of regulatory elements of marker genes such as KIT, which is expressed in the consensus type Lamp5_2 (Fig. 3d) and corresponds to ‘rosehip’ GABAergic neurons in humans24.

To gain insight into the evolutionary conservation of regulatory processes that define M1 cell types, we focused on neuronal subclasses (Fig. 3e). Subclass DARs (Fig. 3f) and DMRs (Fig. 3g and Extended Data Fig. 8h) had conserved proportions, although fewer DARs and DMRs were detected for rare subclasses owing to reduced statistical power5. DMRs and DARs showed low and variable overlap (median 11%; range 0–32%) across subclasses (Extended Data Fig. 8i, j). Only 5% of human and marmoset subclass DARs were shared between species, compared with 25% of RNA marker genes. To identify transcription factors that may mediate cell subclass identity, we tested for differential activities of transcription-factor-binding sites (TFBSs) in accessible chromatin (Supplementary Table 17) and for significant TFBS enrichments in DMRs (Extended Data Fig. 9 and Supplementary Tables 18, 19). Although many DARs and DMRs were species specific, TFBS enrichments and transcription-factor marker expression were remarkably conserved and distinct between subclasses (Fig. 3h–j and Extended Data Fig. 9). Therefore, evolutionary divergence of expression may be driven partly by genomic relocation of TFBS motifs that are bound by a conserved transcription-factor regulatory network31.

## L4-like neurons in human M1

M1 lacks L4 as defined by a thin band of densely packed ‘granular’ neurons that is present in other cortical areas, such as MTG (Fig. 4a). However, prior studies have identified L4-like neurons in M1 on the basis of synaptic properties in mice32 and cell morphology and lack of SMI-32 labelling33 and expression of RORB34 (an L4 marker) in primates. To address the potential existence of L4-like neurons in human M1 from a transcriptomic perspective, we integrated snRNA-seq data from agranular M1 and granular MTG, where we previously described multiple L4 glutamatergic neuron types3. This alignment revealed a broadly conserved cellular architecture between M1 and MTG (Fig. 4b, c and Extended Data Fig. 10), including M1 neuron types Exc L3 RORB OTOGL and Exc L3-5 RORB LINC01202 that map closely to MTG neurons in deep L3 and L4 (Fig. 4c).

We found transcriptomically similar cell types in similar layers in M1 and MTG across the full cortical depth (Fig. 4d). OTOGL and LINC01202 matched MTG types COL22A1 and ESR1, respectively, whereas there were no matches for MTG L4 types FILIP1L and TWIST2 (Fig. 4e, f). FISH analysis validated that the M1 LINC01202 type was sparser and more widely distributed across L3 and L5 than the MTG ESR1 type, which was restricted to L4 (Fig. 4g, h). By contrast, the M1 OTOGL and MTG COL22A1 types were located in deep L3 and superficial L5 or L4, respectively. Thus, M1 contains cells with L4-like properties, but with less diversity and much sparser representation.

## Core molecular identity of chandelier cells

Canonical features of cell types are likely to be the consequence of conserved transcriptomic and epigenomic features. Focused analysis of Pvalb-expressing GABAergic neurons illustrates the power of these data to predict such gene–function relationships. Cortical Pvalb-expressing neurons—comprising basket cells and ChCs—share fast-spiking electrical properties but have distinctive morphologies (Fig. 5a), including ChCs that target axon initial segments (AISs). To reveal conserved transcriptomic hallmarks of ChCs, we identified 357 DEGs in ChCs versus basket cells in at least one species. Humans and marmosets shared a significantly (P = 0.009; chi-squared test) higher percentage of DEGs (23%) than either species did with mice (average 15%) (Fig. 5b and Supplementary Table 20). Remarkably, only 25 DEGs were conserved across all three species, including UNC5B (which encodes a netrin receptor that may contribute to AIS targeting) and three transcription-factor genes (RORA, TRPS1 and NFIB) (which were among the top 1% of the most highly expressed transcription-factor genes in ChCs) (Fig. 5c).

To determine whether ChCs had enriched epigenomic signatures for RORA and NFIB (TRPS1 lacked motif data), we compared DMRs between ChCs and basket cells. In all species, RORA and NFIB showed gene-body hypomethylation (mCH) in ChCs but not in basket cells (Fig. 5d), consistent with differential expression. To discern whether these transcription factors may preferentially bind to DNA in ChCs, we tested for the enrichment of transcription-factor motifs in hypomethylated (mCG) DMRs and for transcription-factor activity in sites of accessible chromatin genome-wide. We found that the RORA motif was significantly enriched in DMRs in primates (Fig. 5d) and showed high activity in accessible-chromatin sites of ChCs in all species (Fig. 5e and Supplementary Table 14). Moreover, 60 of 357 DEGs contained an ROR-binding motif in DMRs and in regions of accessible chromatin in at least one species, further implicating RORA in contributing to gene regulatory networks that determine the unique attributes of ChCs.

## Specialization of L5 extratelencephalic neurons

Using snRNA–seq, we found that L5 extratelencephalic and intratelencephalic subclasses of neurons could be aligned across humans, macaques, marmosets and mice in M1 (Extended Data Fig. 11a–d), as previously reported for humans and mice in temporal3 and fronto-insular cortex10. L5 extratelencephalic neurons had more than 250 DEGs distinguishing them from L5 intratelencephalic neurons in each species, and fewer DEGs were shared with greater evolutionary distance (Fig. 6a, b and Supplementary Table 21). Interestingly, many primate-specific extratelencephalic-enriched genes (Fig. 6c) showed gradually increasing extratelencephalic specificity in species that are more closely related to humans. To explore this idea of gradual evolutionary change further, we identified 131 genes with increasing L5 extratelencephalic versus intratelencephalic specificity as a function of evolutionary distance from humans (Fig. 6d, Supplementary Table 22). These genes include canonical axon-guidance genes, which may contribute to maintaining connections between spinal motor neurons that are associated with high dexterity in primates20. To investigate whether transcriptomically defined L5 extratelencephalic types include anatomically defined Betz cells, we combined FISH for markers of L5 extratelencephalic subtypes with immunolabelling against SMI-32, a protein enriched in Betz cells and other long-range-projecting neurons in macaques35 (Fig. 6e and Extended Data Fig. 11f, g). Cells consistent with the size and shape of Betz cells were identified in two L5 extratelencephalic clusters (Exc L3-5 FEZF2 ASGR2 and Exc L5 FEZF2 CSN1S1), but they also included neurons with pyramidal morphologies.

Conserved and primate-enriched DEGs included ion-channel subunits (Fig. 6b and Extended Data Fig. 11e). Prior studies have established that membrane properties that depend on HCN channels (low input resistance, RN, and a peak resonance, fR, of around 3–9 Hz) distinguish extratelencephalic from intratelencephalic neurons in mice36. We found that extratelencephalic neurons expressed high levels of genes encoding proteins related to the HCN channel in all species (HCN1 and PEX5L; Fig. 6b), suggesting conserved HCN-related physiological properties. To facilitate cross-species comparisons of primate extratelencephalic/Betz and mouse extratelencephalic neurons, we made patch-clamp recordings from L5 neurons in acute and cultured slice preparations of mouse (using extratelencephalic-specific Thy1–YFP and intratelencephalic-specific Etv1–EGFP lines) and macaque M1 and an area of human premotor cortex containing Betz cells (Fig. 6f, g and Extended Data Fig. 12a). For a subset of recordings, we applied patch–seq analysis to identify transcriptomic cell types (Extended Data Fig. 12b). For mouse M1, 91.4% of neurons in the Thy1–YFP line had extratelencephalic-like physiology, and 99.2% of neurons in the Etv1–EGFP line had non-extratelencephalic-like physiology (Fig. 6h,i). For primate M1, all transcriptomically defined Betz cells (humans, n = 4; macaques, n = 3) had extratelencephalic-like physiology, whereas all transcriptomically defined non-extratelencephalic neurons (humans, n = 2; macaques, n = 3) had non-extratelencephalic-like physiology (Fig. 6h, j). The presence of neurons in human premotor cortex with Betz-like morphology and gene expression is consistent with observations that Betz cells may be distributed across motor-related areas that contribute to the corticospinal tract14.

There were substantial physiological differences between mouse and primate extratelencephalic neurons (Extended Data Fig. 12c–l). The firing rate of primate and mouse non-extratelencephalic neurons decreased to a steady state within the first second of a ten-second depolarizing current injection, whereas the firing rate of mouse extratelencephalic neurons increased moderately over the same time period (Fig. 6k, l and Extended Data Fig. 12d). In primate extratelencephalic/Betz neurons, a distinctive biphasic pattern was characterized by an early cessation of firing followed by a sustained and dramatic increase in firing later in the current injection. Thus, although the acceleration in spike frequency of extratelencephalic neurons was conserved across species, the temporal dynamics and magnitude of the acceleration were distinct in primate extratelencephalic/Betz neurons. Ion-channel-related genes that are differentially expressed between primates and mice are candidates to drive these physiological specializations.

## Discussion

Comparative analysis is a powerful strategy with which to understand brain structure and function. Conservation across species is strong evidence for functional relevance under evolutionary constraints that can help to identify essential molecular and regulatory mechanisms37,38. Conversely, divergence indicates adaption or drift, and may be essential to understand the mechanistic underpinnings of human brain function and susceptibility to human-specific diseases. Our integrated transcriptomic and epigenomic analysis of more than 450,000 nuclei in humans, non-human primates and mice has yielded a multimodal, hierarchical classification of approximately 100 cell types in each species, with distinct expression of marker genes and sites of accessible chromatin. This hierarchical organization is highly conserved, although species variation has limited the resolution of alignment to 45 consensus cell types. These types share a core set of molecular features, including expression of transcription factors and enrichment of TFBSs at epigenomic sites. For example, ChCs express a conserved transcription-factor marker, RORA, which has binding sites that are enriched in regions of accessible chromatin and in hypomethylated regions around other ChC markers.

Some characteristics of consensus types also diverge with evolutionary distance between species. On average, 39% of neuronal subclass markers are shared between humans and marmosets, and 27% of markers between humans or marmosets and mice. The composition of M1 circuits shifts dramatically across species. For example, the ratio of glutamatergic to GABAergic neurons varies from 2:1 in humans to 3:1 in marmosets and 5:1 in mice. The relative proportions of GABAergic subclasses and types are similar across species, suggesting a global increase in GABAergic types. As described previously39, we observed proportionally more L2 and L3 intratelencephalic neurons in humans, representing a selective increase in the number of neurons projecting to other parts of the cortex, presumably to facilitate greater corticocortical communication. Humans and marmosets have proportionally fewer L6 corticothalamic and L5 extratelencephalic neurons (also observed in MTG3), which may reflect dilution of these cells owing to allometric scaling of the neocortex relative to the subcortical targets of these cells in primates. These results suggest evolutionary changes in local and long-range cortical circuit function, and are consistent with developmental shifts in neuronal progenitor pools and changes in the timing of neurogenesis and migration.

We can leverage similarities between cell types across brain regions or species to make inferences about other cellular properties. We identified sparse L4-like cells in M1 that are not aggregated into a distinct layer and are predicted to receive input from thalamic axons. We identified two L5 extratelencephalic clusters that include neurons with Betz morphologies in humans and macaques. Similarly, in a recent study of fronto-insular cortex10, we identified an extratelencephalic type of neuron that included cells with spindle shapes (von Economo neurons). Surprisingly, these two extratelencephalic types include neurons with non-Betz and non-spindle morphologies, suggesting that there may be graded expression differences associated with these divergent morphologies. Alternatively, distinct markers of Betz neurons may be transiently expressed during the development of long-range connectivity and not maintained in adulthood, as observed for some neurons in flies40.

A comparative approach can help to elucidate what is different in humans or can be well modelled in closer, non-human primate relatives. In mice and primates, extratelencephalic neurons have a low input resistance and a characteristic peak resonance that reflect their large size and high expression of genes related to the HCN channel, respectively. However, primate Betz/extratelencephalic neurons have distinctive gene-expression and electrophysiological features—including pauses, bursting and spike-frequency acceleration, which have been seen in cats but not in rodents17,18,41. The selection of an appropriate model organism is particularly relevant when studying Betz cells and other extratelencephalic neuronal types that are selectively vulnerable in amyotrophic lateral sclerosis, some forms of frontotemporal dementia and other neurodegenerative conditions42,43.

## Methods

### Statistics and reproducibility

For multiplex fluorescent in situ hybridization (FISH) and immunofluorescence staining experiments, each ISH probe combination was repeated with similar results on at least two separate individuals per species, and on at least two sections per individual. The experiments were not randomized and the investigators were not blinded to allocation during experiments and outcome assessment. No statistical methods were used to predetermine sample size.

### Ethical compliance

Postmortem adult human brain tissue was collected after obtaining permission from the decedent’s next-of-kin. Postmortem tissue collection was performed in accordance with the provisions of the United States Uniform Anatomical Gift Act of 2006 described in the California Health and Safety Code section 7150 (effective 1 January 2008) and other applicable state and federal laws and regulations. The Western Institutional Review Board reviewed tissue-collection processes and determined that they did not constitute research on human participants that requires assessment by an institutional review board (IRB).

Tissue procurement from a neurosurgical donor was performed outside of the supervision of the Allen Institute at a local hospital, and tissue was provided to the Allen Institute under the authority of the IRB of the participating hospital. A hospital-appointed case coordinator obtained informed consent from the donor before surgery. Tissue specimens were de-identified before receipt by Allen Institute personnel. The specimens collected for this study were apparently non-pathological tissues removed during the normal course of surgery to access underlying pathological tissues. Tissue specimens collected were determined to be non-essential for diagnostic purposes by medical staff, and would have otherwise been discarded.

Mouse experiments were conducted in accordance with the US National Institutes of Health (NIH) Guide for the Care and Use of Laboratory Animals under protocol numbers 0120-09-16, 1115-111-18 or 18-00006, and were approved by the Institutional Animal Care and Use Committee (IACUC) at the University of Washington, the Allen Institute for Brain Science, the Salk Institute, or the Massachusetts Institute of Technology. Marmoset experiments were approved by and in accordance with the Massachusetts Institute of Technology IACUC, protocol number 051705020. Macaque tissue used in this research was obtained from the University of Washington National Primate Resource Center, under a protocol approved by the University of Washington IACUC.

### Postmortem human tissue specimens

Male and female donors 18–68 years of age with no known history of neuropsychiatric or neurological conditions (‘control’ cases) were considered for inclusion in this study (Extended Data Table 1). Routine serological screening for infectious disease (HIV, hepatitis B and hepatitis C) was conducted using donor blood samples, and only those donors who were negative for all three tests were considered for inclusion. Only those specimens with RNA integrity (RIN) values of 7.0 or more were considered for inclusion. Postmortem brain specimens were processed as described3. Briefly, coronal brain slabs were cut at 1 cm intervals and frozen for storage at −80 °C until further use. Putative hand and trunk-lower limb regions of the primary motor cortex were identified, removed from slabs of interest, and subdivided into smaller blocks. One block from each donor was processed for cryosectioning and fluorescent Nissl staining (Neurotrace 500/525, ThermoFisher Scientific). Stained sections were screened for histological hallmarks of primary motor cortex. After verifying that regions of interest contained M1, blocks were processed for nucleus isolation as described below.

### Human RNA-seq, quality control and clustering

#### SMART-seq v4

Nucleus isolation and sorting. Vibratome sections were stained with fluorescent Nissl, allowing microdissection of individual cortical layers (https://doi.org/10.17504/protocols.io.7aehibe). Nucleus isolation was performed as described (https://doi.org/10.17504/protocols.io.ztqf6mw). NeuN staining was carried out using mouse anti-NeuN antibody conjugated to phycoerythrin (PE; EMD Millipore, catalogue number FCMAB317PE) at a dilution of 1:500. Control samples were incubated with mouse IgG1k–PE isotype control (BD Biosciences, 555749; 1:250 dilution). DAPI (4′,6-diamidino-2-phenylindole dihydrochloride; ThermoFisher Scientific, D1306) was applied to nucleus samples at a concentration of 0.1 μg ml−1. Single-nucleus sorting was carried out on either a BD FACSAria II SORP or a BD FACSAria Fusion instrument (BD Biosciences) using a 130 μm nozzle and BD Diva software v8.0. A standard gating strategy based on DAPI and NeuN staining was applied to all samples as described3. Doublet discrimination gates were used to exclude nucleus aggregates.

RNA sequencing. The SMART-Seq v4 ultra low input RNA kit for sequencing (Takara, catalogue number 634894) was used as per the manufacturer’s instructions. Standard controls were processed with each batch of experimental samples as described (https://www.protocols.io/view/smarterv4-0-5x-amplification-for-single-cell-or-si-7d5hi86). After reverse transcription, complementary DNA was amplified with 21 polymerase chain reaction (PCR) cycles. The NexteraXT DNA library preparation kit (Illumina, FC-131-1096) with NexteraXT index kit V2 sets A–D (FC-131-2001, 2002, 2003 or 2004) was used for preparation of sequencing libraries. Libraries were sequenced on an Illumina HiSeq 2500 instrument (Illumina HiSeq 2500 System, Research Resource Identifier (RRID) SCR_016383) using Illumina high output V4 chemistry. The following instrumentation software was used during the data-generation workflow: SoftMax Pro v6.5, VWorks v11.3.0.1195 and v13.1.0.1366, Hamilton Run Time Control v4.4.0.7740, Fragment Analyzer v1.2.0.11, and Mantis Control Software v3.9.7.19.

Quantification of gene expression. Raw read (fastq) files were aligned to the GRCh38 human genome sequence (Genome Reference Consortium, 2011) with the RefSeq transcriptome version GRCh38.p2 (RefSeq, RRID SCR_003496, current as of 13 April 2015) and updated by removing duplicate Entrez gene entries from the gtf reference file for STAR processing. For alignment, Illumina sequencing adapters were clipped from the reads using the fastqMCF program (from ea-utils). After clipping, the paired-end reads were mapped using spliced transcripts alignment to a reference (STAR v2.7.3a, RRID SCR_015899) with default settings. Reads that did not map to the genome were then aligned to synthetic construct (that is, External RNA Controls Consortium, ERCC) sequences and the Escherichia coli genome (version ASM584v2). Quantification was performed using summerizeOverlaps from the R package GenomicAlignments v1.18.0. Expression levels were calculated as counts per million (CPM) of exonic plus intronic reads.

#### 10× Chromium RNA sequencing

Nucleus isolation and sorting. Nucleus isolation for 10× Chromium RNA sequencing was conducted as described (https://doi.org/10.17504/protocols.io.y6rfzd6). After sorting, single-nucleus suspensions were frozen in a solution of 1× phosphate-buffered saline (PBS), 1% bovine serum albumin (BSA), 10% dimethylsulfoxide (DMSO) and 0.5% RNAsin Plus RNase inhibitor (Promega, N2611), and stored at −80 °C. At the time of use, frozen nuclei were thawed at 37 °C and processed for loading on the 10× Chromium instrument as described (https://doi.org/10.17504/protocols.io.nx3dfqn). Samples were processed using the 10× Chromium single-cell 3′ reagent kit v3. 10× chip loading and sample processing were carried out according to the manufacturer’s protocol. Gene expression was quantified using the default 10× Cell Ranger v3 (Cell Ranger, RRID SCR_017344) pipeline, except for substituting of the curated genome annotation used for SMART-seq v4 quantification. Introns were annotated as ‘mRNA’, and intronic reads were included to quantify expression.

#### Quality control of RNA-seq data

Nuclei were included for analysis if they passed all quality-control criteria. For SMART-seq v4, criteria were: more than 30% of cDNA was longer than 400 base pairs; more than 500,000 reads were aligned to exonic or intronic sequences; more than 40% of total reads were aligned; more than 50% of reads were unique; the T/A nucleotide ratio was greater than 0.7. For Cv3, criteria were: more than 500 (non-neuronal nuclei) or more than 1,000 (neuronal nuclei) genes were detected; doublet score was less than 0.3.

#### Clustering of RNA-seq data

Nuclei passing quality-control criteria were grouped into transcriptomic cell types using a reported iterative clustering procedure2,3. Briefly, intronic and exonic read counts were summed, and log2-transformed expression was centred and scaled across nuclei. X and Y chromosomes and mitochondrial genes were excluded to avoid nucleus clustering on the basis of sex or nucleus quality. DEGs were selected; principal components analysis (PCA) reduced dimensionality; and a nearest neighbour graph was built using up to 20 principal components. Clusters were identified with Louvain community detection (or Ward’s hierarchical clustering if there were fewer than 3,000 nuclei), and pairs of clusters were merged if either cluster lacked marker genes. Clustering was applied iteratively to each subcluster until clusters could not be further split.

Cluster robustness was assessed by repeating iterative clustering 100 times for random subsets of 80% of nuclei. A co-clustering matrix was generated that represented the proportion of clustering iterations in which each pair of nuclei was assigned to the same cluster. We defined consensus clusters by iteratively splitting the co-clustering matrix as described2,3. The clustering pipeline is implemented in the R package scrattch.hicat v0.0.22 (RRID SCR_018099), with marker genes defined using the limma v3.38.3 package; the clustering method is provided by the ‘run_consensus_clust’ function (https://github.com/AllenInstitute/scrattch.hicat).

Clusters were curated on the basis of quality-control criteria or the expression of markers of cell classes (GAD1, SLC17A7, SNAP25). Clusters were identified as donor specific if they included fewer nuclei sampled from donors than expected by chance. To confirm exclusion, clusters automatically flagged as outliers or donor specific were manually inspected for expression of broad cell-class marker genes, mitochondrial genes related to quality, and known activity-dependent genes.

### Marmoset sample processing and nuclei isolation

Marmoset experiments were approved by, and in accordance with, the Massachusetts Institute of Technology IACUC, protocol number 051705020. Two adult marmosets (2.3 and 3.1 years old; one male, one female; Extended Data Table 2) were deeply sedated by intramuscular injection of ketamine (20–40 mg kg−1) or alfaxalone (5–10 mg kg−1), followed by intravenous injection of sodium pentobarbital (10–30 mg kg−1). When the pedal withdrawal reflex was eliminated and/or the respiratory rate was diminished, animals were transcardially perfused with ice-cold sucrose–HEPES buffer. Whole brains were rapidly extracted into fresh buffer on ice. Sixteen 2-mm coronal blocking cuts were rapidly made using a custom-designed marmoset brain matrix. Coronal slabs were snap-frozen in liquid nitrogen and stored at −80 °C until use.

As for human samples, marmoset M1 was isolated from thawed slabs using fluorescent Nissl staining (Neurotrace 500/525, ThermoFisher Scientific). Stained sections were screened for histological hallmarks of primary motor cortex. Nuclei were isolated from the dissected regions as described (https://www.protocols.io/view/extraction-of-nuclei-from-brain-tissue-2srged6), and were processed using the 10× Chromium single-cell 3′ reagent kit v3. 10× chip loading and sample processing was done according to the manufacturer’s protocol.

### Marmoset RNA-seq, quality control and clustering

#### RNA-sequencing

Libraries were sequenced on NovaSeq S2 instruments (Illumina). Raw sequencing reads were aligned to calJac3. Mitochondrial sequence was added into the published reference assembly. Human sequences of RNR1 and RNR2 (mitochondrial) and RNA5S (ribosomal) were aligned using gmap to the marmoset genome and added to the calJac3 annotation. Reads that mapped to exons or introns of each assembly were assigned to annotated genes. Libraries were sequenced to a median read depth of 5.95 reads per unique molecular index (UMI). The alignment pipeline can be found at https://github.com/broadinstitute/Drop-seq.

#### Cell filtering

Cell barcodes were filtered to distinguish true nuclei barcodes from empty beads and PCR artefacts by assessing proportions of ribosomal and mitochondrial reads, ratio of intronic/exonic reads (greater than 50% of intronic reads), library size (more than 1,000 UMIs) and sequencing efficiency (true cell barcodes have higher reads per UMI). The resulting digital gene-expression matrix (DGE) from each library was carried forward for clustering.

#### Clustering

Clustering analysis proceeded as in ref. 9. Briefly, independent component analysis (ICA, using the fastICA v1.2-1 package in R; RRID SCR_013110) was performed jointly on all marmoset DGEs after normalization and variable gene selection, as in ref. 44. The first-round clustering resulted in 15 clusters, corresponding to major cell classes (neurons, glia and endothelial cells). Each cluster was curated as in ref. 44 to remove doublets and outliers. Independent components were partitioned to remove those reflecting artefactual signals (for example, those for which cell loading indicated replicate or batch effects). The remaining independent components were used to determine clustering (Louvain community detection algorithm igraph v1.2.6 package in R); for each cluster, nearest neighbour and resolution parameters were set to optimize 1:1 mapping between each independent component and a cluster.

### Mouse snRNA-seq and snATAC-seq

Single nuclei were isolated from mouse primary motor cortex; gene expression and accessible chromatin were quantified using RNA-seq (Cv3 and SSv4) and snATAC-seq; and transcriptomic cell types, dendrograms and accessible-chromatin profiles were defined as described5.

### Integrating and clustering human Cv3 and SSv4 snRNA-seq datasets

To establish a set of human consensus cell types, we performed a separate integration of snRNA-seq technologies on the major cell classes (glutamatergic, GABAergic, and non-neuronal). Broadly, this integration is comprised of 6 steps: (1) subsetting the major cell class from each technology (for example, Cv3 GABAergic and SSv4 GABAergic); (2) finding marker genes for all clusters within each technology; (3) integrating both datasets with Seurat’s standard workflow using marker genes to guide integration (Seurat v3.1.1)45; (4) overclustering the data to a greater number of clusters than were originally identified within a given individual dataset; (5) finding marker genes for all integrated clusters; and (6) merging similar integrated clusters together based on marker genes until all merging criteria were sufficed, resulting in the final human consensus taxonomy.

More specifically, each expression matrix was log2(CPM + 1) transformed then placed into a Seurat object with accompanying metadata. Variable genes were determined by downsampling each expression matrix to a maximum of 300 nuclei per scrattch.hicat-defined cluster (from a previous step; see scrattch.hicat clustering) and running select_markers (scrattch.io v0.1.0) with n set to 20, to generate a list of up to 20 marker genes per cluster. The union of the Cv3 and SSv4 gene lists were then used as input for anchor finding, dimensionality reduction, and Louvain clustering of the full expression matrices. We used 100 dimensions for steps in the workflow, and 100 random starts during clustering. Louvain clustering was performed to overcluster the dataset to identify more integrated clusters than the number of scrattch.hicat-defined clusters. For example, GABAergic neurons had 79 and 37 scrattch.hicat-defined clusters, 225 overclustered integrated clusters, and 72 final human consensus clusters after merging for Cv3 and SSv4 datasets, respectively. To merge the overclustered integrated clusters, up to 20 marker genes were found for each cluster to establish the neighbourhoods of the integrated dataset. Clusters were then merged with their nearest neighbour if there were not a minimum of ten Cv3 and two SSv4 nuclei in a cluster, and a minimum of 4 DEGs that distinguished the query cluster from the nearest neighbour (note: these were the same parameters used to perform the initial scrattch.hicat clustering of each dataset).

### Integrating and clustering

#### Human MTG and M1 SSv4 snRNA-seq datasets

To compare cell types between our M1 human cell type taxonomy and our previously described human MTG taxonomy3, we used Seurat’s standard integration workflow to perform a supervised integration of the M1 and MTG SSv4 datasets. Intronic and exonic reads were summed into a single expression matrix for each dataset, with CPM normalized, and placed into a Seurat object with accompanying metadata. All nuclei from each major cell class were integrated and clustered separately. Up to 100 marker genes for each cluster within each dataset were identified, and the union of these two gene lists was used as input to guide alignment of the two datasets during integration, dimensionality reduction and clustering steps. We used 100 dimensions for all steps in the workflow. To compare laminar positioning in M1 and MTG, we estimated the relative cortical depth from pia for each neuron on the basis of layer dissection and average layer thickness46.

#### Integrating Cv3 snRNA-seq datasets across species

To identify homologous cell types across species, we used Seurat’s SCTransform workflow to perform a separate supervised integration on each cell class across species. Raw expression matrices were reduced to include only those genes with one-to-one orthologues defined in the three species (14,870 genes; downloaded from NCBI Homologene (https://www.ncbi.nlm.nih.gov/homologene) in November 2019; RRID SCR_002924) and placed into Seurat objects with accompanying metadata. To avoid having one species dominate the integrated space and to account for potential differences in each species’ clustering resolution, we downsampled the number of nuclei to have similar numbers across species at the subclass level (for example, Lamp5, Sst, L2/3 IT, L6b, and so on). The species with the largest number of clusters under a given subclass was allowed a maximum of 200 nuclei per cluster. The remaining species then split this theoretical maximum (200 nuclei multiplied by the maximum number of clusters under the subclass) evenly across their clusters. For example, the L2/3 intratelencephalic subclass had eight, four and three clusters for humans, marmosets and mice, respectively. All species were allowed a maximum of 1,600 L2/3 intratelencephalic nuclei in total; or a maximum of 200 human, 400 marmoset and 533 mouse nuclei per cluster. To integrate across species, all Seurat objects were merged and normalized using Seurat’s SCTransform function. To better guide the alignment of cell types from each species, we found up to 100 marker genes for each cluster within a given species. We used the union of these gene lists as input for the integration and dimensionality reduction steps, with 30 dimensions used for integration and 100 for dimensionality reduction and clustering. Clustering the human–marmoset–mouse integrated space provided an additional quality-control mechanism, revealing numerous small, species-specific integrated clusters that contained only low-quality nuclei (low UMIs and genes detected). We excluded 4,836 nuclei from the marmoset dataset that constituted low-quality integrated neuronal clusters.

### Estimation of cell-type homology

To identify homologous groups from different species, we applied a tree-based method (https://github.com/AllenInstitute/BICCN_M1_Evo) and package (https://github.com/huqiwen0313/speciesTree). In brief, the approach consists of four steps: first, metacell clustering; second, hierarchical reconstruction of a metacell tree; third, measurements of species mixing and stability of splits; and fourth, dynamic pruning of the hierarchical tree.

First, to reduce noise in single-cell datasets and to remove species-specific batch effects, we clustered cells into small highly similar groups on the basis of the integrated matrix generated by Seurat, as described in the previous section. These cells were further aggregated into metacells, and the expression values of the metacells were calculated by averaging the gene expression of individual cells that belong to each metacell. Correlation was calculated on the basis of the metacell gene-expression matrix to infer the similarity of each metacell cluster. Then hierarchical clustering was performed on the basis of the metacell gene-expression matrix using Ward’s method. For each node or corresponding branch in the hierarchical tree, we calculated three measurements, and the hierarchical tree was visualized on the basis of these measurements: first, cluster size was visualized as the thickness of branches in the tree; second, species mixing was calculated on the basis of the entropy of the normalized cell distribution and visualized as the colour of each node and branch; third, the stability of each node. The entropy of cells was calculated as: $$H=-\sum _{i}{p}_{i}{\rm{l}}{\rm{o}}{\rm{g}}{p}_{i}$$, where pi is the probability of cells from one species appearing among all the cells in a node. We assessed the node stability by evaluating the agreement between the original hierarchical tree, and a result on a subsampled dataset was calculated on the basis of the optimal subtree in the subsampled hierarchical trees, derived from subsampling 95% of cells in the original dataset. The entire subsampling process was repeated 100 times and the mean stability score for every node in the original tree was calculated. Finally, we recursively searched each node in the tree. If the heuristic criteria (see below) were not met for any node below the upper node, the entire subtree below the upper node was pruned, and all of the cells belonging to this subtree were merged into one homologous group.

To identify robust homologous groups, we applied criteria in two steps to dynamically search the cross-species tree. First, for each node in the tree, we computed the mixing of cells from three species on the basis of the entropy and set it as a tuning parameter. For each integrated tree, we tuned the entropy parameter to make sure that the tree method generated the highest resolution of homologous clusters without losing the ability to identify potential species-specific clusters. Nodes with entropy larger than 2.9 (for inhibitory neurons) or 2.75 (for excitatory neurons) were considered as well mixed nodes. For example, an entropy of 2.9 corresponded to a mixture of humans, marmosets and mice equal to (0.43, 0.37, 0.2) or (0.38, 0.30, 0.32). We recursively searched all of the nodes in the tree until we found the node nearest the leaves of the tree that was well mixed among species, and this node was defined as a well mixed upper node. Second, we further checked the within-species cell composition for the subtrees below the well mixed upper node to determine whether further splits were needed. For the cells on the subtrees below the well mixed upper node, we measured the purity of within-species cell composition by calculating the percentage of cells that fall into a specific subgroup in each individual species. If the purity for any species was larger than 0.8, we went one step further below the well mixed upper node so that its children were selected. Any branches below these nodes (or well mixed upper node if the within-species cell composition criteria were not met) were pruned, and cells from these nodes were merged into the same homologous groups, and the final integrated tree was generated.

As a final curation step, the homologous groups generated by the tree method were merged to be consistent with within-species clusters. We defined consensus types by comparing the overlap of within-species clusters between humans and marmosets, and humans and mice, as described3. For each pair of human and mouse clusters and human and marmoset clusters, the overlap was defined as the sum of the minimum proportion of nuclei in each cluster that overlapped within each leaf of the pruned tree. This approach identified pairs of clusters that consistently co-clustered within one or more leaves. Cluster overlaps varied from 0 to 1, and were visualized as a heat map with human M1 clusters in rows and mouse or marmoset M1 clusters in columns. Cell-type homologies were identified as one-to-one, one-to-many, or many-to many so that they were consistent in all three species. For example, the Vip_2 consensus type could be resolved into multiple homologous types between humans and marmosets but not humans and mice, and the coarser homology was retained. Consensus type names were assigned on the basis of the annotations of member clusters from humans and mice, and avoided specific marker gene names owing to the variability of marker expression across species.

To quantify cell-type alignment between pairs of species, we pruned the hierarchical tree described above on the basis of the stability and mixing of two species. We performed this analysis for human–marmoset, human–mouse and marmoset–mouse, and compared the alignment resolution of each subclass. The pruning criteria were tuned to fit the two-species comparison and to remove bias, and we set the same criteria for all comparisons (entropy cutoff 3.0). Specifically, for each subclass and pairwise species comparison, we calculated the number of leaves in the pruned tree. We repeated this analysis on the 100 subsampled datasets, and calculated the mean and standard deviation of the number of leaves in the pruned trees. For each subclass, we tested for significant differences in the average number of leaves across pairs of species using an ANOVA test followed by post hoc Tukey HSD tests.

### Marker determination for cell-type clusters

NS-Forest v2.1 (RRID SCR_018348) was used to determine the minimum set of marker genes whose combined expression identified cells of a given type with maximum classification accuracy47,48 (https://github.com/JCVenterInstitute/NSForest/releases). Briefly, for each cluster, NS-Forest produces a random-forest model using a one versus all binary classification approach. The top ranked genes from the random forest are then filtered by expression level to retain genes that are expressed in at least 50% of the cells within the target cluster. The selected genes are then reranked by binary score, calculated by first finding median cluster expression values for a given gene and then dividing by the target median cluster expression value. Next, one minus this scaled value is calculated, resulting in 0 for the target cluster and 1 for clusters that have no expression, while negative scaled values are set to 0. These values are then summed and normalized by dividing by the total number of clusters. In the ideal case, where all off-target clusters have no expression, the binary score is 1. Finally, for the top six binary genes, optimal expression level cutoffs are determined and all permutations of genes are evaluated by f-beta score, where the beta is weighted to favour precision. This f-beta score indicates the power of discrimination for a cluster and a given set of marker genes. The gene combination giving the highest f-beta score is selected as the optimal marker gene combination. Marker gene sets for human, mouse and marmoset primary motor cortex are listed in Supplementary Tables 46, and were used to construct the semantic cell-type definitions provided in Supplementary Table 1.

### Calculating DEGs

To identify subclass level DEGs that are conserved and divergent across species, we used the integrated Seurat objects from the species integration step. Seurat objects for each major cell class were downsampled to have up to 200 cells per species cell type. Positive DEGs were then found using Seurat’s FindAllMarkers function, using the ROC test with default parameters (min.pct = 0.1, AUROC threshold = 0.7). We compared each subclass within species to all remaining nuclei in that class, and used the SCT normalized counts to test for differential expression. For example, human Sst nuclei were compared with all other GABAergic human neurons using the ROC test. Venn diagrams were generated using the eulerr package v6.0.0 to visualize the relationships of DEGs across species for a given subclass. Heat maps of DEGs for all subclasses under a given class were generated by downsampling each subclass to 50 random nuclei per species. SCT normalized counts were then scaled and visualized with Seurat’s DoHeatmap function.

To identify ChC DEGs that are enriched over basket cells, we used the integrated Seurat objects from the species integration step. The Pvalb subclass was subset, and species cell types were then designated as either ChCs or basket cells. Positive DEGs were then found using Seurat’s FindAllMarkers function using the ROC test to compare ChCs and basket cells for each species. Venn diagrams were generated using the eulerr package to visualize the relationship of ChC-enriched DEGs across species. Heat maps of conserved DEGs were generated by downsampling the dataset to have 100 randomly selected basket cells and ChCs from each species. SCT normalized counts were then scaled and visualized with Seurat’s DoHeatmap function.

We used the four-species (humans, macaques, marmosets and mice) integrated glutamatergic Seurat object from the species integration step for all L5 extratelencephalic DEG figures. L5 extratelencephalic and L5 intratelencephalic subclasses were downsampled to 200 randomly selected nuclei per species. A ROC test was then performed using Seurat’s FindAllMarkers function between the two subclasses for each species to identify L5 extratelencephalic-specific marker genes. We then used the UpSetR v1.4.0 package to visualize the intersections of the marker genes across all four species as an upset plot. To determine genes that decrease in expression across evolutionary distance in L5 extratelencephalic neurons, we found the log-transformed fold change between L5 extratelencephalic and L5 intratelencephalic for each species across all genes. We then filtered the gene lists to include only those genes that had a trend of decreasing log-transformed fold change (from humans to macaques to marmosets to mice). Lastly, we excluded any gene that did not have a log-transformed fold change of 0.5 or greater in the human comparison. These 131 genes were then used as input for Gene Ontology (GO) analysis with the PANTHER classification system49 for the biological process category, with the organism set to Homo sapiens. All significant GO terms for this gene list were associated with cell–cell adhesion and axon guidance, and are coloured blue in the line graph of their expression enrichment.

### Differential isoform usage in humans and mice

To assess changes of isoform usage between mice and humans, we used SSv4 data with full transcript coverage and estimated isoform abundance in each cell subclasses. To mitigate low read depth in each cell, we aggregated reads from all cells in each subclass. We estimated the relative isoform usage in each subclass by calculating its genic proportion (P), defined as the ratio (R) of isoform expression to the gene expression, where R = (Phuman − Pmouse)/(Phuman + Pmouse). For a common set of transcripts for mice and humans, we used the University of California San Diego (UCSC) browser (RRID SCR_005780) TransMapV5 set of human transcripts (hg38 assembly, Gencode v31 annotations, RRID SCR_014966) mapped to the mouse genome (mm10 assembly) (http://hgdownload.soe.ucsc.edu/gbdb/mm10/transMap/V5/mm10.ensembl.transMapV5.bigPsl). We considered only medium to highly expressed isoforms, which have abundances of greater than 10 transcripts per million (TPM) and P values of greater than 0.2 in either mice or humans, and abundances of greater than 10 TPM in both mice and humans.

To calculate isoform abundance in each cell subclass, we aggregated reads from each subclass; mapped reads to the mouse or human reference genome with STAR 2.7.3a using default parameters; transformed genomic coordinates into transcriptomic coordinates using the STAR parameter –quantMode TranscriptomeSAM; and quantified isoform and gene expression using the RSEM v1.3.3 parameters (RSEM, RRID:SCR_013027) –bam–seed 12345–paired-end–forward-prob 0.5–single-cell-prior–calc-ci.

To estimate statistical significance, we calculated the standard deviation of isoform genic proportion (Phuman and Pmouse) from the RSEM’s 95% confidence intervals of isoform expression; calculated the P-value using the normal distribution for the (Phuman − Pmouse) and the summed (mouse + human) variance; and Bonferroni-adjusted P-values by multiplying nominal P-values by the number of medium to highly expressed isoforms in each subclass.

### Species cluster dendrograms

DEGs for a given species were identified by using Seurat’s FindAllMarkers function with a Wilcox test and comparing each cluster with every other cluster under the same subclass, with logfc.threshold set to 0.7 and min.pct set to 0.5. The union of up to 100 genes per cluster with the highest avg_logFC was used. The average log2 expression of the DEGs was then used as input for the build_dend function from scrattch.hicat to create the dendrograms. This was carried out with both human and marmoset datasets. For mouse dendrogram methods, see ref. 5.

### Multiplex FISH and immunofluorescence

Fresh-frozen human postmortem brain tissues were sectioned at 14–16 μm onto Superfrost Plus glass slides (Fisher Scientific). Sections were dried for 20 min at −20 °C and then vacuum sealed and stored at −80 °C until use. The RNAscope multiplex fluorescent v1 kit was used per the manufacturer’s instructions for fresh-frozen tissue sections (ACD Bio), except that fixation was performed for 60 min in 4% paraformaldehyde in 1× PBS at 4 °C and protease treatment was shortened to 5 min. For combined RNAscope and immunofluorescence, primary antibodies were applied to tissues after completion of FISH staining. Primary antibodies used were mouse anti-GFAP (EMD Millipore, catalogue number MAB360, RRID AB_11212597, 1:500 dilution) and mouse anti-Neurofilament H (SMI-32, BioLegend, catalogue number 801701, RRID AB_2564642, 1:250 dilution). Secondary antibodies were goat anti-mouse IgG (H+L) Alexa Fluor 568 conjugate (ThermoFisher Scientific, catalogue number A-11004, 1:500 dilution), goat anti-mouse IgG (H+L) Alexa Fluor 594 conjugate (ThermoFisher Scientific, catalogue number A-11005, 1:500 dilution) and goat anti-mouse IgG (H+L) Alexa Fluor 647 conjugate (ThermoFisher Scientific, catalogue number A-21235, 1:500 dilution) conjugates (594, 647). Sections were imaged using a 60× oil immersion lens on a Nikon TiE fluorescence microscope equipped with NIS-Elements Advanced Research imaging software (v4.20, RRID SCR_014329). For all RNAscope FISH experiments, positive cells were called by manually counting RNA spots for each gene. Cells were called positive for a gene if they contained three or more RNA spots for that gene. Lipofuscin autofluorescence was distinguished from RNA spot signals on the basis of the larger size of lipofuscin granules and broad fluorescence spectrum of lipofuscin. Staining for each probe combination was repeated with similar results on at least two separate individuals per species and on at least two sections per individual. Experiments examining L5 extratelencephalic neurons in humans were conducted on tissues taken from the dome of the gyrus corresponding to the presumptive trunk-lower limb portion of M1. Images were assessed with FIJI distribution of ImageJ v1.52p and GraphPad Prism v7.04.

### Conservation of gene families

To investigate the conservation and divergence of the coexpression of gene families between primates and mice, we carried out MetaNeighbour analysis50 using gene groups curated by the HUGO Gene Nomenclature Committee (HGNC, RRID SCR_002827) at the European Bioinformatics Institute (https://www.genenames.org; downloaded January 2020) and by the Synaptic Gene Ontology (SynGO, RRID SCR_017330)51 (downloaded February 2020). HGNC annotations were propagated via the provided group hierarchy to ensure the comprehensiveness of parent annotations. Only groups containing five or more genes were included in the analysis.

After splitting data by class, we used MetaNeighbour to compare data at the cluster level using labels from cross-species integration with Seurat. Cross-species comparisons were performed at two levels of the phylogeny: first, between the two primate species, marmosets and humans; and second, between mice and primates. In the first case, the data from the two species were each used as the testing and training sets across two folds of cross-validation, reporting the average performance (AUROC) across folds. In the second case, the primate data were used as an aggregate training set, and performance in mice was reported. Results were compared to average within-species performance.

### Replicability of clusters

MetaNeighbour v1.9.1 (RRID SCR_016727) was used to provide a measure of neuronal subclass and cluster replicability within and across species. For this application, we tested all pairs of species (human–marmoset, marmoset–mouse, human–mouse) as well as testing within each species. After splitting the data by class, we identified highly variable genes using the get_variable_genes function from MetaNeighbour, yielding 928 genes for GABAergic and 763 genes for glutamatergic neuron classes, respectively. These were used as input for the MetaNeighbourUS function, which was run using the fast_version and one_vs_best parameters set to TRUE. Using the one_vs_best parameter means that only the two closest neighbouring clusters are tested for their similarity to the training cluster, with results reported as the AUROC for the closest neighbour over the second closest. AUROCs are plotted in heat maps in Extended Data Figs. 2, 3. Data to reproduce these figures can be found in Supplementary Table 9, and scripts are on GitHub (http://github.com/gillislab/MetaNeighbor).

### SNARE–seq2

#### Sample preparation

Human and marmoset primary motor cortex nuclei were isolated for SNARE–seq2 according to the following protocol: https://doi.org/10.17504/protocols.io.8tvhwn6 (ref. 6). Fluorescence-activated nuclei sorting (FANS) was then performed on a FACSAria Fusion (BD Biosciences, Franklin Lakes, NJ), gating out debris from forward scatter (FSC) and side scatter (SSC) plots and selecting DAPI+ singlets (Extended Data Fig. 6a). Samples were kept on ice until sorting was complete and were used immediately for SNARE–seq2.

#### Data processing

A detailed step-by-step pipeline for processing SNARE–seq2 data is provided elsewhere28. For RNA data, this involved the use of dropEst to extract cell barcodes and STAR (v2.5.2b) to align tagged reads to the genome (GRCh38 version 3.0.0 for humans; GCF 000004665.1 Callithrix jacchus-3.2 for marmosets). For data on accessible chromatin, this involved Snaptools v1.4.7 for alignment to the genome (cellranger-atac-GRCh38-1.1.0 for humans, GCF 000004665.1 Callithrix jacchus-3.2 for marmosets) and to generate snap objects for processing using the R package SnapATAC v2. We generated 84,178 and 9,946 dual-omic single-nucleus RNA and accessible chromatin datasets from human (n = 2) and marmoset (n = 2) M1, respectively.

#### Data analysis

Filtering for RNA quality. For SNARE–seq2 data, quality filtering of cell barcodes and clustering analysis were first performed on transcriptomic (RNA) counts and used to inform subsequent quality filtering and analysis of accessible chromatin. Each cell barcode was tagged by an associated library batch identification code (for example MOP1, MOP2, and so on); RNA read counts associated with dT and n6 adaptor primers were merged; libraries were combined for each sample within each experiment and empty barcodes were removed using the emptyDrops() function of DropletUtils v1.6.1 (ref. 52); mitochondrial transcripts were removed; and doublets were identified using the DoubletDetection v2.5 software53 and removed. All samples were combined across experiments within species, and cell barcodes having greater than 200 and fewer than 7,500 genes detected were kept for downstream analyses. To further remove low-quality datasets, we applied a gene UMI ratio filter (gene.vs.molecule.cell.filter) using Pagoda2 v0.1.0 (https://github.com/hms-dbmi/pagoda2).

Clustering of RNA data. For human SNARE–seq2 RNA data, clustering analysis was first performed using Pagoda2, where counts were normalized to the total number per nucleus and batch variations were corrected by scaling expression of each gene to the dataset-wide average. After variance normalization, the top 6,000 overdispersed genes were used for principal component analysis. Clustering was performed using an approximate k-nearest neighbour graph (with k values between 50 and 500) based on the top 75 principal components, and cluster identities were determined using the infomap community detection algorithm. Major cell types were identified using a common set of broad cell-type marker genes: GAD1/GAD2 (GABAergic neurons), SLC17A7/SATB2 (glutamatergic neurons), PDGFRA (oligodendrocyte progenitor cells), AQP4 (astrocytes), PLP1/MOBP (oligodendrocytes), MRC1 (perivascular macrophages), PTPRC (T cells), PDGFRB (vascular smooth muscle cells), FLT1 (vascular endothelial cells), DCN (vascular fibroblasts) and APBB1IP (microglia) (Extended Data Fig. 6c). Low-quality clusters that showed very low gene/UMI detection rates, low marker gene detection and/or mixed cell-type marker profiles were removed. Oligodendrocytes were overrepresented (54,080 in total), possibly reflecting a deeper subcortical sampling than intended; therefore, to ensure a more balanced distribution of cell types, we capped the number of oligodendrocytes at 5,000 and repeated the PAGODA2 clustering as above. To achieve optimal clustering of the different cell types, we used different k values to identify cluster subpopulations for different cell types (L2/3 glutamatergic neurons, k = 500; all other glutamatergic neurons, astrocytes, oligodendrocytes and OPCs, k = 100; GABAergic neurons, vascular cells and microglia/perivascular macrophages, k = 50). To assess the appropriateness of the chosen k values, clusters were compared against SMARTer clustering of data generated on human M1 through correlation of cluster-averaged scaled gene-expression values using the corrplot v0.84 package (https://github.com/taiyun/corrplot) (Extended Data Fig. 6d). For cluster visualization, UMAP dimensional reduction was performed in Seurat (v3.1.0, RRID SCR_007322) using the top 75 principal components identified using Pagoda2 (RRID SCR_017094). For marmosets, clustering was initially performed using Seurat, where the top 2,000 variable features were selected from the mean variance plot using the ‘vst’ method and used for principal component analysis. UMAP embeddings were generated using the top 75 principal components. To harmonize cellular populations across platforms and modalities, snRNA-seq within-species cluster identities were predicted from both human and marmoset data. We used an iterative nearest-centroid classifier algorithm (see Methods subsection ‘Mapping of samples to reference taxonomies’) to generate probability scores for each SNARE–seq2 nucleus mapping to their respective species’ snRNA-seq reference cluster (Cv3 for marmoset and SMART-Seqv4 for human). Comparing the predicted RNA cluster assignment of each nucleus with Pagoda2-identified clusters showed highly consistent cluster membership using a Jaccard similarity index (Extended Data Fig. 6e), confirming the robustness of these cell identities discovered using different analysis platforms.

Filtering for quality of accessible chromatin data, and peak calling. Initial analysis of corresponding SNARE–seq2 data on chromatin accessibility was performed using SnapATAC v2 software (https://github.com/r3fang/SnapATAC) (https://doi.org/10.1101/615179). Snap objects were generated by combining individual snap files across libraries within each species. Cell barcodes were included for downstream analyses only if cell barcodes passed RNA quality filtering (see above) and showed more than 1,000 read fragments and 500 UMIs. Read fragments were then binned to 5,000-bp windows of the genome, and only those cell barcodes that showed a fraction of binned reads within promoters of greater than 10% (15% for marmosets) and less than 80% were kept for downstream analysis. For peak calling, pseudo bulk aggregates of reads were generated for each of the consensus RNA taxonomies, subclasses and classes using Snaptools. Given that comparable sequencing and sampling depths were achieved (Supplementary Table 14), pseudo bulk aggregates for peak calling included all within-species samples. Peaks were called using MACS2 v2.1.2 software (https://github.com/taoliu/MACS) using the runMACS function in SnapATAC and with the following options ‘–nomodel–shift 100–ext 200–qval 5e-2 –B–SPMR’. Peak counts by cell barcodes were then computed using the ‘createPmat’ function of SnapATAC.

Clustering of accessible chromatin data. The matrices for peak counts were filtered to keep only locations from chromosomes 1–22, X or Y, and processed using Seurat (v3.1.0) and Signac (v0.1.4) software45 (https://satijalab.org). All peaks having at least 100 counts (20 for marmosets) across cells were used for dimensionality reduction using latent semantic indexing (‘RunLSI’ function) and visualized by UMAP using the first 50 dimensions (40 for marmosets).

Calculating gene-activity scores. For a gene-activity matrix from accessibility data, cis co-accessible sites and gene-activity scores were calculated using Cicero software (v1.2.0)54 (https://cole-trapnell-lab.github.io/cicero-release/). The binary peak matrix was used as input, with the expression-family variable set to ‘binomialff’ to make the aggregated input Cicero CDS object using the UMAP coordinates derived from accessible chromatin peaks, and setting 50 cells to aggregate per bin. Co-accessible sites were then identified using the ‘run_cicero’ function using default settings, and modules of cis co-accessible sites were identified using the ‘generate_ccans’ function. Co-accessible sites were annotated to a gene if they fell within a region spanning 10,000 bp upstream and downstream of the gene’s transcription start site (human) or within 5,000 bp of the gene body (marmoset). The Cicero gene activity matrix was then calculated using the ‘build_gene_activity_matrix’ function using a co-accessibility cutoff of 0.25 and added to a separate assay of the Seurat object.

Integrating data on RNA and accessible chromatin. To reconcile the differing resolutions achievable from RNA and accessible chromatin (Extended Data Fig. 6f–k), we carried out an integrative analysis using Seurat. Transfer anchors were identified between the activity and RNA matrices using the ‘FindTransferAnchors’ function. For human data, transfer anchors were generated using an intersected list of variable genes identified from Pagoda2 analysis of RNA clusters (top 2,000 genes) and marker genes for clusters identified from SSv4 data (2,492 genes having β-scores greater than 0.4), together with canonical correlation analysis (CCA) for dimension reduction. For marmoset data, transfer anchors were generated using an intersected list of variable genes identified using Seurat (top 2,000 genes) and DEGs identified between marmoset RNA clusters (Cv3 snRNA-seq data, P < 0.05, top 100 markers per cluster). Imputed RNA expression values were then calculated using the ‘TransferData’ function from the Cicero gene activity matrix using normalized RNA expression values for reference and LSI for dimension reduction. RNA and imputed expression data were merged, and a UMAP co-embedding and shared nearest neighbour (SNN) graph was generated using the top 50 principal components (40 for marmoset) and clusters identified (‘FindClusters’) using a resolution of 4. The resulting integrated clusters were compared with RNA clusters by calculating jaccard similarity scores using scratch.hicat software. Cell populations identified as T cells from Pagoda2 analysis (humans only) and those representing low-quality integrated clusters, showing a mixture of disparate cell types, were removed from these analyses. RNA clusters were assigned to co-embedded clusters on the basis of the highest jaccard similarity score and frequency, and then merged to generate the best matched co-embedded clusters, taking into account cell type and subclass to ensure more accurate merging of ambiguous populations. This produced clusters based on accessible chromatin that directly match the RNA-defined populations (Extended Data Fig. 6k). For RNA cluster and subclass level predictions (Extended Data Fig. 6g), we used the Seurat ‘TransferData’ function to transfer RNA cluster or subclass labels to accessible-chromatin data using the precomputed transfer anchors and LSI dimensionality reduction.

Final peaks of accessible chromatin and gene-activity matrices. A final combined list of peak regions was generated using MACS2, as detailed above, for all cell populations corresponding to RNA consensus taxonomies (more than 100 nuclei), accessibility level, subclass level (more than 50 nuclei) and class level barcode groupings. The final peak by cell barcode count matrix was generated by SnapATAC and used to establish a Seurat object as outlined above, with peak counts, Cicero gene activity scores and RNA expression values for matched cell barcodes contained within different assay slots. To confirm the appropriateness of calling peaks on cell barcode groupings that included both samples, we found that 93% of peak regions called by MACS2 on clusters at the accessible-chromatin level for the H18.30.001 sample overlapped with peak regions called independently for H18.30.002. Clusters at the accessible-chromatin level also showed similar coverage across individual samples (Extended Data Fig. 7a–d), and peak counts were highly correlated across experiments (mean Pearson’s correlation coefficient (r) of 0.99 for humans and 0.98 for marmosets). Furthermore, gene activity estimates based on cis-regulatory interactions predicted from co-accessible promoter and distal peak regions using Cicero54 were highly correlated with RNA expression values (Extended Data Fig. 7e, f). Dimensionality reduction using LSI on peak counts for final visualization (Extended Data Fig. 7a–c) was performed as above.

Dual-omic data. Following quality-control filtering for RNA and accessible chromatin (including limiting the representation of oligodendrocytes for humans) and modality integration, we obtained 84,178 and 9,946 dual-omic single-nucleus RNA and accessible-chromatin datasets from human (n = 2) and marmoset (n = 2) M1, respectively (Extended Data Fig. 6a, b and Supplementary Table 13). On average, 2,242 genes were detected per nucleus for humans and 3,858 genes per nucleus for marmosets, owing to the more than fourfold greater sequencing depth for marmosets (average 17,576 reads per nucleus for humans and 77,816 reads per nucleus for marmosets). In human and marmoset cells, we identified a total of 273,103 and 134,769 regions of accessible chromatin, and an average of 1,527 or 1,322 unique peaks of accessible chromatin per nucleus, respectively.

Analysis of transcription-factor motifs. Jaspar motifs (JASPAR2020, all vertebrate) were used to generate a motif matrix and motif object that was added to the Seurat object using Signac (‘CreateMotifMatrix’, ‘CreateMotifObject’, ‘AddMotifObject’); and GC content, region lengths and dinucleotide base frequencies were calculated using the ‘RegionStats’ function. For motif activity scores, chromVAR v1.8.0 (https://greenleaflab.github.io/chromVAR)55 was carried out according to default parameters (marmosets) or using the Signac ‘RunChromVAR’” function on the peak count matrix (humans). The chromVAR deviation score matrix was then added to a separate assay slot of the Seurat object, and differential activities (or deviation scores) of TFBSs between different populations were assessed using the ‘Find[All]Markers’ function through logistic regression and using the number of peak counts as a latent variable.

To examine non-redundant TFBS families, we downloaded motif collections generated by matrix clustering56 from the JASPAR database (http://jaspar.genereg.net/matrix-clusters/), and used them to generate averaged chromVAR TFBS activities by subclass. Select motif clusters were visualized using ggHeat plotting function (SWNE package v0.5.7, https://github.com/yanwu2014/swne).

Identification of DARs. To compare DARs between cell populations (Fig. 3b and Extended Data Fig. 7g, h), we identified DARs that are significantly enriched within each cell grouping against a selection of background cells having best matched total peak counts. In this way, we identified DARs for each cell population, while taking into account any technical artefacts associated with the total accessibility for each cell. This involved calculating the total peaks in each cell on the basis of the accessibility matrix, estimating the distribution of total peaks (depth distribution) for the cells belonging to the test cluster, and randomly sampling cells (10,000 for humans and 2,000 for marmosets) from the rest of the clusters in a weighted way to select cells that have similar depth distribution to the test cluster. DARs were then identified as significantly enriched in the positive cells over selected background cells using the ‘CalcDiffAccess’ function, chromfunks v0.3.0 (https://github.com/yanwu2014/chromfunks), where P-values were calculated using a Fisher’s exact test on a hypergeometric distribution6, and adjusted P values (or q values) were calculated using the Benjamini–Hochberg method. To compare DAR proportions across subclasses and species, we subsampled subclasses (maximum of 500 for humans and 200 for marmosets) and identified DARs using the ‘CalcDiffAccess’ function as above. AUC values, testing the separation power of a specific DAR among different major clusters, were then calculated from the term frequency–inverse document frequency (TF–IDF) normalized peak by cell matrix using getDifferentialGenes and auc functions from the pagoda2 and pROC v1.16.2 packages. To visualize subsampled subclass DARs, we selected significant human (q < 0.005 and log-transformed fold change > 1) and marmoset (q < 0.05 and log-transformed fold change > 1) DARs passing an upper quantile AUC cutoff. For clusters of accessible chromatin and RNA in humans, we selected up to the top 100 DARs on the basis of log-transformed fold change values (accessible chromatin, q < 0.01 and log-transformed fold change > 1; RNA, q < 0.05 and log-transformed fold change > 1). Averaged accessibility values by cell grouping were then calculated, scaled (trimming values to a minimum of 0 and a maximum of 5), and visualized using the ggHeat plotting function (SWNE package).

To identify conservation of DARs between humans and marmosets, we found that 97% of marmoset DARs could be aligned to the human genome on the basis of at least 10% of matched bases using the LiftOver tool (https://genome.ucsc.edu/cgi-bin/hgLiftOver). DARs in each subclass were considered conserved if they were located within 1 kb of the aligned genomic location based on the overlap of genomic locations between species using the ‘findOverlaps’ function in the GenomicRanges v1.38.0 package.

Linking DARs to marker genes. We identified marker genes for the clusters of accessible chromatin by comparing the gene expression from cells in each cluster with a weighted sampling of background cells from the remaining clusters. Wilcoxon tests were used to calculate the z-scores and adjusted P values for individual genes using ‘getDifferentialGenes’ function from the pagoda2 package. Genes were ranked by calculating AUC values, and DARs for the corresponding clusters were identified using the method described above. For each identified DAR, we assigned it to the nearest gene. The top expressed genes and associated DARs that were located within 500 kb of the gene region in each cluster of accessible chromatin were considered as associated targets. To further identify targets that have a direct link between DARs and gene expression, we trained a random forest regression model to predict changes in gene expression in each cluster of accessible chromatin on the basis of the features extracted from its assigned DARs. The significant targets were then identified by comparing the regression model and a background model generated by random permutation. The union of the top predictive targets and identified marker genes and their associated DARs was selected and visualized.

Correlation analyses. To correlate RNA expression and associated accessible-chromatin activities for clusters at the levels of RNA and accessible chromatin (Extended Data Fig. 7e, f), we generated average scaled expression values and carried out pairwise correlations for marker genes identified from an intersected list of variable genes from Pagoda2 analysis of RNA clusters (top 2,000 genes) and marker genes for clusters identified from SSv4 data (2,492 genes having β-scores of greater than 0.4). For correlation of TFBS activities across species (Fig. 3), chromVAR TFBS activity scores for all Jaspar motifs found to be differentially active across marmoset or human subclasses (P < 0.05) were averaged, scaled for each species separately, and then correlated. Averaged scaled gene-expression values for the corresponding transcription factor were also correlated. Variable genes identified from both human and marmoset SNARE–seq2 RNA data using Seurat FindVariableFeatures function (selection.method = ‘vst’, nfeatures = 3,000) were used to generate averaged scaled expression values and correlated. Correlations between human and marmoset cell subclasses were visualized as boxplots for TFBS activities, expression of transcription factors, and variable genes using the R package ggplot2 v3.3.2 (ref. 57).

Plots and figures. All UMAP, feature, dot, and violin plots were generated using Seurat. Correlation plots were generated using the corrplot package. Connection plots were generated using Cicero and visualized using Gviz v1.30.3 (ref. 58). To generate BigWig files for genome browser tracks, we compiled bam files for each cluster and normalized fragments using trimmed mean of M-values (TMM) to better account for differences in size (total fragments) and signal-to-noise ratios between clusters. For this, inverse scale factors were calculated using EdgeR v3.28.1 (ref. 59) for each cluster on the basis of a subset of fragments that overlap chromosome 22. BigWig files were then generated using deepTools v3.4.2 bamCoverage60 with the following options: (–ignoreDuplicates–minFragmentLength 0–maxFragmentLength 1000–binSize 50–scaleFactor). Genome browser tracks were generated using the Integrative Genomics Viewer (IGV v2.7.0).

### Single-cell methylome data (snmC-seq2)

#### Sequencing and quantification

Library preparation and Illumina sequencing. Single nuclei were isolated from human and marmoset M1 tissue as described above for RNA-seq profiling, and for mouse tissue as detailed in ref. 5. Single nuclei were labelled with an anti-NeuN antibody and isolated by fluorescence-activated cell sorting (FACS), and neurons were enriched (90% NeuN+ nuclei) to increase detection of rare types. Mouse experiments were approved by the Salk IACUC under protocol number 18-00006. Detailed methods for bisulfite conversion and library preparation were previously described for snmC-seq2 (refs. 4,30). The snmC-seq2 libraries generated from mouse brain tissues were sequenced using an Illumina Novaseq 6000 instrument with S4 flowcells and 150 bp paired-end mode. We generated 6,095, 6,090 and 9,876 single-nucleus methylcytosine datasets from M1 of humans (n = 2), marmosets (n = 2), and mice, respectively.

Mapping and feature-count pipeline. We implemented a versatile mapping pipeline (http://cemba-data.rtfd.io) for all the single-cell methylome-based technologies developed by our group4,30,61. The main steps of this pipeline included: first, demultiplexing FASTQ files into single cells; second read-level quality control; third, mapping; fourth BAM file processing and quality control; and fifth, final generation of molecular profiles. The details of the five steps for snmC-seq2 were described previously30. We mapped all of the reads from the three corresponding species onto the human hg19 genome, the marmoset ASM275486v1 genome, and the mouse mm10 genome. After mapping, we calculated the methyl-cytosine counts and total cytosine counts for two sets of genome regions in each cell: the non-overlapping chromosome 100-kb bins of each genome (the methylation levels of which were used for clustering analysis) and the gene-body regions (the methylation levels of which were used for cluster annotation and integration with RNA expression data). On average, 5.5% of human, 5.6% of marmoset and 6.2% of mouse genomes were covered by stringently filtered reads per cell, with 3.4 × 104, 1.8 × 104 and 4.5 × 104 genes detected per cell in the three species.

#### Quality control and preprocessing

Cell filtering. We filtered the cells on the basis of the following main mapping metrics: first, an mCCC rate of less than 0.03 (the mCCC rate reliably estimates the upper bound of the bisulfite non-conversion rate4); second, an overall mCG rate of 0.5; third, an overall mCH rate of less than 0.2; fourth, total final reads of more than 500,000; and fifth, a bismark mapping rate of more than 0.5. Other metrics such as genome coverage, rate of PCR duplicates, and index ratio were also generated and evaluated during filtering. However, after removing outliers with the main metrics 1–5, few additional outliers could be found.

Feature filtering. We filtered 100-kb genomic bin features by removing bins with mean total cytosine base calls of less than 250 or more than 3,000. We also excluded regions that overlap with the ENCODE blacklist62 from further analysis.

Computation and normalization of the methylation rate. For CG and CH methylation, the computation of methylation rate from the methyl-cytosine and total cytosine matrices contains two steps: first, prior estimation for the beta-binomial distribution; and second, posterior rate calculation and normalization per cell.

In step 1, for each cell we calculated the sample mean, m, and variance, v, of the raw methylcytosine rate for each sequence context (CG, CH). The shape parameters (α, β) of the beta distribution were then estimated using the method of moments:

$$a=m(m(1\,\mbox{--}\,m)/v\,\mbox{--}\,1)$$
$$\beta =(1\,\mbox{--}\,m)(m(1\,\mbox{--}\,m)/v\,\mbox{--}\,1)$$

This approach used different priors for different methylation types for each cell and used weaker priors for cells with more information (higher raw variance).

In step 2, we calculated the posterior $$\overline{{\rm{mc}}}$$ = α + mc/α + β + cov, where cov is the total read number and mc is the number of reads supporting methylation. We normalized this rate by the cell’s global mean methylation, m = α/(α + β). Thus, all the posterior $$\overline{{\rm{mc}}}$$ with 0 cov will be a constant 1 after normalization. The resulting normalized mc rate matrix contains no ‘not available’ (NA) values, and features with less cov tend to have a mean value close to 1.

Selection of highly variable features. Highly variable methylation features were selected on the basis of a modified approach using the scanpy v1.4.4 package scanpy.pp.highly_variable_genes function63. In brief, the scanpy.pp.highly_variable_genes function normalized the dispersion of a gene by scaling with the mean and standard deviation of the dispersions for genes falling into a given bin for mean expression of genes. In our modified approach, we reasoned that both the mean methylation level and the mean cov of a feature (100-kb bin or gene) could impact the dispersion of the mc rate. We grouped features that fall into a combined bin of mean and cov, and then normalized the dispersion within each mean–cov group. After dispersion normalization, we selected the top 3,000 features based on normalized dispersion for clustering analysis.

Dimension reduction and combination of different mC types. For each selected feature, mc rates were scaled to unit variance, and zero mean. Principal component analysis (PCA) was then performed on the scaled mc rate matrix. The number of important principal components was selected by inspecting the variance ratio of each principal component using the elbow method. The CH and CG principal components were then concatenated together for further analysis in clustering and manifold learning.

#### Data analysis

Consensus clustering on concatenated principal components. We used a consensus clustering approach based on multiple Leiden-clustering64 over a k-nearest neighbour (KNN) graph to account for the randomness of the Leiden clustering algorithms. After selecting dominant principal components from PCA in both mCH and mCG matrices, we concatenated the principal components together to construct a KNN graph using scanpy.pp.neighbours with Euclidean distance. Given fixed resolution parameters, we repeated the Leiden clustering 300 times on the KNN graph with different random starts, and combined these cluster assignments as a new feature matrix, where each single Leiden result is a feature. We then used the outlier-aware DBSCAN algorithm from the scikit-learn v0.21.3 package (RRID SCR_002577) to perform consensus clustering over the Leiden feature matrix using the hamming distance. Different epsilon parameters of DBSCAN are traversed to generate consensus cluster versions with the number of clusters that range from the minimum to the maximum number of clusters observed in the multiple Leiden runs. Each version contained a few outliers that usually fall into three categories: first, cells located between two clusters that had gradient differences instead of clear borders; second, cells with a low number of reads that potentially lack information in essential features to determine the specific cluster; and third, cells with a high number of reads that were potential doublets. The amount of the first and second types of outliers depends on the resolution parameter and is discussed in the ‘Choice of resolution parameter’ section below. The third type of outliers were very rare after cell filtering. The supervised model evaluation then determined the final consensus cluster version.

Supervised model evaluation of the clustering assignment. For each consensus clustering version, we performed a recursive feature elimination with cross-validation (RFECV)65 process from the scikit-learn package to evaluate clustering reproducibility. We first removed the outliers from this process, and then we held out 10% of the cells as the final testing dataset. For the remaining 90% of the cells, we used tenfold cross-validation to train a multiclass prediction model using the input principal components as features and sklearn.metrics.balanced_accuracy_score66 as an evaluation score. The multiclass prediction model is based on BalancedRandomForestClassifier from the imblearn v0.0 package that accounts for imbalanced classification problems67. After training, we used the 10% testing dataset to test the model performance using the balanced_accuracy_score score. We kept the best model and corresponding clustering assignments as the final clustering version. Finally, we used this prediction model to predict outliers’ cluster assignments, and rescued those with a prediction probability of more than 0.3, otherwise labelling them as outliers.

Choice of resolution parameter. Choosing the resolution parameter of the Leiden algorithm is essential for determining the final number of clusters. We selected the resolution parameter according to three criteria: first, the portion of outliers is less than 0.05 in the final consensus clustering version; second, the ultimate accuracy of the model’s prediction is more than 0.95; and third, the average number of cells per cluster is 30 or more, thereby controlling the cluster size to reach the minimum coverage required for further epigenome analysis such as DMR calls. All three criteria prevented the oversplitting of the clusters; thus, we selected the maximum resolution parameter to meet the criteria using a grid search.

Three levels of iterative clustering analysis. We used an iterative approach to cluster the data into three levels of categories with the consensus clustering procedure described above. In the first level, termed CellClass, clustering analysis is done on all cells. The resulting clusters are then manually merged into three canonical classes, glutamatergic neurons, GABAergic neurons and non-neurons, based on marker genes. The same clustering procedure is then conducted within each CellClass to obtain clusters as the MajorType level. Within each MajorType, we obtain the final clusters as the SubTypes in the same way.

Integrating cell clusters identified from snmC-seq2 and from Cv3. We identified gene markers on the basis of gene-body hypomethylation for each level of clustering of snmC-seq2 data using our in-house analysis utilities (https://github.com/lhqing/cemba_data), and identified gene markers for cell classes and subclasses from Cv3 analysis using scanpy63. We then used Scanorama v1.0 (ref. 68) to integrate the two modalities with the markers identified (Supplementary Table 23). Methylome tracks at the subclass level can be found at http://neomorph.salk.edu/aj2/pages/cross-species-M1/.

Calling CG DMRs. We identified CG DMRs using methylpy v1.4.0 (https://github.com/yupenghe/methylpy) as described69. Briefly, we first called CG differentially methylated sites and then merged them into blocks if they both showed similar sample-specific methylation patterns and were within 250 bp. Normalized relative lengths of DMRs (Fig. 4d) were calculated by summing the lengths of DMRs and the surrounding 250 bp, and dividing by the numbers of cytosines covered in sequencing.

Analysis of enriched TFBS motifs. For each cell subclass (cluster), we analysed enriched TFBS motifs for hypomethylated DMRs compared with the hypomethylated DMRs from other cell subclasses (clusters) using software AME70. DMRs and surrounding 250-bp regions were used in the analysis. Enrichment results are reported as significance (P values) and effect sizes (log2(true positives/false positives).

### Characterization of chandelier cells

#### Morphology

Morphological reconstructions of Pvalb-expressing ChC and basket cells were obtained from human donors using the patch–seq protocol described below for L5 extratelencephalic neurons. Macaque reconstructions were from source data available in Neuromorpho71,72. Mouse cells also appear in ref. 73.

#### Mouse ATAC-seq: data acquisition and analysis

Chandelier cells are rare in mouse cortex and were enriched by isolating individual neurons from transgenically labelled mouse primary visual cortex (VISp). Many of the transgenic mouse lines have previously been characterized by single-cell RNA-seq2. Single-cell suspensions of cortical neurons were generated as described2 and subjected to tagmentation (ATAC-seq)74,75. Mixed libraries containing 60–96 samples were sequenced on an Illumina MiSeq. In total, 4,275 single cells were collected from 36 driver-reporter combinations in 67 mice. After sequencing, raw FASTQ files were aligned to the GRCm38 (mm10) mouse genome using Bowtie v1.1.0 (RRID SCR_005476) as described76. Following alignment, duplicate reads were removed using samtools v1.9 rmdup, which yielded only single copies of uniquely mapped paired reads in BAM format. Quality-control filtering was applied to select samples with more than 10,000 uniquely mapped paired-end fragments, more than 10% of which were longer than 250 base pairs and with more than 25% of their fragments overlapping high-depth cortical DNase-seq peaks from ENCODE77. The resulting dataset contained a total of 2,799 samples.

To increase the cell-type resolution of chromatin-accessibility profiles beyond that provided by driver lines, we used a feature-free method for computation of pairwise distances (Jaccard). Using Jaccard distances, we carried out PCA and t-SNE, followed by Phenograph v1.5.2 (RRID SCR_016919) clustering78. This clustering method grouped cells from class-specific driver lines together, but also segregated them into multiple clusters. Phenograph-defined neighbourhoods were assigned to cell subclasses and clusters by comparing accessibility near transcription start sites (TSS ± 20 kb) to median expression values of scRNA-seq clusters at the cell-type and subclass levels from mouse primary visual cortex79. From this analysis, we assigned a total of 226 samples to Pvalb and 124 samples to Pvalb Vipr2 (ChC) clusters. The sequence data for these samples were grouped together and further processed through the Snap-ATAC pipeline.

Mouse scATAC-seq peak counts for Pvalb and ChC were used to generate a Seurat object as outlined above for human and marmoset SNARE–seq2 data on accessible chromatin. Cicero cis co-accessible sites were identified, gene-activity scores calculated, and motif-enrichment analyses performed as above. Genes used for motif enrichment were ChC markers identified from differential expression analysis between PVALB-positive clusters in mouse Cv3 scRNA-seq data (with an adjusted P value of less than 0.05).

### Patch–seq

#### Participants

The human neurosurgical specimen was obtained from a 61-year-old female patient who underwent deep tumour resection (glioblastoma) from the frontal lobe at a local hospital. The patient provided informed consent and experimental procedures were approved by the hospital institute review board before commencing the study. Post hoc analysis revealed that the neocortical tissue obtained from this patient was from a premotor region near the confluence of the superior frontal gyrus and the precentral gyrus (Fig. 6g). Betz cells are enriched in the primary motor cortex, but they are also present in premotor cortex (area 6; refs. 14,80,81; Allen Institute Human Brain Reference Atlas). These neurons have several histological hallmarks of Betz cells (including gigantocellular somata, horizontally emanating dendrites and abundant rough endoplasmic reticulum14). In addition, as can be seen in the biocytin images in Fig. 6, the recorded neurons possessed large somata with many perisomatic dendrites. Additional histological hallmarks of Betz cells cannot be assessed in biocytin-filled neurons.

All procedures involving macaques and mice were approved by the IACUC at either the University of Washington or the Allen Institute for Brain Science. Macaque M1 tissue was obtained from male (n = 4) and female (n = 5) animals (mean age = 10 ± 2.21 years) designated for euthanasia from the University of Washington National Primate Resource Center, under a protocol approved by the University of Washington UACUC. Mouse M1 tissue was obtained from 4–12-week-old male and female mice from the following transgenic lines: Thy1h–eyfp (B6.Cg-Tg(Thy1–YFP)-HJrs/J; RRID IMSR_JAX:003782); Etv1–egfp (Tg(Etv1–EGFP)BZ192Gsat/Mmucd; RRID MMRRC_011152-UCD) (animals maintained on an outbred Charles River Swiss Webster background (Crl:CFW(SW; RRID IMSR_CRL:024)); and C57BL/6-Tg (Pvalb–tdTomato)15Gfng/J; RRID IMSR_JAX:027395). Mice were provided food and water ad libitum and were maintained on a regular 12-h day/night cycle with no more than five adult animals per cage.

#### Preparation of brain slices

Brain slices were prepared in a similar way for Pvalb–TdTomato mice and macaque and human samples. Upon resection, human neurosurgical tissue was immediately placed in a chilled and oxygenated solution formulated to prevent excitotoxicity and preserve neural function82. This artificial cerebrospinal fluid (NMDG aCSF) consisted of (in mM): 92 N-methyl-d-glucamine (NMDG), 2.5 KCl, 1.25 NaH2PO4, 30 NaHCO3, 20 4-(2-hydroxyethyl)-1-piperazineethanesulfonic acid (HEPES), 25 glucose, 2 thiourea, 5 sodium ascorbate, 3 sodium pyruvate, 0.5 CaCl2·4H2O and 10 MgSO4·7H2O. The pH of the NMDG aCSF was titrated to 7.3–7.4 with concentrated hydrochloric acid, and the osmolality was 300–305 mOsmoles per kilogram. The solution was prechilled to 2–4 °C and thoroughly bubbled with carbogen (95% O2/5% CO2) before collection. Macaques were anaesthetized with sevoflurane gas, during which the entire cerebrum was extracted and placed in the protective solution described above. After extraction, macaques were euthanized with sodium-pentobarbital. We dissected the trunk/limb area of the primary motor cortex to prepare brain slices. Pvalb–TdTomato mice were deeply anaesthetized by intraperitoneal administration of Avertin (20 mg kg−1) and were perfused through the heart with NMDG aCSF (bubbled with carbogen).

Brains were sliced at 300-μm thickness on a vibratome using the NMDG protective recovery method and a zirconium ceramic blade83,84. Mouse brains were sectioned coronally, and human and macaque brains were sectioned such that the angle of slicing was perpendicular to the pial surface. After sections were obtained, slices were transferred to a warmed (32–34 °C) initial recovery chamber filled with NMDG aCSF under constant carbogenation. After 12 min, slices were transferred to a chamber containing an aCSF solution consisting of (in mM): 92 NaCl, 2.5 KCl, 1.25 NaH2PO4, 30 NaHCO3, 20 HEPES, 25 glucose, 2 thiourea, 5 sodium ascorbate, 3 sodium pyruvate, 2 CaCl2·4H2O and 2 MgSO4·7H2O, continuously bubbled with 95% O2/5% CO2. Slices were held in this chamber for use in acute recordings or transferred to a six-well plate for long-term culture and viral transduction. Cultured slices were placed on membrane inserts and wells were filled with culture medium consisting of 8.4 g l−1 MEM Eagle medium, 20% heat-inactivated horse serum, 30 mM HEPES, 13 mM d-glucose, 15 mM NaHCO3, 1 mM ascorbic acid, 2 mM MgSO4·7H2O, 1 mM CaCl2.4H2O, 0.5 mM GlutaMAX-I and 1 mg l−1 insulin83. The slice culture medium was carefully adjusted to pH 7.2–7.3, an osmolality of 300–310 mOsmoles per kilogram by addition of pure H2O, sterile-filtered and stored at 4 °C for up to 2 weeks. Culture plates were placed in a humidified 5% CO2 incubator at 35 °C, and the slice culture medium was replaced every two to three days until endpoint analysis. One to three hours after brain slices were plated on cell culture inserts, brain slices were infected by direct application of concentrated AAV viral particles over the slice surface80.

For mouse M1, the extratelencephalic-specific Thy1–YFP-H41,84 and intratelencephalic-specific Etv1–EGFP85 lines preferentially labelled physiologically defined extratelencephalic and non-extratelencephalic neurons, respectively (Fig. 6h, i). Thy1 and Etv1 mice were deeply anaesthetized by intraperitoneal administration of a ketamine (130 mg kg−1) and xylazine (8.8 mg kg−1) mix and were perfused through the heart with chilled (2–4 °C) sodium-free aCSF consisting of (in mM): 210 sucrose, 7 d-glucose, 25 NaHCO3, 2.5 KCl, 1.25 NaH2PO4, 7 MgCl2, 0.5 CaCl2,1.3 sodium ascorbate and 3 sodium pyruvate, bubbled with carbogen (95% O2/5% CO2). Near-coronal slices, 300-μm thick, were generated using a Leica vibratome (VT1200) in the same sodium-free aCSF, and were transferred to warmed (35 °C) holding solution (in mM): 125 NaCl, 2.5 KCl, 1.25 NaH2PO4, 26 NaHCO3, 2 CaCl2, 2 MgCl2, 17 dextrose and 1.3 sodium pyruvate, bubbled with carbogen (95% O2/5% CO2). After 30 min of recovery, the chamber holding the slices was allowed to cool to room temperature.

#### Patch-clamp electrophysiology

Macaque, human and Pvalb–TdTomato mouse brain slices were placed in a submerged, heated (32–34 °C) recording chamber that was continually perfused (at a rate of 3–4 ml min−1) with aCSF under constant carbogenation and containing (in mM) 1): 119 NaCl, 2.5 KCl, 1.25 NaH2PO4, 24 NaHCO3, 12.5 glucose, 2 CaCl2·4H2O and 2 MgSO4·7H2O (pH 7.3–7.4). Slices were viewed with an Olympus BX51WI microscope using infrared differential interference contrast (IR-DIC) optics and a 40× water-immersion objective. The infragranular layers of macaque primary motor cortex and human premotor cortex are heavily myelinated, which makes visualization of neurons under IR-DIC almost impossible. To overcome this challenge, we labelled neurons using various viral constructs in organotypic slice cultures (Extended Data Fig. 12a). We were unable to use some classic histological markers of Betz cells (prominent rough endoplasmic reticulum, conspicuous nucleolus, intensity of anti-Nissl staining) for selection of neurons during patch-clamp experiments. Thus, we used the size of soma (greater than 40 μm in height or width) as the primary criterion, because somatic volume and/or height/width reasonably separates Betz cells from other pyramidal neurons14,86,87. Occasionally in the fluorescent image we observed additional hallmarks of Betz cells, namely large tap-root dendrites88,89 and horizontal dendrites emanating directly from the somatic compartment. In many of these neurons, substantial lipofuscin could be observed. Finally, the size of the biocytin-filled neuron in the example (Fig. 6) is at the upper end of the range in corticospinal neurons in macaque area 4 (20–60 μm)90. The conservative size criterion resulted in soma sizes that are consistent with the more than threefold enhancement of the volume in Betz cells compared with other pyramidal neurons, and match the size range of these neurons in adult macaques86,87.

Patch pipettes (2–6 MΩ) were filled with an internal solution containing (in mM): 110.0 potassium gluconate, 10.0 HEPES, 0.2 EGTA, 4 KCl, 0.3 sodium GTP, 10 phosphocreatine disodium salt hydrate, 1 Mg-ATP, 20 μg ml−1 glycogen, 0.5 U μl−1 RNase inhibitor (Takara, catalogue number 2313A) and 0.5% biocytin (Sigma, B4261), pH 7.3. Fluorescently labelled neurons from Thy1 or Etv1 mice were visualized through a 40× objective using either Dodt contrast with a CCD camera (Hamamatsu) and/or a two-photon imaging/uncaging system from Prairie (Bruker) Technologies. Recordings were made in aCSF containing (in mM): 125 NaCl, 3.0 KCl, 1.25 NaH2PO4, 26 NaHCO3, 2 CaCl2, 1 MgCl2, 17 dextrose and 1.3 sodium pyruvate bubbled with carbogen (95% O2/5% CO2) at 32–35 °C, with synaptic inhibition blocked using 100 μM picrotoxin. Sylgard-coated patch pipettes (3–6 MΩ) were filled with an internal solution containing (in mM): 135 potassium gluconate, 12 KCl, 11 HEPES, 4 MgATP, 0.3 NaGTP, 7 potassium phosphocreatine and 4 sodium phophocreatine (pH 7.42 with KOH) with neurobiotin (0.1–0.2%), Alexa 594 (40 μM) and Oregon Green BAPTA 6F (100 μM).

Whole-cell somatic recordings were acquired using either a Multiclamp 700B amplifier or an AxoClamp 2B amplifier (Molecular Devices), and were digitized using an ITC-18 (HEKA). Data-acquisition software was either MIES (https://github.com/AllenInstitute/MIES/; RRID SCR_016443) or custom software written in Igor Pro. Electrical signals were digitized at 20–50 kHz and filtered at 2–10 kHz. Upon attaining whole-cell current-clamp mode, the pipette capacitance was compensated and the bridge was balanced. Access resistance was monitored throughout the recording and was 8–25 MΩ.

#### Data analysis

Data were analysed using custom analysis software written in Igor Pro (RRID SCR_000325). All measurements were made at resting membrane potential. The input resistance (RN) was measured from a series of 1-s hyperpolarizing steps from −150 pA to +50 pA in +20 pA increments. For neurons with low input resistance (for example, the Betz cells), this current-injection series was scaled by four times or more. The input resistance was calculated from the linear portion of the current/steady-state-voltage relationship generated in response to these current injections. The resonance (fR) was determined from the voltage response to a constant-amplitude sinusoidal current injection (Chirp stimulus). The chirp stimulus increased in frequency either linearly from 1–20 Hz over 20 s, or logarithmically from 0.2–40 Hz over 20 s. The amplitude of the chirp stimulus was adjusted in each cell to produce a peak-to-peak voltage deflection of roughly 10 mV. The impedance amplitude profile (ZAP) was constructed from the ratio of the fast Fourier transform of the voltage response to the fast Fourier transform of the current injection. ZAPs were produced by averaging at least three presentations of the chirp stimulus, and were smoothed using a running median smoothing function. The frequency corresponding to the peak impedance (Zmax) was defined as the resonant frequency. Spike input/output curves were constructed in response to 1-s current injections (50–500 pA in 50-pA steps). For a subset of experiments, this current-injection series was extended to 3 nA in 600-pA steps to probe the full dynamic range of low-RN neurons. Analysis of the acceleration of spike frequency was performed for current injections that produced roughly ten spikes during the 1-s step. The acceleration ratio was defined as the ratio of the second to the last interspike interval. To examine the dynamics of spike timing over longer periods, we also measured spiking in response to current injections with 10-s steps, in which the amplitude of the current was adjusted to produce roughly five spikes in the first second. Properties of action potentials were measured for currents near rheobase. The threshold of action potentials was defined as the voltage at which the first derivative of the voltage response exceeded 20 V s−1. The width of action potentials was measured at half the amplitude between threshold and the peak voltage. The faster after-hyperpolarization was defined relative to threshold. We clustered mouse, macaque and human pyramidal neurons into two broad groups on the basis of their RN and fR values using Ward’s algorithm. Macaque and human extratelencephalilc neurons were grouped for physiological analysis because their intrinsic properties were not substantially different, and because there is evidence that Betz cells can be found in premotor cortex as well as in M180,81.

#### Biocytin histology

We used a horseradish peroxidase (HRP)-based reaction—with diaminobenzidine (DAB) as the chromogen—to visualize filled cells after electrophysiological recording, and DAPI staining to identify cortical layers as described91.

#### Microscopy

Mounted sections were imaged as described91. Briefly, operators captured images on an upright AxioImager Z2 microscope (Zeiss, Germany) equipped with an Axiocam 506 monochrome camera and 0.63× optivar. Two-dimensional tiled overview images were captured with a 20× objective lens (Zeiss Plan NEOFLUAR 20×/0.5) in brightfield transmission and fluorescence channels. Tiled image stacks of individual cells were acquired at higher resolution in the transmission channel only for the purpose of automated and manual reconstruction. Light was transmitted using an oil-immersion condenser (numerical aperture 1.4). High-resolution stacks were captured with a 63× objective lens (Zeiss Plan-Apochromat 63×/1.4 oil or Zeiss LD LCI Plan-Apochromat 63×/1.2 imm corr) at an interval of 0.28 μm (numerical aperture 1.4 NA; mouse specimens) or 0.44 μm (numerical aperture 1.2; human and non-human primate specimens) along the z-axis. Tiled images were stitched in ZEN 2012 SP2 software and exported as single-plane TIFF files.

#### Morphological reconstruction

Reconstructions of the dendrites and the full axon were generated based on a three-dimensional image stack that was run through a Vaa3D-based (v3.475) image processing and reconstruction pipeline as described91.

#### Production and transduction of viral vectors

Recombinant AAV vectors were produced by triple transfection of enhancer plasmids containing inverted terminal repeats (ITRs) along with AAV helper and rep/cap plasmids using the HEK 293T/17 cell line (ATCC, CRL-11268), followed by harvest, purification and concentration of the viral particles. The plasmid supplying the helper function is available from a commercial source (Cell Biolabs). The PHP.eB capsid variant was generated by V. Gradinaru at the California Institute of Technology92, and the DNA plasmid for AAV packaging is available from Addgene (RRID Addgene_103005). Quality control of the packaged AAV was determined by viral titring to determine that an adequate concentration was achieved (more than 5 × 1012 viral genomes per millilitre), and by sequencing the AAV genome to confirm the identity of the viral vector that was packaged. Human and NHP L5 extratelencephalic neurons, including Betz cells, were targeted to cultured slices by transducing the slices with viral vectors that either generically label neurons (AAV–hSyn1–tdTomato), or that enrich for L5 extratelencephalic neurons by expressing reporter transgene under the control of the msCRE4 enhancer79.

#### Processing of patch–seq samples

For a subset of experiments, the nucleus was extracted at the end of the recording and processed for RNA-seq. Before collecting data for these experiments, we thoroughly cleaned all surfaces with RNase Zap. The contents of the pipette were expelled into a PCR tube containing lysis buffer (Takara, 634894). cDNA libraries were produced using the SMART-Seq v4 ultra low input RNA kit for sequencing according to the manufacturer’s instructions. We performed reverse transcription and cDNA amplification for 20 PCR cycles. Sample proceeded through Nextera NT DNA library preparation using Nextera XT Index Kit V2 Set A (FC-131-2001).

#### Isolating of macaque nuclei, RNA-seq and clustering

Tissue was obtained from three macaque animals (aged 3–17 years, male and female; Extended Data Table 2) as above. As described for humans, M1 was isolated from thawed slabs using fluorescent Nissl staining (Neurotrace 500/525, ThermoFisher Scientific). Stained sections were screened for histological hallmarks of primary motor cortex, and L5 was dissected. Nuclei were isolated from the dissected layer; gene expression was quantified with 10× Chromium v3 using the Mmul_10 genome annotation; nuclei that passed quality-control criteria were clustered; and a taxonomy of glutamatergic types was defined. To identify which clusters in our three-species taxonomy aligned with macaque clusters from our L5 dissected Cv3 dataset, we carried out an identical integration workflow on glutamatergic neurons to that used for the three-species integration. Macaque clusters were assigned subclass labels on the basis of their corresponding alignment with subclasses from the other species.

#### Mapping of samples to reference taxonomies

To identify which cell type a given patch–seq nuclei mapped to, we used our previously described nearest-centroid classifier2. Briefly, a centroid classifier was constructed for glutamatergic reference data (human SSv4 or macaque Cv3) using marker genes for each cluster. Patch–seq nuclei were then mapped to the appropriate species reference 100 times, using 80% of randomly sampled marker genes during each iteration. Probabilities for each nucleus mapping to each cluster were computed over the 100 iterations, resulting in a confidence score ranging from 0 to 100. We identified four human patch–seq nuclei that mapped with greater than 85% confidence, and four macaque nuclei that mapped with greater than 93% confidence, to a cluster in the L5 extratelencephalic subclass.

### Reporting summary

Further information on research design is available in the Nature Research Reporting Summary linked to this paper.