A multimodal cell census and atlas of the mammalian primary motor cortex

Here we report the generation of a multimodal cell census and atlas of the mammalian primary motor cortex as the initial product of the BRAIN Initiative Cell Census Network (BICCN). This was achieved by coordinated large-scale analyses of single-cell transcriptomes, chromatin accessibility, DNA methylomes, spatially resolved single-cell transcriptomes, morphological and electrophysiological properties and cellular resolution input–output mapping, integrated through cross-modal computational analysis. Our results advance the collective knowledge and understanding of brain cell-type organization1–5. First, our study reveals a unified molecular genetic landscape of cortical cell types that integrates their transcriptome, open chromatin and DNA methylation maps. Second, cross-species analysis achieves a consensus taxonomy of transcriptomic types and their hierarchical organization that is conserved from mouse to marmoset and human. Third, in situ single-cell transcriptomics provides a spatially resolved cell-type atlas of the motor cortex. Fourth, cross-modal analysis provides compelling evidence for the transcriptomic, epigenomic and gene regulatory basis of neuronal phenotypes such as their physiological and anatomical properties, demonstrating the biological validity and genomic underpinning of neuron types. We further present an extensive genetic toolset for targeting glutamatergic neuron types towards linking their molecular and developmental identity to their circuit function. Together, our results establish a unifying and mechanistic framework of neuronal cell-type organization that integrates multi-layered molecular genetic and spatial information with multi-faceted phenotypic properties.

Unique among body organs, the human brain is a vast network of information processing units, comprising billions of neurons interconnected through trillions of synapses. Diverse neuronal and non-neuronal cells display a wide range of molecular, anatomical, and physiological properties that together shape the network dynamics and computations underlying mental activities and behaviour. Brain networks self-assemble during development, leveraging genomic information shaped by evolution to build a set of stereotyped network scaffolds that are largely identical among individuals; life experiences then customize neural circuits in each individual. An essential step towards understanding the architecture, development, function and diseases of the brain is to discover and map its constituent elements of neurons and other cell types.
The notion of a 'neuron type', with similar properties among its members, as the basic unit of brain circuits has been an important concept for over a century; however, rigorous and quantitative definitions have remained surprisingly elusive [1][2][3][4][5] . Neurons are remarkably complex and heterogeneous, both locally and in their long-range axonal projections, which can span the entire brain and connect to many target regions. Many conventional techniques analyse one neuron at a time, and often study only one or two cellular phenotypes in an incomplete way (for example, missing axonal arbours in distant targets). As a result, despite major advances in past decades, phenotypic analyses of neuron types have remained severely limited in resolution, robustness, comprehensiveness and throughput. Complexities in the relationship between different cellular phenotypes (multi-modal correspondence) have fuelled long-standing debates on neuronal classification 6 .
Single-cell genomics technologies provide unprecedented resolution and throughput to measure the transcriptomic and epigenomic profiles of individual cells and have rapidly influenced many areas of biology including neuroscience, promising to catalyse a transformation from phenotypic description and classification to a mechanistic and explanatory molecular genetic framework for the cellular basis of brain organization. The application of single-cell RNA sequencing (scRNA-seq) to the neocortex and other brain regions has revealed a complex but tractable hierarchical organization of transcriptomic cell types that are consistent overall with knowledge from decades of anatomical, physiological and developmental studies but with an unmatched level of granularity [7][8][9][10][11] . Similarly, single-cell DNA methylation and chromatin accessibility studies have begun to reveal cell-type-specific genome-wide epigenetic landscapes and gene regulatory networks in the brain [12][13][14][15] . Notably, the scalability and high information content of these methods enable comprehensive quantitative analysis and classification of all cell types, which are readily applicable Nature | Vol 598 | 7 October 2021 | 87 to brain tissues across species and provide a quantitative means of comparative analysis 16,17 .
Other recent technological advances provide the resolution and throughput to analyse whole-brain neuronal morphology and comprehensive projection mapping 18,19 . Imaging-based single-cell transcriptomics and its combination with functional imaging, and integration of electrophysiology and single-cell sequencing, enable mapping of the spatial organization and key phenotypic properties of molecularly defined cell types [20][21][22][23][24] . Finally, molecular classification of cell types enables genetic access to specific cell types using transgenic mice [25][26][27] and, more recently, enhancer-based viral vectors [28][29][30][31][32] . All of these methods have been applied to brain tissues in independent studies, but not yet in a coordinated fashion to establish how different modalities correspond with one another, and whether a molecular genetic framework is explanatory for other functionally important cellular phenotypes.
The overarching goal of the BRAIN Initiative Cell Census Network (BICCN) is to leverage these technologies to generate an open-access reference brain cell atlas that integrates molecular, spatial, morphological, connectional and functional data for describing cell types in mouse, human and non-human primate 33 . A key concept is the Brain Cell Census, similar conceptually to a population census, that defines the constituent neuronal and non-neuronal cell types and their proportions, spatial distributions and defining phenotypic characteristics. This cell-type classification, organized as a taxonomy, should aim for consensus across modalities and across mammalian species for conserved types. Beyond the cell census, a Brain Cell Atlas would be embedded in a 3D common coordinate framework (CCF) of the brain 34 , in which the precise location and distribution of all cell types and their multi-modal features are registered and displayed. This spatial framework facilitates integration, interpretation and navigation of various types of information for understanding brain network organization and function.
Here we present the cell census and atlas of cell types in the primary motor cortex (MOp, referred to as M1 in primates) of mouse, marmoset and human (Extended Data Fig. 1, Extended Data Table 1). MOp is important in the control of complex movement and is well conserved across species, with a rich history of anatomical, physiological and functional studies to aid interpretation of this cell-type information 35,36 . We describe a synthesis of eleven companion studies through a coordinated multi-laboratory effort. In these studies, we derive a cross-species consensus molecular taxonomy of cell types using scRNA-seq or single-nucleus RNA sequencing (snRNA-seq), DNA methylation and chromatin accessibility data [37][38][39][40] . In mouse, we map the spatial cellular organization by multiplexed error-robust fluorescence in situ hybridization (MERFISH) 41 , characterize morphological and electrophysiological properties by multimodal profiling using patch clamp recording, biocytin staining and scRNA-seq (Patch-seq) 42,43 , describe the cellular input-output wiring diagrams by anterograde and retrograde tracing 44 , identify glutamatergic neuron axon projection patterns by Epi-retro-seq 45 , Retro-MERFISH 41 and single-neuron complete morphology reconstruction 46 , and describe transgenic driver lines targeting glutamatergic cell types on the basis of marker genes and lineages 47 . Finally, we integrate this information into a cohesive description of cell types in MOp. These datasets are organized by the BRAIN Cell Data Center (BCDC) and made public through the BICCN web portal (https://www.biccn.org). Key concepts and terms are described in Extended Data Table 2, including anatomical terms for input and output brain regions for MOp, and hierarchical cell class, subclass and type definitions. Major findings: • Combined single-cell transcriptomic and epigenomic analysis reveals a unified molecular genetic landscape of cortical cell types that integrates gene expression, chromatin state and DNA methylation. • A combination of single-cell 'omics, MERFISH-based spatially resolved single-cell transcriptomics and Patch-seq generates a census of cell types, including their proportions and spatial distribution across cortical layers and sublayers. • Comparative analysis of mouse, marmoset and human transcriptomic types describes a conserved cross-species taxonomy of cortical cell types with hierarchical organization that reflects developmental origins; the transcriptional similarity of cell type granularity across species varies as a function of evolutionary distance. • We observed highly conserved transcriptomic and epigenomic signatures of cell identity across species, as well as a large set of species-enriched cell-type gene expression profiles that suggests a high degree of evolutionary specialization. • Correspondence among molecular, anatomical and physiological datasets reinforces the transcriptomic classification of neuronal subclasses and distinctive types, demonstrating their biological validity and genomic underpinnings, and also reveals continuously varying properties along these axes for some neuronal subclasses and types. • Anatomical studies yield a cellular-resolution wiring diagram of mouse MOp anchored on major transcriptome-defined projection types, including input-output connectivity at the subpopulation level and output pathways at a genetically defined single-cell level. • Long-range axon projection patterns of individual glutamatergic excitatory neurons exhibit a complex and diverse range of relationships with transcriptomic and epigenetic types (between one-to-one and many-to-many), suggesting another level of regulation in defining single-cell connectional specificity. • Cell-type transcriptional and epigenetic signatures guide the generation of genetic tools for targeting glutamatergic pyramidal neuron types and fate mapping their progenitor types. • Multi-site coordination within BICCN and data archives enabled a high degree of standardization, computational integration and creation of open data resources for community dissemination of data, tools and knowledge.
Article types were grouped into broader subclasses on the basis of shared developmental origins for GABAergic inhibitory neurons (that is, three caudal ganglionic eminence (CGE)-derived subclasses (Lamp5, Sncg and Vip) and three medial ganglionic eminence (MGE)-derived subclasses (Sst Chodl, Sst and Pvalb)), layer and projection pattern in mouse for glutamatergic excitatory neurons (that is, intratelencephalic (IT), extratelencephalic (ET), corticothalamic (CT), near-projecting (NP) and layer 6b (L6b)), and non-neuronal functional subclasses (for example, oligodendrocytes and astrocytes) (Extended Data Table 2). Note that the layer 5 extratelencephalic (L5 ET) neurons have been called pyramidal tract (PT) or subcerebral projection neurons (SCPN) 48,49 ; here we use the name L5 ET to be more accurate across cortical areas and species (Methods). The resolution of this cross-species consensus taxonomy was lower than that derived from each species alone, owing to variation in gene expression across species. The degree of species alignments varied across consensus types (Fig. 1c); some types could be aligned one-to-one (for example, Lamp5_1 and L6 IT_3), whereas others aligned several-to-several (for example, Pvalb_1, L2/3 IT and L5 IT_1). This may reflect over-or under-clustering, limitations in aligning highly similar cell types, or species-specific expansion of cell-type diversity.
We expected that cell types from more recent common ancestors would share more similar gene expression profiles. Indeed, transcriptomic profiles of consensus cell types were more correlated between human and marmoset, and had 25-50% fewer differentially expressed genes than between primate and mouse (Fig. 1d, e). The one exception was the vascular leptomeningeal cell (VLMC) type, which had greater Spearman correlations of overall gene expression (Fig. 1d) between marmoset and mouse. However, this probably reflects that rare non-neuronal cells in human (n = 40 nuclei) were under-sampled compared with marmoset (n = 463) and mouse (n = 2,329), and average expression was not adequately estimated 38 .
Glutamatergic subclasses expressed 50-450 marker genes and, unexpectedly, the majority of markers were species-enriched ( Fig. 1f, g). This evolutionary divergence of marker gene expression may reflect species adaptations or relaxed constraints on genes that can be substituted with others for related cellular functions. Glutamatergic subclasses also had a core set of 5-65 markers that were conserved across all three species (Fig. 1g); these genes are candidates for conserved cell identity and function, and are useful for consistent labelling across species. GABAergic subclasses expressed 50-325 markers in each species, and 18-55 markers were conserved. At a finer level, GABAergic consensus types also expressed conserved markers with similar expression levels across species and relatively type-specific expression (Fig. 1h). Some marker genes also showed evidence for cell-type-specific enhancers located in regions of open chromatin and DNA hypomethylation in both human and mouse (Extended Data Fig. 2b, c).

Spatially resolved cell atlas of mouse MOp
We used MERFISH, a single-cell transcriptome imaging method 50,51 , to identify cell types in situ and map their spatial organization. We selected a panel of 258 genes (254 of which passed quality control) on the basis of prior knowledge of marker genes for major cortical cell types and genes identified using sc/snRNA-seq data, and we imaged approximately 300,000 individual cells across MOp and adjacent areas 41 . Clustering analysis of the MERFISH-derived single-cell expression profiles resulted in a total of 95 cell clusters in MOp (42 GABAergic, 39 glutamatergic and 14 non-neuronal) (Fig. 2a), which showed excellent, essentially one-to-one correspondence to the consensus sc/snRNA-seq taxonomy at subclass level (for example, glutamatergic IT, ET, NP, CT and L6b subclasses, and GABAergic Lamp5, Sncg, Vip, Sst and Pvalb subclasses) and good correspondence at cluster level 41 .
Spatial distribution of the MERFISH clusters showed a complex, laminar pattern in MOp (Fig. 2b). Many glutamatergic clusters showed narrow distributions along cortical depth that subdivided individual cortical layers, although frequently without discrete boundaries 41 . Notably, IT cells, the largest branch of neurons in the MOp, formed a largely continuous gradient of cells with correlated gradual changes between their expression profiles and their cortical depths 41 (Fig. 2c). Many GABAergic clusters also showed laminar distribution, preferentially residing within one or two layers 41 . Among the non-neuronal cell clusters, VLMCs formed the outermost layer of cells of the cortex, whereas mature oligodendrocytes and some astrocytes were enriched in white matter. Other subclasses of non-neuronal cells were largely dispersed across all layers. MERFISH analysis also revealed interesting spatial distribution of cell types along the medial-lateral and anteriorposterior axes 41 . Overall, the neuronal and non-neuronal cell clusters in MOp form a complex spatial organization refining traditionally defined cortical layers.
Integration of retrograde tracing with MERFISH (Retro-MERFISH) identified projection targets of different neuron types in the MOp 41 (Fig. 2d). Retrograde tracers were injected into secondary motor cortex (MOs), primary somatosensory cortex (SSp), and temporal association (TEa) and neighbouring ectorhinal (ECT) and perirhinal (PERI) areas, and retrograde labels were imaged together with the MERFISH gene panel in the MOp (approximately 190,000 cells were imaged). Each of the three target regions received inputs from multiple cell clusters in 15    Article the MOp, primarily from IT cells; each IT cluster projected to multiple regions, with each region receiving input from a different composition of IT clusters 41 (Fig. 2d). Overall, projections of MOp neurons do not follow a simple 'one cell type to one target region' pattern, but rather form a complex multiple-to-multiple network.

Multimodal analysis of cell types with Patch-seq
We used Patch-seq to characterize the electrophysiological and morphological phenotypes and laminar location of t-types. We patched more than 1,300 neurons in MOp of adult mice, recorded their electrophysiological responses to a set of current steps, filled them with biocytin to recover their morphologies (around 50% of cells) and obtained their transcriptomes using Smart-seq2 sequencing 42 . We mapped these cells to the mouse MOp transcriptomic taxonomy 37 (Fig. 1a). Cells were assigned to 77 t-types (Fig. 3a), thereby characterizing the morpho-electric phenotypes of most glutamatergic and GABAergic t-types (examples in Fig. 3b, c).
We found that morpho-electric phenotypes were largely determined by transcriptomic subclasses, with different subclasses having distinct phenotypes. For example, Sst interneurons were often characterized by large membrane time constants, pronounced hyperpolarization sag, and rebound firing after stimulation offset. However, within each subclass, there was substantial variation in morpho-electric properties between t-types. This variation was not random but organized such that transcriptomically similar t-types had more similar morpho-electric properties than distant t-types. For example, excitatory t-types from the IT subclasses with more similar transcriptomes were also located at adjacent cortical depths, suggesting that distance in t-space co-varied with anatomical distance 42 , even within a layer (Fig. 3g), in line with the above MERFISH results (Fig. 2c). Similarly, electrophysiological properties of Sst interneurons varied continuously across the transcriptomic landscape 42 . Thus, within major transcriptomic subclasses, morpho-electric phenotypes and/or soma depth frequently varied smoothly across neighbouring t-types, indicating that transcriptomic neighbourhood relationships in many cases corresponded to similarities in other modalities.
At the level of single t-types, some t-types showed layer-adapting morphologies in different layers (Fig. 3e, f)    stimulation offset. Surprisingly few t-types were entirely homogeneous with regard to the measured morpho-electric properties (Fig. 3d). Patch-seq also enables direct comparison of the morpho-electric properties of homologous cell types across species. Here we analysed the gigantocellular Betz cells found in M1 of primates and large carnivores, which are predicted to be in the L5 ET subclass 38 , as are the mouse corticospinal-projecting L5 ET neurons. We first created a joint embedding of excitatory neurons in mouse, macaque and human, which showed strong homology across all three species for the L5 ET subclass (Fig. 3h). Patch-seq recordings were made from L5 neurons in acute and cultured slice preparations of mouse MOp and macaque M1. We also capitalized on a unique opportunity to record from neurosurgical tissue excised from human premotor cortex-which also contains Betz cells-during surgery to treat epilepsy. To enable visualization of cells in heavily myelinated macaque M1 and human premotor cortex, we used adeno-associated viruses (AAVs) to drive fluorophore expression in glutamatergic neurons in slice culture.
Patch-seq cells in each species that mapped to the L5 ET subclass (Fig. 3h) were all large L5 neurons that sent apical dendrites to the pial surface (Fig. 3i). Macaque and human L5 ET neurons were much larger, with hallmark Betz cell long 'taproot' basal dendrites 52 . Subthreshold membrane properties were relatively well conserved across species. For example, L5 ET neurons in all three species had low input resistances, although they were exceptionally low in macaque and human (Fig. 3j). Conversely, suprathreshold properties of macaque and human Betz ET neurons were highly specialized; they responded to prolonged suprathreshold current injections with biphasic firing in which a pause in firing early in the sweep was followed by a marked increase in firing later (Fig. 3k). Intriguingly, several genes encoding ion channels were enriched in macaque and human L5 ET neurons compared with mouse ( Fig. 3l), and may contribute to the distinctive primate suprathreshold properties. These results indicate that primate Betz cells are homologous to mouse thick-tufted L5 ET neurons, but display species specializations in their morphology, physiology and gene expression.

Multimodal correspondence by Epi-retro-seq
To understand molecular diversity among projection neurons, we developed Epi-retro-seq 45 -which combines retrograde tracing and epigenomic profiling-and applied it to mouse MOp neurons projecting to each of the eight selected brain regions receiving inputs from MOp (Fig. 4a). Th-target regions included two cortical areas, SSp and anterior cingulate area (ACA), and six subcortical areas, striatum (STR), thalamus (TH), superior colliculus (SC), ventral tegmental area and substantia nigra (VTA+SN), pons and medulla (MY).
We obtained methylomes for 2,115 MOp projection neurons. Co-clustering them with MOp neurons collected without enrichment of specific projections, we observed precise agreement among all major cell subclasses (Fig. 4b, c). We observed enrichment of cortico-cortical and cortico-striatal projecting neurons in IT subclasses (L2/3, L4, L5 IT, L6 IT and L6 IT Car3), and cortico-subcortical projecting neurons in L5 ET. Many cortico-thalamic projecting neurons were also observed in L6 CT (Extended Data Fig. 3a). Consistent with the specificity of retrograde labelling, quantitative comparisons with unbiased collection of neurons in MOp suggest at least 30-fold (IT) or 200-fold (ET) enrichment of neurons in the expected subclasses (Methods).
Enrichment of L5 ET neurons with Epi-retro-seq (40.2% versus 5.62% in unbiased profiling of MOp using snmC-seq2) enabled investigation of subtypes of L5 ET neurons known to project to multiple subcortical targets in TH, VTA+SN, pons and MY 48 . The 848 L5 ET neurons were segregated into 6 clusters (Fig. 4d, e). MY-projecting neurons showed clear enrichment for L5 ET cluster 0 (Fig. 4e, Extended Data Fig. 3b), in agreement with scRNA-seq data for anterolateral motor cortex (ALM), part of MOs 9,53 . We used gene body non-CG methylation (mCH) levels to integrate the L5 ET Epi-retro-seq cells with the ALM Retro-seq cells and observed enrichment of MY-projecting cells in the same cluster 45 .
The presence of mCH in gene bodies is strongly anti-correlated with gene expression in neurons, whereas promoter-distal differentially CG-methylated regions (CG-DMRs) are reliable markers of regulatory elements such as enhancers 12 . We identified 511 differentially CH-methylated genes (CH-DMGs) and 58,680 CG-DMRs across the L5 ET clusters (Fig. 4f). We also inferred transcription factors that GFP + NeuN + Inject AAVretro in a selected brain region of INTACT mice Slice brain and dissect MOp Computational analysis FACS and snmC-seq2 Article may contribute to defining the cell clusters by identifying enriched transcription factor-binding DNA sequence motifs within CG-DMRs (Fig. 4g). For example, Ascl1 is a transcription factor whose motif was significantly enriched in the MY-projecting cluster. In addition, 230 hypo-CH-DMGs were identified between the MY-projecting cluster and other projection neurons. One of the most differentially methylated genes is Ptprg (Extended Data Fig. 3c), which encodes the receptor tyrosine phosphatase-γ, which interacts with contactin proteins to mediate neural projection development 54 . Thus, these epigenomic mapping data for projection neurons facilitate the understanding of gene regulation in establishing neuronal identity and connectivity.

Cell-type-targeting tools
Genetic access to specific neural subpopulations and progenitors is necessary for multi-modal analyses to validate t-types, fate-map their developmental trajectories, and study their function in circuit operation 25 . Here we present a genetic toolkit for dissecting and fate-mapping glutamatergic pyramidal neuron (PyN) subpopulations largely on the basis of their developmental genetic programs. Along the lineage progression of neural progenitors during corticogenesis in the embryonic dorsal telencephalon, radial glial progenitors (RGs) generate PyNs either directly or indirectly through intermediate progenitors (IPs) 55 (Fig. 5a, b). Temporal expression of transcription factors gates sequential developmental decisions to shape hierarchically organized PyN subpopulations 47,56 . The LIM-homeodomain protein LHX2 and zinc-finger transcription factor FEZF2 act at multiple stages of neurogenesis 55,57 , and IPs specifically express the T-box transcription factor Tbr2 during indirect neurogenesis 58 . We generated temporally inducible Lhx2-CreER, Fezf2-CreER, Tbr2-CreER, Fezf2-Flp and Tbr2-FlpER driver lines (Fig. 5c) that faithfully recapitulate the spatiotemporal expression of these transcription factors and enable fate-mapping of associated RG and IP pools 47   gradient, consistent with their mRNA expression at this stage 59,60 . These RGs generated PyNs across all cortical layers, suggesting multipotency (Fig. 5d). We also generated 15 Cre and Flp driver lines targeting PyN subpopulations, including the CT, PT and IT subclasses, and subpopulations within these subclasses (Fig. 5b, c). These driver lines precisely recapitulated endogenous expression patterns, highlighted here with three representative lines (Fig. 5e): L2/3 and L5a for IT-Plxnd1 (IT Plxnd1 ), L5b and L6 for ET-Fezf2 (ET Fezf2 ), L6 for CT-Tle4 (CT Tle4 ). Anterograde projection tracing in MOp of adult animals demonstrated that IT Plxnd1 projected to multiple ipsilateral and contralateral cortical areas and to STR/caudate putamen (CP); ET Fezf2 projected robustly to several ipsilateral cortical sites, CP and numerous subcortical targets including TH, MY and corticospinal tract; CT Tle4 projected specifically to a set of thalamic nuclei 47 (Fig. 5f-h).
We further developed a combinatorial method to target PyN subtypes on the basis of their lineage, birth order and anatomical features. For example, the PyN PlxnD1 population localizes to L5a, L3 and L2 and projects to many ipsilateral and contralateral cortical and striatal targets 47 (Fig. 5e, h). Based on the knowledge that most IT PyNs are generated from IPs 61 , we generated PlxnD1-Flp;Tbr2-CreER;Ai65 compound mice in which the inducible Tbr2-CreER allele was used to birth date IT PlxnD1 . Tamoxifen induction at E13.5 and 17.5 selectively labelled L5a and L2 IT PlxnD1 , respectively, across cortical areas (Fig. 5e). To reveal their projection patterns, we bred the PlxnD1-Flp;Tbr2-CreER;dual-tTA mice for tTA-dependent viral tracing in MOp. We found that E13.5-born IT PlxnD1(E13.5) neurons resided in L5a and projected ipsilaterally to multiple cortical areas, contralaterally to homotypic and heterotypic areas, and bilaterally to CP (Fig. 5h). By contrast, E17.5-born IT PlxnD1(E17.5) neurons resided in L2; although they also projected to ipsilateral cortical and striatal targets, and to homotypic contralateral cortex, they extended minimal projections to heterotypic contralateral cortex and CP (Fig. 5i). Together, this set of PyN driver lines provides much-improved specificity, robustness, reliability and coverage, and demonstrates feasibility to target highly specific PyN subtypes.

MOp input-output wiring diagram
A comprehensive cellular resolution input-output MOp wiring diagram was generated by combining classic tracers, genetic viral labelling in Cre driver lines and single-neuron reconstructions with high-resolution, brain-wide imaging, precise 3D registration to CCF and computational analyses 44 .
We first systematically characterized the global inputs and outputs of MOp upper limb (MOp-ul) region using classic anterograde (Phaseolus vulgaris leucoagglutinin (PHAL)) and retrograde (cholera toxin b (CTb)) tract tracing 44 (Fig. 6a). MOp-ul projects to more than 110 grey matter regions and spinal cord, and around 60 structures in the cerebral cortex and TH project back to MOp-ul.
We generated a fine-grained areal and laminar distribution map of multiple MOp-ul projection neuron populations using retrograde tracing 44 (Extended Data Fig. 4a). In parallel with these tracer-labelled, projection-and layer-defined cell populations, we characterized the distribution patterns in MOp-ul of neuronal populations labelled in 28 Cre driver lines, including those from different IT (for example, Cux2, Plxnd1 and Tlx3 driver lines), ET (Rbp4, Sim1 and Fezf2) and CT (Ntsr1 and Tle4) subclasses with distinct laminar distributions 47,62 .
Viral tracers were used to systematically examine MOp-ul cell subclass-specific inputs and outputs 44 (Extended Data Fig. 4b).
Neurons projecting to Cre-defined starter cells were labelled using trans-synaptic rabies viral tracers. Projections from MOp were labelled following AAV-GFP injection into wild-type mice, revealing patterns consistent with PHAL tracing (Fig. 6a). Projections from L2/3 IT, L4 IT, L5 IT, L5 ET and L6 CT cells were mapped following injections of Cre-dependent viral tracers into Cre lines selective for these laminar and projection cell subclasses 63 . Most Cre line anterograde tracing experiments revealed a component of the overall output pathway. This result is consistent with labelling from retrograde injections in various thalamic nuclei (posterior complex (PO), ventral anterior-lateral complex (VAL) and ventral medial nucleus (VM)) and cortical areas such as MOs and SSp.
We systematically characterized axonal projections of more than 300 single MOp excitatory neurons, by combining sparse labelling, high-resolution whole-brain imaging, complete axonal reconstruction and quantitative analysis 44,46 , augmented with publicly available single-cell reconstructions from the Janelia Mouselight project 18 . Additional analysis was also conducted using BARseq 44,64 . This analysis revealed a rich diversity of projection patterns within the IT, ET and CT subclasses (Fig. 6b). Individual L6 neurons display several distinct axonal arborization targets that likely contribute to the composite subpopulation output described for the Ntsr1 and Tle4 diver lines. Individual IT cells across L2-L6 also generate richly diverse axonal trajectories. Confirming and extending previous reports 53 , we characterized detailed axon projections of the MY-projecting and non-MY-projecting L5 ET neurons, revealing complex axon collaterals in TH and midbrain regions 44,46 .

Multimodal characterization of L4 IT neurons in MOp
Traditionally MOp has been considered an agranular cortical area, defined by the lack of a cytoarchitectonic layer 4, which usually contains spiny stellate or star pyramid excitatory neurons. However, previous studies have suggested that L4 neurons similar to those typically found in sensory cortical areas are also present in mouse MOp and macaque M1 65,66 . Here we present multimodal evidence to confirm the presence of L4-like neurons in mouse MOp and primate M1 (Fig. 7).
We performed a joint clustering (Methods) and uniform manifold approximation and projection (UMAP) embedding of all IT neurons (excluding the highly distinct L6 IT Car3 cells) from 11 mouse molecular datasets, including 6 sc/snRNA-seq datasets, and the snmC-seq2, snATAC-seq, Epi-retro-seq, MERFISH and Patch-seq data (Fig. 7a). This resulted in five joint clusters, mostly along a continuous variation axis from L2/3 to L4/5 to L5 to L6 in line with the above MERFISH and Patch-seq results. The joint clustering enabled linkage of the cells independently profiled by each individual modality and cross-correlation of these disparate properties. Consequently, we identified epigenomic peaks linked to cluster-specific marker genes-Cux2 for L2/3 IT and L4/5 IT (1), Rspo1 for L4/5 IT (1), Htr2c for L4/5 IT (2-3), and Rorb for L4/5 IT and L5 IT (Fig. 7b, cluster names from SCF). MERFISH data showed that L4/5 IT and L5 IT cells occupied distinct layers in MOp, and the L4/5 IT type expressed Rspo1 (Fig. 7c), a L4 cell-type marker in sensory cortical areas identified in previous studies 9 . There are fewer Rspo1 + L4 cells in MOp than in the neighbouring SSp. Transcriptomic IT types from mouse corresponded well with those from human and marmoset at subclass level, whereas substantial ambiguities existed at cluster level (Fig. 7d), probably owing to the gene expression variation between rodents and primates (Fig. 1).
We further compared the L4 cells in mouse MOp with those from mouse primary visual cortex (VISp) 9 after co-clustering all the SMART-seq glutamatergic transcriptomes from both regions (Fig. 7e). In UMAP, L4/5 IT cells in MOp occupied a subspace of the L4 IT co-cluster defined by the intersection of marker genes Cux2 and Rorb, suggesting that L4 cells in MOp are similar to a subset of L4 cells in VISp, while the L4 cells in VISp have additional diversity and specificity.
L4 IT cells in MOp also exhibited morphological features characteristic of traditionally defined L4 excitatory neurons. In Patch-seq 42 , cells from the L4/5 IT_1 type had no or minimal apical dendrites without tufts in L1, in contrast to cells from the L2/3 IT, L4/5 IT_2 and L5 IT types, which had tufted apical dendrites (Fig. 7f). We obtained complete morphological reconstructions of excitatory neurons with their somas located in L2, L3 or L4 in MOp or MOs from fMOST imaging of Cux2-CreERT2;Ai166 mice 46  Article L4 (between L2/3 and L5) exhibited two local morphological features typical of L4 neurons from sensory cortices (Fig. 7g). First, the dendrites of the L4 neurons were simple and untufted, whereas those of the L2/3 neurons all had extensive tufts. Second, the local axons of L4 neurons mostly projected upward into L2/3 in addition to collateral projections, whereas the local axons of L2/3 neurons had axon branches projecting downward into L5. These local projection patterns are consistent with the canonical feedforward pathways within a cortical column observed in somatosensory and visual cortices, with the first feedforward step from L4 to L2/3 and the second feedforward step from L2/3 to L5 67 . We also found that the MOp or MOs L4 neurons had intracortical long-range projections similar to the L2/3 neurons 46 (Fig. 6b).

Multimodal characterization of L5 ET neurons in MOp
Previous studies showed that in mouse ALM, L5 ET neurons have two transcriptomically distinct projection types that may be involved in different motor control functions: the TH-projecting type in movement planning and the MY-projecting type in movement initiation 53 . Here we demonstrate that L5 ET neurons in mouse MOp also have MY-projecting and non-MY-projecting types, with distinct gene markers, epigenomic elements, laminar distribution, genetic targeting tools and corresponding types in human and marmoset.
We identified multiple full-morphology reconstructions of MOp L5 ET neurons from fMOST imaging of Fezf2-CreER;Ai166 and Pvalb-T2A-CreERT2;Ai166 transgenic mice, which were clustered into MY-projecting and non-MY projecting morphological types but also exhibited extensive morphological and projectional variability among individual cells 46 (Fig. 8e), although this was not directly linked     Article to t-types. Both groups of cells had thick-tufted dendrites that were similar to each other (Fig. 8e), consistent with the Patch-seq study 42 .

E x c T H E M IS A R L 1 3 B E x c F E Z F 2 P D Z R N 4 E x c F E Z F 2 P IE Z O 2 E x c L 5 F E Z F 2 C S N 1 S 1 E x c L 3 − 5 F E Z F 2 L IN
We used CRISPR-Cas9 gene editing to generate transgenic mice in which Cre or Flp recombinase was targeted to Slco2a1 or Npnt, marker genes for the MY-projecting or non-MY-projecting L5 ET type, respectively (Fig. 8c). Cre-and Flp-dependent tdTomato reporter in Slco2a1-P2A-Cre;Ai14 and Npnt-P2A-FlpO;Ai65F mice labelled cortical L5 neurons, as well as vascular cells in Slco2a1 mice and L2/3 cells in Npnt mice (Fig. 8f). Slco2a1-labelled cells occupy a deeper sub-lamina of L5 than those targeted by Npnt, consistent with the MERFISH result (Fig. 8b). To test the projection specificity of labelled neurons, we injected AAV vectors encoding a Cre-or Flp-dependent EGFP reporter into L5 in the MOp of these mice. GFP-labelled axon terminals were found in MY of Slco2a1 but not Npnt mice, demonstrating cell-type specificity of these new driver lines (Fig. 8f).

An integrated synthesis of MOp cell types
As the conclusion of this series of studies from the BICCN, we present an overview and integrated synthesis of the multimodal census and atlas of cell types in the primary motor cortex of mouse, non-human primate and human (Fig. 9).
This integrated synthesis uses the mouse MOp consensus transcriptomic taxonomy 37 as the anchor (Fig. 9a) because it was derived from the largest datasets and was the reference taxonomy for nearly all the cross-modality and cross-species comparisons. This taxonomy has a hierarchical organization, with major divisions first between neural and non-neural cell types, then between neuronal and non-neuronal types within the neural branch, and finally between GABAergic and glutamatergic types within the neuronal branch.
Correspondence matrices show that the mouse MERFISH-based spatial transcriptomic taxonomy 41 , the transcriptomic-epigenomic integrated mouse molecular taxonomies using either SCF or LIGER 37 and the human and marmoset transcriptomic taxonomies 38 all aligned largely consistently with the mouse consensus transcriptomic taxonomy (Fig. 9e, Extended Data Fig. 5, Supplementary Table 1). The alignments are highly consistent at subclass level, but disagreements exist at individual-cluster level and increase with cross-species comparison (Fig. 9e), suggesting that differential variations exist in different data types and consistency, in particular that across species, may be more appropriately described at an intermediate level of granularity. We developed a standardized nomenclature system to track cell types described in different modalities (Supplementary Table 2).
Through integrative approaches such as Patch-seq 42 , Epi-retro-seq 45 and axon projection mapping 44,46 , we related many t-types or subclasses to cortical neuron types traditionally defined by electrophysiological, morphological and connectional properties (Fig. 9a, b, f), thus bridging the cell-type taxonomy with historical knowledge. We derived the relative proportion of each cell type in mouse MOp using either snRNA-seq or MERFISH data. The MERFISH data 41 also revealed the spatial distribution pattern of each cell type, showing that many glutamatergic or GABAergic neuron types adopt narrow distributions along the cortical-depth direction, often occupying predominantly a single layer or a sublayer, and related types (for example, the L2/3-6 IT excitatory types) display a largely gradual transition across cortical depth or layers (Fig. 9d).
Finally, we demonstrate the potential to elucidate gene regulatory mechanisms by discovering candidate CREs (cCREs) and master transcription factors specific to neuronal subclasses in the combined transcriptomic and epigenomic datasets (Fig. 9c). We found 7,245 distal (more than 1 kbp from the transcription start site) cCRE-gene pairs in MOp neurons that showed a positive correlation between accessibility at 6,280 cCREs and expression level of 2,490 putative target genes (Methods) 37,40 . We grouped these putative enhancers into modules based on accessibility across cell clusters (Extended Data Fig. 6) and identified a large number of enhancer-gene pairs for each subclass of neurons (Extended Data Fig. 5). Similarly, we identified transcription factors showing cell-type specificity supported by both RNA expression and DNA-binding motif enrichment in cell subclasses 37,39 (Methods) (Extended Data Fig. 7).

A cell census and atlas of primary motor cortex
Understanding the principles of brain circuit organization requires a detailed understanding of its basic components. The current effort combines a wide array of single-cell-based techniques to derive a robust and comprehensive molecular cell-type classification and census of the primary motor cortex of mouse, marmoset and human, coupled with a spatial atlas of cell types and an anatomical input-output wiring diagram in mouse. We demonstrate the robustness and validity of this classification through strong correlations across cellular phenotypes, and strong conservation across species. Together these data comprise a cell atlas of the primary motor cortex that encompasses a comprehensive reference catalogue of cell types, their proportions, spatial distributions, anatomical and physiological characteristics, and molecular genetic profiles, registered into a CCF. This cell atlas establishes a foundation for an integrative study of the architecture and function of cortical circuits akin to reference genomes for studying gene function and genome regulatory architecture. Furthermore, it provides a map of the genes that contribute to cellular phenotypes and their epigenetic regulation. These data resources and associated tools enabling genetic access for manipulative experimentation are publicly available. This body of work provides a roadmap for exploring cellular diversity and organization across brain regions, organ systems and species.

Principles of cortical cell-type organization
Substantiating previous studies 9,10 , our multimodal cross-species study of the primary motor cortex suggests that a general principle of cortical cell-type organization is its hierarchical relationship, whereby high-level classes linked by major branches comprise progressively finer subpopulations connected by minor branches. In this scheme, the higher-level classes and subclasses are categorically and concordantly distinct from each other across modalities, are conserved across species, and probably arise from different developmental programs, such as GABAergic neuron derivatives of different zones of the ganglionic eminences or the layer-selective glutamatergic neurons derived sequentially from progenitors of the cortical plate. At the lower branch levels (types or clusters), however, while certain cell types are highly distinct (for example, Pvalb chandelier cells), distinctions and boundaries among many other clusters can be ambiguous and vary among different modalities.
In this context, another important finding, consistent with and building on multiple other studies 9,11,68,69 , is the coexistence of discrete and continuous variations of cell features across modalities at the lower branch level. A compelling example is the continuous and concordant variation of transcriptomic, anatomical and physiological properties along cortical depth within multiple cell populations, including the glutamatergic L2/3-L6 IT and GABAergic Sst and Pvalb subclasses. Although some of the variations may result from technical factors, such as differences in the resolution of measurements across data modalities (with transcriptomics providing the highest granularity at present), a major source of these continuous variations may reflect true biology, supported by the coordinated variation across transcriptomic, spatial, morphological and physiological properties as shown by MERFISH and Patch-seq. Therefore, another emerging principle of cell type organization is the coexistence of discrete and continuous variations that underlie cell-type diversity.
Together, the principles of hierarchical organization comprising discrete classes and types as well as continuum within and across Nature | Vol 598 | 7 October 2021 | 99 subpopulations represent a more nuanced and biologically realistic description of cell-type landscape, with implications in cell classification and census. For example, the multimodal variations at finer granularity may preclude a fully discretized representation of cell types with consistency across cell phenotypes, and may explain some of the discrepancies in estimated numbers of cell types using different approaches. An intriguing question is whether continuous variations of cell features will increase further or become more discretized in the context of neural circuit operation, converging to a set of distinct functional elements from a more continuous cellular landscape. An example of this is regionalization. We identify a MOp-specific input-output wiring diagram-however, transcriptomic cell types are generally shared between MOp and its neighbouring cortical areas 11 . Region-specific connectivity patterns of similar molecular types may be a major factor defining the functional specificity of the primary motor cortex.

Perspectives on cell-type classification
Our findings have major implications for understanding the biological basis of cellular identity towards a more rigorous, quantitative and satisfying definition and classification of cell types. First and foremost, our discovery of the compelling correspondence across molecular genetic, anatomical and physiological features of hierarchically organized cell populations, reflecting developmental origins and mainly conserved across mammalian species, demonstrates the biological validity and genomic underpinning of major cell types. These findings establish a unifying and mechanistic framework of cell-type classification that integrates multi-layered molecular genetic information with multi-faceted phenotypic properties. Thus, single-cell transcriptomics and epigenomics can serve as powerful approaches for establishing a foundational framework of cell types, owing to not only their unparalleled scalability but also to their representation of the underlying molecular genetic programs rooted in development and evolution. Physiological, morphological and connectional characterizations assign functional attributes to cells; their concordance with molecular identities provides strong validation to the molecularly defined cell types, whereas their differential variations reveal additional, probably network-and activity-driven factors that contribute to further refinement of cell types.
While the higher levels of the hierarchy comprise ~around 25 subclasses (16 neuronal and 9 non-neuronal) that are identified with remarkable consistency across multiple species and experimental modalities, many finer levels of cell properties do not neatly segregate into discrete and consistent sets of cell types with perfect correspondence among data modalities. These include aspects of continuous distributions, species specializations and mismatches between molecular and anatomical phenotypes that may result from developmental events no longer represented in the adult. Different methods provide somewhat different granularity of clustering, and thus different numbers of putative cell types. For example, single-cell transcriptomics identifies around 100 clusters representing the terminal leaves of this hierarchically branched organization 37 . Looking ahead, it is important to note that at more refined levels, the number of cell types that can be distinguished will probably change with additional cellular features characterized at greater breadth and depth using new methods and approaches.
Overall, the landscape of cell types appears to be generated from a combination of specification through evolutionarily driven and developmentally regulated genetic mechanisms, and refinement of cellular identities through intercellular interactions within the network in which the cells are embedded. In this scenario, genetic mechanisms drive intrinsic or cell-autonomous determination of cell fate, as well as progressive temporal generation of cell types from common progenitor pools that explain global similarities and continuous features of cellular phenotypes reflecting developmental gradients. Network influences can drive further phenotypic refinement that may not be reflected in the adult genetic signature-for example for axonal projection and synaptic connectivity that may reflect transient or stochastic developmental events, region or circuit-specific and/or activity or plasticity-dependent modification to form and reshape functionally specific circuits. Future studies focusing on these mechanisms and testing of the ensuing hypotheses will enable a deeper understanding of the nature of variability among related cell types in the mammalian brain.

Cell-type conservation and divergence
Evolutionary conservation is strong evidence of functional significance. The demonstrated conservation of cell types from mouse, marmoset, macaque and human suggests that these conserved types have important roles in cortical circuitry and function in mammals and even more distantly related species. We also find that similarity of cell types varies as a function of evolutionary distance, with substantial species differences that represent either adaptive specialization or genetic drift. For the most part, species specializations tend to appear at the finer branches of the hierarchical taxonomy. This result is consistent with a recent hypothesis in which cell types are defined by common evolutionary descent and evolve independently, such that new cell types are generally derived from existing genetic programs and appear as specializations at the finer levels of the taxonomic tree 70 .
A surprising finding across all homologous cell types was the relatively high degree of divergence for genes with cell-type-specific expression in a given species. This observation provides a clear path to identify core conserved genes underlying the canonical identity and features of those cell types. Furthermore, it highlights the need to understand species adaptations superimposed on the conserved program, as many specific cellular phenotypes may vary across species including gene expression, epigenetic regulation, morphology, connectivity and physiological properties. As we illustrated in the Betz cells, there is clear homology across species in the L5 ET subclass, but variation in many measurable properties across species.

Linking model organisms to human biology and disease
Our findings have major implications for the consideration of model organisms to understand human brain function and disease. Despite major investments, animal models of neuropsychiatric disorders have often been characterized by 'loss of translation', fuelling heated debates about the utility of model organisms in the development of treatments for human diseases. Cell census information aligned across species will be highly valuable for making rational choices about the best models for each disease and therapeutic target. For example, the characterization of cell types and their properties shown in Fig. 9 can be used to infer the main characteristics of homologous cell types in humans and other mammalian species, which would be difficult to obtain otherwise. They can also reveal potential limitations of model organisms and the necessity to study human and other primates to understand the specific cell-type features that contribute to human brain function and diseases. This reductionist dissection of the cellular components provides a foundation for understanding the general principles of neural circuit organization and computation that underlie mental activities and brain disorders.

Future directions
The approach we took to generate a cell census and atlas through a systematic dissection of cell types opens up numerous avenues for future work. The MOp census and atlas provides a foundational platform for the broad neuroscience community to accumulate and integrate cell-type information across species. Classification of cell types based on their molecular, spatial and connectional properties in the adult sets the stage for developmental studies to understand the molecular genetic programs underlying cell-type specification, maturation and circuit assembly. The molecular genetic information promises to deliver tools for genetic access to many brain cell types via transgenic Article and enhancer virus strategies. A combination of single-cell transcriptomics and functional measurements may further elucidate the roles of distinct cell types in circuit computation during behaviour, bridging the gap between molecular and functional definition of cell types. The systematic, multi-modal strategy described here can be extended to the whole brain, and major efforts are underway in the BICCN to generate a brain-wide cell census and atlas in the mouse with increasing coverage of human and non-human primates.

Online content
Any methods, additional references, Nature Research reporting summaries, source data, extended data, supplementary information, acknowledgements, peer review information; details of author contributions and competing interests; and statements of data and code availability are available at https://doi.org/10.1038/s41586-021-03950-0.

Nomenclature of the L5 ET subclass of glutamatergic neurons
In this manuscript we have adopted a nomenclature for major subclasses of cortical glutamatergic excitatory neurons, which have long-range projections both within and outside of the cortex, following a long tradition of naming conventions that often classify neurons based on their projection targets. This nomenclature is based on our de novo transcriptomic taxonomy (Fig. 9) that organizes cell types hierarchically and validates the naming of the primary branches of glutamatergic neurons by their major long-range projection targets. At these levels, glutamatergic neurons are clearly divided into several subclasses, the cortico-cortical and cortico-striatal only projecting IT neurons that are distributed across nearly all layers (L2/3 IT, L4/5 IT, L5 IT, L6 IT and L6 IT Car3), the layer 5 neurons projecting to extratelencephalic targets (L5 ET), the CT-projecting neurons in layer 6 (L6 CT), the NP neurons found in layers 5 and 6, and the L6b neurons whose projection patterns remain largely unknown.
While the IT, CT, NP and L6b neurons have been consistently labelled as such in the field, the L5 ET neurons have not been named consistently in the literature, largely owing to their large variety of projection targets and other phenotypic features that vary depending on cortical areas and species. Here we use the term L5 ET (layer 5 extratelencephalic) to refer to this prominent and distinct subclass of neurons as a standard name that can be accurately used across cortical regions and across species, and we provide our rationale below.
It has long been appreciated that cortical layer 5 contains two distinct populations of neurons that can be distinguished, not only based on the presence or absence of projections to ET targets (ET and IT cells), but also based on their predominant soma locations, dendritic morphologies and intrinsic physiology 48 . Accordingly, various names incorporating these features have been adopted to refer to L5 ET versus L5 IT cells, such as L5b versus L5a, thick-tufted versus thin-tufted and burst-firing versus regular-firing. The most common term used to refer to L5 ET cells residing in motor cortical areas has been PT, which refers to neurons projecting to the pyramidal tract. As accurately stated in Wikipedia, "The pyramidal tracts include both the corticobulbar tract and the corticospinal tract. These are aggregations of efferent nerve fibers from the upper motor neurons that travel from the cerebral cortex and terminate either in the brainstem (corticobulbar) or spinal cord (corticospinal) and are involved in the control of motor functions of the body." Owing to the past wide use of the term PT, we do not take the decision to use L5 ET rather than PT lightly. However, in the face of multiple lines of evidence that have accumulated over the last several years 72,73 and prominently highlighted in this manuscript, it is now clear that PT represents only a subset of L5 ET cells and is thus unable to accurately encompass the entire L5 ET subclass. This realization is informed by comparisons across species and cortical areas, and by single-cell transcriptomics and descriptions of the projections of single neurons, as well as studies linking transcriptional clusters to projection targets.
As noted above, the overall transcriptomic relationships between cortical neurons are well-described by a hierarchical tree that closely matches developmental lineage relationships as neurons become progressively restricted in their adult fates 37,38 (Fig. 9). The cortical excitatory neurons are a major branch, distinct from inhibitory, glial and epithelial cells. Subsequent splitting of the excitatory neurons reveals several major excitatory neuron subclasses-IT, L5 ET, L6 CT, NP and L6b. These major subclasses are conserved across mammalian species 9,10 , as well as across all cortical areas as shown in mouse 11 . It is therefore clear that names are needed that both accurately incorporate and accurately distinguish between neurons in these subclasses, and which are applicable across all cortical areas.
Also as noted above, a widely used alternative to L5 ET is PT. Further, this term is traditionally used along with CT to distinguish between cells with these different projections. The two main observations that make these alternative nomenclatures untenable are: (1) PT refers to motor neurons that project into MY or spinal cord, but in many cortical areas (for example, visual and auditory areas) none of the L5 ET cells are motor neurons; and (2) even in the motor cortex many cells in the L5 ET subclass do not project to the pyramidal tract and instead project solely to the TH (or to TH and other non-PT targets). This is revealed by single-neuron reconstructions 18,46,53 (Figs. 6,8), BARseq 64 , projections from neuron populations with known gene expression and anatomical position in mouse lines 63 , and studies directly linking projections to transcriptomics 9,41 and epigenetics 45 (Figs. 4, 8). The term PT therefore is not inclusive of the entire L5 ET subclass. Furthermore, the L5 CT cells within the L5 ET subclass are largely continuous with PT cells (or 'PT-like' cells), not only genetically but also anatomically 41,42 (Figs. 2, 3), as a majority of L5 ET cells project to multiple targets, typically including both the TH and the PT structures (for example, MY and spinal cord), as well as the midbrain 46 (Figs. 6, 8). Thus, the L5 ET subclass should neither be split into PT and CT, nor should the CT-only cells be omitted by use of the term PT. These facts also inform us that it is important to maintain a distinction between L5 CT (a type of L5 ET) and L6 CT (a major subclass of cortical excitatory neurons that is highly distinct from L5 ET, despite the presence of some L6 CT cells at the bottom of layer 5) 41 . CT can be accurately used as a generic term, but CT neurons do not belong to a single subclass of cortical excitatory neurons.
We recognize that another name that has been used to describe L5 ET cells is subcerebral projection neuron (SCPN) 49 . Given that the telencephalon is equivalent to the cerebrum, ET and subcerebral have the same meaning and the term L5-SCPN would be an accurate and equivalent alternative. But the 'L5' qualifier is crucial in either case to distinguish these cells from the L6 CT subclass. We favour the use of ET because SCPN has not been widely adopted and due to symmetry with the widely used 'IT' nomenclature. Alternatively, given their evidence that "unlike pyramidal tract neurons in the motor cortex, these neurons in the auditory cortex do not project to the spinal cord", Chen et al 64 used the term 'pyramidal tract-like' (PT-l). We also favour L5 ET over L5 PT-l which clings to an inaccurate and now outdated nomenclature.

Integrating 10x v3 snRNA-seq datasets across species
To identify homologous cell types across species, human, marmoset and mouse 10x v3 snRNA-seq datasets were integrated using Seurat's SCTransform workflow. Each major cell class (glutamatergic, GABAergic and non-neuronal cells) was integrated separately across species. Expression matrices were reduced to 14,870 one-to-one orthologues across the three species (NCBI Homologene; 22 November 2019). Nuclei were downsampled to have approximately equivalent numbers at the subclass level across species. Marker genes were identified for each species cluster using Seurat's FindAllMarkers function with test.use set to 'roc', > 0.7 classification power. Markers were used as input to guide alignment and anchor-finding during integration steps. For full methods see ref. 38

Estimation of cell-type homology
To establish a robust cross-species cell type taxonomy, we applied a tree-based clustering method on integrated class-level datasets (https:// github.com/AllenInstitute/BICCN_M1_Evo). The integrated space (from the previously mentioned Seurat integration) was over-clustering into small sets of highly similar nuclei for each class (about 500 clusters per class). Clusters were aggregated into metacells, then hierarchical clustering was performed based on the metacell gene expression matrix using Ward's method. Hierarchical trees were then assessed for cluster size, species mixing and branch stability by subsampling the dataset Article 100 times with 95% of nuclei. Finally, we recursively searched every node of the tree, and if certain heuristic criteria were not sufficient for a node below the upper node, all nodes below the upper node were pruned and nuclei belonging to this subtree were merged into one homologous group. We identified 24 GABAergic, 13 glutamatergic and 8 non-neuronal cross-species consensus clusters that were highly mixed across species and robust. For full methods see ref. 38 . A final dendrogram of consensus cell types was constructed by transforming the raw unique molecular identifier (UMI) counts to log 2 (counts per million (CPM)) normalized counts. Up to 50 marker genes per cross-species cluster were identified by using the scrattch.hicat (v0.0.22) (https:// github.com/AllenInstitute/scrattch.hicat) display_cl and select_markers functions with the following parameters; q1.th = 0.4, q.diff.th = 0.5, de.score.th = 80. Median cross-species cluster log 2 CPM expression of these genes were then used as input for scrattch.hicat's build_dend function. This analysis was bootstrapped 10,000 times with branch colour denoting confidence. Branch robustness was assessed by rebuilding the dendrogram 10,000 times with a random 80% subset of variable genes across clusters and calculating the proportion of iterations that clusters were present on the same branch. Consensus taxonomy agreement in Fig. 9e is determined by selecting maximum frequency leaf match with stacked bars indicating assigned consensus cell types in the centred neighbourhood.

Cross-species differential gene expression and correlations
Expression matrices were subsetted to include one-to-one orthologous genes across all three species. Spearman correlations shown in Fig. 1d were performed by comparing cross-species cluster median log 2 CPM expression of all orthologous genes for each species pair. To calculate the number of differentially expressed genes between each species pair for each cross-species cluster, we used a pseudobulk comparison method 74 from DESeq2 (v1.30.0). For a given cross-species cluster, each sample was split by species and donor, then a Wald test was performed between each species pair. Genes with adjusted P-values < 0.05 and log 2 fold-changes greater than 2 in either direction were counted and reported in Fig. 1e.

Generation of Epi-retro-seq data
We injected retrograde tracer rAAV2-retro-Cre 75 into a target region in INTACT mice 76 , which turned on Cre-dependent GFP expression in the nuclei of MOp neurons projecting to the injected target region. Individual GFP-labelled nuclei of MOp projection neurons were then isolated using fluorescence-activated nucleus sorting (FANS) (box outlines selected cells in Fig. 4a). snmC-seq2 77 was performed to profile the DNA methylation (mC) of each single nucleus.

Evaluation of contamination in Epi-retro-seq
The methods used to evaluate contamination level and potential reasons are described in detail in ref. 45 . Specifically, we quantified the ratio between the number of cells in expected on-target subclasses (for example, L5 ET cluster for ET-projecting neurons) versus in expected off-target subclasses (for example, IT clusters for ET-projecting neurons), denoted as r p , and compared the ratio with the one expected from the unbiased data without enrichment for specific projections, denoted as r u . This provides an estimation of signal-to-noise ratio of each FANS experiment. For IT projections, we used IT subclasses as on-target and L6 CT + inhibitory as off-target, and for ET projections, we used L5 ET as on-target and IT + inhibitory as off-target. For the MOp neurons without enrichment of projections, the expected ratio between cells in IT subclasses and in L6 CT + inhibitory are r u = 2,652:1,775, whereas the expected ratio between cells in L5 ET subclass and in IT + inhibitory are r u = 202:3,434. The fold enrichment in the text was computed by r p /r u for each FANS run separately and averaged across IT or ET targets respectively.
We want to point out that, in addition to this computational method, other methods are available to evaluate and minimize potential contamination in Epi-retro-seq. In cases in which differences in expected results from on-versus off-target populations are unknown, other available methods would need to be used to eliminate cases in which injections might have directly labelled cells outside the intended target region, such as examination of labelling along the injection electrode track.

Integration of L5 ET cells from Epi-retro-seq and 10x snRNA-seq
For snRNA-seq, the 4,515 cells from 10x v3 B dataset labelled as L5 ET by SCF were selected 37 . The read counts were normalized by the total read counts per cell and log transformed. Top 5,000 highly variable genes were identified with Scanpy 78 (v1.8.1) and z-score was scaled across all the cells. For Epi-retro-seq, the posterior methylation levels of 12,261 genes in the 848 L5 ET cells were computed 45 . Top 5,000 highly variable genes were identified with AllCools 79 and z-score was scaled across all the cells. The 1,512 genes as the intersection between the two highly variable gene lists were used in Scanorama 80 (v1.7.1) to integrate the z-scored expression matrix and minus z-scored methylation matrix with sigma equal to 100.

Integrating mouse transcriptomic, spatially resolved transcriptomic, and epigenomic datasets
To integrate IT cell types from different mouse datasets, we first take all cells that are labelled as IT, except for L6_IT_Car3, from the 11 datasets as listed in Fig. 7a. These cell labels are either from dataset-specific analyses 41,45 , or from the integrated clustering of multiple datasets 37 . The integrated clustering and embedding of the 11 datasets are then generated by projecting all datasets into the 10x v2 scRNA-seq dataset using SingleCellFusion 37,79 . Genome browser views of IT and ET cell types (Figs. 7b, 8c) are taken from the corresponding cell types of the brainome portal 37 (https://brainome.ucsd.edu/BICCN_MOp). MERFISH data were analysed using custom Python code, which is available at https://github.com/ZhuangLab/MERlin.

Identification of cCREs
For peak calling in the snATAC-seq data, we extracted all the fragments for each cluster, and then performed peak calling on each aggregate profile using MACS2 81 v2.2.7.1. using Python 3.6 with parameter: "--nomodel --shift −100 --ext 200 --qval 1e-2 -B --SPMR". First, we extended peak summits by 250 bp on either side to a final width of 501 bp. Then, to account for differences in performance of MACS2 based on read depth and/or number of nuclei in individual clusters, we converted MACS2 peak scores (−log 10 (q-value)) to 'score per million' 82 . Next, a union peak set was obtained by applying an iterative overlap peak-merging procedure, which avoids daisy-chaining and still allows for use of fixed-width peaks. Finally, we filtered peaks by choosing a score per million cut-off of 5 as cCREs for downstream analysis.

Predicting enhancer-promoter interactions
First, co-accessible cCREs are identified for all open regions in all neuron types (cell clusters with less than 100 nuclei from snATAC-seq are excluded) using Cicero 83 with the following parameters: aggregation k = 50, window size = 500 kb, distance constraint = 250 kb. In order to find an optimal co-accessibility threshold, we generated a random shuffled cCRE-by-cell matrix as background and calculated co-accessible scores from this shuffled matrix. We fitted the distribution of co-accessibility scores from random shuffled background into a normal distribution model by using the R package fitdistrplus 84 . Next, we tested every co-accessible cCRE pair and set the cut-off at co-accessibility score with an empirically defined significance threshold of FDR < 0.01. The cCREs outside of ±1 kb of transcriptional start sites in GENCODE mm10 (v16) were considered distal. Next, we assigned co-accessibility pairs to three groups: proximal-to-proximal, distal-to-distal and distal-to-proximal. In this study, we focus only on distal-to-proximal pairs. We calculated the Pearson's correlation coefficient (PCC) between gene expression (scRNA SMART-seq) and cCRE accessibility across the joint clusters to examine the relationships between the distal cCREs and target genes as predicted by the co-accessibility pairs. To do so, we first aggregated all nuclei or cells from scRNA-seq and snATAC-seq for every joint cluster to calculate accessibility scores (log 2 CPM) and relative expression levels (log 2 transcripts per million). Then, PCC was calculated for every gene-cCRE pair within a 1-Mbp window centred on the transcriptional start sites for every gene. We also generated a set of background pairs by randomly selecting regions from different chromosomes and shuffling the cluster labels. Finally, we fit a normal distribution model on background and defined a cut-off at PCC score with an empirically defined significance threshold of FDR < 0.01, in order to select significant positively correlated cCRE-gene pairs.

Identification of cis-regulatory modules
We used nonnegative matrix factorization (NMF) to group cCREs into cis-regulatory modules based on their relative accessibility across cell clusters. We adapted NMF (Python package: sklearn v.0.24.2) to decompose the cluster-by-cCRE matrix V (N × M, N rows: cCRE, M columns: cell clusters) into a coefficient matrix H (R × M, R rows: number of modules) and a basis matrix W (N × R), with a given rank R: V ≈ WH.
The basis matrix defines module related accessible cCREs, and the coefficient matrix defines the cell cluster components and their weights in each module. The key issue to decompose the occupancy profile matrix was to find a reasonable value for the rank R (that is, the number of modules). Several criteria have been proposed to decide whether a given rank R decomposes the occupancy profile matrix into meaningful clusters. Here we applied a measurement called sparseness 85 to evaluate the clustering result. Median values were calculated from 100 times for NMF runs at each given rank with a random seed, which will ensure the measurements are stable. Next, we used the coefficient matrix to associate modules with distinct cell clusters. In the coefficient matrix, each row represents a module and each column represents a cell cluster. The values in the matrix indicate the weights of clusters in their corresponding module. The coefficient matrix was then scaled by column (cluster) from 0 to 1. Subsequently, we used a coefficient > 0.1 (~95th percentile of the whole matrix) as a threshold to associate a cluster with a module. Similarly, we associated each module with accessible elements using the basis matrix. For each element and each module, we derived a basis coefficient score, which represents the accessible signal contributed by all clusters in the defined module.

Identification of subclass-selective transcription factors by both RNA expression and motif enrichment
All analyses for this section were at the subclass level. For RNA expression, we used the scSMART-seq dataset and compared each subclass with the rest of the population through a one-tailed Wilcoxon test and FDR correction to select significantly differentially expressed transcription factors (adjusted P-value < 0.05, cluster average fold change > 2). To perform the motif enrichment analysis, we used known motifs from the JASPAR 2020 database 86 and the subclass specific hypo-CG-DMR identified in Yao et al. 37 . The AME software from the MEME suite (v5.1.1) 87 was used to identify significant motif enrichment (adjusted P-value < 10 −3 , odds ratio > 1.3) using default parameters and the same background region set as described 37 . All genes in Extended Data Fig. 7 were both significantly expressed and had their motif enriched in at least one of the subclasses.

Generation and use of new knockin mouse lines
All experimental procedures were approved by the Institutional Animal Care and Use Committees (IACUC) of Cold Spring Harbor Laboratory, University of California Berkeley and Allen Institute, in accordance with NIH guidelines. Mouse knockin driver lines are being deposited to the Jackson Laboratory for wide distribution. Tle4-2A-CreER, Fezf2-2A-CreER, PlexinD1-2A-CreER, PlexinD1-2A-Flp, Tbr2-2A-CreER and dual-tTA mouse lines Driver and reporter mouse lines were generated using a PCR-based cloning. Knockin mouse lines Tle4-2A-CreER, Fezf2-2A-CreER, Plex-inD1-2A-CreER, PlexinD1-2A-Flp and Tbr2-2A-CreER were generated by inserting a 2A-CreER or 2A-Flp cassette in-frame before the STOP codon of the targeted gene. Targeting vectors were generated using a PCR-based cloning approach 27,47 . In brief, for each gene of interest, two partially overlapping BAC clones from the RPCI-23&24 library (made from C57BL/b mice) were chosen from the Mouse Genome Browser. 5′ and 3′ homology arms were PCR amplified (2-5 kb upstream and downstream, respectively) using the BAC DNA as template and cloned into a building vector to flank the 2A-CreERT2 or 2A-Flp expressing cassette as described 27 . These targeting vectors were purified, tested for integrity by enzyme restriction and PCR sequencing. Linearized targeting vectors were electroporated into a 129SVj/B6 hybrid ES cell line (v.6.5). ES cell clones were first screened by PCR and then confirmed by Southern blotting using appropriate probes. DIG-labelled Southern probes were generated by PCR, subcloned and tested on wild-type genomic DNA to verify that they give clear and expected results. Positive v6.5 ES cell clones were used for tetraploid complementation to obtain male heterozygous mice following standard procedures. The F 0 males and subsequent generations were bred with reporter lines (Ai14, Snap25-LSL-EGFP, Ai65) and induced with tamoxifen at the appropriate ages to characterize the resulting genetically targeted recombination patterns. Drivers Tle4-2A-CreER, Fezf2-2A-CreER and PlexinD1-2A-CreER were additionally crossed with reporter Rosa26-CAG-LSL-Flp and Tbr2-2A-CreER;PlexinD1-2A-Flp with reporter dual-tTA, and induced with tamoxifen at the appropriate age to perform anterograde viral tracing, with Flp-or tTA-dependent AAV vector expressing EGFP (AAV8-CAG-fDIO-TVA-EGFP or AAV-TRE-3g-TVA-EGFP), to characterize the resulting axon projection patterns. Last updated by author(s): July 22, 2021 Reporting Summary Nature Research wishes to improve the reproducibility of the work that we publish. This form provides structure for consistency and transparency in reporting. For further information on Nature Research policies, see our Editorial Policies and the Editorial Policy Checklist.

Statistics
For all statistical analyses, confirm that the following items are present in the figure legend, table legend, main text, or Methods section.
n/a Confirmed The exact sample size (n) for each experimental group/condition, given as a discrete number and unit of measurement A statement on whether measurements were taken from distinct samples or whether the same sample was measured repeatedly The statistical test(s) used AND whether they are one-or two-sided Only common tests should be described solely by name; describe more complex techniques in the Methods section.
A description of all covariates tested A description of any assumptions or corrections, such as tests of normality and adjustment for multiple comparisons A full description of the statistical parameters including central tendency (e.g. means) or other basic estimates (e.g. regression coefficient) AND variation (e.g. standard deviation) or associated estimates of uncertainty (e.g. confidence intervals) For null hypothesis testing, the test statistic (e.g. F, t, r) with confidence intervals, effect sizes, degrees of freedom and P value noted Our web collection on statistics for biologists contains articles on many of the points above.

Software and code
Policy information about availability of computer code

Data collection
All data for this manuscript was collected in the set of corresponding core companion papers. Please refer to these papers for data generation and quantification software.
• An integrated transcriptomic and epigenomic atlas of mouse primary motor cortex cell types, Yao et al., 2021 • Evolution of cellular diversity in primary motor cortex of human, marmoset monkey, and mouse, Bakken et al., 2021 • Molecular, spatial and projection diversity of neurons in primary motor cortex revealed by in situ single-cell transcriptomics, Zhang et al, 2021 • Phenotypic variation of transcriptomic cell types in mouse motor cortex, Scala et al, 2020 • Cellular Anatomy of the Mouse Primary Motor Cortex, Munoz-Casteneda, 2021 • Genetic dissection of the glutamatergic neuron system in cerebral cortex, Matho et al,2021