## Main

Neurons exhibit extraordinary diversity across molecular, morphological, physiological and connectional features, thus accurate classification and mapping of cell types needs to consider and integrate these distinct yet related cellular properties1,2. Single-cell RNA-sequencing (scRNA-seq) has enabled systematic classification at the transcriptomic level3,4,5, capturing major cell types with known anatomical and functional properties and revealing many potentially new cell types. Classification of cortical neurons using a combination of transcriptomic, electrophysiological and local morphological properties has also been achieved6,7. Brain-wide inter-areal connectivity has been mapped extensively using anterograde and retrograde tracing of projection neuron populations8,9,10,11. However, it remains largely unknown how population-level projection patterns are reflected at the level of single cells, the fundamental units of the circuits. Thus, characterizing single neuron axonal projections through reconstruction of complete morphologies provides ground-truth information not only for cell classification, but also for charting global networks and local circuits.

Single neurons have traditionally been labelled with molecular markers using whole-cell patching, in vivo electroporation12,13,14, sparse transgenic expression15 or sparse viral infection16,17,18,19, followed by manual reconstruction across many consecutive sections. The recent development of high-throughput and high-resolution fluorescent imaging platforms, such as fluorescence micro-optical sectioning tomography (fMOST)20,21 and MouseLight22,23, has enabled the generation of large-scale datasets for neuron reconstructions. Further improvements in brain-wide single-cell labelling methods and computational tools to expedite the laborious reconstruction process are still needed to achieve scalable and complete reconstructions from a comprehensive set of cell types.

As part of the BRAIN Initiative Cell Census Network (BICCN) efforts to characterize brain cell types across multiple modalities, we established a pipeline to label, image, reconstruct and classify single neurons in mice. We report here the largest set of complete single-neuron reconstructions to date. These neurons are labelled by cell subclass or type-selective Cre driver lines, enabling correlation of their morphologies and projection patterns with molecular identities. We also provide a corresponding set of scRNA-seq data from retrogradely labelled neurons (Retro-seq data) to corroborate our anatomical findings. Overall, our study reveals substantial morphological and projection diversity of individual neurons—this diversity is governed by underlying principles that manifest in region- and cell-type-specific manners.

## Results

### Complete neuron reconstruction pipeline

To achieve more efficient, widespread, consistently sparse yet strong labelling, we used TIGRE2.0 transgenic reporter lines that exhibit viral-like transgene-expression levels, coupling them with Cre expression from either driver lines or viral delivery. We used two types of reporter lines: the GFP-expressing Ai139 or Ai140 TIGRE2.0 reporter24, optionally coupled with the Ai82 TIGRE1.0 reporter25; and the TIGRE-MORF reporter26 (also called Ai166) (Extended Data Fig. 1). TIGRE-MORF (Ai166) expresses the MORF gene, which is a farnesylated eGFP (GFPf) preceded by a mononucleotide repeat of 22 guanines (G22–GFPf). The GFPf transgene is translated only when rare stochastic frameshift events occur to delete one guanine, leading to extremely sparse labelling well suited for reconstruction of elaborate axonal arborizations. For this study, we generated 53 high-quality fMOST-imaged brain datasets with sparsely labelled cells in cortical, thalamic, claustral, striatal and other regions, and for cholinergic, noradrenergic and serotonergic neuron types (Extended Data Fig. 2, Supplementary Table 1).

We acquired whole-brain images with sufficient resolution for reconstructing fine-calibre axons using the fMOST imaging platform21. To handle the large imaging datasets generated, we established a standardized image data processing and informatics workflow (Extended Data Fig. 3) for efficient whole-brain morphology reconstruction using Vaa3D, an open-source, cross-platform visualization, reconstruction and analysis system27,28. In parallel, each fMOST dataset was registered to the 3D Allen mouse brain Common Coordinate Framework (CCFv3)29, using a newly developed mBrainAligner program specifically designed for fMOST datasets to handle the challenges of brain shrinkage and deformation (Extended Data Fig. 4a). Following registration of the whole-brain image dataset, all individual neuron reconstructions were also registered to CCFv3 using the source brain’s transformation parameters. Co-registration of reconstructions from different brains to CCFv3 enables digital anatomical delineation, spatial quantification and comparison of each reconstructed morphology and its compartments (for example, soma, dendrites and axon arbors) using a set of analysis tools that we developed. A stringent quality control process was established to ensure the completeness of reconstructed morphologies (Extended Data Fig. 4b–e). An advantage of this platform is the distributed and modularized components and open access of all data and tools that facilitate multi-site collaboration and community engagement.

### Overview of projection neuron types

To extract rules underlying the morphological diversity of long-range projection neurons, we systematically analysed eight subclasses of neurons that were labelled by Cre lines representing specific transcriptomic subclasses4,30,31 (Fig. 1, Supplementary Table 2).

From the Cux2-CreERT2;Ai166 mice which label cortical layer (L)2/3 and L4 intratelencephalic (IT) subclasses, we analysed 100 neurons in somatosensory and motor regions: primary somatosensory cortex (SSp, n = 46), supplemental somatosensory cortex (SSs, n = 14), primary motor cortex (MOp, n = 22) and secondary motor cortex (MOs, n = 18). From the Plxnd1-CreER;Ai166 mice, in which cortical L2/3 and L5 IT subclasses are labelled, we analysed 33 L5 IT neurons in SSp (n = 15), SSs (n = 10), MOp (n = 4) and MOs (n = 4). From the Fezf2-CreER;Ai166 and Pvalb-T2A-CreERT2;Ai166 mice, in which the cortical L5 extratelencephalic (ET) (also known as pyramidal tract (PT)) subclass is labelled, we analysed 197 neurons in SSp (n = 141), SSs (n = 21), MOp (n = 19) and MOs (n = 16).

We investigated a special type of cortical excitatory neurons, the Car3 IT transcriptomic subclass4,31, whose morphology and projection pattern were unknown. This subclass of neurons is located in the deep layers (mostly L6) of all lateral cortical areas and shares the same transcriptomic clusters with neurons from the claustrum (CLA). Mesoscale population anterograde tracing shows that cortical L6 Car3 neurons have a more restricted intracortical projection pattern than CLA neurons, which project widely in cortex32 (Extended Data Fig. 5). We analysed 99 neurons from the Gnb4-IRES2-CreERT2;Ai140;Ai82 mice, including 29 CLA neurons and 70 neurons from multiple lateral cortical areas.

We analysed 701 thalamocortical projection neurons from the Tnnt1-IRES2-CreERT2;Ai140;Ai82 and Vipr2-IRES2-Cre-neo;Ai166 mice, in which the Prkcd_Grin2c transcriptomic subclass is labelled (H.Z. et al., unpublished data). The reconstructed neurons cover 21 of the 44 thalamic regions in CCFv3, which can be broadly divided into two major groups33,34, the ‘core’ or ‘driver’ nuclei (n = 638 cells) and the ‘matrix’ or ‘modulatory’ nuclei (n = 63 cells).

We analysed 280 striatal (caudoputamen (CP)) neurons from the Tnnt1, Vipr2 and Plxnd1 Cre lines. These are the medium spiny neurons (MSNs) with main projections to either globus pallidus, external segment (GPe, n = 180 cells) or substantia nigra, reticular part (SNr, n = 100 cells), which correspond to the two well-known subclasses of MSNs, dopamine receptor D1 (Drd1) neurons projecting to SNr (direct pathway) and dopamine receptor D2 (Drd2) neurons projecting to GPe (indirect pathway)35.

To provide a clearer narrative, here we first summarize major findings derived from the detailed characterizations described in the sections below. We analysed morphological features and rules at multiple levels: projection class, projection type, projection patterns (such as convergent or divergent projection, feedforward or feedback projection, and total number of projection targets), regional difference, topography and individual cell variability (Fig. 1b).

At the higher levels, neurons from the 8 transcriptomic subclasses exhibit highly distinct projection patterns and correspond well to 5 projection classes and 11 projection types. The split of the L5 ET subclass into medulla (MY)-projecting and non-MY-projecting types corresponds to specific transcriptomic and epigenomic types within the L5 ET subclass described in other studies4,36,37. The split of the thalamic Prkcd_Grin2c subclass into core and matrix projection types is also consistent with transcriptomic clusters differentiating these thalamic nuclei (H.Z. et al., unpublished data; see also ref. 38). By contrast, the split of the Car3 subclass into L6 Car3 and CLA projection types is not associated with corresponding transcriptomic clusters.

Beyond these high-level divisions, morphological distinctions among the 11 projection types are reflected in multiple aspects. The average number of projection targets (each target is defined by having total axon length greater than 1 mm)12 is highly characteristic of each projection type, with Car3 (in particular CLA) neurons having the highest number of targets, followed by L5 ET neurons, then by IT and thalamic matrix neurons, and thalamic core and CP neurons having the lowest. This distinction appears to be directly related to the differential projection patterns (divergent, parallel or convergent) among types. A major distinction between the two corticocortical projecting classes is that the CLA and cortical L6 Car3 neurons do not have collateral projections into striatum whereas IT neurons do. In addition, multiple morphological features distinguish among L2/3, L4 and L5 IT types, and between thalamic core and matrix types—some of these features are likely to be associated with the differential roles of these neuron types in mediating feedforward or feedback information flow.

We observe further morphological diversity within each projection type. Regional specificity is seen in all cortical and thalamic projection types containing neurons from different subregions. Topographic correspondence between soma locations and major axon arbors is seen in all projection types with sufficient numbers of reconstructions that allow such examination. Within each projection type that has divergent projection patterns, there is a high degree of variability among individual cells in choosing a subset of projection targets. Such individual variability appears stochastic, while it also allows further clustering of individual cells into target-driven subtypes as we have done in the most highly divergent Car3 and L5 ET projection types.

To directly compare the single-cell and population-level projection patterns, we identified 1,354 single-cell morphologies and 163 mesoscale experiments from the Allen Mouse Brain Connectivity Atlas, matched on the basis of soma or injection site being within the same CCFv3 structure (Supplementary Table 3). In this location-matched dataset, the combined single-cell projection pattern from a given region (and cortical layer) is highly concordant with that of the mesoscale experiments, with a few exceptions (Extended Data Fig. 6). A minimal set of single cells can recapitulate the mesoscale pattern well; however, averaging across all single cells shows a low level of approximation, suggesting highly diverse projection patterns among the single cells.

Overall, this large set of long-range projection neurons displays a wide range of morphological and projectional diversity that can be described at multiple levels, revealing different rules that different neuron types follow. Morphological features at high levels are closely related to the neuron’s molecular identities, whereas those at more refined levels may underlie the specific functional role of each neuron in the circuit it is embedded in.

### Cortical L2/3, L4 and L5 IT neurons

All the cortical L2/3, L4 and L5 IT neurons have their long-range projections confined within cortex and striatum (Extended Data Fig. 7). We compared single neuron and population long-range projections for both L2/3 and L5 IT subclasses (Fig. 2a). All neurons within a type collectively recapitulate the population projection pattern, but each neuron selects a subset of projection targets. Such selection appears random without correlation to each neuron’s soma depth or dendritic morphology. L5 IT neurons have significantly greater numbers of projection targets than L2/3 neurons, and this difference is particularly pronounced in SSp and SSs (Fig. 2a,b). L5 IT neurons also have longer total axon lengths in ipsilateral and contralateral cortex and striatum (Fig. 2b). L4 neurons exhibit a notable regional difference; all but two SSp L4 neurons have only local axons but no long-range projections, whereas nearly all L4 neurons in SSs, MOp and MOs do have axon projections outside of their local area (Extended Data Fig. 7).

We examined both local and distal axon projections to see whether axon termination patterns of single cells recapitulate the overall feedforward or feedback projection patterns apparent at the population level33. Quantitative vertical profiles of local axons (Fig. 2c) show that, in addition to their within-layer collaterals, L2/3 cells also project downward into L5, whereas L5 IT cells have local projections up to L1. This difference is consistent across all cortical areas examined.

Given that each IT neuron projects to only a subset of their potential intracortical targets, we pooled all the distal axon arbors from all the cells within the L2/3 or L5 IT subclass and within a source region to generate a cumulative vertical profile of laminar distribution pattern (Fig. 2d, Extended Data Fig. 8). In SSp and SSs, distal axon terminals of L2/3 cells are concentrated in middle layers (L2/3-5), whereas those of L5 IT cells preferentially target L1, suggesting feedforward and feedback roles for these L2/3 and L5 IT cells, respectively. Notably, in MOp and MOs, distal axon terminals of both L2/3 and L5 IT cells preferentially target L1 and to a lesser extent L2/3, suggesting that L2/3 IT cells in motor cortex may also have a feedback role onto other cortical regions. This cell-type difference between somatosensory and motor cortices is consistent with the differential positions of these regions in the hierarchical cortical network33 (that is, MOp and MOs are higher than SSp and SSs).

Overall, these analyses reveal major projectional differences both among the L2/3, L4 and L5 IT subclasses and among the cortical regions, as well as individual cell-to-cell variations within each subclass and each region.

We also investigated the projection target specificity of transcriptomic cell types using Retro-seq4, in which the transcriptomes of 1,134 retrogradely labelled neurons from SSp, SSs, MOp and MOs were mapped to our established transcriptomic taxonomy across the entire isocortex and hippocampal formation31 (Extended Data Fig. 9, Supplementary Table 4). For each source region, within each of the L2/3, L4, and L5 IT subclasses, neurons labelled from different injection targets were mostly mapped to a common subset of transcriptomic types, with little between-target difference, suggesting that within each IT subclass, projection pattern at single-cell level does not correlate one-to-one with the cell’s transcriptomic type.

### Cortical L5 ET neurons

L5 ET neurons exhibit extensive heterogeneity in their selected subset of projection targets (Supplementary Fig. 1). To search for patterns in this diversity, we clustered all L5 ET neurons (n = 193) together and found that cluster segregation is mainly driven by projection targets in thalamus (TH), midbrain (MB) and MY (Fig. 3). A major division is between neurons mainly projecting to MY and MB structures such as midbrain reticular nucleus (MRN) and superior colliculus, motor related (SCm) (clusters 5–6) and neurons mainly projecting to TH (the other clusters). The MY- and MRN-projecting neurons are further subdivided into those preferentially projecting to zona incerta (ZI) (cluster 5) or SCm (cluster 6). The TH branch is subdivided into those mainly projecting to ventral anterior-lateral complex (VAL) (cluster 1), posterior complex (PO) (clusters 2–4) or ventral posteromedial nucleus (VPM) and ventral posterolateral nucleus (VPL) (clusters 7–8). In addition to PO, cluster 3 neurons also project to mediodorsal nucleus (MD), central lateral nucleus (CL) and parafascicular nucleus (PF). Other brain regions such as CP, GPe, SNr and pontine grey (PG) are shared projection targets among all or most L5 ET neurons.

L5 ET neurons belonging to different clusters are intermingled in the cortical regions they come from (Extended Data Fig. 10a). At the same time, most MOp and MOs neurons are found in the more complex, MY and MRN-projecting clusters 5 and 6, whereas the other clusters with simpler, TH projections mainly contain SSp and SSs neurons, revealing a regional difference and suggesting that medulla projection may be primarily a feature of MOp and MOs neurons (Fig. 3a).

### Cortical and claustral Car3 neurons

All CLA and cortical (CTX) L6 Car3 neurons project extensively in cortex, but they do not project into striatum. Clustering on the 99 Car3 neurons from all regions identified 13 clusters (Fig. 4a, b). The first major division is between CLA and L6 Car3 neurons. CLA neurons have greater total axon lengths and higher numbers of projection targets (Fig. 4c).

CLA neurons often have long-distance projections, predominantly targeting prefrontal, medial and lateral association cortical areas as well as entorhinal cortex, whereas CTX L6 Car3 neurons mostly project to nearby cortical areas or homotypic cortical areas on the contralateral side (Fig. 4b, d). A prominent observation is that both CLA and CTX L6 Car3 clusters are arranged topographically from anterior to posterior parts, based on both soma location and projection targets—each cluster contains a group of neurons that are located close to each other and project to similar cortical target areas (Fig. 4d).

In our single-cell transcriptomic taxonomy of the mouse cortex and hippocampus31, the Car3 subclass is highly distinct from other glutamatergic neuron subclasses (Extended Data Fig. 11a). Single-cell transcriptomes of CLA neurons also mapped exclusively to this subclass. We performed Retro-seq on cells isolated from CLA (n = 240) and cortical areas SSs (n = 11) and temporal association area (TEa)–perirhinal area (PERI)–ectorhinal area (ECT) (n = 35) that were labelled by retrograde tracers injected into far-apart cortical areas (Supplementary Table 4). The Retro-seq cells mapped to the Car3 subclass are concentrated in one of the three clusters (Extended Data Fig. 11b). These results suggest a lack of regional distinction and that these cortical and claustral cells are highly related to each other, possibly having a common developmental origin. In an attempt to reveal subtler transcriptomic differences, we re-clustered all the non-Retro-seq Car3 cells (n = 1,699) from cortex and CLA (Supplementary Table 4), resulting in 8 clusters, and then remapped all the Retro-seq cells to the new clusters (Extended Data Fig. 11c, d). The CLA and CTX L6 Car3 Retro-seq cells projecting to different cortical areas are again distributed across a similar set of clusters, indicating no clear one-to-one correspondence between transcriptomic clusters and projection target specificity.

### Thalamic core and matrix neurons

Single thalamic sensory–motor relay neurons usually have one major axon arbor targeting their corresponding primary sensory or motor cortex. Axons from these nuclei terminate predominantly in L4, consistent with the core-type classification (Extended Data Fig. 12a–e, Supplementary Fig. 2). A small fraction of the core-type thalamic neurons have more than one axon arbor targeting different cortical areas. We analysed morphometric features of 944 axon arbors from 586 neurons located in the sensory thalamic nuclei VPM, VPL, lateral geniculate complex, dorsal part (LGd) and medial geniculate complex (MG) (Extended Data Fig. 13). We identify two major types of axon arbors targeting cortical layer 4 (and lower L2/3), a smaller type 1 and a larger type 2. Thus, core-type neurons can be assigned to either small-arbor or large-arbor subtype.

Outside the sensory–motor relay nuclei, nearly all reconstructed thalamic neurons have a large, diffusely branched axon arbor and/or several arbors projecting to different cortical areas, often with columnar or L5-dominant axon termination patterns. Many (79%) of these cells also have axon branches more than 1 mm long in L1, consistent with the matrix type, but they also exhibit a diverse range of morphological patterns (Extended Data Fig. 12e–h, Supplementary Fig. 3). Some nuclei (for example, lateral posterior nucleus (LP) and mediodorsal nucleus (MD)) further display distinct subdivisions.

A quantitative interareal projection matrix (Extended Data Fig. 12i, Supplementary Tables 2, 3) further demonstrates the distinction between core- and matrix-type neurons, with the core-type neurons predominantly targeting a single cortical area (sometimes with a secondary area) and the matrix-type neurons targeting multiple cortical areas. Within each nucleus, individual neurons show a high (for core cells) or moderate (for matrix cells) degree of consistency with each other and with the population projection pattern for that nucleus.

### Striatal medium spiny neurons

Individual striatal MSNs project to GPe or SNr in a simple point-to-point fashion, each with one major axon arbor. Most SNr-projecting neurons also send minor collaterals to globus pallidus, internal segment (GPi) and/or GPe. The dominant feature of both types of striatal neurons is convergent projection within the main target region, GPe or SNr, consistent with the approximately 20-fold smaller sizes of these regions compared to the dorsal striatum (Extended Data Fig. 14a–c). Between each pair of neurons, the centre-to-centre distance of their axon arbors increases proportionally and the degree of overlap between the axon arbors decreases exponentially, along with the increase of the pair’s soma-to-soma distance (Extended Data Fig. 14c), indicating a regular spatial organization of these neurons’ axon projections. The axon arbor distances between striatal neurons within the same type are substantially smaller and their overlap scores are substantially higher than those of neurons from various thalamic nuclei (Extended Data Fig. 14d). Furthermore, axon arbors in GPe or SNr can be grouped into domains on the basis of the degree of overlap; these domains are arranged topographically and correspond to the topographic localization of the somas in striatum (Extended Data Fig. 14e).

### Topographic organization of axon projection

Single-neuron reconstructions enabled us to investigate the topographic relationship between soma locations and main axon arbors of several projection neuron types for which we have sufficiently large numbers of reconstructions, and soma–axon arbor spatial topographic relationships were found in all cases examined (Extended Data Fig. 15). For neurons in LGd, VPM, and VPM and VPL combined, axon arbor positions along anterior–posterior and lateral–medial dimensions in each of the cortical target areas correspond to soma positions along ventral–dorsal and medial–lateral dimensions in each thalamic nucleus, indicating a three-dimensional rotation of axon projections (Extended Data Fig. 15a–c). This topographic relationship is most clearly seen in VPM and VPL neurons, whereas it is more complex for LGd neurons. There is a similar topographic rotational relationship for SSp L5 ET neurons between their soma locations in SSp and axon arbors in VPM and PO, although the relationship also appears fuzzier (Extended Data Fig. 15d). For striatal neurons, GPe-projecting MSNs maintain the dorsal–ventral, lateral–medial and anterior–posterior orientations between their somas in CP and axon arbors in GPe, whereas SNr-projecting MSNs exhibit a flip between their soma and axon arbor positions in the dorsal–ventral axis (Extended Data Fig. 15e, f), consistent with previous bulk tracing studies39. These single-cell results reveal yet another level of organization of cell-type -specific axon projection patterns.

## Discussion

To fully understand the anatomical diversity and specificity of individual neurons across the mammalian brain, a very large number of neurons will need to be examined. Approaches such as MAPseq and BARseq12,40,41 can quickly survey projection patterns at regional level for many neurons in a high-throughput manner. However, many essential details can be obtained only through complete morphological reconstructions. Collecting such ground-truth data provides a unique opportunity to uncover principles of neuronal diversity and circuit organization.

We systematically examined multiple levels of morphological properties in this large set of complete reconstructions with the goal of deriving organizational rules governing long-range axon projections, incorporating cross-modality relationship between transcriptomic and morphological properties. At the highest level, there is a high degree of concordance between major transcriptomic and projection neuron types. Neurons belonging to different transcriptomic subclasses have highly distinct morphological and projectional properties. Additionally, the medulla-projecting and non-medulla-projecting L5 ET neuron types and the core and matrix thalamocortical projection types also correlate with different molecular types as shown in other studies36,37,38. An exception at this level is the claustral and cortical L6 Car3 neurons which have distinct projection patterns but appear transcriptomically homogeneous.

At the intermediate level, within each projection neuron type, neurons follow region-specific and topographic organizational rules. Our latest transcriptomic study showed that L2/3 IT, L4 IT, L5 IT and L5 ET transcriptomic types are largely shared between somatosensory and motor cortical areas with some continuous gradient variations, and the Car3 transcriptomic types are also shared among all lateral cortical areas and claustrum31. By contrast, here we show for all types that, within each type, neurons from different regions have distinct sets of projection targets that are region-specific. Furthermore, in each type and region examined we observe a topographic relationship between soma locations and axon arbor distributions in a main target region. The most prominent example is the cortical and claustral Car3 neurons whose extensive variation of axon projections is linked to both regional specificity and topography, two closely related factors as these neurons are situated at the lateral part of the cortical sheet along almost the entire anterior–posterior extent.

At the lowest, single-cell level, the degree of similarity or variability between individual neurons within a given type also varies across types. Within-type individual cell variability is high in cortical and claustral neurons, moderate in thalamic matrix neurons, and low in thalamic core neurons and striatal MSNs.

A major question is how morphological and projectional properties compare and correlate with the neurons’ molecular identities. We attempted to address this question with two approaches: using validated driver lines to define the subclass-level molecular identities of reconstructed neurons and using Retro-seq to obtain transcriptomic profiles of neurons projecting to specific targets. Both approaches show that subclasses of neurons have highly distinct morphological and projectional properties; however, within these major types, especially for cortical and claustral neurons, many aspects of morphological diversity cannot be accounted for by currently identified transcriptomic subtypes or clusters in the adult stage. Previous studies showed that L2/3 SSp pyramidal neurons projecting to MOp or SSs have distinct intrinsic and network physiological properties42,43. Even though they may not belong to distinct transcriptomic subtypes, it will be interesting to examine gene-expression differences that might correspond to the differential connectional and physiological properties for these neurons, as found for primary visual cortical neurons projecting differentially to medial or lateral higher visual areas44.

Several mechanisms may explain the origin of the morphological diversity, such as molecular instructions that act transiently during development45, activity-dependent cell interactions, or stochastic processes. It will be informative to develop methods that enable complete reconstruction of morphology and in-depth gene-expression profiling to be conducted on the same cell, and apply them to single cells in both adult stage and during brain development, so that the developmental correlations of molecular and morphological and connectional features can be identified.

## Methods

### Nomenclature and abbreviations in CCFv3 ontology of mouse brain regions mentioned in this study

Isocortex: frontal pole (FRP), primary motor area (MOp), secondary motor area (MOs), primary somatosensory area (SSp), supplemental somatosensory area (SSs), gustatory area (GU), visceral area (VISC), dorsal auditory area (AUDd), primary auditory area (AUDp), posterior auditory area (AUDpo), ventral auditory area (AUDv), primary visual area (VISp), anterolateral visual area (VISal), anteromedial visual area (VISam), lateral visual area (VISl), posterolateral visual area (VISpl), posteromedial visual area (VISpm), laterointermediate area (VISli), postrhinal area (VISpor), anterior cingulate area, dorsal part (ACAd), anterior cingulate area, ventral part (ACAv), prelimbic area (PL), infralimbic area (ILA), orbital area, lateral part (ORBl), orbital area, medial part (ORBm), orbital area, ventrolateral part (ORBvl), agranular insular area, dorsal part (AId), agranular insular area, posterior part (AIp), agranular insular area, ventral part (AIv), retrosplenial area, lateral agranular part (RSPagl), retrosplenial area, dorsal part (RSPd), retrosplenial area, ventral part (RSPv), posterior parietal association area (PTLp), anterior area (VISa), rostrolateral visual area (VISrl), temporal association area (TEa), perirhinal area (PERI), ectorhinal area (ECT).

Olfactory areas (OLF): piriform area (PIR).

Hippocampal formation (HPF): hippocampal region (HIP), fields CA1, CA2, CA3, dentate gyrus (DG), entorhinal area, lateral part (ENTl), entorhinal area, medial part (ENTm), parasubiculum (PAR), postsubiculum (POST), presubiculum (PRE), subiculum (SUB), prosubiculum (ProS).

Cortical subplate (CTXsp): claustrum (CLA), endopiriform nucleus, dorsal part (EPd), endopiriform nucleus, ventral part (EPv), lateral amygdalar nucleus (LA), basolateral amygdalar nucleus (BLA), basomedial amygdalar nucleus (BMA).

Cerebral nuclei (CNU): striatum (STR), caudoputamen (CP), nucleus accumbens (ACB), fundus of striatum (FS), central amygdalar nucleus (CEA), medial amygdalar nucleus (MEA), globus pallidus, external segment (GPe), globus pallidus, internal segment (GPi), bed nuclei of the stria terminalis (BST).

Thalamus (TH): ventral anterior-lateral complex (VAL), ventral medial nucleus (VM), ventral posterolateral nucleus (VPL), ventral posterolateral nucleus, parvicellular part (VPLpc), ventral posteromedial nucleus (VPM), ventral posteromedial nucleus, parvicellular part (VPMpc), posterior triangular thalamic nucleus (PoT), medial geniculate complex, dorsal part (MGd), medial geniculate complex, ventral part (MGv), medial geniculate complex, medial part (MGm), lateral geniculate complex, dorsal part (LGd), lateral posterior nucleus (LP), posterior complex (PO), anteromedial nucleus (AM), interanterodorsal nucleus (IAD), lateral dorsal nucleus (LD), mediodorsal nucleus (MD), submedial nucleus (SMT), paraventricular nucleus (PVT), nucleus of reuniens (RE), central medial nucleus (CM), paracentral nucleus (PCN), central lateral nucleus (CL), parafascicular nucleus (PF), reticular nucleus (RT).

Hypothalamus (HY): subthalamic nucleus (STN), zona incerta (ZI).

Midbrain (MB): substantia nigra, reticular part (SNr), midbrain reticular nucleus (MRN), superior colliculus, motor related (SCm), periaqueductal grey (PAG), anterior pretectal nucleus (APN), red nucleus (RN), pedunculopontine nucleus (PPN), dorsal nucleus raphe (DR).

Pons: parabrachial nucleus (PB), pontine grey (PG), pontine reticular nucleus, caudal part (PRNc), tegmental reticular nucleus (TRN), pontine reticular nucleus (PRNr), locus ceruleus (LC).

Medulla (MY): cuneate nucleus (CU), gigantocellular reticular nucleus (GRN), inferior olivary complex (IO), intermediate reticular nucleus (IRN), magnocellular reticular nucleus (MARN), parvicellular reticular nucleus (PARN), spinal nucleus of the trigeminal, caudal part (SPVC), spinal nucleus of the trigeminal, interpolar part (SPVI), spinal nucleus of the trigeminal, oral part (SPVO), medullary reticular nucleus, dorsal part (MDRNd).

### Animal care and use

Both male and female transgenic mice from at least postnatal day 56 (P56) were used for all experiments. All animals were housed 3–5 per cage and maintained on a 14 h:10 h light:dark cycle, in a humidity- and temperature-controlled room (humidity at ~40%, temperature at ~21 °C) with water and food available ad libitum. All experimental procedures related to the use of mice were conducted with approved protocols in accordance with NIH guidelines, and were approved by the Institutional Animal Care and Use Committee (IACUC) of the Allen Institute for Brain Science.

### Transgenic mice and sparse labelling

All transgenic crosses are listed in Supplementary Tables 1, 4. Data for systematic characterization of the expression pattern of each transgenic mouse line can be found in the Allen Transgenic Characterization database (http://connectivity.brain-map.org/transgenic/search/basic).

Induction of CreERT2 driver lines was done by administration by oral gavage of tamoxifen (50 mg ml−1 in corn oil) at original (0.2 mg g−1 body weight) or reduced dose for 1 d in an adult mouse. The dosage for mice aged P7–P15 is 0.04 ml. Mice can be used for experiments at two or more weeks after tamoxifen dosing. We found optimal tamoxifen doses for sparse labelling in each case using serial two photon tomography (STPT)10,33 to quickly screen for brain-wide transgene expression. The specific dose of tamoxifen to induce sparse labelling in each CreERT2 driver line is shown in Supplementary Table 1.

### fMOST imaging

In summary, a GFP-labelled brain is first embedded in resin. The resin-embedded GFP fluorescence can be recovered through chemical reactivation47 provided by adding Na2CO3 in the imaging water bath. Thus, a line-scanning block-face imaging system can be used to maximize imaging speed. Following imaging of the entire block face, the top 1 µm of tissue is sliced off with a diamond knife, exposing the next face of the block for imaging. For the entire mouse brain, a 15–20 TB dataset containing about 10,000 coronal planes of 0.2–0.3 µm xy resolution and 1 µm z sampling rate is generated within 2 weeks.

All tissue preparation has been described previously48. Following fixation, each intact brain was rinsed 3 times (6 h for two washes and 12 h for the third wash) at 4 °C in a 0.01 M PBS solution (Sigma-Aldrich). Then the brain was subsequently dehydrated via immersion in a graded series of ethanol mixtures (50%, 70% and 95% (vol/vol) ethanol solutions in distilled water) and the absolute ethanol solution three times for 2 h each at 4 °C. After dehydration, the whole brain was impregnated with Lowicryl HM20 Resin Kits (Electron Microscopy Sciences, cat.no. 14340) by sequential immersions in 50, 75, 100 and 100% embedding medium in ethanol, 2 h each for the first three solutions and 72 h for the final solution. Finally, each whole brain was embedded in a gelatin capsule that had been filled with HM20 and polymerized at 50 °C for 24 h.

The whole brain imaging is realized using a fMOST system. The basic structure of the imaging system is the combination of a line-scanning upright epifluorescence microscope with a mechanical sectioning system. This system runs in a line-scanning block-face mode but updated with the principle of chemical sectioning to obtain better image contrast and speed and thus enables high-throughput imaging of the fluorescent-protein-labelled sample (manuscript in preparation). Each time we do a block-face fluorescence imaging across the whole coronal plane (xy axes), then remove the top layer (z axis) using a diamond knife, and then expose next layer, and image again. The thickness of each layer is 1.0 µm. In each layer imaging, we used a strip-scanning (x axis) model combined with a montage in the y axis to cover the whole coronal plane49. The fluorescence, collected using a microscope objective, passes a bandpass filter and is recorded with a TDI-CCD camera. We repeat these procedures across the whole sample volume to obtain the required dataset.

The objective used is a 40× water-immersion lens with numerical aperture 0.8 to provide a designed optical resolution (at 520 nm) of 0.35 μm in the xy axes. The imaging gives a sample voxel of 0.35 × 0.35 × 1.0 μm to provide proper resolution to trace the neural process. The voxel size may vary for difference objectives. Other imaging parameters for GFP imaging include an excitation wavelength of 488 nm, and emission filter with passing band 510–550 nm. The fMOST is a two-colour imaging system. The green channel is used to obtain the complete morphology of neurons, and the red channel is used to obtain the cellular architecture information of propidium iodide (PI) staining20.

### Complete neuronal morphology reconstruction

We used Vaa3D, an open-source, cross-platform visualization and analysis system, for the tasks of reconstructing neuronal morphologies. To efficiently and effectively deal with the mouse whole-brain imaging data, we incorporated several enabling modules into Vaa3D, such as TeraFly, TeraVR and a number of other supporting tools. TeraFly50 supports visualization and annotation of multidimensional imaging data with virtually unlimited scales. A user can flexibly choose to work at a specific region of interest (ROI) with desired level of detail (LoD). TeraVR51 is an annotation tool for immersive neuron reconstruction that has been proved to be critical for achieving precision and efficiency in morphology data production. It creates stereo visualization for image volumes and reconstructions and offers an intuitive interface for the user to interact with such data. Both TeraFly and TeraVR are seamlessly integrated in Vaa3D and can be used combinedly and flexibly. From reconstructions (in SWC file format), morphological quantification statistics is obtained to characterize neurons. A quality control process identifies errors based on morphological indicators and does corrections in a feedback setting. The quality control process then refines the skeleton location with Mean-Shift52 and performs pruning focused on terminal location refinement. When needed, auto-refinement fits the tracing to the centre of fluorescent signals. The whole process ends with SWC resampling and registration. The final reconstruction of each neuron is a valid single tree without breaks, loops, multiple branches from a single point, and so on.

### Registration to CCF

We used mBrainAligner based on BrainAligner53 to perform 3D registration from fMOST images (subject) to the average mouse brain template of CCFv3 (target) (Extended Data Fig. 4a). The main steps are: (1) fMOST images were first downsampled by 64 × 64 × 16 (x × y × z) to roughly match the size of the target brain. (2) The stripe artefacts in fMOST images from diamond-knife cutting and the imaging process were eliminated by using log-space frequency notch filter. (3) The dense outer-contour feature points of target and subject brain (about 1,500 points per brain) were uniformly sampled from the brains’ outer contour obtained using adaptive threshold, and then affine-aligned using a reliable landmark points matching algorithm to ensure the subject brain has the same position, orientation and scale as the target brain. (4) Intensity was normalized by matching the local average intensity of subject image to that of target image in a sliding window manner with patch size 41 × 41 × 41 and stride 1. (5) For the target brain, 1,744 landmarks corresponding to the points of high curvature (corners or junction of different brain compartments) in CCFv3 annotation image were detected via 2.5D Harris corner detector. On the basis of a combination of texture, shape context and deep-learning-derived features, mBrainAligner established the correspondence between target and subject brain by iteratively deforming these target landmarks to fit the subject image, and accomplished the local alignment using the smooth-thin-plate-spline (STPS). (6) Finally, to ensure the accuracy of registration, automatic registration results were examined in the semi-automatic registration module of mBrainAligner, and if necessary, the boundaries of the brain region were further optimized in a manual or semi-automatic way. Once images were aligned, the reconstructed neurons and somas were warped to CCF space using the generated deformation fields.

### Processing single-cell morphology data

Pre-processing of SWC files: SWC files were processed and examined with Vaa3D plugins to ensure topological correctness: sorted single tree with root node as soma. Terminal branches < 10 pixels were pruned to remove artifacts. SWC files were resampled with a step size of 64 (x), 64 (y) and 16 (z) before registration.

Quantification of axon projection patterns: to analyse the distribution and amount of axon in brain-wide targets following registration to the CCFv3, we used a manually curated set of 316 non-overlapping structures at a mid-ontology level that are most closely matched in size or division. Ipsilateral and contralateral sides of brain regions were calculated separately.

Morphological features: axonal and dendritic morphological features, defined according to L-measurement54, were calculated using Vaa3D plugin “global_neuron_feature”. Selected features include Axon global: ‘Overall Width’, ‘Overall Height’, ‘Overall Depth’, ‘Total Length’, ‘Euclidean Distance’, ‘Max Path Distance’, ‘Number of Branches’; Axon local: ‘Total Length’, ‘Number of Branches’; Dendrite: ‘Overall Width’, ‘Overall Height’, ‘Overall Depth’, ‘Total Length’, ‘Max Euclidean Distance’, ‘Max Path Distance’, ‘Number of Branches’, ‘Max Branch Order’.

Local axons were defined as axon arbors within 200 µm from the somata. Local axons and dendrites were rotated based on principal component analysis (PCA) so dimensions were aligned with the largest to smallest spans. Then shifting was performed to localize somata at the origin of coordinates.

### mBrainAnalyzer

The mBrainAnalyzer toolbox (also named neuro_morpho_toolbox), which was developed for analysis of full neuron morphology, includes multiple modules for feature quantification, arbor detection, statistical analysis and visualization. In addition to morphological features (for example, total length, angle of branches, and so on), this toolbox also quantifies projection intensities at branch length level and number of terminal levels. Using the arbor detection module, one can define sub-cellular components of a neuron as the granularity. Analysis and visualization can be performed at both whole-cell and arbor levels.

### Arbor detection and partition

We detected and partitioned a series of neuronal arbors out of each neuron reconstruction using a graph-partition clustering method. First, as a neuron consists of a number of topologically connected reconstruction nodes, the neuron was viewed as a graph, where every reconstruction node (unit) in the neuron was connected with its parent node with an edge specified by the topological connection of the parent-child pair with the edge weight, or ‘similarity’ s, set to be the exponential of the negative 3D Euclidean distance, d, of these two nodes, that is, s = exp(−d). Then, we considered the normalized graph-cut method55 to extract ‘clusters’ of reconstruction nodes so that the within-cluster ‘total similarity’ of nodes would be maximized and cross-cluster total similarity would be minimized. As a result, each such coherent cluster corresponds to one neuron arbor, which was also visually checked to ensure its correctness. Third, to automatically determine the number of such clusters, for a presumed number of clusters, we calculated the normalized score of total cross-cluster similarity divided by the total within-cluster similarity, followed by trial testing a range (from 2 to 8) of such presumed cluster-numbers to determine the optimal number that would minimizes this normalized score. In the final result, the detected arbor that contains the soma is called soma arbor; the remaining arbors are called non-soma arbors.

### Feature quantification of cortical arbors

We divided the cortex into consecutive 100-μm thick coronal slices. Anchor points were evenly sampled along the outer border of each slice, with normal vectors perpendicular to the local cortical surface and pointing to the inside of the brain. Nodes of arbors were assigned to their neighbour anchors and projected onto the surface by corresponding normal vectors. Depth of nodes were determined by the length of projection along normal vectors. We also estimated the area of an anchor by their distance to neighbour anchors and slice thickness. The 2D cortical area of an arbor was determined by the total areas of unique anchors occupied by its nodes. To determine the radius of an arbor, we assigned arbor ‘centre’ as the node that has the shortest average distance to other nodes. Radius was determined by a growing sphere until 70% of segments were inside it. For neurons with tufted apical dendrite, we vertically shifted the arbors, so the top of apical dendrites reached L1. We manually confirmed that all tufted apical dendrites reached L1 in the original image.

### Clustering of cortical axon arbors of core-type thalamic neurons

For local (soma-neighbouring) arbors, the following features were used for clustering: ‘2d_area’, ‘total_length’, ‘radius’, ‘depth_mean’, ‘depth_std’. We performed PCA to reduce the effect of noise. Top principal components (PCs) were selected to recover 95% of variance. We applied uniform manifold approximation and projection (UMAP) dimension reduction using the Python package ‘UMAP’56. The ‘n_neighbors’ parameter was set at 15. k-Means clustering was performed using the UMAP embeddings as input.

### Neuron-beta

We developed the Neuron-beta metric by borrowing the concept of the beta value from the finance field57. For each group, defined by brain areas and/or cortical layers, we calculated the average of mesoscale experiments as M = [m1, , mp], p = number of brain areas. For one single cell S = [s1, , sp], we define the neuron-beta value as:

$${\rm{Neuronbeta}}=\frac{{\rm{Cov}}(M,S)}{{\rm{Var}}(M)}$$

### Clustering of cortical L5 ET neurons

Projection strength is defined as ln(axon length in mm). Strength values for regions with axon length below 1 mm were set as 0. Only non-cortical areas were included. Dimension reduction was performed by PCA followed by 2D UMAP. Top PCs which explained > 90% variance were used as input of UMAP. Hierarchical clustering was performed using UMAP embeddings. Minimum branch length for clusters was manually determined.

### Clustering of cortical L6 Car3 and claustral neurons

Data normalization: morphological features were normalized by the mean and standard variation in a feature-wise manner. Projection pattern features were defined as ln(axon length in mm). For regions with axon length below 1 mm, projection pattern feature values are set as 0. Soma locations were flipped to the same hemisphere.

Similarity metrics: for each feature set, we first calculated the Euclidean distance matrix. Then a ranked k-nearest neighbour (KNN) matrix was created. We then applied the shared nearest neighbor (SNN) approach to measure the similarity between each pair of samples xi and xj. The SNN metric was defined as the maximum average rank among their common neighbours:

$$S({x}_{i},{x}_{j})=\mathop{{\rm{\max }}}\limits_{v\in {\rm{NN}}({x}_{i})\cap {\rm{NN}}({x}_{j})}\{k-\frac{1}{2}[{{\rm{rank}}}_{{\rm{NN}}({x}_{i})}(v)+\frac{1}{2}({{\rm{rank}}}_{{\rm{NN}}({x}_{j})}(v))]\}$$

where NN is nearest neighbour and v is a neuron from the dataset. Similarity scores were set as 0 for pairs with non-overlapping KNN sets and a weighted SNN graph was created.

Co-clustering analysis: the co-clustering matrix for each feature set was calculated by iterative random sampling. During each iteration, 95% of samples were randomly selected to create an SNN graph. We then applied the Fast-greedy community detection algorithm using the Python package python-igraph for clustering assignment. For each pair of samples, the co-clustering score was defined as the times of co-clustering normalized by the iterations of co-occurring. Resampling was performed 1,000 times to reach saturation. The overall co-clustering matrix is a weighted average of the four feature sets. Agglomerative clustering was performed on the co-clustering matrix to get clusters.

Outlier removal: outliers were detected by comparing the Euclidean distance between a sample and the other samples with the same cluster identity. We used overall within-cluster distance as the background distribution. Samples with significantly higher (one-sided Mann–Whitney test) within-cluster distance were filtered out as outliers. Agglomerative clustering was performed for the remaining co-clustering matrix. This process iterated until no new outlier could be detected.

Characterization of cell types: for each feature set, we performed two-sided Mann-Whitney tests: claustrum versus cortical neurons; each cluster versus other clusters. P-values were adjusted by Bonferroni correction.

For anterograde projection mapping, we injected AAV2/1-pCAG-FLEX-EGFP-WPRE-pA into CLA, SSs or SSp of Gnb4-IRES2-Cre or Gnb4-IRES2-CreERT2 mice at P37–P65. Stereotaxic injection procedures were performed as previously described10. For the Gnb4-IRES2-CreERT2 mice, tamoxifen induction was conducted one week after injection at full dose (0.2 mg per g body weight) for 5 consecutive days. Mice survived 3 weeks (or 4 weeks for the tamoxifen-induced mice) after injection, and brains were perfused and collected for TissueCyte imaging.

For retrograde labelling, we injected several different types of retrograde viral tracers, including AAV2-retro-EF1a-dTomato, AAV2-retro-EF1a-Cre58, RVdGdL-Cre, RVdL-FlpO59 or CAV2-Cre60 into specific target regions of defined transgenic mice (Supplementary Table 4). RFP+ or RFP+/GFP+ cells from defined source regions were collected for scRNA-seq using the procedure described below. Stereotaxic injection procedures were performed as described10. Mice were injected at P40 or older, and survived for 16–31 days after injection.

### scRNA-sequencing, clustering and mapping

Cells from transgenic mice or transgenic mice injected with retrograde tracers were collected by microdissection of different cortical regions. Single-cell suspensions were generated and cells were collected using fluorescence activated cell sorting (FACS). FACS gates were selective for cells with fluorescent protein expression from transgenic and/or viral reporters.

Cells were then frozen at −80 °C and were later processed for scRNA-seq using the SMART-Seq v4 method4. After sequencing, raw data was quantified using STAR v2.5.361 and were aligned to both a Ref-Seq transcriptome index for the mm10 genome, and a custom index consisting of transgene sequences. PCR duplicates were masked and removed using the STAR option bamRemoveDuplicates. Only uniquely aligned reads were used for gene quantification. Gene read counts were quantified using the summarizeOverlaps function from R GenomicAlignments package (RRID: SCR_018096)62 using both intronic and exonic reads, and quality control was performed as described4.

Clustering was performed using in-house developed R package scrattch.hicat (available via GitHub: https://github.com/AllenInstitute/scrattch.hicat). The Retro-seq cells where mapped to the most correlated cell types in the Cortex/HPF taxonomy31 based on a set of 5,981 cell-type-specific markers using the map_sampling function from the scrattch.hicat package. Only the SMART-Seq dataset from the reference taxonomy is used for mapping. All the cells from CLA were mapped to the Car3 subclass. However, CLA cells were not included as part of the Cortex/HPF taxonomy. To examine more closely the cell-type diversity, we re-clustered all the original SMART-Seq cells within the Car3 subclass together with the mapped cells from CLA (Supplementary Table 4; CLA is part of CTXsp), resulting in 8 clusters. The cortical and CLA Retro-seq cells previously mapped to the Car3 subclass were then re-mapped to the new clusters, using 277 marker genes that discriminate these 8 clusters.

### Reporting summary

Further information on research design is available in the Nature Research Reporting Summary linked to this paper.