Morphological diversity of single neurons in molecularly defined cell types

Dendritic and axonal morphology reflects the input and output of neurons and is a defining feature of neuronal types1,2, yet our knowledge of its diversity remains limited. Here, to systematically examine complete single-neuron morphologies on a brain-wide scale, we established a pipeline encompassing sparse labelling, whole-brain imaging, reconstruction, registration and analysis. We fully reconstructed 1,741 neurons from cortex, claustrum, thalamus, striatum and other brain regions in mice. We identified 11 major projection neuron types with distinct morphological features and corresponding transcriptomic identities. Extensive projectional diversity was found within each of these major types, on the basis of which some types were clustered into more refined subtypes. This diversity follows a set of generalizable principles that govern long-range axonal projections at different levels, including molecular correspondence, divergent or convergent projection, axon termination pattern, regional specificity, topography, and individual cell variability. Although clear concordance with transcriptomic profiles is evident at the level of major projection type, fine-grained morphological diversity often does not readily correlate with transcriptomic subtypes derived from unsupervised clustering, highlighting the need for single-cell cross-modality studies. Overall, our study demonstrates the crucial need for quantitative description of complete single-cell anatomy in cell-type classification, as single-cell morphological diversity reveals a plethora of ways in which different cell types and their individual members may contribute to the configuration and function of their respective circuits.

Neurons exhibit extraordinary diversity across molecular, morphological, physiological and connectional features, thus accurate classification and mapping of cell types needs to consider and integrate these distinct yet related cellular properties 1,2 . Single-cell RNA-sequencing (scRNA-seq) has enabled systematic classification at the transcriptomic level [3][4][5] , capturing major cell types with known anatomical and functional properties and revealing many potentially new cell types. Classification of cortical neurons using a combination of transcriptomic, electrophysiological and local morphological properties has also been achieved 6,7 . Brain-wide inter-areal connectivity has been mapped extensively using anterograde and retrograde tracing of projection neuron populations [8][9][10][11] . However, it remains largely unknown how population-level projection patterns are reflected at the level of single cells, the fundamental units of the circuits. Thus, characterizing single neuron axonal projections through reconstruction of complete morphologies provides ground-truth information not only for cell classification, but also for charting global networks and local circuits.
Single neurons have traditionally been labelled with molecular markers using whole-cell patching, in vivo electroporation [12][13][14] , sparse transgenic expression 15 or sparse viral infection [16][17][18][19] , followed by manual reconstruction across many consecutive sections. The recent development of highthroughput and high-resolution fluorescent imaging platforms, such as fluorescence micro-optical sectioning tomography (fMOST) 20,21 and MouseLight 22,23 , has enabled the generation of large-scale datasets for neuron reconstructions. Further improvements in brain-wide single-cell labelling methods and computational tools to expedite the laborious reconstruction process are still needed to achieve scalable and complete reconstructions from a comprehensive set of cell types.
As part of the BRAIN Initiative Cell Census Network (BICCN) efforts to characterize brain cell types across multiple modalities, we established a pipeline to label, image, reconstruct and classify single neurons in mice. We report here the largest set of complete single-neuron reconstructions to date. These neurons are labelled by cell subclass or type-selective Cre driver lines, enabling correlation of their morphologies and projection patterns with molecular identities. We also provide a corresponding set of scRNA-seq data from retrogradely labelled neurons (Retro-seq data) to corroborate our anatomical findings. Overall, our study reveals substantial morphological and projection diversity of individual neurons-this diversity is governed by underlying principles that manifest in region-and cell-type-specific manners.

Complete neuron reconstruction pipeline
To achieve more efficient, widespread, consistently sparse yet strong labelling, we used TIGRE2.0 transgenic reporter lines that exhibit viral-like transgene-expression levels, coupling them with Cre expression from either driver lines or viral delivery. We used two types of reporter lines: the GFP-expressing Ai139 or Ai140 TIGRE2.0 reporter 24 , optionally coupled with the Ai82 TIGRE1.0 reporter 25 ; and the TIGRE-MORF reporter 26 (also called Ai166) (Extended Data Fig. 1). TIGRE-MORF (Ai166) expresses the MORF gene, which is a farnesylated eGFP (GFPf) preceded by a mononucleotide repeat of 22 guanines (G 22 -GFPf). The GFPf transgene is translated only when rare stochastic frameshift events occur to delete one guanine, leading to extremely sparse labelling well suited for reconstruction of elaborate axonal arborizations. For this study, we generated 53 high-quality fMOST-imaged brain datasets with sparsely labelled cells in cortical, thalamic, claustral, striatal and other regions, and for cholinergic, noradrenergic and serotonergic neuron types (Extended Data Fig. 2, Supplementary Table 1).
We acquired whole-brain images with sufficient resolution for reconstructing fine-calibre axons using the fMOST imaging platform 21 . To handle the large imaging datasets generated, we established a standardized image data processing and informatics workflow (Extended Data Fig. 3) for efficient whole-brain morphology reconstruction using Vaa3D, an open-source, cross-platform visualization, reconstruction and analysis system 27,28 . In parallel, each fMOST dataset was registered to the 3D Allen mouse brain Common Coordinate Framework (CCFv3) 29 , using a newly developed mBrainAligner program specifically designed for fMOST datasets to handle the challenges of brain shrinkage and deformation (Extended Data Fig. 4a). Following registration of the whole-brain image dataset, all individual neuron reconstructions were also registered to CCFv3 using the source brain's transformation parameters. Co-registration of reconstructions from different brains to CCFv3 enables digital anatomical delineation, spatial quantification and comparison of each reconstructed morphology and its compartments (for example, soma, dendrites and axon arbors) using a set of analysis tools that we developed. A stringent quality control process was established to ensure the completeness of reconstructed morphologies (Extended Data Fig. 4b-e). An advantage of this platform is the distributed and modularized components and open access of all data and tools that facilitate multi-site collaboration and community engagement.

Overview of projection neuron types
To extract rules underlying the morphological diversity of long-range projection neurons, we systematically analysed eight subclasses of neurons that were labelled by Cre lines representing specific transcriptomic subclasses 4,30,31 (Fig. 1, Supplementary Table 2).
We investigated a special type of cortical excitatory neurons, the Car3 IT transcriptomic subclass 4,31 , whose morphology and projection pattern were unknown. This subclass of neurons is located in the deep layers (mostly L6) of all lateral cortical areas and shares the same transcriptomic clusters with neurons from the claustrum (CLA). Mesoscale population anterograde tracing shows that cortical L6 Car3 neurons have a more restricted intracortical projection pattern than CLA neurons, which project widely in cortex 32 (Extended Data Fig. 5). We analysed 99 neurons from the Gnb4-IRES2-CreERT2;Ai140;Ai82 mice, including 29 CLA neurons and 70 neurons from multiple lateral cortical areas.
To provide a clearer narrative, here we first summarize major findings derived from the detailed characterizations described in the sections below. We analysed morphological features and rules at multiple levels: projection class, projection type, projection patterns (such as convergent or divergent projection, feedforward or feedback projection, and total number of projection targets), regional difference, topography and individual cell variability (Fig. 1b).
At the higher levels, neurons from the 8 transcriptomic subclasses exhibit highly distinct projection patterns and correspond well to 5 projection classes and 11 projection types. The split of the L5 ET subclass into medulla (MY)-projecting and non-MY-projecting types corresponds to specific transcriptomic and epigenomic types within the L5 ET subclass described in other studies 4,36,37 . The split of the thalamic Prkcd_Grin2c subclass into core and matrix projection types is also consistent with transcriptomic clusters differentiating these thalamic nuclei (H.Z. et al., unpublished data; see also ref. 38 ). By contrast, the split of the Car3 subclass into L6 Car3 and CLA projection types is not associated with corresponding transcriptomic clusters.
Beyond these high-level divisions, morphological distinctions among the 11 projection types are reflected in multiple aspects. The average number of projection targets (each target is defined by having total axon length greater than 1 mm) 12 is highly characteristic of each projection type, with Car3 (in particular CLA) neurons having the highest number of targets, followed by L5 ET neurons, then by IT and thalamic matrix neurons, and thalamic core and CP neurons having the lowest. This distinction appears to be directly related to the differential Article projection patterns (divergent, parallel or convergent) among types. A major distinction between the two corticocortical projecting classes is that the CLA and cortical L6 Car3 neurons do not have collateral projections into striatum whereas IT neurons do. In addition, multiple morphological features distinguish among L2/3, L4 and L5 IT types, and between thalamic core and matrix types-some of these features are likely to be associated with the differential roles of these neuron types in mediating feedforward or feedback information flow.
We observe further morphological diversity within each projection type. Regional specificity is seen in all cortical and thalamic projection types containing neurons from different subregions. Topographic correspondence between soma locations and major axon arbors is seen in all projection types with sufficient numbers of reconstructions that allow such examination. Within each projection type that has divergent projection patterns, there is a high degree of variability among individual cells in choosing a subset of projection targets. Such individual variability appears stochastic, while it also allows further clustering of individual cells into target-driven subtypes as we have done in the most highly divergent Car3 and L5 ET projection types.
To directly compare the single-cell and population-level projection patterns, we identified 1,354 single-cell morphologies and 163 mesoscale experiments from the Allen Mouse Brain Connectivity Atlas,   matched on the basis of soma or injection site being within the same CCFv3 structure (Supplementary Table 3). In this location-matched dataset, the combined single-cell projection pattern from a given region (and cortical layer) is highly concordant with that of the mesoscale experiments, with a few exceptions (Extended Data Fig. 6). A minimal set of single cells can recapitulate the mesoscale pattern well; however, averaging across all single cells shows a low level of approximation, suggesting highly diverse projection patterns among the single cells.
Overall, this large set of long-range projection neurons displays a wide range of morphological and projectional diversity that can be described at multiple levels, revealing different rules that different neuron types follow. Morphological features at high levels are closely related to the neuron's molecular identities, whereas those at more refined levels may underlie the specific functional role of each neuron in the circuit it is embedded in.

Cortical L2/3, L4 and L5 IT neurons
All the cortical L2/3, L4 and L5 IT neurons have their long-range projections confined within cortex and striatum (Extended Data Fig. 7). We compared single neuron and population long-range projections for both L2/3 and L5 IT subclasses (Fig. 2a). All neurons within a type collectively recapitulate the population projection pattern, but each neuron selects a subset of projection targets. Such selection appears random without correlation to each neuron's soma depth or dendritic morphology. L5 IT neurons have significantly greater numbers of projection targets than L2/3 neurons, and this difference is particularly pronounced in SSp and SSs (Fig. 2a,b). L5 IT neurons also have longer total axon lengths in ipsilateral and contralateral cortex and striatum (Fig. 2b). L4 neurons exhibit a notable regional difference; all but two SSp L4 neurons have only local axons but no long-range projections, whereas nearly all L4 neurons in SSs, MOp and MOs do have axon projections outside of their local area (Extended Data Fig. 7).
We examined both local and distal axon projections to see whether axon termination patterns of single cells recapitulate the overall feedforward or feedback projection patterns apparent at the population level 33 . Quantitative vertical profiles of local axons (Fig. 2c) show that, in addition to their within-layer collaterals, L2/3 cells also project downward into L5, whereas L5 IT cells have local projections up to L1. This difference is consistent across all cortical areas examined.
Given that each IT neuron projects to only a subset of their potential intracortical targets, we pooled all the distal axon arbors from all the cells within the L2/3 or L5 IT subclass and within a source region to generate a cumulative vertical profile of laminar distribution pattern (Fig. 2d, Extended Data Fig. 8). In SSp and SSs, distal axon terminals of L2/3 cells are concentrated in middle layers (L2/3-5), whereas those of L5 IT cells preferentially target L1, suggesting feedforward and feedback roles for these L2/3 and L5 IT cells, respectively. Notably, in MOp and MOs, distal axon terminals of both L2/3 and L5 IT cells preferentially target L1 and to a lesser extent L2/3, suggesting that L2/3 IT cells in motor cortex may also have a feedback role onto other cortical regions. This cell-type difference between somatosensory and motor cortices is consistent with the differential positions of these regions in the hierarchical cortical network 33 (that is, MOp and MOs are higher than SSp and SSs).
Overall, these analyses reveal major projectional differences both among the L2/3, L4 and L5 IT subclasses and among the cortical regions, as well as individual cell-to-cell variations within each subclass and each region.
We also investigated the projection target specificity of transcriptomic cell types using Retro-seq 4 , in which the transcriptomes of 1,134 retrogradely labelled neurons from SSp, SSs, MOp and MOs were mapped to our established transcriptomic taxonomy across the entire isocortex and hippocampal formation 31 (Extended Data Fig. 9, Supplementary Table 4). For each source region, within each of the L2/3, L4, and L5 IT subclasses, neurons labelled from different injection targets were mostly mapped to a common subset of transcriptomic types, with little between-target difference, suggesting that within each IT subclass, projection pattern at single-cell level does not correlate one-to-one with the cell's transcriptomic type.
Cortical L5 ET neurons L5 ET neurons exhibit extensive heterogeneity in their selected subset of projection targets ( Supplementary Fig. 1). To search for patterns in this diversity, we clustered all L5 ET neurons (n = 193) together and found that cluster segregation is mainly driven by projection targets in thalamus (TH), midbrain (MB) and MY (Fig. 3). A major division is between neurons mainly projecting to MY and MB structures such as midbrain reticular nucleus (MRN) and superior colliculus, motor related (SCm) (clusters 5-6) and neurons mainly projecting to TH (the other clusters). The MY-and MRN-projecting neurons are further subdivided into those preferentially projecting to zona incerta (ZI) (cluster 5) or SCm (cluster 6). The TH branch is subdivided into those mainly projecting to ventral anterior-lateral complex (VAL) (cluster 1), posterior complex (PO) (clusters 2-4) or ventral posteromedial nucleus (VPM) and ventral posterolateral nucleus (VPL) (clusters 7-8). In addition to PO, cluster 3 neurons also project to mediodorsal nucleus (MD), central lateral nucleus (CL) and parafascicular nucleus (PF). Other brain regions such as CP, GPe, SNr and pontine grey (PG) are shared projection targets among all or most L5 ET neurons. L5 ET neurons belonging to different clusters are intermingled in the cortical regions they come from (Extended Data Fig. 10a). At the same time, most MOp and MOs neurons are found in the more complex, MY and MRN-projecting clusters 5 and 6, whereas the other clusters with simpler, TH projections mainly contain SSp and SSs neurons, revealing a regional difference and suggesting that medulla projection may be primarily a feature of MOp and MOs neurons (Fig. 3a).

Cortical and claustral Car3 neurons
All CLA and cortical (CTX) L6 Car3 neurons project extensively in cortex, but they do not project into striatum. Clustering on the 99 Car3 neurons from all regions identified 13 clusters (Fig. 4a, b). The first major division is between CLA and L6 Car3 neurons. CLA neurons have greater total axon lengths and higher numbers of projection targets (Fig. 4c).
CLA neurons often have long-distance projections, predominantly targeting prefrontal, medial and lateral association cortical areas as well as entorhinal cortex, whereas CTX L6 Car3 neurons mostly project to nearby cortical areas or homotypic cortical areas on the contralateral side ( Fig. 4b, d). A prominent observation is that both CLA and CTX L6 Car3 clusters are arranged topographically from anterior to posterior parts, based on both soma location and projection targets-each cluster contains a group of neurons that are located close to each other and project to similar cortical target areas (Fig. 4d).
In our single-cell transcriptomic taxonomy of the mouse cortex and hippocampus 31 , the Car3 subclass is highly distinct from other glutamatergic neuron subclasses (Extended Data Fig. 11a). Single-cell transcriptomes of CLA neurons also mapped exclusively to this subclass. We performed Retro-seq on cells isolated from CLA (n = 240) and cortical areas SSs (n = 11) and temporal association area (TEa)-perirhinal area (PERI)-ectorhinal area (ECT) (n = 35) that were labelled by retrograde tracers injected into far-apart cortical areas (Supplementary Table 4). The Retro-seq cells mapped to the Car3 subclass are concentrated in one of the three clusters (Extended Data Fig. 11b). These results suggest a lack of regional distinction and that these cortical and claustral cells are highly related to each other, possibly having a common developmental origin. In an attempt to reveal subtler transcriptomic differences, we re-clustered all the non-Retro-seq Car3 cells (n = 1,699) from cortex and CLA (Supplementary Table 4), resulting in 8 clusters, and then remapped all the Retro-seq cells to the new clusters (Extended Data Fig. 11c, d). The CLA and CTX L6 Car3 Retro-seq cells projecting to different cortical areas are again distributed across a similar set of clusters, indicating no clear one-to-one correspondence between transcriptomic clusters and projection target specificity.

Article
Thalamic core and matrix neurons Single thalamic sensory-motor relay neurons usually have one major axon arbor targeting their corresponding primary sensory or motor cortex. Axons from these nuclei terminate predominantly in L4, consistent with the core-type classification (Extended Data Fig. 12a-e, Supplementary Fig. 2). A small fraction of the core-type thalamic neurons have more than one axon arbor targeting different cortical areas. We analysed morphometric features of 944 axon arbors from 586 neurons located in the sensory thalamic nuclei VPM, VPL, lateral geniculate complex, dorsal part (LGd) and medial geniculate complex (MG) (Extended Data Fig. 13). We identify two major types of axon arbors targeting cortical layer 4 (and lower L2/3), a smaller type 1 and a larger type 2. Thus, core-type neurons can be assigned to either small-arbor or large-arbor subtype.
Outside the sensory-motor relay nuclei, nearly all reconstructed thalamic neurons have a large, diffusely branched axon arbor and/or several arbors projecting to different cortical areas, often with columnar or L5-dominant axon termination patterns. Many (79%) of these cells also have axon branches more than 1 mm long in L1, consistent with the matrix type, but they also exhibit a diverse range of morphological patterns (Extended Data Fig. 12e-h, Supplementary Fig. 3). Some nuclei (for example, lateral posterior nucleus (LP) and mediodorsal nucleus (MD)) further display distinct subdivisions.    A quantitative interareal projection matrix (Extended Data Fig. 12i, Supplementary Tables 2, 3) further demonstrates the distinction between core-and matrix-type neurons, with the core-type neurons predominantly targeting a single cortical area (sometimes with a secondary area) and the matrix-type neurons targeting multiple cortical areas. Within each nucleus, individual neurons show a high (for core cells) or moderate (for matrix cells) degree of consistency with each other and with the population projection pattern for that nucleus.

Striatal medium spiny neurons
Individual striatal MSNs project to GPe or SNr in a simple point-to-point fashion, each with one major axon arbor. Most SNr-projecting neurons also send minor collaterals to globus pallidus, internal segment (GPi) and/or GPe. The dominant feature of both types of striatal neurons is convergent projection within the main target region, GPe or SNr, consistent with the approximately 20-fold smaller sizes of these regions compared to the dorsal striatum (Extended Data Fig. 14a-c). Between each pair of neurons, the centre-to-centre distance of their axon arbors increases proportionally and the degree of overlap between the axon arbors decreases exponentially, along with the increase of the pair's soma-to-soma distance (Extended Data Fig. 14c), indicating a regular spatial organization of these neurons' axon projections. The axon arbor distances between striatal neurons within the same type are substantially smaller and their overlap scores are substantially higher than those of neurons from various thalamic nuclei (Extended Data Fig. 14d). Furthermore, axon arbors in GPe or SNr can be grouped into domains on the basis of the degree of overlap; these domains are arranged topographically and correspond to the topographic localization of the somas in striatum (Extended Data Fig. 14e).

Topographic organization of axon projection
Single-neuron reconstructions enabled us to investigate the topographic relationship between soma locations and main axon arbors of several projection neuron types for which we have sufficiently large numbers of reconstructions, and soma-axon arbor spatial topographic relationships were found in all cases examined (Extended Data Fig. 15). For neurons in LGd, VPM, and VPM and VPL combined, axon arbor positions along anterior-posterior and lateral-medial dimensions in each of the cortical target areas correspond to soma positions along ventraldorsal and medial-lateral dimensions in each thalamic nucleus, indicating a three-dimensional rotation of axon projections (Extended Data Fig. 15a-c). This topographic relationship is most clearly seen in VPM and VPL neurons, whereas it is more complex for LGd neurons. There is a similar topographic rotational relationship for SSp L5 ET neurons between their soma locations in SSp and axon arbors in VPM and PO, although the relationship also appears fuzzier (Extended Data Fig. 15d). For striatal neurons, GPe-projecting MSNs maintain the dorsal-ventral, lateral-medial and anterior-posterior orientations between their somas in CP and axon arbors in GPe, whereas SNr-projecting MSNs exhibit a flip between their soma and axon arbor positions in the dorsal-ventral axis (Extended Data Fig. 15e, f), consistent with previous bulk tracing studies 39 . These single-cell results reveal yet another level of organization of cell-type -specific axon projection patterns.

Discussion
To fully understand the anatomical diversity and specificity of individual neurons across the mammalian brain, a very large number of neurons will need to be examined. Approaches such as MAPseq and BARseq 12,40,41 can quickly survey projection patterns at regional level for many neurons in a high-throughput manner. However, many essential details can a b Article be obtained only through complete morphological reconstructions. Collecting such ground-truth data provides a unique opportunity to uncover principles of neuronal diversity and circuit organization. We systematically examined multiple levels of morphological properties in this large set of complete reconstructions with the goal of deriving organizational rules governing long-range axon projections, incorporating cross-modality relationship between transcriptomic and morphological properties. At the highest level, there is a high degree of concordance between major transcriptomic and projection neuron types. Neurons belonging to different transcriptomic subclasses have highly distinct morphological and projectional properties. Additionally, the medulla-projecting and non-medulla-projecting L5 ET neuron types and the core and matrix thalamocortical projection types also correlate with different molecular types as shown in other studies [36][37][38] . An exception at this level is the claustral and cortical L6 Car3 neurons which have distinct projection patterns but appear transcriptomically homogeneous.
At the intermediate level, within each projection neuron type, neurons follow region-specific and topographic organizational rules. Our latest transcriptomic study showed that L2/3 IT, L4 IT, L5 IT and L5 ET transcriptomic types are largely shared between somatosensory and motor cortical areas with some continuous gradient variations, and the Car3 transcriptomic types are also shared among all lateral cortical areas and claustrum 31 . By contrast, here we show for all types that, within each type, neurons from different regions have distinct sets of projection targets that are region-specific. Furthermore, in each type and region examined we observe a topographic relationship between soma locations and axon arbor distributions in a main target region. The most prominent example is the cortical and claustral Car3 neurons whose extensive variation of axon projections is linked to both regional specificity and topography, two closely related factors as these neurons are situated at the lateral part of the cortical sheet along almost the entire anterior-posterior extent.
At the lowest, single-cell level, the degree of similarity or variability between individual neurons within a given type also varies across types. Within-type individual cell variability is high in cortical and claustral neurons, moderate in thalamic matrix neurons, and low in thalamic core neurons and striatal MSNs.
A major question is how morphological and projectional properties compare and correlate with the neurons' molecular identities. We attempted to address this question with two approaches: using validated driver lines to define the subclass-level molecular identities of reconstructed neurons and using Retro-seq to obtain transcriptomic profiles of neurons projecting to specific targets. Both approaches show that subclasses of neurons have highly distinct morphological and projectional properties; however, within these major types, especially for cortical and claustral neurons, many aspects of morphological diversity cannot be accounted for by currently identified transcriptomic    (6) C6 (12) C9 (7) C7 (4) C 8 (7) C3 (5) C 4 (9) C2 (8) C12 (7) C1 (7) C10 (5) C 11 (10) C13 (9) subtypes or clusters in the adult stage. Previous studies showed that L2/3 SSp pyramidal neurons projecting to MOp or SSs have distinct intrinsic and network physiological properties 42,43 . Even though they may not belong to distinct transcriptomic subtypes, it will be interesting to examine gene-expression differences that might correspond to the differential connectional and physiological properties for these neurons, as found for primary visual cortical neurons projecting differentially to medial or lateral higher visual areas 44 . Several mechanisms may explain the origin of the morphological diversity, such as molecular instructions that act transiently during development 45 , activity-dependent cell interactions, or stochastic processes. It will be informative to develop methods that enable complete reconstruction of morphology and in-depth gene-expression profiling to be conducted on the same cell, and apply them to single cells in both adult stage and during brain development, so that the developmental correlations of molecular and morphological and connectional features can be identified.

Online content
Any methods, additional references, Nature Research reporting summaries, source data, extended data, supplementary information, acknowledgements, peer review information; details of author contributions and competing interests; and statements of data and code availability are available at https://doi.org/10.1038/s41586-021-03941-1.

Animal care and use
Both male and female transgenic mice from at least postnatal day 56 (P56) were used for all experiments. All animals were housed 3-5 per cage and maintained on a 14 h:10 h light:dark cycle, in a humidity-and temperature-controlled room (humidity at ~40%, temperature at ~21 °C) with water and food available ad libitum. All experimental procedures related to the use of mice were conducted with approved protocols in accordance with NIH guidelines, and were approved by the Institutional Animal Care and Use Committee (IACUC) of the Allen Institute for Brain Science.
Transgenic mice and sparse labelling All transgenic crosses are listed in Supplementary Tables 1, 4. Data for systematic characterization of the expression pattern of each transgenic mouse line can be found in the Allen Transgenic Characterization database (http://connectivity.brain-map.org/transgenic/search/basic).
Induction of CreERT2 driver lines was done by administration by oral gavage of tamoxifen (50 mg ml −1 in corn oil) at original (0.2 mg g −1 body weight) or reduced dose for 1 d in an adult mouse. The dosage for mice aged P7-P15 is 0.04 ml. Mice can be used for experiments at two or more weeks after tamoxifen dosing. We found optimal tamoxifen doses for sparse labelling in each case using serial two photon tomography (STPT) 10,33 to quickly screen for brain-wide transgene expression. The specific dose of tamoxifen to induce sparse labelling in each CreERT2 driver line is shown in Supplementary Table 1.

fMOST imaging
In summary, a GFP-labelled brain is first embedded in resin. The resin-embedded GFP fluorescence can be recovered through chemical reactivation 47 provided by adding Na 2 CO 3 in the imaging water bath. Thus, a line-scanning block-face imaging system can be used to maximize imaging speed. Following imaging of the entire block face, the top 1 µm of tissue is sliced off with a diamond knife, exposing the next face of the block for imaging. For the entire mouse brain, a 15-20 TB dataset containing about 10,000 coronal planes of 0.2-0.3 µm xy resolution and 1 µm z sampling rate is generated within 2 weeks.
All tissue preparation has been described previously 48 . Following fixation, each intact brain was rinsed 3 times (6 h for two washes and 12 h for the third wash) at 4 °C in a 0.01 M PBS solution (Sigma-Aldrich). Then the brain was subsequently dehydrated via immersion in a graded series of ethanol mixtures (50%, 70% and 95% (vol/vol) ethanol solutions in distilled water) and the absolute ethanol solution three times for 2 h each at 4 °C. After dehydration, the whole brain was impregnated with Lowicryl HM20 Resin Kits (Electron Microscopy Sciences, cat.no. 14340) by sequential immersions in 50, 75, 100 and 100% embedding medium in ethanol, 2 h each for the first three solutions and 72 h for the final solution. Finally, each whole brain was embedded in a gelatin capsule that had been filled with HM20 and polymerized at 50 °C for 24 h.
The whole brain imaging is realized using a fMOST system. The basic structure of the imaging system is the combination of a line-scanning upright epifluorescence microscope with a mechanical sectioning system. This system runs in a line-scanning block-face mode but updated with the principle of chemical sectioning to obtain better image contrast and speed and thus enables high-throughput imaging of the fluorescent-protein-labelled sample (manuscript in preparation). Each time we do a block-face fluorescence imaging across the whole coronal plane (xy axes), then remove the top layer (z axis) using a diamond knife, and then expose next layer, and image again. The thickness of each layer is 1.0 µm. In each layer imaging, we used a strip-scanning (x axis) model combined with a montage in the y axis to cover the whole coronal plane 49 . The fluorescence, collected using a microscope objective, passes a bandpass filter and is recorded with a TDI-CCD camera. We repeat these procedures across the whole sample volume to obtain the required dataset.
The objective used is a 40× water-immersion lens with numerical aperture 0.8 to provide a designed optical resolution (at 520 nm) of 0.35 µm in the xy axes. The imaging gives a sample voxel of 0.35 × 0.35 × 1.0 µm to provide proper resolution to trace the neural process. The voxel size may vary for difference objectives. Other imaging parameters for GFP imaging include an excitation wavelength of 488 nm, and emission filter with passing band 510-550 nm. The fMOST is a two-colour imaging system. The green channel is used to obtain the complete morphology of neurons, and the red channel is used to obtain the cellular architecture information of propidium iodide (PI) staining 20 .

Complete neuronal morphology reconstruction
We used Vaa3D, an open-source, cross-platform visualization and analysis system, for the tasks of reconstructing neuronal morphologies. To efficiently and effectively deal with the mouse whole-brain imaging data, we incorporated several enabling modules into Vaa3D, such as Tera-Fly, TeraVR and a number of other supporting tools. TeraFly 50 supports visualization and annotation of multidimensional imaging data with virtually unlimited scales. A user can flexibly choose to work at a specific region of interest (ROI) with desired level of detail (LoD). TeraVR 51 is an annotation tool for immersive neuron reconstruction that has been proved to be critical for achieving precision and efficiency in morphology data production. It creates stereo visualization for image volumes and reconstructions and offers an intuitive interface for the user to interact with such data. Both TeraFly and TeraVR are seamlessly integrated in Vaa3D and can be used combinedly and flexibly. From reconstructions (in SWC file format), morphological quantification statistics is obtained to characterize neurons. A quality control process identifies errors based on morphological indicators and does corrections in a feedback setting. The quality control process then refines the skeleton location with Mean-Shift 52 and performs pruning focused on terminal location refinement. When needed, auto-refinement fits the tracing to the centre of fluorescent signals. The whole process ends with SWC resampling and registration. The final reconstruction of each neuron is a valid single tree without breaks, loops, multiple branches from a single point, and so on.

Registration to CCF
We used mBrainAligner based on BrainAligner 53 to perform 3D registration from fMOST images (subject) to the average mouse brain template of CCFv3 (target) (Extended Data Fig. 4a). The main steps are: (1) fMOST images were first downsampled by 64 × 64 × 16 (x × y × z) to roughly match the size of the target brain. (2) The stripe artefacts in fMOST images from diamond-knife cutting and the imaging process were eliminated by using log-space frequency notch filter. (3) The dense outer-contour feature points of target and subject brain (about 1,500 points per brain) were uniformly sampled from the brains' outer contour obtained using adaptive threshold, and then affine-aligned using a reliable landmark points matching algorithm to ensure the subject brain has the same position, orientation and scale as the target brain. (4) Intensity was normalized by matching the local average intensity of subject image to that of target image in a sliding window manner with patch size 41 × 41 × 41 and stride 1. (5) For the target brain, 1,744 landmarks corresponding to the points of high curvature (corners or junction of different brain compartments) in CCFv3 annotation image were detected via 2.5D Harris corner detector. On the basis of a combination of texture, shape context and deep-learning-derived features, mBrainAligner established the correspondence between target and subject brain by iteratively deforming these target landmarks to fit the subject image, and accomplished the local alignment using the smooth-thin-plate-spline (STPS). (6) Finally, to ensure the accuracy of registration, automatic registration results were examined in the semi-automatic registration module of mBrainAligner, and if necessary, the boundaries of the brain region were further optimized in a manual or semi-automatic way. Once images were aligned, the reconstructed neurons and somas were warped to CCF space using the generated deformation fields.

Processing single-cell morphology data
Pre-processing of SWC files: SWC files were processed and examined with Vaa3D plugins to ensure topological correctness: sorted single tree with root node as soma. Terminal branches < 10 pixels were pruned to remove artifacts. SWC files were resampled with a step size of 64 (x), 64 (y) and 16 (z) before registration.
Quantification of axon projection patterns: to analyse the distribution and amount of axon in brain-wide targets following registration to the CCFv3, we used a manually curated set of 316 non-overlapping structures at a mid-ontology level that are most closely matched in size or division. Ipsilateral and contralateral sides of brain regions were calculated separately.
Morphological features: axonal and dendritic morphological features, defined according to L-measurement 54  Local axons were defined as axon arbors within 200 µm from the somata. Local axons and dendrites were rotated based on principal component analysis (PCA) so dimensions were aligned with the largest to smallest spans. Then shifting was performed to localize somata at the origin of coordinates.

mBrainAnalyzer
The mBrainAnalyzer toolbox (also named neuro_morpho_toolbox), which was developed for analysis of full neuron morphology, includes multiple modules for feature quantification, arbor detection, statistical analysis and visualization. In addition to morphological features (for example, total length, angle of branches, and so on), this toolbox also quantifies projection intensities at branch length level and number of terminal levels. Using the arbor detection module, one can define sub-cellular components of a neuron as the granularity. Analysis and visualization can be performed at both whole-cell and arbor levels.

Arbor detection and partition
We detected and partitioned a series of neuronal arbors out of each neuron reconstruction using a graph-partition clustering method. First, as a neuron consists of a number of topologically connected reconstruction nodes, the neuron was viewed as a graph, where every reconstruction node (unit) in the neuron was connected with its parent node with an edge specified by the topological connection of the parent-child pair with the edge weight, or 'similarity' s, set to be the exponential of the negative 3D Euclidean distance, d, of these two nodes, that is, s = exp(−d). Then, we considered the normalized graph-cut method 55 to extract 'clusters' of reconstruction nodes so that the within-cluster 'total similarity' of nodes would be maximized and cross-cluster total similarity would be minimized. As a result, each such coherent cluster corresponds to one neuron arbor, which was also visually checked to ensure its correctness. Third, to automatically determine the number of such clusters, for a presumed number of clusters, we calculated the normalized score of total cross-cluster similarity divided by the total within-cluster similarity, followed by trial testing a range (from 2 to 8) of such presumed cluster-numbers to determine the optimal number that would minimizes this normalized score. In the final result, the detected arbor that contains the soma is called soma arbor; the remaining arbors are called non-soma arbors.

Feature quantification of cortical arbors
We divided the cortex into consecutive 100-µm thick coronal slices. Anchor points were evenly sampled along the outer border of each slice, with normal vectors perpendicular to the local cortical surface Article and pointing to the inside of the brain. Nodes of arbors were assigned to their neighbour anchors and projected onto the surface by corresponding normal vectors. Depth of nodes were determined by the length of projection along normal vectors. We also estimated the area of an anchor by their distance to neighbour anchors and slice thickness. The 2D cortical area of an arbor was determined by the total areas of unique anchors occupied by its nodes. To determine the radius of an arbor, we assigned arbor 'centre' as the node that has the shortest average distance to other nodes. Radius was determined by a growing sphere until 70% of segments were inside it. For neurons with tufted apical dendrite, we vertically shifted the arbors, so the top of apical dendrites reached L1. We manually confirmed that all tufted apical dendrites reached L1 in the original image.

Clustering of cortical axon arbors of core-type thalamic neurons
For local (soma-neighbouring) arbors, the following features were used for clustering: '2d_area', 'total_length', 'radius', 'depth_mean', 'depth_std'. We performed PCA to reduce the effect of noise. Top principal components (PCs) were selected to recover 95% of variance. We applied uniform manifold approximation and projection (UMAP) dimension reduction using the Python package 'UMAP' 56 . The 'n_neighbors' parameter was set at 15. k-Means clustering was performed using the UMAP embeddings as input.

Neuron-beta
We developed the Neuron-beta metric by borrowing the concept of the beta value from the finance field 57 . For each group, defined by brain areas and/or cortical layers, we calculated the average of mesoscale experiments as M = [m 1 , …, m p ], p = number of brain areas. For one single cell S = [s 1 , …, s p ], we define the neuron-beta value as:

Clustering of cortical L5 ET neurons
Projection strength is defined as ln(axon length in mm). Strength values for regions with axon length below 1 mm were set as 0. Only non-cortical areas were included. Dimension reduction was performed by PCA followed by 2D UMAP. Top PCs which explained > 90% variance were used as input of UMAP. Hierarchical clustering was performed using UMAP embeddings. Minimum branch length for clusters was manually determined.

Clustering of cortical L6 Car3 and claustral neurons
Data normalization: morphological features were normalized by the mean and standard variation in a feature-wise manner. Projection pattern features were defined as ln(axon length in mm). For regions with axon length below 1 mm, projection pattern feature values are set as 0. Soma locations were flipped to the same hemisphere. Similarity metrics: for each feature set, we first calculated the Euclidean distance matrix. Then a ranked k-nearest neighbour (KNN) matrix was created. We then applied the shared nearest neighbor (SNN) approach to measure the similarity between each pair of samples x i and x j . The SNN metric was defined as the maximum average rank among their common neighbours: where NN is nearest neighbour and v is a neuron from the dataset. Similarity scores were set as 0 for pairs with non-overlapping KNN sets and a weighted SNN graph was created. Co-clustering analysis: the co-clustering matrix for each feature set was calculated by iterative random sampling. During each iteration, 95% of samples were randomly selected to create an SNN graph. We then applied the Fast-greedy community detection algorithm using the Python package python-igraph for clustering assignment. For each pair of samples, the co-clustering score was defined as the times of co-clustering normalized by the iterations of co-occurring. Resampling was performed 1,000 times to reach saturation. The overall co-clustering matrix is a weighted average of the four feature sets. Agglomerative clustering was performed on the co-clustering matrix to get clusters.
Outlier removal: outliers were detected by comparing the Euclidean distance between a sample and the other samples with the same cluster identity. We used overall within-cluster distance as the background distribution. Samples with significantly higher (one-sided Mann-Whitney test) within-cluster distance were filtered out as outliers. Agglomerative clustering was performed for the remaining co-clustering matrix. This process iterated until no new outlier could be detected.
Characterization of cell types: for each feature set, we performed two-sided Mann-Whitney tests: claustrum versus cortical neurons; each cluster versus other clusters. P-values were adjusted by Bonferroni correction.

Anterograde tracing and retrograde labelling
For anterograde projection mapping, we injected AAV2/1-pCAG-FLEX-EGFP-WPRE-pA into CLA, SSs or SSp of Gnb4-IRES2-Cre or Gnb4-IRES2-CreERT2 mice at P37-P65. Stereotaxic injection procedures were performed as previously described 10 . For the Gnb4-IRES2-CreERT2 mice, tamoxifen induction was conducted one week after injection at full dose (0.2 mg per g body weight) for 5 consecutive days. Mice survived 3 weeks (or 4 weeks for the tamoxifen-induced mice) after injection, and brains were perfused and collected for TissueCyte imaging.
scRNA-sequencing, clustering and mapping Cells from transgenic mice or transgenic mice injected with retrograde tracers were collected by microdissection of different cortical regions. Single-cell suspensions were generated and cells were collected using fluorescence activated cell sorting (FACS). FACS gates were selective for cells with fluorescent protein expression from transgenic and/or viral reporters.
Cells were then frozen at −80 °C and were later processed for scRNA-seq using the SMART-Seq v4 method 4 . After sequencing, raw data was quantified using STAR v2.5.3 61 and were aligned to both a Ref-Seq transcriptome index for the mm10 genome, and a custom index consisting of transgene sequences. PCR duplicates were masked and removed using the STAR option bamRemoveDuplicates. Only uniquely aligned reads were used for gene quantification. Gene read counts were quantified using the summarizeOverlaps function from R GenomicAlignments package (RRID: SCR_018096) 62 using both intronic and exonic reads, and quality control was performed as described 4 .
Clustering was performed using in-house developed R package scrattch.hicat (available via GitHub: https://github.com/AllenInstitute/ scrattch.hicat). The Retro-seq cells where mapped to the most correlated cell types in the Cortex/HPF taxonomy 31 based on a set of 5,981 cell-type-specific markers using the map_sampling function from the scrattch.hicat package. Only the SMART-Seq dataset from the reference taxonomy is used for mapping. All the cells from CLA were mapped to the Car3 subclass. However, CLA cells were not included as part of the Cortex/HPF taxonomy. To examine more closely the cell-type diversity, we re-clustered all the original SMART-Seq cells within the Car3 subclass together with the mapped cells from CLA (Supplementary Table 4; CLA is part of CTXsp), resulting in 8 clusters. The cortical and CLA Retro-seq cells previously mapped to the Car3 subclass were then re-mapped to the new clusters, using 277 marker genes that discriminate these 8 clusters.

Reporting summary
Further information on research design is available in the Nature Research Reporting Summary linked to this paper.

Data availability
The fMOST image datasets (https://download.brainimagelibrary.org/ biccn/zeng/luo/fMOST/) of all mouse brains used in this study, as well as the original and CCFv3 registered single neuron reconstructions (https://doi.org/10.35077/g. 25), are available at BICCN's Brain Image Library (BIL) at the Pittsburgh Supercomputing Center (www.brainimagelibrary.org). The single-neuron reconstructions, the CCFv3 registered version of these reconstructions, as well as 3D navigation movie gallery of these data are available at SEU-ALLEN Joint Center, Institute for Brain and Intelligence (https://braintell.org/projects/ fullmorpho/). Mesoscale AAV-tracing data (including high resolution images, segmentation, registration to CCFv3 and automated quantification of injection size, location, and distribution across brain structures) are available through the Allen Mouse Brain Connectivity Atlas portal (http://connectivity.brain-map.org/). Expression patterns of transgenic mouse lines can be found in the Allen Transgenic Characterization database (http://connectivity.brain-map.org/transgenic/search/basic). Retro-seq SMART-Seq v4 data are available at the NCBI Gene Expression Omnibus (GEO) under accession GSE181363.

Article
Extended Data Fig. 2 | Sparse, robust and consistent labeling and visualization of the dendritic and axonal arborizations of a wide range of neuronal types. Images shown are 100-µm maximum intensity projection (MIP) images (i.e., projected from 100 consecutive 1-µm image planes). Arrowheads indicate observed terminal boutons at the end of the axon segments. Number of fMOST imaged brains per mouse line and tamoxifen induction conditions are shown in Supplementary Table 1. Scale bars, 100 µm. a, Cortical L2/3/4 IT neurons and their extensive local axon collaterals clearly labeled in a Cux2-CreERT2;Ai166 brain. b, Cortical L5 IT neurons and their local axon collaterals seen in a Plxnd1-CreER;Ai166 brain. Striatal medium spiny neurons (STR MSN) are also sparsely labeled, and their individual axons are clearly seen in substantia nigra, reticular part (SNr). c, Cortical L5 ET neurons and their sparse local axon collaterals seen in a Fezf2-CreER;Ai166 brain. d, Cortical inhibitory basket cells (BC) and translaminar basket cells (t-BC), as well as L5 ET excitatory neurons, seen in a Pvalb-T2A-CreERT2;Ai166 brain. The L5 ET neurons form driving-type axon clusters with large boutons in the thalamus (TH). e, Cortical L6 CT neurons and their characteristic apical dendrites not reaching L1, as well as local axon collaterals and long-range axon projections into thalamus (TH), labeled in a Tle4-CreER;Ai166 brain. f, Cortical L6b neurons and their local axon projections up into L1 seen in a Nxph4-T2A-CreERT2;Ai166 brain. g, Gnb4+ claustral (CLA) and cortical (L6 Car3) neurons with their widely dispersed axon fibers seen in a Gnb4-IRES2-CreERT2;Ai140;Ai82 brain. h, Cortical inhibitory Martinotti cells (MC) and hippocampal CA1 OLM cells labeled in a Sst-Cre;Ai166 brain. i, Thalamic projection neurons (TH PN) with their dense axon terminal clusters in cortex seen in a Tnnt1-IRES2-CreERT2;Ai82;Ai140 brain. j, In a Vipr2-IRES2-Cre-neo;Ai166 brain, axon clusters from projection neurons in visual thalamic nuclei are seen in cortex (CTX), and a cortical chandelier cell (ChC) is fully labeled with its characteristic axonal branches. Vipr2-IRES2-Cre-neo;Ai166 also labels axons consistent with projections from retinal ganglion cells 63 , which are not shown here. k, Noradrenergic neurons labeled in the locus ceruleus (LC), and their long-range axon fibers seen in CTX and hypothalamus (HY) in a Dbh-Cre_ KH212;Ai166 brain. l, Serotonergic neurons labeled in the dorsal raphe (DR), and their long-range axon fibers seen in hippocampus (HIP) and CTX in a Slc6a4-CreERT2_EZ13;Ai166 brain. Overall, it is apparent that these neurons display a remarkable array of dendritic and axonal morphologies. Specifically, in these sparsely labeled brains, cortical IT, ET and CT neurons not only have primary long-range projections but also local axonal branches that are well segregated and clearly identifiable, enabling truly complete reconstruction of the entire local and long-range, cortical and subcortical axonal arborization (a-e). L5 ET neurons form the 'driving' type of synapses in the thalamus 34,64 , which have enlarged and intensely fluorescent boutons (d). L6b subplate neurons extend their local axon collaterals upwards into layer 1 (f). The axons of thalamic projection neurons form either dense or dispersed clusters in the cortex (i, j). On the other hand, claustral, noradrenergic and serotonergic neurons have widely dispersed, thin axons that are nonetheless well labeled (k, l). One can also clearly see individual axons in the substantia nigra from striatal medium spiny neurons (b), as well as dense and fine local axonal branches of a variety of cortical interneurons (e.g., basket cells, Martinotti cells and chandelier cells) (d, h, j). Of note, sparsely labeled neurons were frequently observed in other regions of the brain for all of these crosses but are not described in detail here. Each of these brains contains ~100-1,000 labeled neurons (Supplementary Table 1). Thus, tens of thousands of neurons could be reconstructed from these and newly generated datasets in the coming years. The whole brain image datasets are publicly available as a unique resource for the community. Fig. 3 | Platform and workflow of the brain-wide complete morphology imaging, reconstruction, registration and analysis pipeline. Each fMOST dataset is first converted to a multi-level navigable TeraFly dataset using TeraConverter, the data formatting tool in the Vaa3D-TeraFly program 50 , which allows smooth handling of terabyte-scale datasets. Neuron visualization and reconstruction is then carried out on the TeraFly files. A series of tools, especially those based on the "Virtual Finger" method 65 , were developed within Vaa3D to facilitate semi-automated and manual reconstruction. Further, a virtual reality (VR) environment created within Vaa3D, TeraVR, significantly enhances a user's ability to see the 3D relationships among intertwined axonal segments, improving precision and efficiency of reconstruction 51 . Annotators work in the TeraVR annotation system to reconstruct the full morphology of each neuron. After quality control (QC) and manual correction, Vaa3D's deformable model is used to automatically fit the tracing to the center of fluorescent signals. The final reconstructed morphology is completed as a single tree without breaks, loops, or trifurcations. All these data processing, reconstruction, and workflow control processes are managed using a specific software system for massive scale data production. In parallel, each fMOST dataset is registered to CCFv3 using mBrainAligner, which uses both CLM (Coherent-Landmark-Matching) and LQW (Little-Quick-Warp) modules in brain alignment. Following registration of the whole-brain image dataset to CCFv3, all the reconstructed morphologies from the same brain are also registered for subsequent visualization and quantitative analysis. Registration to CCFv3 enables digital anatomical delineation and spatial quantification of each reconstructed morphology and its compartments (e.g., soma, dendrites, axon arbors). Since neurons are reconstructed from different brains, co-registration to the CCFv3 allows them to be compared and analyzed using a unified framework, mBrainAnalyzer, which automatically detects the arbors of each neuron followed by mapping of these dendritic and axonal arbors onto the standardized CCFv3 space. Morphological features such as length, depth, area, etc., at the whole neuron level are also computed for each arbor-domain for analysis.

Corresponding author(s): Hanchuan Peng, Hongkui Zeng
Last updated by author(s): Aug 15, 2021 Reporting Summary Nature Research wishes to improve the reproducibility of the work that we publish. This form provides structure for consistency and transparency in reporting. For further information on Nature Research policies, see our Editorial Policies and the Editorial Policy Checklist.

Statistics
For all statistical analyses, confirm that the following items are present in the figure legend, table legend, main text, or Methods section.

n/a Confirmed
The exact sample size (n) for each experimental group/condition, given as a discrete number and unit of measurement A statement on whether measurements were taken from distinct samples or whether the same sample was measured repeatedly The statistical test(s) used AND whether they are one-or two-sided Only common tests should be described solely by name; describe more complex techniques in the Methods section.
A description of all covariates tested A description of any assumptions or corrections, such as tests of normality and adjustment for multiple comparisons A full description of the statistical parameters including central tendency (e.g. means) or other basic estimates (e.g. regression coefficient) AND variation (e.g. standard deviation) or associated estimates of uncertainty (e.g. confidence intervals) For null hypothesis testing, the test statistic (e.g. F, t, r) with confidence intervals, effect sizes, degrees of freedom and P value noted
For manuscripts utilizing custom algorithms or software that are central to the research but not yet described in published literature, software must be made available to editors and reviewers. We strongly encourage code deposition in a community repository (e.g. GitHub). See the Nature Research guidelines for submitting code & software for further information.

Data
Policy information about availability of data All manuscripts must include a data availability statement. This statement should provide the following information, where applicable: -Accession codes, unique identifiers, or web links for publicly available datasets -A list of figures that have associated raw data -A description of any restrictions on data availability Data availability statement is provided in the manuscript.