Learning cellular morphology with neural networks

Reconstruction and annotation of volume electron microscopy data sets of brain tissue is challenging but can reveal invaluable information about neuronal circuits. Significant progress has recently been made in automated neuron reconstruction as well as automated detection of synapses. However, methods for automating the morphological analysis of nanometer-resolution reconstructions are less established, despite the diversity of possible applications. Here, we introduce cellular morphology neural networks (CMNs), based on multi-view projections sampled from automatically reconstructed cellular fragments of arbitrary size and shape. Using unsupervised training, we infer morphology embeddings (Neuron2vec) of neuron reconstructions and train CMNs to identify glia cells in a supervised classification paradigm, which are then used to resolve neuron reconstruction errors. Finally, we demonstrate that CMNs can be used to identify subcellular compartments and the cell types of neuron reconstructions.

A dvances in volume electron microscopy (VEM) have led to increasingly large three-dimensional (3D) images of brain tissue, making manual analysis infeasible 1 . Multibeam scanning electron microscopes 2 and transmission electron microscopes equipped with fast camera arrays can now generate data sets exceeding 100 TB 3 , a development that was fortunately accompanied by substantial progress in neuron reconstruction [4][5][6][7][8][9] and the automatic analysis of synapses [10][11][12][13] . These advances have enabled automatic morphology analyses on the neuron (fragment) scale, which were previously restricted to direct segmentation error detection 5,14 or the use of manual skeletons with data-specific hand-crafted features 11,15,16 . Cell types and other biological properties that can be inferred from the morphology of a neuron are required for the interpretation of a connectome 11 and can also constrain automatic neuron reconstruction itself 17 .
Outside of the field of connectomics, many approaches have been developed for automated 3D shape analysis, including multi-view two-dimensional (2D) projection-based neural network models 18,19 as well as voxel-and point cloud-occupancybased 3D networks 20,21 . Interestingly, projection-based models often appear to outperform 3D architectures, possibly because of higher-resolution input due to decreased model complexity 18 .
Here we present cellular morphology neural networks (CMNs), which use multi-view projections to enable the supervised and unsupervised analysis of cell fragments of arbitrary size while retaining high resolution. First, we demonstrate that CMNs can be used to automate morphology feature extraction itself by inferring low-dimensional embeddings, dubbed Neuron2vec, through unsupervised triplet-loss training 22,23 . Second, we apply CMNs to the supervised classification of glia cells and use these data to demonstrate the feasibility and effectiveness of a simple top-down false merger resolution strategy. Third, we identify neuronal cell types and compartments, outperforming methods with hand-crafted features based on skeleton representations on the same data, and finally perform high-resolution cell surface segmentation to identify dendritic spines.

Results
Cellular morphology learning networks. CMNs are convolutional neural networks (CNNs) optimized for the analysis of multi-channel 2D projections of cell reconstructions, inspired by multi-view CNNs for the classification of objects fitting into projections from one rendering site 18,19 .
Su et al. 18 rendered views of an entire object, potentially sacrificing crucial detail when applied to reconstructions of entire neurons (or very large objects in general), which can have processes as thin as 50 nm extending over millimeters 24 . To address this problem, we used a sampling algorithm that homogeneously probes an entire cell at many locations ("Methods"; Fig. 1a, b) that are then analyzed either individually or in combination by a CMN. The neuron reconstructions were taken from a songbird basal ganglia data set and consisted of flood-filling network (FFN)-created supervoxels 25 (SVs), which were agglomerated to sets of super-SVs (SSVs) 8 , each corresponding to a single neuron. A rectangular field of view (FoV) was chosen for the projection to capture the elongated shape of most neuronal arbors more effectively. Additionally, we incorporated image channels beyond the cell shape (here represented through depth-map projections) of the same rendering perspective. This extension allows the CMN to analyze the geometry and density of objects contained in a cell, such as mitochondria and other organelles (Fig. 1c, d).
We implemented the view rendering engine with OpenGL and the neural network models using the ElektroNN neural network library (www.elektronn.org) and adapted the SyConn pipeline (https://github.com/StructuralNeurobiologyLab/SyConn/; see Supplementary Table 1 for a timing overview). The models were trained using standard loss optimization procedures (Adam or stochastic gradient descent) on various supervised and unsupervised tasks (see Fig. 1e for an application overview), which are described in the following to demonstrate the versatility of the approach.
Neuron2vec embedding. Supervised models often require hardto-obtain manual ground truth, making alternative objective functions and models based on intuitive isomorphisms of the underlying data desirable. We trained a CMN using triplet loss 23 to learn a latent space (embedding; dimension d z = 10) of single renderings based on the similarity of three inputs x ref , x + , and x − . At every location, two renderings (see above; φ = 50°) were generated that served as a similar pair (x ref and x + ). In contrast, a single rendering at a different, randomly sampled location was used as the dissimilar example x − ("Methods"). During training, the views were sampled from 372 cell or cell fragment reconstructions (181.02 mm; 19.95 giga voxel (GV); 32,324 µm 3 ).
We explored the information content of the embedding through inspection of clusters in a 2D t-SNE 26 projection (Fig. 2a) of an example cell reconstruction ( Fig. 2b; colors as in Fig. 2a; see "Methods") and by fitting a k-nearest neighbor classifier (kNN; k = 5; uniform weights) to the Neuron2vec encoding extracted from a set of neurites with cellular compartment ground truth annotations (same as in "Cellular compartment identification"; total path length: 30.16 mm; 3.05 GV; 4947 µm 3 ).
The predictive performance of the kNN classifier on the triplet-CMN (t-CMN) latent vectors of a cellular compartment test set (same as in "Cellular compartment identification"; 20.75 mm; 1.31 GV; 2130 µm 3 ) was particularly low for dendrites (F1-score for dendrite: 0.565, axon: 0.686, soma: 0.942; Supplementary  Fig. 1). We were able to increase the classification performance by sampling the similar view x + from close-by rendering locations during training as well (two and eight additional rendering locations N r = 3 and 9), which had a smoothing effect across the inferred latent vectors of adjacent rendering locations (Supplementary Fig. 2; F1-score for dendrite, axon, soma with N r = 3: 0.751, 0.767, 0.951; N r = 9: 0.804, 0.800, 0.951). Increasing the latent space dimension to d z = 25 had little impact on the model's performance (F1-score for dendrite, axon, soma with N r = 3: 0.728, 0.756, 0.957; N r = 9: 0.797, 0.808, 0.967).
Axons, or thin processes in general, and soma regions could be readily identified based on a color projection of the embedding using principal component analysis (PCA; Fig. 2 and Supplementary Fig. 2), whereas, for example, spiny dendrites showed a heterogeneous and less continuous spectrum (inset Fig. 2b). Interestingly, views did not only group by compartment type but also formed sub-clusters separating views with cell objects (such as mitochondria and synaptic junctions) from those without (first and second column in Fig. 2c), providing an explanation for the low kNN performance.
We next took a supervised end-to-end approach and examined ground truth-based CMN models.

Glia detection and top-down segmentation error correction.
Owing to the dense heavy-metal staining used for VEM data, the acquired images contain all cell types, including astrocytes and other glia types, which are usually not considered for connectomic analysis. Astrocytes have tight surface contacts with neurons to supply them with nutrients and regulate the local environment 27 , making astrocyte-neurite mergers a problem in automated reconstruction pipelines. While it can be difficult for humans to determine from the raw EM data whether a process belongs to a glia cell, it seemed straightforward to solve this classification task using the established multi-view representation due to their distinct shape (Fig. 3a, Supplementary Fig. 3 1 Multi-view generation for a given neurite reconstruction and model architecture. a Flood-filling network reconstruction and agglomeration of a dendrite, with supervoxels individually colored. b Sampled multi-view projection locations indicated as red spheres. c Two-dimensional projection of the whole neurite reconstruction with cell organelles (blue: mitochondria; red: synaptic junctions). d Each location was rendered from two different perspectives, one orthogonal to the first and second principal component (p.c.), the other one rotated by φ around the first p.c., i.e., orthogonal to the first and third p.c. (here: φ = 90°). e Multi-view fingerprints were extracted from the neuron reconstruction and served as input for one unsupervised and three supervised convolutional neural network-based applications, from left to right: Neuron2vec latent vector inference from single views to encode morphology; (multi-) class probability from multi-views to infer cellular compartments and glia fragments; multi-class prediction of many multi-views (Kviews) for morphological cell-type identification; high-resolution semantic segmentation on single views to identify individual spines. Scale bars are 10 µm in a, e and 2 µm in d Can we exploit the excellent glia classification results to reduce the rate of false mergers in a segmentation? The naive solution would be the simple removal of all SVs classified as glia fragments, but this approach could induce large-scale false splits in the case of false positive classifications ( Supplementary Fig. 6a). We therefore developed a more sophisticated splitting heuristic, ensuring that large neuron fragments remained connected in the presence of small misclassifications.
The agglomeration of SVs can be represented as a graph, connecting adjacent SVs (nodes) of the same SSV (neurite) with edges. In this graph, glia predictions were used as node properties to identify connected glia and neuron components with their respective sizes. Sufficiently large (BBD ≥8.0 µm) glia components were removed and added to the glia graph, while small glia components (BBD <8.0 µm) remained in the original SSV graph. The resulting connected components of the graph were stored as individual glia and neuron SSVs, respectively (Fig. 3c, d; see "Methods").
We first evaluated the splitting procedure by assessing how many SVs remained wrongfully assigned and manually labeled the SVs of 12 neurite reconstructions as ground truth (N SV : 616; 5161 μm 3 ). As before, our approach showed a significantly higher performance when classifying large SVs and after the splitting procedure, almost the entire glia share of the original SSVs was removed (average F1-score with volume weights: 0.995, neuron: fraction of about 0.154. It should be noted that the EM data set was prepared so as to preserve the extracellular space, which may change the morphology of the glial processes 28 . While this evaluation already showed that the splitting heuristic successfully removes large volume fractions of glia from neurite SSVs, it is necessary to assess whether it could do so without introducing too many new false splits. Note that the underlying FFN segmentation has already an exceptionally low false merger rate, which was evaluated in depth in Januszewski et al. 8 . Consistent with this, the splitting procedure created only 882 additional SSVs from the 181 affected SSVs in the entire data set. We therefore inspected 180 SSVs (one very large SSV with 600,924 SVs was evaluated separately by sampling, see "Methods") and their resulting connected components (470 SSVs) to estimate how many false mergers were resolved at the cost of introducing new false splits into neuron SSVs, which would require manual fixing. About half (55%; 122 out of 222 neuron SSVs) of the newly created neurite SSVs were affected by a new false split, which severed them off of the originally correct (regarding false splits) neurite. In 7% (16 out of 222 neuron SSVs), the procedure removed only a small neuron component (e.g., parts of the soma-internal Golgi falsely classified as glia fragment). The other 38% (84) of neurite SSVs were correctly recovered meaning that all their false mergers with glia, or with other neurons bridged through a glia cell, were removed without the introduction of a new false split (see Fig. 3d and Supplementary Fig. 6f for examples). These numbers were comparable on the sampled SSVs, which originated from the very large SSV (58% affected by new false splits, 42% correct). While we could not accurately measure the remaining neuron-glia false merger rate after splitting due to the effort involved in inspecting a large part of the data set to estimate the rate of these rare events (inspection of a random sample of 100 SSVs revealed no glia merge errors), any remaining glia merge error is expected to be small (volume F1-score 0.995, Fig. 3e).
We then attempted to reconstruct the 27 astrocytes identified in the data set starting at their somata and assigning each glial fragment to the closest astrocyte soma (see "Methods"), justified by reports that astrocytes establish roughly spherical, largely nonoverlapping territories 29 . While the CMN-based glia identification appears promising in principle (see Supplementary Fig. 7 for automatically extracted glia-blood vessel contacts), our approach likely merged arbors from other glia cells because they reach into the data set from the outside.
Neuron-type classification. Similar to glia cells, many neuronal types can be identified based on their morphology 15 , a feature that was used well before connectivity-based methods 30 and other approaches (reviewed in ref. 31 ). We recently demonstrated on the same songbird data set that a feature-based method with random forest classifiers (RFCs) can be used to identify the four main cell types, excitatory axons (EA), medium spiny neurons (MSN), pallidal-like neurons (GP), and interneurons (INT) 11 . However, manual neurite skeletons and hand-designed feature vectors were required as basis for this classification method.
In contrast to the classification of astrocytes, neuronal-type classification is less successful when using only a local view (i.e., spatially focused) of a neurite (F1-scores for single views: 0.885 and after majority vote: 0.891; training set: 145.98 mm, 17.21 GV, 27,872 µm 3 ; test set: 65.04 mm, 7.31 GV, 11,843 µm 3 ). A simple solution would be to increase the FoV for a single view, capturing the entire extent of a neuron, similar to the proposed object classification method by Su et al. 18 . However, this approach reduces the resolution for a given view size, obscuring potentially important details. Alternatively, local views from different locations can be sampled at random and combined to a global representation (multi-views of size N, further called N-views; Fig. 4a) by the neural network model. This approach does not sacrifice resolution and increased the classification performance substantially (N = 10 views: F1-score of 0.987, and 0.970 on SSV; Fig. 4b). It should be noted that lower-resolution, zoomed out views might still be beneficial or view location information additionally fed into the network but we did not explore this further. Interestingly, a greater number of sample locations did not necessarily increase performance (F1-score for N = 60: 0.984, SSV: 0.957; Fig. 4b), possibly due to increasing model complexity and decreasing diversity of the training data through fewer independent view samples per neuron. This result was consistent with the observation that the N-view F1-scores throughout all Ns were significantly lower without shuffling (views were analyzed in the order they were created; Fig. 4c), which led to spatially correlated N-view sets, i.e., they likely consisted of views from a single compartment type only. For all N, additional models were trained to predict sets of views without cell organelle channels, which, in agreement with our previous results using RFCs 11 , reduced the F1-score on SSV-level substantially (e.g., F1-score reduction of 0.088 for N = 20 with majority vote; Fig. 4b).
Cellular compartment identification. We next attempted to analyze the FFN neuron segmentation by identifying subcellular compartments (axon, dendrite, and soma) at single multi-view locations (Fig. 5a).
Similar to the glia model, a reduction of the original FoV of 8 × 4 × 4 to 2 × 1 × 1 µm 3 reduced the performance (F1-score on validation set of 0.996 with original views vs. 0.913), while a fourfold downsampling of the multi-views had almost no effect (reduction of 0.014, Fig. 5b). Not surprisingly, the performance of the soma class was barely affected by this or any other changes to the input, whereas the discrimination between axons and dendrites was strongly dependent on cell organelle information (F1-score reduction by 0.08 with exclusion; Fig. 5a, b).
The performance of the CMN approaches were directly compared with the skeleton-based RF classification developed by us previously 11  High-resolution semantic segmentation of neurite surfaces. The neuron classification described thus far is restricted in its spatial resolution to the minimum size of at least one view rendering (8 × 4 × 4 µm). This property makes the approach suitable for the identification of neuronal compartments on a larger scale (axons, dendrites, and somata), a situation in which additional spatial context is helpful (Fig. 5b). However, this approach does not allow the semantic segmentation of morphology smaller than a single rendered view. Higher-resolution classification requires a dense analysis of the rendered views. Similar to the approach taken by Boulch et al. 32 , we solved the resulting image-to-surface mapping problem by rendering the cell views with spatially subdivided unique colors 33 , thereby creating an efficient reversible mapping between the 2D view space and the 3D surface (Fig. 6a, b). We then trained a -FCN-VGG 34,35 based model on the identification of dendritic spines (the training set contained five MSN reconstructions; 12.59 mm; 1.01 GV; 1652 µm 3 ), a classification problem that was previously solved on manually traced skeletons, which made the automated identification of spines easy 11,36 . Automatically generated skeletons or surface meshes do not have the advantage that many skeleton endings are dendritic spines, which is likely a result of the implicit knowledge of human annotators. Instead of evaluating the classification performance on sampled locations, as done by us previously 11 (vertex-based evaluation in Supplementary Note 1), we tested the performance on a test set of 182 manually annotated synaptic contacts that were morphologically classified (see "Methods") as a spinous synapse (N = 88) or dendritic shaft synapse (N = 94) and obtained an F1-score of 0.978 (prec. 0.978, recall 0.978, F1-score spine head only 0.977; 2 views per location and k = 20). Interestingly, the classification performance did not improve further with more views (F1-score for k = 20 and 6 views: 0.978), likely because the PCA view alignment along the dendritic process already optimized the coverage. Note that the coverage saturates below 1.0 due to non-surface triangles (see Fig. 6c).

Discussion
We demonstrated that CMNs are a versatile tool to analyze reconstruction fragments or entire cell reconstructions. Our approach adapts to the neurons' sparse coverage of the 3D volume by using distributed 2D multi-view projections instead of dense 3D models that end up processing many empty voxels 5,14,18,20 . While mainly developed for the analysis of neuron reconstructions, our proposed view-sampling method seems generally beneficial for the analysis of 3D objects that span large distances but still require high-resolution representations, as demonstrated by SnapNet which was developed independently for the semantic labeling of point clouds 32 .
By combining the concept of CMNs with unsupervised triplet loss training 22 , we created Neuron2vec embeddings that could serve as the basis for an unbiased morphological comparison of cells and cell types without requiring hand-designed features or allow neuron database queries using example neurites 16 . Additionally, the embedding can be used for visualization purposes, e.g., through coloring (Fig. 2), making morphologically different regions salient. The triplet-loss embedding could be compared to alternative unsupervised training paradigms (e.g., autoencoders 37,38 or generative query networks 39 ), to evaluate whether the excellent supervised training results can be reached. CMNs showed improved performance in cell compartment classification in comparison to the previously used hand-designed features 11 and are likely more generalizable, since the morphology does not need to be parameterized first.
Based on the classification results of the glia CMN, we built a simple glia removal and merge error resolution algorithm, which could split neuron-glia mergers, but introduced additional false splits in about half of the newly created neurite components. However, these errors are considered to be easier to correct 8,40 and more importantly, their location is known, making guided proofreading possible. We would like to note that the other inferred cell types and neuronal compartments could be used in a similar manner or in combination as input to a more powerful graph cut segmentation algorithm, as proposed by Krasowski et al. 17 .
Another application could be to directly evaluate the shapeplausibility of neurite fragments to detect errors 5,14 or to estimate the probability that separate SVs should be combined 41 . Applications requiring high-resolution segmentation (e.g., the localization of reconstruction errors) should especially benefit from the cell surface analysis that we used here for the classification of postsynaptic dendritic morphology with excellent performance Methods EM data and used segmentation. The analyzed EM data set (Area X, adult male zebra finch, >120 days post hatching) was acquired by J.K. through serial block-face scanning electron microscopy as reported previously 8,11 and had an extent of 96 × 98 × 114 µm 3 with an xyz-resolution of 9 × 9 × 20 nm 3 (total of 664 GV; 10,664 × 10,914 × 5701 voxel). The animal experiment was approved by the Regierungspräsidium Karlsruhe and performed in accordance with the laws of the German federal government. Meshes and skeletons were based on a FFN segmentation by M.J. and V.J. 8 , including the over segmentation (all SVs) and the postagglomeration SV graph (defines SSVs) combined with manual reconnects of orphan neurite fragments.
Local scene rendering. We used SV triangle meshes to efficiently render depth maps with PyOpenGL (http://pyopengl.sourceforge.net/) and EGL (for off screen rendering). For rendering, the model view matrix was rotated such that the first axis of the view was parallel to the main principal component of the object and clipped to an extent of 8 × 4 × 4 µm 3 . The rendering had a throughput of about 58 and 25 multi-views/s without and with cell organelles (see Supplementary Table 1) on a single CPU core with GPU acceleration, making it scalable in a cloud or highperformance computing environment.
PCA was applied to a subset (0.125) of the vertices within the clipping box that yielded the axes with highest variance (x, y, and z; decreasing variance) in good approximation. In this system, xy and xz represent the planes with the highest spatial variance, allowing alignment with the elongated neurite structure. By rotating around the x axis, orthographic depth-map projections of size 256 × 128 pixels were rendered and stored as unsigned 8-bit integer. For every SV, rendering locations were obtained by calculating its vertex density ρ in a grid of voxels with size 2 × 2 × 2 µm 3 . In order to not oversample dense regions, the center coordinate of every voxel with ρ > 0 was used to calculate the mean of the vertices within a radius of 1 µm. The resulting set of coordinates, effectively a downsampled point cloud, was stored as the SVs rendering locations.
Rendering cell organelles. Three additional channels were generated next to the 2D depth-map views of the cell shape which contained the rendering of mapped cell organelles. Cell organelles (mitochondrion (MI); synaptic junction (SJ); vesicle cloud (VC)) with a relative overlap with SSVs above or within a certain threshold range (MI: >0.5, VC: >0.5, SJ: 0.2-0.8) were mapped to the SSV. Only objects with a minimum size (number of voxels) were considered during the overlap calculation (SJ: 498; VC: 1584; MI: 2786). Cellular organelle predictions were generated using 3D CNNs with ground truth volume details provided in Supplementary Table 4  Meshes for the associated objects were extracted from a Gaussian-smoothed (σ = 1), distance-transformed (https://ukoethe.github.io/vigra/) binary 3D mask with marching cubes (contour value of 0; scikit-image http://scikit-image.org/).
The cell object meshes were rendered from the same perspective and resolution (256 × 128 pixels) as the corresponding SV views. The location's fingerprint finally consisted of the rotated views, each with four channels. We decreased the average edge length in the skeleton representations to approximately 150 nm by removing skeleton nodes (ignoring branch and end nodes). Nodes of degree 2 were removed and replaced by a single edge if the summed length λ of the adjacent edges was below a threshold (λ = 50 nm) or if the dot product of their edges was >0.8 in combination with λ = 500 nm.
At every node, the cell radius was estimated by the median of the distance to the ten nearest vertices of the mesh. Total path lengths were calculated as the sum of all edges.
Multi-view models for type classification. Multi-views were sampled from the joint set of SV meshes of an entire SSV, and renderings generated at locations as described above.
To overcome RAM limitations, large SSVs (>10 4 SVs) were processed as subgraphs, defined by a breadth-first-search (extending 40 nodes) on the SV graph and starting at each SV. Multi-views were generated from the joint meshes of the 40 SVs at the sampled locations of the source SV.
If not stated otherwise, hyper-parameters were chosen to be: batch size: 20 dropout rate 44 Supplementary Table 1).
Neuron2vec embedding. The architecture of the CMN encoder was used to learn a projection from the single view space ℝ 256×128×4 to a lower-dimensional latent space (embedding) ℝ d (d dimensions) based on the triplet loss described in ref. 23 . Its architecture was identical to the one described in "Multi-view models for type classification," whereas max pooling was removed from conv. L2 and L6, dropout was restricted to conv. L3-L5, the f.c. L2 and the softmax from layer f.c. L3 were removed. The objective function was to keep the distance of the reference x ref to the similar input x + below the distance of x ref to the dissimilar input x − . For the two similar views x ref and x + , we used the two views of the same rendering location (rotated by φ = 50°), while the dissimilar view x − was sampled randomly from a different SSV. The clipping volume was set to 8 × 4 × 8 µm 3 . In order to take strong similarities of adjacent rendering locations into account, the view-sampling during training was adapted to sample similar views also from close-by rendering locations (k-nearest-neighbors with k = 2 and k = 8) instead of only using the rotated view at the same rendering location (Supplementary Fig. 2).
The loss was defined as L α>0 ¼ α þ λ 2 and L α 0 ¼ λ 2 , with α ¼ r þ À r À þ λ 1 , r þ ¼x ref Àx þ 2 , r À ¼x ref Àx À j j 2 , and λ 1 = 0.2 being a parameter to control the margin between data points.x & R d represents the latent space of the triplet net. The second regularization term λ 2 was the mean norm of the reference, similar, and dissimilar view and acted as a counterpart to λ 1 by restricting the latent vectors to be small: λ 2 ¼ 1=3x ref j j 2 þx þ 2 þx À j j 2 . Our implementation was inspired by the one from A. Veit (https://github.com/andreasveit/triplet-network-pytorch).
The PCA was performed on the triplet network latent vectors of the 372 cell or cell fragment reconstructions, which were also used for querying the view-triplets during training. By taking only the first three principal components, every multiview location was assigned an RGB value. The mesh was colored according to the nearest view color of every vertex. . The performance was calculated based on multi-views and measured as F1-score. The classification threshold θ was set to the optimal F1-score on the validation set. The best model was retrained on training and validation set and applied to the whole data set. To remove background structures not connected to the central object of interest, connected component analysis was performed on the 2D multi-view images, followed by masking of the unconnected pixels. The SVs were then classified by calculating the mean of all its multi-view predictions and thresholding with θ. In addition, at least 70% of all multi-view predictions of a SV had to be glia for the assignment of this label. Top-down glia splitting. In order to split glia fragments from neurons, a connected component analysis was applied to the SV graph of every SSV. Glia and neuron connected components (CCs) in the SV graph were computed on subgraphs induced by the classification (glia or neuron) of each SV. The CC size was estimated by calculating the BBD of its combined SVs. Glia CCs with a BBD ≥ 8.0 µm were separated from the SV graph first. The remaining, small glia CCs (BBD < 8.0 µm) were assigned to the neuron class and the BBD was re-evaluated. Neuron CCs with a BBD < 8 µm were removed and added to the glia graph. The purpose of this was to bridge small false glia/neuron predictions and thereby avoid false neuron splits.
The resulting class labels after the splitting procedure were evaluated on 12 randomly drawn SSVs with at least 1 introduced split. SVs were sorted by volume and manually labeled by P.J.S. until a major fraction of the reconstruction volume was covered. The average inspected volume coverage was 0.905 (proportion of inspected SVs weighted by their volume).
To evaluate the splitting performance, 181 SSVs were manually inspected by P.J.S. and J.K. The total number of neuron components and the number of neuron components that were not split into several parts were identified, i.e., those in which all splits preserved the neuron as a single component. Removing, for example, a small fragment from an SSV ( Supplementary Fig. 6d) was not considered a split and the SSV therefore counted as a correct component. In contrast, an axon passing through a falsely merged glia and being split into two components was not added to the number of correct components, but its two components were added to the total number of resulting neuron components. Note that we also counted a component as incorrect in case the removed fragment (predicted as glia) only virtually disconnected the neurite ( Supplementary Fig. 6e), e.g., at the data set boundary, when it could be reasonably assumed that it would continue in a larger data set.
For the evaluation of the large SSV, which was split into 593 SSV (516 neuron and 77 glia components predicted), we inspected a random subset of 50 neuron components.
Reconstructing glial cells. SV graph edges were added between the sample locations of the collection of all splitted glia SV by identifying the k-nearest neighbors (k = 15, maximum distance: 10 µm; weighted by Euclidean distance).
Twenty-seven somata of putative astrocytes were identified in the data set and every glia SV was assigned to its closest soma (shortest path using Dijkstra's algorithm).
Neuron-type classification. The 2-views generated at each rendering location were re-used to construct a set of N-views for every SSV by the following procedure: The collection of all M 2-views of an SSV was split into 2M/N random sets (drawn without replacement) each of size N. If 2M < N, the set was filled by randomly drawing from the existing views.
The model architecture was identical to the model used for glia classification, except for a reduced batch size, a dropout rate of 0.08, and a learning rate schedule defined as exponential decay, with decay rate of 0.98 per 1000 steps. The input shape was (1, 4, N, 256, 128).
We used 402 manually traced (skeletonized) cells to identify their corresponding SSV, which were split into training set (N train : 301; N EA : 177, N MSN : 114, N GP : 6, N INT : 6) and test set (N test : 101; N EA : 60, N MSN : 39, N GP : 3, N INT : 2) with labels, that corresponded to the broad biological classes found in the data set (EA, MSN, GP, INT).
During batch creation while training, the N-views were generated by randomly drawing from the corresponding SSV views. Every batch contained an equal number of SSVs for each class. The classification was performed using argmax on the output of the softmax layer and the majority vote of the corresponding N-view classifications was used for SSV classification.
The support-weighted average F1-score of all classes was evaluated on N-views and on a SSV level after majority vote (carried out on all its N-view predictions).
Subcellular compartment classification. The cellular compartments of 33 neurites were manually annotated and axon, dendrite, and soma views were generated, which were split into a training set (N train : 80,370 views; N dendrite : 10,004; N axon : 41,424; N soma : 28,942) and a validation set (N validation : 9,040; N dendrite : 1,078; N axon : 1,823; N soma : 6139). During training, we applied class weights for loss computation to address imbalances in their frequency (dendrite: 2, axon: 1, soma: 1). Performance was measured with the F1-score of the multi-view classification using argmax on the softmax output. The best model was again retrained on the whole ground truth data for the data set prediction.
Classification of so far unclassified locations within neurons was performed by assigning it the label of the closest classified location (Voronoi partitioning with Euclidean distance).
All SSV skeleton nodes were manually labeled by P.J.S. using KNOSSOS. In order to enable a direct comparison between the RFC and CMN model, the skeleton-node locations were used for the extraction of the hand-designed features. CMN and kNN predictions were mapped to the skeleton nodes using the nearest neighbors on the multi-view locations. As in Dorkenwald et al. 11 , hand-designed features were computed for every skeleton node (context of 4 µm and 8 µm maximal traversed path length from the source node). Only properties of nodes visited during the traversal were considered for the source node statistics.
A total of 23 features were extracted from the collected properties at each node: mean and standard deviation (s.d.) and histogram (10 bins) of the encountered node diameters, mean of node degrees, node density within a box with edge length of 2-times the context-range (either 4 or 8 µm) number of cell organelles and mean and s.d. of their size for mitochondria, synaptic junctions, vesicle clouds. The RFC was trained on the same training data as the CMN to classify each node as axon, dendrite, or soma using argmax on the resulting class probability.
The sliding window majority vote was performed on the cell reconstruction skeletons. Every skeleton node was assigned the majority label found in a set of adjacent node/multi-view predictions, which were collected within a maximum of 12.5 µm traversal length along the skeleton.
High-resolution semantic segmentation of surfaces. The training data were generated by rendering multi-views (5 different perspectives) from the rendering locations of 5 reconstructions (training: 24,248 views; validation: 6062 views) with label-dependent vertex colors. Skeleton nodes were manually annotated as neck, head, shaft, or soma/axon, which were then mapped to the mesh vertices with Voronoi partitioning (Euclidean distance). To smooth label boundaries, each vertex was assigned the majority label of 40 vertices found by a breadth-first search on the vertex graph of the reconstruction. Graph edges were added between vertices with a distance of up to 120 nm. Only rendering locations with annotated skeleton nodes within 2 µm were considered.
To assign pixel labels back to the mesh vertices, an additional view of the faces was rendered using color buffering with a unique ID per face. This allowed to perform a majority vote of all collected labels corresponding to a single vertex as classification. Subsequently, we applied a kNN classification to propagate predicted labels to vertices, which were not covered by the rendered color map.
The synapse test set was generated as a random set of head and shaft synapses collected from four different reconstructions, which were also used to calculate the cell surface fraction captured by the multi-views. The dendritic tree for the per-vertex evaluation was manually annotated on skeleton node level and then propagated to the mesh vertices as described above for the GT generation.
Computing infrastructure. The used parallel computing environment consisted of 18 nodes, each equipped with 20 cores (Intel Xeon CPU E5-2660 v3 @ 2.60GHz), 2 GeForce GTX 980Ti, and 256 GB of RAM. Compute jobs were managed using SGE QSUB or SLURM.

Data availability
The data sets generated and analyzed during the current study are available from the corresponding author on reasonable request. comments; the KNOSSOS team for support; Rangoli Saxena, Mariana Shumliakivska, and Atul T Mohite for code contributions; Marius Killinger and Martin Drawitsch for help with the ELEKTRONN library; and Julia Rogowska and Mariana Shumliakivska for proofreading tracings and ground truth generation. We also thank Christian Guggenberger and his team at the MPCDF facility in Garching, Germany for compute cluster support.

Author contributions
P.J.S. and J.K. performed and designed experiments and wrote the manuscript. M.J. and V.J. contributed the data set FFN segmentation and wrote the manuscript. SD contributed code for the experiments.