The ANTsX ecosystem for quantitative biological and medical imaging

The Advanced Normalizations Tools ecosystem, known as ANTsX, consists of multiple open-source software libraries which house top-performing algorithms used worldwide by scientific and research communities for processing and analyzing biological and medical imaging data. The base software library, ANTs, is built upon, and contributes to, the NIH-sponsored Insight Toolkit. Founded in 2008 with the highly regarded Symmetric Normalization image registration framework, the ANTs library has since grown to include additional functionality. Recent enhancements include statistical, visualization, and deep learning capabilities through interfacing with both the R statistical project (ANTsR) and Python (ANTsPy). Additionally, the corresponding deep learning extensions ANTsRNet and ANTsPyNet (built on the popular TensorFlow/Keras libraries) contain several popular network architectures and trained models for specific applications. One such comprehensive application is a deep learning analog for generating cortical thickness data from structural T1-weighted brain MRI, both cross-sectionally and longitudinally. These pipelines significantly improve computational efficiency and provide comparable-to-superior accuracy over multiple criteria relative to the existing ANTs workflows and simultaneously illustrate the importance of the comprehensive ANTsX approach as a framework for medical image analysis.

www.nature.com/scientificreports/ (e.g., bias correction 14 and denoising 15 ), joint label fusion 16,17 , and brain cortical thickness estimation 18,19 (cf Table 1). Additionally, ANTs has been integrated into multiple, publicly available workflows such as fMRIprep 20 and the Spinal Cord Toolbox 21 . Frequently used ANTs pipelines, such as cortical thickness estimation 19 , have been integrated into Docker containers and packaged as Brain Imaging Data Structure (BIDS) 22 and FlyWheel applications (i.e., "gears' '). It has also been independently ported for various platforms including Neurodebian 23 (Debian OS), Neuroconductor 24 (the R statistical project), and Nipype 25 (Python). Additionally, other widely used software, such as FreeSurfer 26 , have incorporated well-performing and complementary ANTs components 14,15 into their own libraries. According to GitHub, recent unique "clones" have averaged 34 per day with the total number of clones being approximately twice that many. 50 unique contributors to the ANTs library have made a total of over 4500 commits. Additional insights into usage can be viewed at the ANTs GitHub website. Over the course of its development, ANTs has been extended to complementary frameworks resulting in the Python-and R-based ANTsPy and ANTsR toolkits, respectively. These ANTs-based packages interface with extremely popular, high-level, open-source programming platforms which have significantly increased the user base of ANTs. The rapidly rising popularity of deep learning motivated further recent enhancement of ANTs and its extensions. Despite the existence of an abundance of online innovation and code for deep learning algorithms, much of it is disorganized and lacks a uniformity in structure and external data interfaces which would facilitate greater uptake. With this in mind, ANTsR spawned the deep learning ANTsRNet package 35 which is a growing Keras/TensorFlow-based library of popular deep learning architectures and applications specifically geared towards medical imaging. Analogously, ANTsPyNet is an additional ANTsX complement to ANTsPy. Both, which we collectively refer to as "ANTsXNet", are co-developed so as to ensure cross-compatibility such that training performed in one library is readily accessible by the other library. In addition to a variety of popular Figure 1. An illustration of the tools and applications available as part of the ANTsRNet and ANTsPyNet deep learning toolkits. Both libraries take advantage of ANTs functionality through their respective language interfaces-ANTsR (R) and ANTsPy (Python). Building on the Keras/TensorFlow language, both libraries standardize popular network architectures within the ANTs ecosystem and are cross-compatible. These networks are used to train models and weights for such applications as brain extraction which are then disseminated to the public. network architectures (which are implemented in both 2-D and 3-D), ANTsXNet contains a host of functionality for medical image analysis that have been developed in-house and collected from other open-source projects. For example, an extremely popular ANTsXNet application is a multi-modal brain extraction tool that uses different variants of the popular U-net 36 architecture for segmenting the brain in multiple modalities. These modalities include conventional T1-weighted structural MRI as well as T2-weighted MRI, FLAIR, fractional anisotropy, and BOLD data. Demographic specialization also includes infant T1-weighted and/or T2-weighted MRI. Additionally, we have included other models and weights into our libraries such as a recent BrainAGE estimation model 37 , based on > 14, 000 individuals; HippMapp3r 38 , a hippocampal segmentation tool; the winning entry of the MICCAI 2017 white matter hyperintensity segmentation competition 39 ; MRI super resolution using deep backprojection networks 40 ; and NoBrainer, a T1-weighted brain extraction approach based on FreeSurfer (see Fig. 1).
The ANTsXNet cortical thickness pipeline. The most recent ANTsX innovation involves the development of deep learning analogs of our popular ANTs cortical thickness cross-sectional 19 and longitudinal 41 pipelines within the ANTsXNet framework. Figure 2, adapted from our previous work 19 , illustrates some of the major changes associated with the single-subject, cross-sectional pipeline. The resulting improvement in efficiency derives primarily from eliminating deformable image registration from the pipeline-a step which has historically been used to propagate prior, population-based information (e.g., tissue maps) to individual subjects for such tasks as brain extraction 42 and tissue segmentation 13 which is now configured within the neural networks and trained weights. These structural MRI processing pipelines are currently available as open-source within the ANTsXNet libraries. Evaluations using both cross-sectional and longitudinal data are described in subsequent sections and couched within the context of our previous publications 19,41 . Related work has been recently reported by external groups 43,44 and provides a context for comparison to motivate the utility of the ANTsX ecosystem.

Results
Cross-sectional performance evaluation. Due to the absence of ground-truth, we utilize the evaluation strategy from our previous work 19 where we used cross-validation to build and compare age prediction models from data derived from both the proposed ANTsXNet pipeline and the established ANTs pipeline. Specifically, we use "age" as a well-known and widely-available demographic correlate of cortical thickness 45 and quantify the predictive capabilities of corresponding random forest classifiers 34 of the form: with covariates GENDER and VOLUME (i.e., total intracranial volume). T(DKT i ) is the average thickness value in the i th Desikian-Killiany-Tourville (DKT) region 46 (cf Table 2). Root mean square error (RMSE) between the actual and predicted ages are the quantity used for comparative evaluation. As we have explained previously 19 , we Figure 2. Illustration of the ANTsXNet cortical thickness pipeline and the relationship to its traditional ANTs analog. The hash-designated sections denote pipeline steps which have been obviated by the deep learning approach. These include template-based brain extraction, template-based n-tissue segmentation, and joint label fusion for cortical labeling. In our prior work, execution time of the thickness pipeline was dominated by registration. In the deep version of the pipeline, it is dominated by DiReCT. However, we note that registration and DiReCT execute much more quickly than in the past in part due to major improvements in the underlying ITK multi-threading strategy. www.nature.com/scientificreports/ find these evaluation measures to be much more useful than other commonly applied criteria as they are closer to assessing the actual utility of these thickness measurements as biomarkers for disease 47 or growth. In recent work 44 the authors employ correlation with FreeSurfer thickness values as the primary evaluation for assessing relative performance with ANTs cortical thickness 19 . This evaluation, unfortunately, is fundamentally flawed in that it is a prime example of a type of circularity analysis 48 whereby data selection is driven by the same criteria used to evaluate performance. Specifically, the underlying DeepSCAN network used for the tissue segmentation step employs training based on FreeSurfer results which directly influences thickness values as thickness/ segmentation are highly correlated and vary characteristically between software packages. Relative performance with ANTs thickness (which does not use FreeSurfer for training) is then assessed by determining correlations with FreeSurfer thickness values. Almost as problematic is their use of repeatability, which they confusingly label as "robustness, " as an additional ranking criterion. Repeatability evaluations should be contextualized within considerations such as the bias-variance tradeoff and quantified using relevant metrics, such as the intra-class correlation coefficient which takes into account both inter-and intra-observer variability.
In addition to the training data listed above, to ensure generalizability, we also compared performance using the SRPB data set 32 comprising over 1600 participants from 12 sites. Note that we recognize that we are processing a portion of the evaluation data through certain components of the proposed deep learning-based pipeline that were used to train the same pipeline components. Although this does not provide evidence for generalizability (which is why we include the much larger SRPB data set), it is still interesting to examine the results since, in this case, the deep learning training can be considered a type of noise reduction on the final results. It should be noted that training did not use age prediction (or any other evaluation or related measure) as a criterion to be optimized during network model training (i.e., circular analysis) 48 .
The results are shown in Fig. 3 where we used cross-validation with 500 permutations per model per data set (including a "combined" set) and an 80/20 training/testing split. The ANTsXNet deep learning pipeline outperformed the classical pipeline 19 in terms of age prediction in all data sets except for IXI. This also includes the cross-validation iteration where all data sets were combined. Additionally, repeatability assessment on the regional cortical thickness values of the MMRR data set yielded ICC values ("average random rater") of 0.99 for both pipelines.
A comparative illustration of regional thickness measurements between the ANTs and ANTsXNet pipelines is provided in Fig. 4 for three different ages spanning the lifespan. Linear models of the form were created for each of the 62 DKT regions for each pipeline. These models were then used to predict thickness values for each gender at ages of 25 years, 50 years, and 75 years and subsequently plotted relative to the absolute maximum predicted thickness value (ANTs: right entorhinal cortex at 25 years, male). Although there appear to be systematic differences between specific regional predicted thickness values (e.g., T(ENT) ANTs > T(ENT) ANTsXNet , T(pORB) ANTs < T(pORB) ANTsXNet )), a pairwise t-test evidenced no statistically significant difference between the predicted thickness values of the two pipelines.
Longitudinal performance evaluation. Given the excellent performance and superior computational efficiency of the proposed ANTsXNet pipeline for cross-sectional data, we evaluated its performance on longitudinal data using the longitudinally-specific evaluation strategy and data we employed with the introduction of the longitudinal version of the ANTs cortical thickness pipeline 41 . We also evaluated an ANTsXNet-based pipeline tailored specifically for longitudinal data. In this variant, an SST is generated and processed using the www.nature.com/scientificreports/ previously described ANTsXNet cross-sectional pipeline which yields tissue spatial priors. These spatial priors are used in our traditional brain segmentation approach 13 . The computational efficiency of this variant is also significantly improved, in part, due to the elimination of the costly SST prior generation which uses multiple registrations combined with joint label fusion 16 .   Table 2 for region abbreviations. www.nature.com/scientificreports/ The ADNI-1 data used for our longitudinal performance evaluation 41 consists of over 600 subjects (197 cognitive normals, 324 LMCI subjects, and 142 AD subjects) with one or more follow-up image acquisition sessions every 6 months (up to 36 months) for a total of over 2500 images. In addition to the ANTsXNet pipelines ("ANTsXNetCross" and "ANTsXNetLong") for the current evaluation, our previous work included the FreeSurfer 26 cross-sectional ("FSCross") and longitudinal ("FSLong") streams, the ANTs cross-sectional pipeline ("ANTsCross") in addition to two longitudinal ANTs-based variants ("ANTsNative" and "ANTsSST"). Two evaluation measurements, one unsupervised and one supervised, were used to assess comparative performance between all seven pipelines. We add the results of the ANTsXNet pipeline cross-sectional and longitudinal evaluations in relation to these other pipelines to provide a comprehensive overview of relative performance.
First, linear mixed-effects (LME) 31 modeling was used to quantify between-subject and residual variabilities, the ratio of which provides an estimate of the effectiveness of a given biomarker for distinguishing between subpopulations. In order to assess this criteria while accounting for changes that may occur through the passage of time, we used the following Bayesian LME model: where Y k ij denotes the i th individual's cortical thickness measurement corresponding to the k th region of interest at the time point indexed by j and specification of variance priors to half-Cauchy distributions reflects commonly accepted best practice in the context of hierarchical models 49 . The ratio of interest, r k , per region of the between-subject variability, τ k , and residual variability, σ k is where the posterior distribution of r k was summarized via the posterior median.
Second, the supervised evaluation employed Tukey post-hoc analyses with false discovery rate (FDR) adjustment to test the significance of the LMCI-CN, AD-LMCI, and AD-CN diagnostic contrasts. This is provided by the following LME model Here, Y is the change in thickness of the k th DKT region from baseline (bl) thickness Y bl with random intercepts for both the individual subject ( ID ) and the acquisition site. The subject-specific covariates AGE , APOE status, GENDER , DIAGNOSIS , ICV , and VISIT were taken directly from the ADNIMERGE package.
Results for all pipelines with respect to the longitudinal evaluation criteria are shown in Figs. 5 and 6. Figure 5(a) provides the 95% confidence intervals of the variance ratio for all 62 regions of the DKT cortical labeling where ANTsSST consistently performs best with ANTsXNetLong also performing well. These quantities are summarized in Fig. 5(b). The second evaluation criteria compares diagnostic differentiation via LMEs. Log p-values are provided in Fig. 6 which demonstrate excellent LMCI-CN and AD-CN differentiation for both deep learning pipelines.

Discussion
The ANTsX software ecosystem provides a comprehensive framework for quantitative biological and medical imaging. Although ANTs, the original core of ANTsX, is still at the forefront of image registration technology, it has moved significantly beyond its image registration origins. This expansion is not confined to technical contributions (of which there are many) but also consists of facilitating access to a wide range of users who can use ANTsX tools (whether through bash, Python, or R scripting) to construct tailored pipelines for their own studies or to take advantage of our pre-fabricated pipelines. And given the open-source nature of the ANTsX software, usage is not limited, for example, to non-commercial use-a common constraint characteristic of other packages such as the FMRIB Software Library (https:// fsl. fmrib. ox. ac. uk/ fsl/ fslwi ki/ Licen ce).
One of our most widely used pipelines is the estimation of cortical thickness from neuroimaging. This is understandable given the widespread usage of regional cortical thickness as a biomarker for developmental or pathological trajectories of the brain. In this work, we used this well-vetted ANTs tool to provide training data for producing alternative variants which leverage deep learning for improved computational efficiency and also provides superior performance with respect to previously proposed evaluation measures for both cross-sectional 19 and longitudinal scenarios 41 , In addition to providing the tools which generated the original training data for the proposed ANTsXNet pipeline, the ANTsX ecosystem provides a full-featured platform for the additional steps such as preprocessing (ANTsR/ANTsPy); data augmentation (ANTsR/ANTsPy); network construction and training (ANTsRNet/ANTsPyNet); and visualization and statistical analysis of the results (ANTsR/ANTsPy).
Using ANTsX, various steps in the deep learning training processing (e.g., data augmentation, preprocessing) can all be performed within the same ecosystem where such important details as header information for image geometry are treated the same. In contrast, related work 44 described and evaluated a similar thickness measurement pipeline. However, due to the lack of a complete processing and analysis framework, training data was generated using the FreeSurfer stream, deep learning-based brain segmentation employed DeepSCAN 50 (in-house software), and cortical thickness estimation 18 was generated using the ANTs toolkit. The interested www.nature.com/scientificreports/ researcher must ensure the consistency of the input/output interface between packages (a task for which the Nipype development team is quite familiar.) Although potentially advantageous in terms of such issues as computational efficiency and other performance measures, there are a number of limitations associated with the ANTsXNet pipeline that should be mentioned both to guide potential users and possibly motivate future related research. As is the case with many deep learning models, usage is restricted based on training data. For example, much of the publicly available brain data has been anonymized through various defacing protocols. That is certainly the case with the training data used for the ANTsXNet pipeline which has consequences specific to the brain extraction step which could lead to poor performance. We are currently aware of this issue and have provided a temporary workaround while simultaneously resuming training on whole head data to mitigate this issue. Also, although the ANTsXNet pipeline performs relatively well as assessed across lifespan data, performance might be hampered for specific age ranges (e.g., neonates), whereas the traditional ANTs cortical thickness pipeline is more flexible and might provide better www.nature.com/scientificreports/ age-targeted performance. This is the subject of ongoing research. Additionally, application of the ANTsXNet pipeline would be limited with high-resolution acquisitions. Due to the heavy memory requirements associated with deep learning training, the utility of any resolution greater than ~1 mm isotropic would not be leveraged by the existing pipeline. However, there is a potential pipeline variation (akin to the longitudinal variant) that would be worth exploring where Deep Atropos is used only to provide the priors for a subsequent traditional Atropos segmentation on high-resolution data. Although direct evaluation by the principal co-authors of the ANTs toolkit, the similarity in resulting cortical thickness values, as indicated by Fig. 4, and considerations of the training data origins all strongly suggest similarity between Atropos and Deep Atropos output, further evaluation is certainly warranted and would benefit other potential applications.
In terms of additional future work, the recent surge and utility of deep learning in medical image analysis has significantly guided the areas of active ANTsX development. As demonstrated in this work with our widely used cortical thickness pipelines, there are many potential benefits of deep learning analogs to existing ANTs tools as well as the development of new ones. Performance is mostly comparable-to-superior relative to existing pipelines depending on the evaluation metric. Specifically, the ANTsXNet cross-sectional pipeline does well for the age prediction performance framework and in terms of the ICC. Additionally, this pipeline performs relatively well for longitudinal ADNI data for disease differentiation but not so much in terms of the generic variance ratio criterion. However, for such longitudinal-specific studies, the ANTsXNet longitudinal variant performs well for both performance measures. We see possible additional longitudinal extensions incorporating subject ID and months as additional network inputs.

Methods
The original ANTs cortical thickness pipeline. The original ANTs cortical thickness pipeline 19 consists of the following steps: • Preprocessing: denoising 15  Our recent longitudinal variant 41 incorporates an additional step involving the construction of a single subject template (SST) 12 coupled with the generation of tissue spatial priors of the SST for use with the processing of the individual time points as described above.
Although the resulting thickness maps are conducive to voxel-based 52 and related analyses 53 , here we employ the well-known Desikan-Killiany-Tourville (DKT) 46 labeling protocol (31 labels per hemisphere) to parcellate the cortex for averaging thickness values regionally (cf Table 2). This allows us to 1) be consistent in our evaluation strategy for comparison with our previous work 19,41 and 2) leverage an additional deep learning-based substitution within the proposed pipeline. www.nature.com/scientificreports/ Overview of cortical thickness via ANTsXNet. The entire analysis/evaluation framework, from preprocessing to statistical analysis, is made possible through the ANTsX ecosystem and simplified through the open-source R and Python platforms. Preprocessing, image registration, and cortical thickness estimation are all available through the ANTsPy and ANTsR libraries whereas the deep learning steps are performed through networks constructed and trained via ANTsRNet/ANTsPyNet with data augmentation strategies and other utilities built from ANTsR/ANTsPy functionality. The brain extraction, brain segmentation, and DKT parcellation deep learning components were trained using data derived from our previous work 19 . Specifically, the IXI 30 , MMRR 54 , NKI 55 , and OASIS 28 data sets, and the corresponding derived data, comprising over 1200 subjects from age 4 to 94, were used for network training. Brain extraction employs a traditional 3-D U-net network 36 with whole brain, template-based data augmentation 35 whereas brain segmentation and DKT parcellation are processed via 3-D U-net networks with attention gating 56 on image octant-based batches. Additional network architecture details are given below. We emphasize that a single model (as opposed to ensemble approaches where multiple models are used to produce the final solution) 39 was created for each of these steps and was used for all the experiments described below.
Implementation. Software, average DKT regional thickness values for all data sets, and the scripts to perform both the analysis and obtain thickness values for a single subject (cross-sectionally or longitudinally) are provided as open-source. Specifically, all the ANTsX libraries are hosted on GitHub (https:// github. com/ ANTsX). The cross-sectional data and analysis code are available as .csv files and R scripts at the GitHub repository dedicated to this paper (https:// github. com/ ntust ison/ Paper ANTsX) whereas the longitudinal data and evaluation scripts are organized with the repository associated with our previous work 41 (https:// github. com/ ntust ison/ Cross Long).
In Listing 1, we show the ANTsPy/ANTsPyNet code snippet for cross-sectional processing a single subject which starts with reading the T1-weighted MRI input image, through the generation of the Atropos-style sixtissue segmentation and probability images, application of ants.kelly_kapowski (i.e., DiReCT), DKT cortical parcellation, subsequent label propagation through the cortex, and, finally, regional cortical thickness tabulation. The cross-sectional and longitudinal pipelines are encapsulated in the ANTsPyNet functions antspynet.cortical_thickness and antspynet.longitudinal_cortical_thickness, respectively. Note that there are precise, line-by-line R-based analogs available through ANTsR/ANTsRNet. Both the ants.deep_atropos and antspynet.desikan_killiany_tourville_labeling functions perform brain extraction using the antspynet.brain_extraction function. Internally, www.nature.com/scientificreports/ antspynet.brain_extraction contains the requisite code to build the network and assign the appropriate hyperparameters. The model weights are automatically downloaded from the online hosting site https:// figsh are. com (see the function get_pretrained_network in ANTsPyNet or getPretrainedNetwork in ANTsRNet for links to all models and weights) and loaded to the constructed network. antspynet. brain_extraction performs a quick translation transformation to a specific template (also downloaded automatically) using the centers of intensity mass, a common alignment initialization strategy. This is to ensure proper gross orientation. Following brain extraction, preprocessing for the other two deep learning components includes ants.denoise_image and ants.n4_bias_correction and an affine-based reorientation to a version of the MNI template 57 .
We recognize the presence of some redundancy due to the repeated application of certain preprocessing steps. Thus, each function has a do_preprocessing option to eliminate this redundancy for knowledgeable users but, for simplicity in presentation purposes, we do not provide this modified pipeline here. Although it should be noted that the time difference is minimal considering the longer time required by ants.kelly_kapowski. ants.deep_atropos returns the segmentation image as well as the posterior probability maps for each tissue type listed previously. antspynet.desikan_killiany_tourville_labeling returns only the segmentation label image which includes not only the 62 cortical labels but the remaining labels as well. The label numbers and corresponding structure names are given in the program description/help. Because the DKT parcellation will, in general, not exactly coincide with the non-zero voxels of the resulting cortical thickness maps, we perform a label propagation step to ensure the entire cortex, and only the non-zero thickness values in the cortex, are included in the tabulated regional values.
As mentioned previously, the longitudinal version, antspynet.longitudinal_cortical_thickness, adds an SST generation step which can either be provided as a program input or it can be constructed from spatial normalization of all time points to a specified template. ants.deep_atropos is applied to the SST yielding spatial tissues priors which are then used as input to ants.atropos for each time point. ants. kelly_kapowski is applied to the result to generate the desired cortical thickness maps.
Computational time on a CPU-only platform is approximately 1 hour primarily due to ants.kelly_ kapowski processing. Other preprocessing steps, i.e., bias correction and denoising, are on the order of a couple minutes. This total time should be compared with 4 − 5 hours using the traditional pipeline employing the quick registration option or 10 − 15 hours with the more comprehensive registration parameters employed). As mentioned previously, elimination of the registration-based propagation of prior probability images to individual subjects is the principal source of reduced computational time. For ROI-based analyses, this is in addition to the elimination of the optional generation of a population-specific template. Additionally, the use of antspynet. desikan_killiany_tourville_labeling, for cortical labeling (which completes in less than five minutes) eliminates the need for joint label fusion which requires multiple pairwise registrations for each subject in addition to the fusion algorithm itself.
Training details. Training differed slightly between models and so we provide details for each of these components below. For all training, we used ANTsRNet scripts and custom batch generators. Although the network construction and other functionality is available in both ANTsPyNet and ANTsRNet (as is model weights compatibility), we have not written such custom batch generators for the former (although this is on our to-do list). In terms of hardware, all training was done on a DGX (GPUs: 4X Tesla V100, system memory: 256 GB LRDIMM DDR4).
T1-weighted brain extraction. A whole-image 3-D U-net model 36 was used in conjunction with multiple training sessions employing a Dice loss function followed by categorical cross entropy. Training data was derived from the same multi-site data described previously processed through our registration-based approach 42 . A center-of-mass-based transformation to a standard template was used to standardize such parameters as orientation and voxel size. However, to account for possible different header orientations of input data, a template-based data augmentation scheme was used 35 whereby forward and inverse transforms are used to randomly warp batch images between members of the training population (followed by reorientation to the standard template). A digital random coin flipping for possible histogram matching 58 between source and target images further increased data augmentation. The output of the network is a probabilistic mask of the brain. The architecture consists of four encoding/decoding layers with eight filters at the base layer which doubled every layer. Although not detailed here, training for brain extraction in other modalities was performed similarly.
Deep atropos. Dealing with 3-D data presents unique barriers for training that are often unique to medical imaging. Various strategies are employed such as minimizing the number of layers and/or the number of filters at the base layer of the U-net architecture (as we do for brian extraction). However, we found this to be too limiting for capturing certain brain structures such as the cortex. 2-D and 2.5-D approaches are often used with varying levels of success but we also found better performance using full 3-D information. This led us to try randomly selected 3-D patches of various sizes. However, for both the six-tissue segmentations and DKT parcellations, we found that an octant-based patch strategy yielded the desired results. Specifically, after a brain extracted affine normalization to the MNI template, the normalized image is cropped to a size of [160,190,160]. Overlapping octant patches of size [112,112,112] were extracted from each image and trained using a batch size of 12 such octant patches with weighted categorical cross entropy as the loss function. The architecture consists of four encoding/decoding layers with 16 filters at the base layer which doubled every layer.
As we point out in our earlier work 19 , obtaining proper brain segmentation is perhaps the most critical step to estimating thickness values that have the greatest utility as a potential biomarker. In fact, the first and last www.nature.com/scientificreports/ authors (NT and BA, respectively) spent much time during the original ANTs pipeline development 19 trying to get the segmentation correct which required manually looking at many images and adjusting parameters where necessary. This fine-tuning is often omitted or not considered when other groups 44,59,60 use components of our cortical thickness pipeline which can be potentially problematic 61 . Fine-tuning for this particular workflow was also performed between the first and last authors using manual variation of the weights in the weighted categorical cross entropy. Specifically, the weights of each tissue type were altered in order to produce segmentations which most resemble the traditional Atropos segmentations. Ultimately, we settled on a weight vector of (0.05, 1.5, 1, 3, 4, 3, 3) for the CSF, GM, WM, Deep GM, brain stem, and cerebellum, respectively. Other hyperparameters can be directly inferred from explicit specification in the actual code. As mentioned previously, training data was derived from application of the ANTs Atropos segmentation 13 during the course of our previous work 19 , Data augmentation included small affine and deformable perturbations using antspynet. randomly_transform_image_data and random contralateral flips.
Desikan-Killiany-Tourville parcellation. Preprocessing for the DKT parcellation training was similar to the Deep Atropos training. However, the number of labels and the complexity of the parcellation required deviation from other training steps. First, labeling was split into an inner set and an outer set. Subsequent training was performed separately for both of these sets. For the cortical labels, a set of corresponding input prior probability maps were constructed from the training data (and are also available and automatically downloaded, when needed, from https:// figsh are. com). Training occurred over multiple sessions where, initially, categorical cross entropy was used and then subsquently refined using a Dice loss function. Whole-brain training was performed on a brain-cropped template size of [96,112,96]. Inner label training was performed similarly to our brain extraction training where the number of layers at the base layer was reduced to eight. Training also occurred over multiple sessions where, initially, categorical cross entropy was used and then subsquently refined using a Dice loss function. Other hyperparameters can be directly inferred from explicit specification in the actual code. Training data was derived from application of joint label fusion 17 during the course of our previous work 19 . When calling antspynet.desikan_killiany_tourville_labeling, inner labels are estimated first followed by the outer cortical labels.
Other softwares. Several R 62 packages were used in preparation of this manuscript including R