Comparisons between multi-component myelin water fraction, T1w/T2w ratio, and diffusion tensor imaging measures in healthy human brain structures

Various MRI techniques, including myelin water imaging, T1w/T2w ratio mapping and diffusion-based imaging can be used to characterize tissue microstructure. However, surprisingly few studies have examined the degree to which these MRI measures are related within and between various brain regions. Therefore, whole-brain MRI scans were acquired from 31 neurologically-healthy participants to empirically measure and compare myelin water fraction (MWF), T1w/T2w ratio, fractional anisotropy (FA), axial diffusivity (AD), radial diffusivity (RD) and mean diffusivity (MD) in 25 bilateral (10 grey matter; 15 white matter) regions-of-interest (ROIs). Except for RD vs. T1w/T2w, MD vs. T1w/T2w, moderately significant to highly significant correlations (p < 0.001) were found between each of the other measures across all 25 brain structures [T1w/T2w vs. MWF (Pearson r = 0.33, Spearman ρ = 0.31), FA vs. MWF (r = 0.73, ρ = 0.75), FA vs. T1w/T2w (r = 0.25, ρ = 0.22), MD vs. AD (r = 0.57, ρ = 0.58), MD vs. RD (r = 0.64, ρ = 0.61), AD vs. MWF (r = 0.43, ρ = 0.36), RD vs. MWF (r = −0.49, ρ = −0.62), MD vs. MWF (r = −0.22, ρ = −0.29), RD vs. FA (r = −0.62, ρ = −0.75) and MD vs. FA (r = −0.22, ρ = −0.18)]. However, while all six MRI measures were correlated with each other across all structures, there were large intra-ROI and inter-ROI differences (i.e., with no one measure consistently producing the highest or lowest values). This suggests that each quantitative MRI measure provides unique, and potentially complimentary, information about underlying brain tissues – with each metric offering unique sensitivity/specificity tradeoffs to different microstructural properties (e.g., myelin content, tissue density, etc.).

wide age range (57 ± 18 years), we found extremely low correlations across WM structures (r = 0.004), low correlations across all structures (r = 0.23), and moderate correlations among subcortical GM structures (r = 0.45) 38,40 , and similar correlations were found for relatively young (age 29 ± 11 years), neurologically-healthy participants as well 40 . Using a cohort of healthy children, Geeraert and coworkers recently compared quantitative inhomogeneous magnetization transfer (qiMT), myelin volume fraction (MVF) using mcDESPOT, and RD using DTI, finding strong correlations between measures (r = 0.89 for qihMT vs. MVF, r = −0.66 for RD vs. MVF, r = −0.74 for RD vs. MVF) 49 . Finally, a very recent study by Ercan et al. compared inhomogeneous magnetization transfer ratio (ihMTR), T2-relaxation based MWF, and DTI metrics (FA, AD, RD, and MD), reporting a wide range of correlations between various measures (r = 0.77 for ihMTR vs. MWF, r = −0.30 for ihMTR vs. RD, r = 0.20 for ihMTR vs. FA, r = −0.19 for ihMTR vs. MD, r = 0.02 for ihMTR vs. AD) 50 . However, to the best of our knowledge, no studies to date have directly compared calibrated T1w/T2w ratios with both T2-relaxation-based MWFs (i.e., myelin-specific measures) and diffusion-based FA, AD, RD and MD (i.e., general measures of tissue microstructures) in the same cohort -which is important for validating/replicating recent studies showing that T1w/ T2w ratios may not be myelin-specific (particularly in subcortical WM structures), and for testing the notion that T1w/T2w signals might be similar to diffusion-based MRI metrics.
In this work, we therefore used six different MRI measures, including MWF, T1w/T2w, FA, AD, RD and MD to analyze brain tissue properties -both to compare them to each other and to establish normative values within a number of different brain structures. The main goals of the current work were to: (1) verify previous work comparing T1w/T2w ratio vs. MWF in a larger sample population and in more brain structures [38][39][40] , and (2) directly compare T1w/T2w ratios to various DTI metrics. In addition to potentially shedding light on the mechanisms underlying T1w/T2w ratio measurements in subcortical structures, this could also have practical implications for future multi-modal imaging studies -both in terms of prospective study planning (e.g., by informing the choices of pulse sequences to be included in study protocols) and in terms of post-hoc data analysis (e.g., by reducing the number of redundant multiple comparisons in multi-modal neuroimaging studies that can potentially arise by analyzing data using different quantitative imaging metrics).

Methods
Participants. Thirty-one healthy volunteers (15 males, 16 females) aged 18-57 years (29.6 ± 10.7 years) were enrolled from the Charles Village and Roland Park communities in Baltimore, Maryland, USA. Written informed consent was obtained from each volunteer, and all experiments were performed in accordance with the relevant guidelines and regulations of the Johns Hopkins Medical Institutions and Johns Hopkins University institutional review boards. All participants were verbally screened to confirm the absence of any current or previous neurological disorder or psychiatric disease. Participant ages (males 28.3 ± 9.9 years; females 30.9 ± 11.5 years) were not significantly different between the two groups (p = 0.47).
Data Acquisition. Participants were scanned using a whole-body 3 T Philips Achieva MRI system equipped with a 32-channel SENSE head coil (Philips Healthcare, Best, The Netherlands). For each participant, all of the GRASE data (T2w images acquired with 32 different TEs) were coregistered to the T1w TFE images and then resliced to 1 mm 3 resolution using "resize_img.m" script (http://www. cs.ucl.ac.uk/staff/G.Ridgway/vbm/resize_img.m, University College London, UK). Then, skull stripping of all the images was performed by generating participant-specific brain masks in SPM12, and refining the masks manually using the ROIEditor Toolbox in MRIStudio. The coregistered and skull-stripped mean images were then normalized to the "JHU_MNI_SS_T1_ss" template 51 in Montreal Neurological Institute (MNI) coordinate space 52 using a 12-parameter affine (linear) transformation with Automated Image Registration (AIR), followed by high-dimensional, non-linear warping with the large deformation diffeomorphic metric mapping (LDDMM) algorithm with alpha values of 0.01, 0.005, and 0.002 53 in MRIStudio's DiffeoMap Toolbox, as previously reported 54 . The alpha values constrain the amount of elasticity allowed in each iteration of the deformation, so using three iterations with cascading alpha values allows for increasingly non-linear registrations. The T1w and GRASE images were then spatially normalized to MNI space by applying the linear (affine) and non-linear LDDMM transformations before generating voxel-wise MWF maps using a regularized non-negative least squares approach and an extended phase graph algorithm to compensate for any stimulated echoes due to B1 heterogeneities 12,55 . In this way, MWF was calculated based on the ratio of T2w signal between 10-40 ms to the total T2w distribution 19 .
Whole-brain calibrated T1w/T2w maps were generated for each participant using the T1w TFE image and the T2w GRASE image with TE = 140 ms, as recently recommended for GRASE-based T1w/T2w calculations 40 . Bias correction and calibration were performed to all of the images using previously described methods 29 , after co-registering T2w GRASE images with T1w TFE images. Intensity calibration was performed in native-space and without skull-stripping because the procedure requires signals from eyeballs and temporal muscles, but binary masks -obtained using T1w images and FSL's brain extraction tool (BET) 56 with a fractional intensity threshold of 0.45 -were then applied to the resulting T1w/T2w ratio maps. The skull-stripped T1w/T2w maps were then spatially normalized to MNI space by applying the same linear (affine) and LDDMM-based deformation described above.
After coregistering each participant's diffusion weighted data with T1w anatomical images, DTI images were preprocessed using CATNAP (Coregistration, Adjustment, and Tensor-solving, a Nicely Automated Program; http://iacl.ece.jhu.edu/~bennett/catnap/, JHU School of Medicine, Baltimore, Maryland, USA) to correct for motion artifacts and coregister the DTI images to the reference images (i.e., the mean b = 0 s/mm 2 image) using 12-parameter (affine) registration, which additionally corrects for eddy current distortions 13 . CATNAP then automatically calculated the six tensor images (dxx, dyy, dzz, dxy, dxz, dyz) and three diagonalized eigenvalue images (λ 1 , λ 2 , λ 3 ) using a log-linear minimum mean squared error (LLMMSE) approach 57 , which assumes that noise is independently and identically distributed in a Gaussian fashion (as previously supported for SNR values greater than 2:1) 58 . After skull-stripping the mean b = 0 s/mm 2 image and six tensor images, the same linear (affine) and non-linear LDDMM-based approach (described above) was applied to spatially normalize the mean b = 0 s/mm 2 image to the "JHU_MNI_SS_b0_ss" template in MNI space, and the resulting deformation field was applied to spatially normalize the three eigenvalue images before finally generating voxel-wise fractional anisotropy (FA), axial diffusivity (AD), radial diffusivity (RD), and mean diffusivity (MD) maps using the DTIStudio Toolbox 59 in MRI studio.
Region of interest segmentation. After generating the MWF, T1w/T2w, FA, AD, RD and MD maps, region-of-interest (ROI) analyses were performed for each participant using MRIStudio's ROIEditor Toolbox. 3D ROIs, listed in the JHU_MNI_SS ('Eve') atlas, were chosen for 25 brain regions, including 15 WM structures and 10 subcortical grey matter (GM) structures (Fig. 1) For bilateral structures, MWF, T1w/T2w ratio, FA, AD, RD and MD values were averaged over both the left and right hemispheres to yield a mean value for each metric, ROI, and participant (i.e., 6 metrics × 25 ROIs × 31 participants = 4,650 unique data points).

Statistical analysis.
After extracting all of the raw values from each brain region, three types of statistical analyses were performed, and all statistical analyses were performed using MATLAB 2017a (The Mathworks Inc., Natick, MA, USA) and SPSS version 20 (IBM Inc., Armonk, NY, USA).
First, paired-sample t-tests were performed to investigate overall differences between tissue types (i.e., WM vs. subcortical GM) for each MRI metric. In order to achieve this, raw values for each metric were averaged across all 15 WM structures (i.e., to generate a mean WM value) and all 10 subcortical GM structures (i.e., to generate a mean GM value) for each participant. Paired t-tests were then performed using the corresponding WM and GM values from each participant; and due to strong a priori directional hypotheses (e.g., MWF higher in WM than GM, etc.), one-tailed t-tests were used. Given that there are 6 different MRI metrics, the overall type-I error rate (due to multiple comparisons) for each of the paired-sample, one-tailed t-tests was controlled by applying a post-hoc family-wise error (Bonferroni) correction -i.e., requiring a threshold of p < 0.008 (0.05/6) for any given t-test to be deemed statistically significant.
Second, Pearson (linear) and Spearman (rank) correlations were initially used to examine the relationships between the different MRI measures after combining data points across all 25 structures, across only the 15 WM structures, and across the 10 subcortical GM structures -in order to reveal overall trends (i.e., including potentially large differences in tissue properties between structures). However, Pearson and Spearman correlations were then performed to investigate the relationships between different MRI measures within each structure separately. Given that there are 15 unique between-method comparisons, the overall type-I error rate (due to multiple comparisons) for each of the Pearson and Spearman correlations was controlled by applying a post-hoc family-wise error (Bonferroni) correction -i.e., requiring a threshold of p < 0.003 (0.05/15) for any given Pearson or Spearman correlation to be deemed statistically significant.
Finally, in order to facilitate comparisons between measures with different intensity scales and/or units, the raw values of each MRI measure (i.e., MWF, T1w/T2w ratio, FA, AD, RD and MD) were standardized (z-scored) across all 25 structures. However, before calculating the z-scores, values for RD and MD were inverted (i.e., 1/ RD and 1/MD, denoted as RD −1 and MD −1 ), in line with similar recent analyses 49 in order to produce analogous contrasts (i.e., positive z-scores reflecting greater microstructural integrity and negative z-scores reflecting lower microstructural integrity) across measures. Repeated measures ANOVAs (aka, ANOVAs for correlated samples) were then performed independently for each brain region to test for differences (F-statistics) between the mean z-scores of the six quantitative metrics. Given that there are 25 ROIs, the overall type-I error rate (due to multiple comparisons) was controlled by applying a post-hoc family-wise error (Bonferroni) correction -i.e., requiring a threshold of p < 0.002 (0.05/25) for any given F-statistic to be deemed statistically significant. Figure 1 shows a T1-weighted anatomical image with all 25 ROIs (15 WM structures and 10 subcortical GM structures) overlaid, Fig. 2 shows example MWF, T1w/T2w, FA, AD, RD and MD maps obtained from a representative healthy volunteer, and Table 1 lists the mean, standard deviation (SD), and coefficient of variation (COV; also known as relative standard deviation) between participants for the MWF, T1w/T2w ratio, FA, AD, RD and MD values in each ROI. Interestingly, the average COV across all structures was highest for MWF (COV = 19.5) and RD (COV = 19.4), followed by T1w/T2w ratio (COV = 14.9), . Moreover, although measurement variability showed no apparent dependence on tissue type for T1w/T2w ratio, FA and MD measures (p > 0.05, uncorrected), COVs tended to be higher in subcortical GM structures for MWF and AD, and lower for RD (all three p < 0.02, uncorrected) compared to COVs from WM structures (based on two-tailed, two-sample t-tests).

MWF (%)
T1w/T2w FA AD ×10 −3 (mm 2 /s) RD ×10 −3 (mm 2 /s) MD ×10 −3 (mm 2 /s)  www.nature.com/scientificreports www.nature.com/scientificreports/ Quantitative values in WM vs. subcortical GM structures. Figure 3 shows box and whisker plots for each MRI metric, broken down by tissue type (i.e., the mean values from the 15 WM structures and 10 GM structures). As expected, the paired-sample, one-tailed t-tests revealed that WM structures had significantly higher MWF (t = 40.07), T1w/T2w (t = 13. Correlations between quantitative MRI measures. Pearson (linear) correlations between every combination of MRI measures are presented for each brain structure in Table 2, and the corresponding Spearman (rank) correlations are presented in Table 3. Based on a comparison of these tables, the Pearson and Spearman correlations were in close agreement -where 34/375 (9.1%) Spearman correlations were found to be uniquely significant and only 2/375 (0.5%) Pearson correlations were found to be uniquely significant. To summarize, both Pearson and Spearman correlations were found to be significant between MD vs. AD for the majority of the 25   www.nature.com/scientificreports www.nature.com/scientificreports/ brain structures investigated, and the differences between Pearson and Spearman analyses were primarily due to uniquely significant rank (Spearman) correlations between RD vs. AD in 11/25 ROIs, between MD vs. RD in 19/25 ROIs and between RD vs. FA in 3/25 ROIs (Tables 2 and 3). Neither Pearson nor Spearman correlations between different MRI measures were found to be statistically significant in more than 1/25 brain structures after Bonferroni correcting for the number of between-metric comparisons (p < 0.003).
However, both Pearson and Spearman correlations of the data combined across all 31 participants and all 25 structures (Fig. 4, Table 4       www.nature.com/scientificreports www.nature.com/scientificreports/ was statistically significant, the corresponding Pearson correlation was not; and although the Pearson correlation for MD vs. MWF (r = 0.21) was statistically significant, the corresponding Spearman correlation was not. Relationships among WM structures for FA vs. T1w/T2w, AD vs. T1w/T2w, RD vs. T1w/T2w, MD vs. T1w/T2w and RD vs. AD were not significant based on either Pearson or Spearman correlations.

MD vs. RD
Finally, both Pearson and Spearman correlations of the data combined across all 31 participants and the 10 GM structures (Fig. 6, Table 6  www.nature.com/scientificreports www.nature.com/scientificreports/   Table 6. Pearson (lower triangle) and Spearman (upper triangle) correlation coefficients with p-values (indicated in parentheses) between different MRI measures across all 10 grey matter structures. Corresponding data plots are shown in Fig. 6. Note: Bold font indicates correlations that were statistically significant after correcting for multiple comparisons using a Bonferroni correction (p < 0.003).
www.nature.com/scientificreports www.nature.com/scientificreports/ Comparison of z-scored MRI metrics between brain regions. Figure 7 presents the means and 99% confidence intervals (CI) for all six MRI measures (broken down by ROI) as standardized z-scores across all 25 brain structures. Although, some of the structures (e.g., ACR, PTR, RLIC, Caud, GCG, PCC) showed reasonably good correspondence between metrics (as indicated by differences in mean z-scores < 1 and many overlapping 99% CIs), several other regions (e.g., BCC, CP, GCC, SCC, SFO, INS, Put) exhibit poor correspondence between metrics (as indicated by differences in mean z-scores > 2 and relatively few overlapping 99% CIs). Furthermore, no consistent relationships between the six metrics were observed. For example, the body of the corpus callosum (BCC) appeared to have an average MWF (z-score ≈ 0.25), high FA (z-score ≈ 1.25) and low MD −1 (z-score ≈ −1.00) compared to other structures; the cerebral peduncles (CP; displayed immediately to the right of the BCC in Fig. 7) appeared to have an extremely high MWF (z-score ≈ 2.50), high FA (z-score ≈ 1.25) and low MD −1 (z-score ≈ −1.00); and the external capsule (EC; displayed immediately to the right of the CP in Fig. 7) appeared to have a slightly lower than average MWF (z-score ≈ −0.75), average FA (z-score ≈ 0.00) and slightly higher than average MD −1 (z-score ≈ 0.75).
These observations (from Fig. 7), were further underscored by the repeated measures ANOVAs, to formally compare the mean z-scores (from all 6 MRI metrics) within each region. The resulting F-statistics (Table 7), revealed that at least one of the MRI measures was significantly different from the others in all 25 ROIs, even after Bonferroni correction -suggesting that the different MRI measures have different sensitivities and/or specificities to the underlying tissue properties in different brain regions.

Discussion
In this cross-sectional study, we have measured and characterized relationships between six quantitative brain MRI measures, including multi-component T2-based MWF, T1w/T2w ratio, and four different DTI measures (i.e., FA, AD, RD and MD) in a cohort of 31 neurologically healthy participants. Our results confirmed that the values of all six methods showed significant differences between WM and subcortical GM structures, where MWF, T1w/T2w, FA and AD values were significantly higher in WM regions and both RD and MD values were significantly higher in subcortical GM regions (Fig. 3). Our work also indicated that these six measures were generally correlated with each other (to varying degrees) across all participants and the 15 WM and/or 10 subcortical GM structures examined in the current study. However, correlations were generally found to be less statistically significant within individual ROIs (except for the three DTI diffusivity metrics, which were significantly inter-related based on Spearman correlations); and z-scoring the values within each metric did not reveal consistent relationships (across brain regions) between many of the metrics. Taken together, this suggests that, while all six measures are indicators of brain microstructure, different metrics are sensitive to distinct combinations of tissue characteristics and/or local artifacts.
Consistent with previous work 48 , our results indicate that FA (which measures the directional homogeneity of water diffusion) had the strongest and most significant correlations with MWF (which is thought to be the most  Fig. 1, and values for RD and MD were inverted (RD −1 and MD −1 ) prior to z-scoring, in order to produce analogous contrasts (i.e., positive z-scores reflecting greater microstructural integrity and negative z-scores reflecting lower microstructural integrity) across measures.
www.nature.com/scientificreports www.nature.com/scientificreports/ myelin-specific measure) after pooling data across all 25 structures (Fig. 4, Table 4), the 15 WM structures (Fig. 5, Table 5) and the 10 GM structures (Fig. 6, Table 6) -with T1w/T2w ratio also showing significant positive correlations with MWF after pooling data across all 25 structures, the 15 WM structures and the 10 GM structures. Specifically, FA and MWF had 53% shared variance among all 25 brain structures (r 2 = 0.53), 15% shared variance among WM structures (r 2 = 0.15) and 28% shared variance among GM structures (r 2 = 0.28), implying that a reasonably large proportion of FA values could conceivably be related to myelin (noting the inherent limitations of correlation vs. causation). On the other hand, T1w/T2w ratios only shared 11% variance with MWF among all brain structures (r 2 = 0.11), 6% among WM structures (r 2 = 0.06), and 6% among GM structures (r 2 = 0.06), suggesting that T1w/T2w ratio measurements are likely dominated by factors other than myelination -a finding that is consistent with other recent work 38,39 . Nevertheless, although the shared variance was small between T1w/T2w ratio and MWF, the statistical significance of these correlations is due to the large number of data points included in the correlations (e.g., 31 participants × 25 ROIs = 775 data points), suggesting that myelination contributes to a small but statistically significant portion of T1w/T2w ratio measurements.
As expected, pooling data across all 25 structures yielded a significant positive correlation for AD vs. MWF and significant negative correlations for RD vs. MWF and MD vs. MWF (based on both Pearson and Spearman correlations); and pooling data across the 15 WM ROIs yielded a significantly positive Spearman correlation for AD vs. MWF and a significantly negative Spearman correlation for RD vs. MWF. However, pooling data across the 10 subcortical GM ROIs yielded significantly negative correlations for RD vs. MWF and MD vs. MWF (based on both Pearson and Spearman correlations), with no significant relationships identified between AD and MWF.
Overall, the relationship between FA vs. MWF (positive correlation) was found to have the largest effect size across all 25 brain structures, and the next largest effect sizes were observed for AD vs. FA (positive correlation), RD vs. FA (negative correlation), MD vs. AD (positive correlation) and MD vs. RD (positive correlation) -all of which had more than 25% shared variance (r 2 > 0.25). Therefore, combined with the many significant within-ROI Spearman correlations between AD, RD and MD (Table 3), it appears -perhaps not surprisingly -as though the four diffusion metrics (particularly the three diffusivity measures) reflect largely similar, albeit not identical, characteristics of the underlying tissues.
The fact that relatively few correlations were found within individual structures likely has two primary causes. The first is that our study sample was comprised of neurologically-healthy control participants, who presumably have relatively consistent tissue microstructure within any given brain region (e.g., compared to patient populations). Thus, if there is a small range of values along either (or both) dimension(s), then any true correlations are more likely to be obscured by variance owing to small amounts of measurement error (e.g., due to signal noise limitations, etc.). Moreover, from a statistical standpoint, the measurements within each region have much lower  www.nature.com/scientificreports www.nature.com/scientificreports/ power because the sample size (i.e., number of data points = number of participants = 31) is much smaller than correlations across several different brain structures (e.g., number of data points = number of participants × number of ROIs = 775 across all 25 brain regions) [see Study Limitations section below for more details].
Nonetheless, as expected, WM structures (on average) had significantly higher MWF, T1w/T2w ratio, FA and AD -and lower RD and MD values -compared to GM structures (Fig. 3). However, by comparing the z-scored values of all six MRI measures across different structures (Fig. 7), there do not appear to be any systematic relationships between the various metrics (i.e., any particular metric being high or low did not consistently predict whether any other metric is high or low across structures), indicating their sensitivities to different underlying factors. For example, the observation of high MWF (z-score ≈ 2.50) and high FA (z-score ≈ 1.25) values in the cerebral peduncles (CP; collectively made up of the corticobulbar, corticopontine, and corticospinal fibers) is consistent with the fact that these structures are both highly myelinated and have well-organized fiber orientations. However, other structures known to have highly uniform fiber orientations (e.g., genu of the corpus callosum; GCC) 60 exhibited relatively high FA (z-score ≈ 1.75) despite a moderate MWF (z-score ≈ 0.25), suggesting that the degree of fiber coherence contributes to FA more in this region than can be explained by myelin alone 41,61 . Interestingly, this same relationship held true for both the body and splenium of the corpus callosum (BCC and SCC) as well, while white matter structures that are known to have complex fiber geometries (e.g., superior corona radiata, superior fronto-occipital fasciculus and superior longitudinal fasciculus; SCR, SFO and SLF) 60 along with many of the subcortical GM ROIs -tended to have lower FA z-scores than MWF z-scores (as might have been expected).
Finally, although the current findings support previous reports that T1w/T2w ratio is not particularly specific to myelin concentration [38][39][40] , one of the other objectives of the study was to determine if T1w/T2w values might correlate better with other general measures of tissue microstructure, such as FA, AD, RD and/or MD values. Interestingly, however, the metric most highly correlated with T1w/T2w ratio across all 25 brain structures was MWF (r = 0.33), followed by FA (r = 0.25), and then AD (r = 0.14), corresponding to 11% shared variance with MWF, 6% shared variance with FA, and 2% shared variance with AD overall; and the shared variance between T1w/T2w vs. MWF and T1w/T2w vs. FA falls to 6% and 0% (respectively) across WM structures, and 6% and 8% (respectively) across GM structures. It is also noteworthy that all but 2 out of the 9 between-metric relationships that failed to reach statistical significance with either Pearson or Spearman correlations (Tables 4-6) -i.e., RD vs. AD in white matter (Table 5) and AD vs. MWF in gray matter (Table 6) -involved T1w/T2w ratios. Taken together, our results therefore suggest that: (1) T1w/T2w ratios are sensitive to unique aspects of tissue microstructure that are largely independent of either myelin-based or diffusion-based metrics, and (2) the differences between T1w/T2w ratios and the diffusion-based metrics are particularly striking among white matter regions (see Table 6).

Study Limitations.
As with any scientific experiment, the results of the current study must be interpreted within the context of known limitations.
First, although our sample size is comparable to similar previous studies 29, [38][39][40][48][49][50] , it is worth noting that it is perhaps on the lower end for performing some of these types of comparisons. For example, one-tailed t-tests and/or correlations (since the general directional relationships between MWF, T1w/T2w ratio, FA, AD, RD and MD were already known) with sample sizes of n = 31 allow intermediate correlations with r ≥ 0.35 (or ρ ≥ 0.35) to be detected with 95% confidence and 80% power (i.e., alpha = 0.05 and beta = 0.20) for each comparison. Therefore, especially after correcting for multiple comparisons, statistical significance for within-ROI comparisons (Tables 2 and 3) could only be reached for large correlations with r ≥ 0.5 (or ρ ≥ 0.5). However, for correlations across brain structures (e.g., across 10 GM regions, 15 WM regions, or all 25 regions), it is worth pointing out that much smaller effect sizes become readily detectable (e.g., r ≥ 0.11 [uncorrected] or r ≥ 0.17 [corrected] for 31 participants × 10 regions = 310 data points) with the same statistical confidence and power.
Second, although we employed advanced image-processing steps -e.g., a well-validated LDDMM algorithm for high-dimensional nonlinear spatial normalization, bias correction and intensity calibration for the T1w/T2w ratio calculations, and stimulated echo correction for the MWF maps -we cannot rule out the possibility that small differences in coregistration, partial volume averaging or other effects due to imperfect preprocessing may have contributed some variability between imaging approaches. It is worth noting, however, that the ROI-based analyses used in the current study (i.e., averaging individual signals across entire brain structures) should mitigate these types of misalignment and/or partial volume effects compared to voxel-wise analysis approaches.
Third, MWF estimates in GM structures are likely to be less accurate than in WM structures due to inherently lower contrast-to-noise ratio (CNR), and MWF in the internal capsule may be slightly overestimated due to the use of the NNLS approach 62 and an overlap between intra/extracellular water and myelin water signal in that region.
Fourth, a relatively short TR was used to acquire the GRASE images, thereby imparting a small degree of T1-weighting in otherwise T2w GRASE images used for MWF and T1w/T2w ratio calculations. However, other recent work in our lab has validated that T2w GRASE images can be used in order to calculate T1w/T2w ratios that are comparable to those obtained using FSE-based T2w images 38 .
Finally, given that the current experiment was purely empirical (i.e., measuring and comparing the different MRI values), we have avoided detailed discussion or speculation about the underlying mechanisms related to each imaging modality and what factors might be driving the observed differences between methods, beyond what is already well known. However, more thoroughly understanding the mechanisms underlying these signals is an important topic that has been explored by comparing various quantitative MRI approaches to ex vivo histological preparations -and there have been some very recent advances using optically cleared tissues that will likely open new avenues for better understanding what aspects of tissue microstructure these MRI signals are measuring at the cellular and sub-cellular levels 63 . (2019) 9:2500 | https://doi.org/10.1038/s41598-019-39199-x www.nature.com/scientificreports www.nature.com/scientificreports/

Conclusions
Overall, all six of the evaluated quantitative MRI measures (MWF, T1w/T2w ratio, FA, AD, RD and MD) were found to be correlated with each other to varying degrees. The strongest correlation observed was between MWF and FA, which shared 53% variance across all 25 brain structures. However, the mean values (and z-scores across structures) differed between measures in several brain regions, and these differences can likely be attributed to unique sensitivities of the T1w/T2w ratios and diffusion-based measures to non-myelin factors, including: white matter fiber orientation (e.g., crossing fibers), proton density (e.g., tissue swelling), neural and glial density (e.g., axonal packing), iron/mineral accumulation, and/or local image artifacts. Importantly, the current study verifies previous work showing that calibrated T1w/T2w ratios differ from MWF estimates, and therefore should not be interpreted as myelin-specific measurements in subcortical brain structures; however, in revealing differences between T1w/T2w ratios and FA, AD and MD, it also appears that T1w/T2w ratios provide different/ additional information about the underlying tissue characteristics compared to diffusion-based measurements. Given these differences, it stands to reason that a combination of methods may provide complimentary information. However, practical considerations such as available hardware and overall scan time are likely to dictate how many (and what types) of pulse sequences can be acquired in any given protocol; and although using multiple imaging modalities may shed additional light on brain development, aging and/or pathology, repeating statistical analyses using different quantitative MRI metrics can decrease statistical power (i.e., if corrections for type-I error due to multiple comparisons are required). Therefore, information about how the various metrics are related to each other can hopefully be used to inform decisions about what types of data to acquire and analyze in future multi-modal neuroimaging studies (e.g., to maximize sensitivity and/or specificity to certain brain pathologies while minimizing scan time and data redundancy).

Data Availability
Unfortunately, the raw MRI data generated and analyzed in the current study cannot be made publicly available due to Research Ethics Board restrictions governing the storage and use of human neuroimaging data. However, anonymized images could potentially be made available upon special request from the corresponding author (pending the approval of the Research Ethics Board). Nonetheless, we have made our summary data -i.e., all raw MRI metrics (MWF, T1w/T2w ratio, FA, AD, RD, MD), broken down by participant and ROI -available as a supplementary spreadsheet accompanying this article.