Spatial regression analysis of MR diffusion reveals subject-specific white matter changes associated with repetitive head impacts in contact sports

Repetitive head impacts (RHI) are a growing concern due to their possible neurocognitive effects, with research showing a season of RHI produce white matter (WM) changes seen on neuroimaging. We conducted a secondary analysis of diffusion tensor imaging (DTI) data for 28 contact athletes to compare WM changes. We collected pre-season and post-season DTI scans for each subject, approximately 3 months apart. We collected helmet data for the athletes, which we correlated with DTI data. We adapted the SPatial REgression Analysis of DTI (SPREAD) algorithm to conduct subject-specific longitudinal DTI analysis, and developed global inferential tools using functional norms and a novel robust p value combination test. At the individual level, most detected injured regions (93.3%) were associated with decreased FA values. Using meta-analysis techniques to combine injured regions across subjects, we found the combined injured region at the group level occupied the entire WM skeleton, suggesting the WM damage location is subject-specific. Several subject-specific functional summaries of SPREAD-detected WM change, e.g., the \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${L}^{\infty }$$\end{document}L∞ norm, significantly correlated with helmet impact measures, e.g. cumulative unweighted rotational acceleration (adjusted p = 0.0049), time between hits rotational acceleration (adjusted p value 0.0101), and time until DTI rotational acceleration (adjusted p = 0.0084), suggesting RHIs lead to WM changes.

MR imaging acquisition and processing. Diffusion MR imaging, also known as DTI, was performed on all subjects before the start of the football season in August (pre-season) and within 1 week of the season's end in November (post-season). All subjects had two DTI scans approximately three months apart. Imaging was acquired using a single 3 T Siemens Trio scanner (Siemens Healthcare, Erlangen, Germany) using Numaris 4 software version 17B. The matrix head coil had 32 channels, and the scanning parameters were: total acquisition time of 11 min, spin-echo echo planar imaging, bandwidth of 1502 Hz/Px, parallel acceleration technique of generalized autocalibrating partially parallel acquisition, TR/TE = 9100 ms/89 ms, voxel size 2 × 2 × 2 mm, 69 diffusion directions with b = 1200 s/mm 2 and 10 averages of b = 0. The TR was chosen to minimize T1 weighting influences on the diffusion images and maximize the signal to noise ratio. TR times of similar duration have been used in similar studies that assess WM damage in individuals with head trauma 26 . FSL-5.0.9 was used for all preprocessing of the diffusion data (FSL; www.fmrib .ox.ac.uk). Fugue and eddy packets in FSL were used to correct for magnetic susceptibility distortions, motion, eddy currents, and brain extraction 27 . DTIFIT, an FSL packet, was used to create global maps of fractional anisotropy (FA). For time-point comparisons, the FA map for each subject's post-season FA map was non-linearly registered to their pre-season FA map using the FSL function, FNIRT 28 . The R package "oro.nifti" was used to load NIfTI image data 29 . The registered images were noted to have ringing artifacts due to the sharp transitions near edges, which would detrimentally affect later image processing 30 . We developed a simple and computationally efficient method, known as "MountDoom" in our software package, to remove these artifacts before the subsequent analyses ( Supplementary Fig. 4). We first determined the foreground (true brain signals) and the background using DTIFIT, and then removed all voxels in the foreground with Euclidean distance to the background less than or equal to a pre-specified threshold, which was set to 4 mm, and represents two voxels apart in six major directions and a 3D neighborhood with 32 voxels. After this step, we cropped the images to the individual's smallest three-dimensional rectangle, which Scientific RepoRtS | (2020) 10:13606 | https://doi.org/10.1038/s41598-020-70604-y www.nature.com/scientificreports/ contained all the voxels with nonzero FA values to reduce the dimensionality and computational cost. We utilized SPREAD for image analysis.
SPREAD analysis of DTI. The SPREAD model used in the current study is an adaptation of the original model developed by Zhu and colleagues 21 (Fig. 1).
The original model has the capability to incorporate multiple scans collected from multiple subjects at several time points. In the original model, the voxel-wise summary statistic was calculated as the temporal standard deviation of averaged FA or mean diffusivity values for each subject, and each subject's value was summed to get the final summary for one voxel. In our study, only the pre-and post-season scans are available for each subject and we decided to conduct subject-specific SPREAD analysis due to the high-level of heterogeneity among subjects (Fig. 2).
In this special case, the temporal standard deviation of FA values for a voxel is monotonically equivalent to the absolute value of the difference between the pre-and post-scans at this voxel, and we do not average this summary statistic over all subjects (Supplementary Methods). Using this simplification, we streamlined the code, which is mathematically equivalent to the original model. First, we randomly permuted the time labels of each voxel to create permutation samples, represented by k in our spatial regression model. We then fitted FA values, which were calculated by applying spatial regression based on multivariate Nadaraya-Watson kernel to each permutation sample. The test statistics were computed as the voxel-wise absolute difference between the fitted FA values of the pre-and post-scans for each permutation sample. Next, the permutation p values were computed by comparing the original, before permutation, to the absolute differences of FA values from their permuted counterparts. With the large number of hypotheses tested, we used Benjamini-Hochberg (BH) procedure to control the false discovery rate, and Westfall-Young (WY) procedure to control the family-wise error rate [30][31][32] .   www.nature.com/scientificreports/ k = 1000 permutations for each subject. These global summary statistics and p values were used in group-level analysis. We performed group-level inferences based on associating the global functional summary statistics with clinical outcomes and combining voxel wise p value maps by a novel robust meta-analysis method. The SPREAD method is illustrated in Fig. 1.
SPREAD parameter optimization. SPREAD was originally developed for detecting small WM lesions, typically under 5 voxels in size, seen with multiple sclerosis 21 . Because the size of the "abnormal" region after RHI is not known, the parameters used in the original SPREAD procedure might not be appropriate. We designed a simulation study to select the optimal tuning parameters of the SPREAD method for analyzing WM changes occurring in the setting of RHI. Specifically, we studied the performance of SPREAD with the following parameters in a series of simulations involving a pre-post image pair from a single collegiate non-athlete subject: (1) we superimposed artificial "lesions" to different regions of the post-season scan with three sizes: radius of 5 voxels (small), radius of 12 voxels (medium), and radius of 48 voxels (large); (2) various choices of bandwidth, ranging from three to 25 voxels, were used in the Gaussian kernel function; (3) BH and WY were used to obtain the adjusted p value maps; and (4) seven different p value thresholds were used to define statistically significant voxels based on the adjusted p value maps (1e −4 , 5e −4 , 1e −3 , 5 e−3 , 0.01, 0.05, and 0.1) ( Supplementary Fig. 1). We assessed true positive and false positive rates to find the optimal combination of bandwidth, multiple comparison method, and p value threshold for each lesion size.

SPREAD outputs.
From the SPREAD algorithm, we summarized the following outputs: number of significantly changed voxels (NSV), total difference of significant voxels (TDSV), absolute difference of significant voxels (ADSV), and the L p norms of the fitted FA map. TDSV was calculated by taking the sum of the differences for significantly changed voxels between pre-season and post-season scans for a single subject. ADSV was calculated by taking the sum of the absolute difference for significantly changed voxels between pre-season and post-season scans for a single subject. L 1 is the absolute difference between pre and post-season scans of the fitted FA values averaged over all voxels. The L 2 norm is the square root of the average of squared difference of the fitted FA values. The L ∞ norm is the maximum of absolute difference of the fitted FA values. TDSV, ADSV, L 1 , L 2 , L ∞ have the units of FA, which is a unitless scalar between 0 and 1.
In addition, we conducted a meta-analysis based on a novel robust p value combination test based on betadistributions. The objective of this analysis is to utilize p value maps, which is an image that marks voxels showing significant change from the pre and post-season scan comparison. We combine the p value maps of all 28 subjects into a single group-level p value map, and then select significant regions based on this single map (Supplementary Methods).
Head impact exposure. Head impact exposures were estimated using the Head Impact Telemetry system (HITs) (Simbex, Lebanon, NH), which has accelerometers embedded in Riddell Revolution IQ Helmets (Riddell Corporation; Elyria, OH). HITs provides five helmet-based impact measures (HIM) for each impact > 10 g's of linear acceleration: linear acceleration (LA), rotational acceleration (RA), Head Injury Criterion 15 (HIC 15), Gadd Severity Index (GSI), and Helmet Impact Technology severity profiles (HITsp), as well as time of impact and impact location. The 10 g threshold was chosen as it is the lowest threshold to be set by the helmets, and we were looking to record all hits that may lead to white matter damage and therefore went for the high sensitivity approach.
An equipment manager made sure all helmets were fitted to the player and connected to the computer before every contact session. A sideline computer monitored by a research coordinator at every helmeted session collected the data wirelessly. In addition, the research coordinator monitored all equipment during play, and performed daily data scrubbing to remove any recordings that occurred while the helmet was not on the subject's head.
Accelerometer output for an entire season was summarized with six different helmet impact metrics : mean HIMs, peak HIMs, cumulative unweighted (CUW) HIMs, and three cumulative time-weighted metrics that have been previously described in detail 33 : (1) time between hits (TBH) weights HIMs for the current hit based on the magnitudes of the previous hits and the time between all previous hits and the current hit; (2) time until assessment (TUA) weights HIMs for the current hit based on the number of days between the current hit and the post-season DTI scan; (3) a combination of the TBH and TUA algorithms referred to as TBH + TUA.
Subjects undergoing RHI were monitored for concussion using methods described elsewhere in detail 33 . In brief, certified athletic trainers were present at all practices and games monitoring the athletes for concussion. Concussion was defined by the Sport Concussion Assessment Tool 2, which requires at least one of the following: symptoms (e.g. nausea), physical signs (e.g. loss of consciousness), impaired brain function (e.g. impaired memory), or abnormal behavior 34 .
Clinical assessment of concussion-related cognitive impairment. We used the Immediate Post-Concussion and Cognitive Testing (ImPACT) (ImPACT Applications, inc) and the balance error scoring system (BESS) tests as our clinical measurements at pre-season and post-season evaluation for all 28 subjects 35,36 . The ImPACT test is a computerized test that looks at several different measurements of memory, reaction time and visual speed; from the ImPACT test, we measured verbal memory score, visual memory score, visual motor speed, and reaction time 35  and inter-quartile ranges (IQR) were reported; for categorical variables, sample frequencies and percentages were reported. We used a novel robust p value combination test to assess the anatomic location for WM changes between the subjects. The primary statistical analysis is to correlate SPREAD outputs with helmet impact metrics. Considering the uncertainty of normality of the six helmet impact metrics, Spearman's nonparametric correlation test was employed with the Benjamini-Hochberg procedure to control for multiple testing. Paired t-test was used in an auxiliary analysis to compare clinically defined, concussion-related cognitive test scores before and after the football season. We defined statistically significant findings for all testing to be those with (adjusted) p value < 0.05, and all tests were two-tailed. All statistical analyses were performed in R 3.3.0 37 .

Results
Demographics. The characteristics of the 28 subjects recruited in our study are summarized in Table 1.
Among them, three subjects sustained a mid-season concussion (Supplementary Table 1).

SPREAD parameter optimization.
Using our simulation study to optimize the bandwidth parameter for RHI, we found the optimal bandwidth values to be five for small signal (radius of 5 voxels), 10 for medium signal (radius of 12 voxels), and 19 for large signal (radius of 48 voxels) ( Table 2). We discovered that the optimal bandwidth value appears to be positively related to the signal size, i.e. the smaller the signal is, the smaller the optimal bandwidth is, which can help determine optimal bandwidth values based on lesion size created by the disease process. We also noticed that using very large bandwidth values for large signal tends to result in high levels of false positives (Supplementary Fig. 1). Based on these considerations, bandwidths of 3, 5, 10, and 15 were used in the SPREAD algorithm when analyzing the current dataset.
Anatomic distribution of WM changes detected using SPREAD. When SPREAD results from pre-post comparisons for all 28 athletes were combined into a single p value map, we observed the significant changes to be scattered diffusely across the entire white matter region of the brain (Fig. 3).
When viewing all 28-individual p value maps, we saw that the SPREAD detected region of WM injury after RHI was highly heterogeneous among the athletes, and we did not identify a contiguous region with significant WM change that is common for all 28 subjects. Illustrative single subject raw p value maps can be seen in Fig. 2. All 28 raw p value maps can be seen in Supplementary Fig. 2.
Further investigations showed that most of those detected voxels were associated with decreased FA values. For example, using p < 0.002 as the p value threshold, 106,148 voxels were selected by the SPREAD procedure

Relationship between SPREAD detected WM changes and helmet impact metrics.
The number of head hits over a single season of collegiate football ranged from 37 to 1057 hits with an average of 379 hits per athlete (Supplementary Table 2). We performed correlations between the 30 HIM and helmet impact metric combinations, with the 12 SPREAD derived functional norms and output-bandwidth combinations (Table 3;  Supplementary Tables 3 and 4). The most significant correlations with the helmet impact metrics involved using L ∞ norm with a bandwidth of 15. An illustrative example is seen in Fig. 4. In addition, a subset of cumulative time weighted helmet impact metrics is significantly correlated with the L 2 norm with a smaller bandwidth (Supplementary Table 4). All significant correlations in this association analyses were positive, suggesting that WM structural changes were induced by physical impacts, and the stronger the physical impacts, the greater the maximal injury. The specific metrics that showed significant correlations with L ∞ were the cumulative metrics, all weighted metrics and the CUW metric with Spearman's ρ values ranging from 0.4056 to 0.6327. The CUW metric had the strongest correlations to L ∞ when compared to all other metrics; however, these differences were small (Table 3).
Clinical data. We found no significant changes in ImPACT testing scores between pre and post-season testing (Table 4). There was a significant decrease in the BESS test scores with fewer errors seen on average in the post-season evaluation compared to the pre-season evaluation (p = 0.003, Table 4). The improvement in the BESS score is most likely a function inter-rater variability as well as a mild practice effect with repeated exposures to the test 38 .

Discussion
Our study focused on the ability of SPREAD to identify subject-specific WM changes after one football season, and how SPREAD outputs were associated with multiple single-season summaries of helmet data. It yielded three important findings: (1) SPREAD were able to detect subject-specific FA changes after a single football season; (2) the location of injury is quite heterogeneous and diffuse when looking at the athletes' injury profiles individually and as a group, respectively; (3) the helmet impact metrics had strong associations with the SPREAD outputs, and more specifically the cumulative metrics appeared to have the strongest associations with the WM changes seen on DTI.
Longitudinal WM changes on DTI have been shown in athletes in multiple studies 15,23 . We detected longitudinal WM changes for the athletes after a single football season, and the identified voxels were overwhelmingly (> 93%) associated with decreased FA values. A decrease in FA is a sign of less unidirectional diffusion and has been reported by others in the context of concussion. The pathological underpinning of decreased FA is controversial but is thought to reflect myelin damage and/or axonal loss 13,39,40 . SPREAD provided a unique opportunity to visualize the injury distribution among the athletes not allowed by previous methods.
Using the SPREAD algorithm and meta-analysis techniques, we were able to create a single common map of injury among all the athlete data. We showed that the athletes as a group had large diffuse areas of injury throughout the white matter of the brain with no distinct, defined region of injury. This diffuse nature is highlighted in other studies that have used region of interest analysis with the individual studies describing different regions with significant change with minimal consistency among the studies 4,[17][18][19] . In comparison, when assessing the athletes individually we found their injury profiles to be quite heterogeneous. This heterogeneous nature of injury within individuals could be a reason why group-level DTI analysis methods have a difficult time showing significant regions of injury as well as a lack of consistent regions of injury among the studies 4,17-19 . The diffuse Table 3. Helmet impact metric correlations with the L ∞ norm. "c.c" is the Spearman rank correlation coefficient (ρ). Bold values signify significant correlations (adjusted p value < 0.05). LA linear acceleration, RA rotational acceleration, HIC15 head impact criterion 15, GSI Gadd Severity Index, HITsp helmet impact technology severity profile. www.nature.com/scientificreports/ and heterogeneous nature of this injury suggests group-based level analysis may be a sub-optimal approach for studying RHI induced WM changes, which can be minimized with SPREAD. In addition, this pattern of injury may result from the varying location of impacts among athletes, which minimizes the likelihood of producing similar changes between athletes on DTI. To some extent, these findings explained why some earlier studies failed to establish a clear association between RHI and brain structural changes. The changes are spread out among the entire WM region and different subjects could have quite different changes, so traditional region of interest-based methods could not identify a common region of injury 41,42 . The other advantage SPREAD has provided is the ability to localize injury without requiring a predefined map of brain regions from a specific brain atlas, which allows us to visualize and describe longitudinal WM changes on a more individual level. The significant correlations between helmet impact metrics and WM changes suggest a dose-response link between RHI and WM changes. The use of helmet data helps create a stronger link because the helmet data is collected prospectively during the time interval WM changes are occurring and being quantified by DTI. Several studies have shown similar patterns when testing the association between helmet impact metrics and WM changes observed on DTI 17,23,24,26,33 . The initial studies done by Bazarian and Davenport showed these associations using unweighted helmet impact metrics 14,26 . In comparison, the strength of the associations improved when the helmet impact data were weighted 17,33 . These studies suggest that other factors affect the biomechanical impact of a hit such as, time between hits, on the WM 17,33 . Looking at our pattern of significant correlations, we found that both CUW and time-weighted metrics had significant correlations even after correcting for multiple comparisons. The cumulative helmet impact metrics being significant suggests these sub-acute WM changes are a result of the compounding effects of a full season of RHI and not associated with a single extreme hit or the average hit experienced by an athlete. More specifically, that time weighted helmet impact metrics were significant suggest that the concept of neuronal vulnerability is important to understanding WM changes in humans, which has been demonstrated in animal models 43,44 . When comparing CUW and time-based metric correlations, the CUW correlations were stronger; however, the differences in Spearman's ρ were quite small. This finding does not  www.nature.com/scientificreports/ discount the importance of time in understanding the relationship between RHI and WM damage, but suggests our weighting for the time factor could be optimized to the new SPREAD algorithm. Our examination of clinical data revealed no appreciable declines. The only significant pre to post-season difference was for the BESS scores, which improved over time. This suggests that these clinical measures have a poor sensitivity for the sub-acute WM damage seen on neuroimaging, a finding that has been reported by others 45,46 In addition, both tests, especially the BESS test, are subject to training effects, making them less useful for detecting subtle neurologic changes that might be associated with sub-acute WM change 38 . The high sensitivity demonstrated by the SPREAD algorithm and its ability to localize injury provides unique opportunities for further research.
The SPREAD algorithm is uniquely suited to identifying the location of injury. Our development of the improved SPREAD (iSPREAD) method provides a more flexible algorithm for analyzing brain scans compared to the SPREAD method 22,47 . iSPREAD uses nonlinear partial differential equation (PDE) modeling techniques to smooth the DTI images, which has better sensitivity and accuracy than SPREAD for detecting changes in regions with irregular shapes. It also has the capability to fit more sophisticated longitudinal models that involve scans with nonlinear temporal trends collected at many time points. Those advantages do come with much higher computational cost and several more tuning parameters in the PDE solver. In the future, we are committed to simplifying the iSPREAD algorithm and providing clinical researchers several practical combinations of tuning parameters via thorough simulation studies. The revised SPREAD method was implemented as an R package, which is freely available at https ://githu b.com/ygu42 7/iSPRE ADR.
Future efforts to demonstrate the ability or our iSPREAD method to detect neuroimaging changes might focus on using more sophisticated methods of weighting helmet impact data, such as using machine learning. Another extension would be to utilize the helmet impact location data to help better predict and describe the region of change seen on DTI. Clinically, the ability to measure helmet data could allow the development of threshold values identify athletes who need to temporarily refrain from contact before they develop potentially irreversible or long-term WM changes. This could help lead to improved helmet design as well as protocols to minimize the harmful effects of RHI.
This study is not without limitations. Three of our athletes in our analysis sustained a mid-season concussion, which may provide a confounder to our conclusions; however, the association between the WM changes and physical impacts was not a result of any outliers and the analytic methods used (Spearman's rank correlation test) was robust to outliers in the data. RHI can be viewed as a spectrum of injuries, with one end being very mild sub-concussive contacts and the other end being clinically defined concussions. In this sense, our current analysis having both concussed and non-concussed athletes may reflect the full extent to which playing football impacts the health of players. Another limitation of our study was the relatively small number of subjects recruited (n = 28). The fact that we were able to show significant associations between the WM changes and helmet impact data despite the small sample size, suggested that using computational methods that are designed to make personalized longitudinal analyses such as, SPREAD. An investigator may be able to detect useful signals from a population with high level of between-subject variation. Should the utility of SPREAD be validated in a future prospective study with a larger sample size, it may potentially change the current research and clinical practice of mTBI in a profound way.
From this study, we can derive three conclusions. The first conclusion is that sub-acute WM changes can be detected by the SPREAD method at the subject-level. Second, we conclude that qualitatively this injury is highly diffuse and heterogeneous among individuals so traditional ROI-based methods may be under-powered. Thirdly, we have shown a significant dose-response relationship between the amount of head trauma a player experiences and the WM changes seen on DTI.