Common Brain Structure Findings Across Children with Varied Reading Disability Profiles

Dyslexia is a developmental disorder in reading that exhibits varied patterns of expression across children. Here we examined the degree to which different kinds of reading disabilities (defined as profiles or patterns of reading problems) contribute to brain morphology results in Jacobian determinant images that represent local brain shape and volume. A matched-pair brain morphometry approach was used to control for confounding from brain size and research site effects in this retrospective multi-site study of 134 children from eight different research sites. Parietal operculum, corona radiata, and internal capsule differences between cases and controls were consistently observed across children with evidence of classic dyslexia, specific comprehension deficit, and language learning disability. Thus, there can be common brain morphology findings across children with quite varied reading disability profiles that we hypothesize compound the developmental difficulties of children with unique reading disability profiles and reasons for their reading disability.

Reading Disability Profile Effects. Clinician expert-raters classified the reading disability cases according to the three reading disability profiles described above based on evidence of: (1) poor word and non-word phonological decoding (Poor Decoders); (2) relatively specific reading comprehension problems (Poor Comprehenders); and (3) relatively poor decoding, reading comprehension, and verbal comprehension [Generally Poor Readers; (Supplementary Tables 2 and 3 for behavioral profiles and classification criteria)]. These profiles were defined based on the extant literature on reading disability profiles 4,5 and the raters' considerable clinical experience assessing and treating children with reading disability. Random Forest was then used to show that these profiles could be consistently classified as different from controls (94% accuracy) using a machine learning algorithm. These unique patterns (profiles) of reading disability are shown in Fig. 3, where multidimensional scaling was used to examine the distinctiveness of the reading disability profiles and controls. Analysis of variance demonstrated significant differences in multi-dimensional space between the reading disability profiles (Dimension 1: F (2,66) = 27.02, p < 0.001; Dimension 2: F (2,66) = 90.13, p < 0.001). Post-hoc comparisons revealed that all three reading disability profiles exhibited significant differences between each other for both dimensions (ps = 0.01 to p < 0.001).
The multidimensional behavioral space provided a quantitative metric that characterized variance within and between the reading disability profiles, as well as how different each reading disability case was from their  Table 1. Summary of the matched-pair brain morphometry results (p < 0.05 multiple comparison corrected) and contribution from each reading disability profile. *p < 0.05, **p < 0.01, ***p < 0.001; t-scores are for the average difference in Jacobian determinant across each cluster shown in Fig. 1. N = #/# indicates the number of cases and controls in each analysis. These Jacobian determinant effects were not due to case-control differences in Perceptual Reasoning. For example, the association between case-control differences in the left motor region of interest and Perceptual Reasoning differences was r (65) = 0.04, ns.
Scientific RepoRts | 7: 6009 | DOI:10.1038/s41598-017-05691-5 matched-pair control. We examined the extent to which the case-control differences in multidimensional space were associated with case-control differences in Jacobian determinant values. There were no significant effects for either dimension. However, including these behavioral space variables in the voxel-based analyses resulted in non-significant voxel-based effects across the sample when Dimension 1 case-control differences were controlled. In addition, the left SII/SMG region was no longer significant when controlling for Dimension 2 case-control differences in a separate model. This was due to a modest reduction in effect size after controlling for the dimension that largely differentiated Poor Comprehenders from the other reading disability profiles and controls. The subcortical white matter effects were relatively resilient to controlling for case-control differences in behavioral Dimension 2 (Fig. 3). Thus, case-control differences in subcortical morphology appeared to be more likely when there were large case-control differences across reading measures. Corona radiata findings have been reported in diffusion imaging studies of dyslexia [18][19][20] and there have been some questions about the extent to which these effects explain or are additive to arcuate fasciculus findings 21,22 . While it is unclear if the parietal operculum results in this study reflect atypical arcuate fasciculus white matter, it is worth noting that there was no significant relation between the left SII/SMG and left CR/IC Jacobian determinant differences across reading disability profiles (r (65) = 0.13, ns). This observation is consistent with the results of controlling for Dimension 2 on the left SII/SMG case-control differences, but not the left CR/IC differences.
Parietal Operculum/SMG Findings Interpretation. The left parietal operculum has been observed to exhibit atypical activation in functional imaging studies of dyslexia 23,24 . Left secondary somatosensory cortex (SII), in particular, exhibits increased responsiveness when people attend to speech 25 . A similar parietal operculum region (SII to SMG) is thought to support phonological processing by providing a working memory mechanism to "clean up" noisy phonological representations 26 . These findings appear to be consistent with evidence that this region supports mapping orthographic to phonologic representations 27 . They also suggest that atypical orthographic development would limit the benefit from orthographic consistency on speech processing in children with reading disability, as demonstrated in refs 28 and 29.
A role for SII and SMG cortex in mapping orthographic to phonologic representations predicts that these regions are engaged when people read aloud 27 . SII is engaged during speech production tasks 30 and also exhibits gray matter density changes that track with changes in Verbal IQ 31 . Longitudinal changes in left SII gray matter density appeared to predict changes in Verbal IQ because of reading experience 31 . In support of this experience explanation, parietal cortex morphology (Jacobian determinant) was related to individual differences in Figure 1. (A) Children with reading disability had brain images that required expansion to fit to a studyspecific template compared to brain size and research site matched controls from the Dyslexia Data Consortium multi-site dataset (www.dyslexiadata.org). Parietal cortex, corona radiata, and internal capsule exhibited casecontrol differences were observed, in particular. Non-parametric permutation testing was used to correct for multiple comparisons (p < 0.05, red). (B) Histograms show the larger Cohen's d effect sizes for the case-control design (red) compared to a typical unpaired group difference design (gray) across the 480,730 voxels within an inclusive gray and white matter mask. (C) The reason for statistical benefit of the case-control design is demonstrated by the relatively consistent left SII/anterior supramarginal gyrus (shown in A) differences in cases (orange) compared to matched controls (blue), despite the high morphologic variance within and across sites. Adjusted Jacobian determinant refers to the Jacobian or volumetric displacement values after correcting for brain size and research site-based propensity scores 58 .

Figure 2.
Well-matched and randomly matched case-control differences in Jacobian determinant/volumetric displacement. The top row of images shows the average t-score (thresholded with t ≥ 2.87 or the minimum t-score within the permutation corrected results in Fig. 1) across 1000 well matched (red clusters) and randomly matched (yellow clusters) case-control samples. Note that the extent of the effects are consistently smaller when a reading disability case was randomly matched with a control compared to when cases and controls were wellmatched for brain volume and research site, even though the same 134 participants were examined in each set of analyses. The scatterplots show the distribution of 1000 t-scores pooled within the significant left SMG, Left CR/ IC, and Right CR/IC clusters from Fig. 1. Note the consistently higher average t-score for the well-matched casecontrol samples (red) compared to the randomly matched case-control samples (gray).  Table 1 showing that Poor Comprehenders, which largely affect Dimension 2, exhibited pronounced left SII/SMG differences between cases and controls. There were not significant effects after controlling for dimension 1.
reading ability for ~8.4 year old children, but not in the same children when the data were collected a year earlier 32 . Moreover, adults acquiring literacy late in life had increased SMG gray matter volume compared to illiterate adults 33 . The parietal operculum effects in the current study were observed across reading disability cases from 7.33-12.67 years of age, but the differences appeared to be most pronounced among the younger children (Parietal operculum/SMG Jacobian difference * reading disability case age: r (65) = −0.32, p = 0.008).
One way parietal cortex could "clean up" noisy representations is through accumulating evidence to aid in perceptual decision-making 34 . Children with reading disability may not be able to make efficient use of working memory to accumulate sensory evidence for mapping orthographic and phonologic information (Poor Readers and Poor Decoders) or for following the meaning of connected speech or text (Poor Readers and Poor Comprehenders). Working memory measures may have demonstrated impaired phonological working memory across all three reading disability profiles 35-38 . Subcortical Findings Interpretation. The corona radiata and internal capsule differences appear to be consistent with previous subcortical white matter and gray matter results [39][40][41] , which again may relate to diffusion imaging findings linking the CR/IC to reading development 18,40 . Given that the CR/IC effects were particularly evident in the Generally Poor Readers (Table 1) who can have a broad range of behavioral impairments, there is a possibility that these effects reflect perinatal events that affect periventricular white matter development 42 and result in reading disability 43,44 . The results suggest that our multi-site dataset included children who experience motor and perhaps somatosensory impairments that contribute to their risk for reading disability, particularly when there is language impairment 45 . These potential motor and somatosensory effects appear to be independent of spatial impairments as there was no evidence that the subcortical effects were due to case-control differences in Perceptual Reasoning.

Summary.
Matched-pair brain morphometry demonstrated significant differences in brain morphology between reading disability cases and controls. The parietal operculum appears to be a target for tracking the effectiveness of interventions 46 , and may serve as an early marker of reading disability, if present in at-risk children before formal reading instruction 47,48 . In contrast, the subcortical white matter findings within putative descending fiber tracts could be consistent with a procedural learning deficit hypothesis for language disabilities 49 and perhaps a cerebellar deficit hypothesis for dyslexia 50 . We were unable to test these hypotheses because working memory, motor, and somatosensory data were not available in our retrospective dataset. However, investigators can address these hypotheses in prospective studies. Together, these results indicate that there can be common brain morphology findings across children with quite varied reading disability profiles. We hypothesize that atypical development in SMG and CR/IC regions compound reading difficulties in children with unique reading disability profiles that arise from unique neural underpinnings.

Methods
Participants and Behavioral Data. The data included in this study were collected as part of a project to develop methods for retrospective analysis of multi-site data and are part of a Dyslexia Data Consortium database (www.dyslexiadata.org). These existing data were collected from contributing sites where the human subject review boards provided approval to share de-identified data that were collected under the appropriate human subjects guidelines of those institutions and with informed consent. Data de-identification protected the privacy of subjects in accordance with HIPAA regulations 51 . Reception of these "non-human" de-identified data was approved by the Medical University of South Carolina (MUSC) Institutional Review Board (IRB) and the research described here was performed in accordance to MUSC IRB regulations and guidelines.
Data from 247 children (41% female; age range = 9.65, sd = 1.63) were selected from the multi-site dataset based on having T1-weighted images and standardized scores from the: (1) 56,57 . A subset of these cases (n = 134; 43% female; mean age = 9.61, sd = 1.57) was included in brain morphology analyses after identifying (1) cases who could be consistently classified as having unique reading disability profiles (39 inconsistently classified by expert raters) and (2) cases and controls that could be matched according to brain size and research site (57 without matches). Controls were included if they had Word Attack, Word Identification, and Passage Comprehension scores above the 25 th percentile, while reading disability cases were selected if they scored below the 25 th percentile for one of these measures.

Reading Disability Profile Classification. Two clinically certified Speech-Language Pathologists with
considerable experience assessing children with reading disabilities examined the behavioral data to classify reading disability cases into classic dyslexia (Poor Decoders), comprehension-specific (Poor Comprehenders), and multiple language impairment profiles (Generally Poor Readers). Guided by the empirical literature [4][5][6][7][8][9][10][11]58 and input from a Psychologist with extensive clinical experience with specific learning disabilities (SLDs) and 25 years of experience directing NIH-supported research on SLDs, Poor Decoders were classified based on: (1) Verbal Comprehension standard scores that were within the average to above-average range or above the 25 th percentile; (2) Word Attack standard scores that were typically below the 25 th percentile; (3) Word Identification and Passage Comprehension standard scores that were typically consistent with Word Attack scores; and (4) RAN standard scores that were typically below the 25 th percentile. Poor Comprehenders were classified based on: (1) Verbal Comprehension scores that met the same criteria as Poor Decoders; (2) Word Attack scores that were above the 25 th percentile; (3) Word Identification and Passage Comprehension scores that were typically at or below the 25 th percentile; (4) RAN scores that were typically greater than the 16 th percentile; and (5) children were typically older than 7-8 years to consider the effects of developmental and academic exposure on comprehension. Generally Poor Readers were classified based on Verbal Comprehension, Word Attack, Word Identification, and Passage Comprehension scores that were at or below the 25 th percentile. Supplemental Table 2 presents the means for these variables by the expert-rater profiles and their matched controls. Expressive and receptive language and executive function measures were not available to further confirm the Poor Reader and Poor Comprehender classifications, respectively.
Supplementary Table 3 presents the classification rules above that were established based on the reading disabilities literature. The two expert raters met to discuss the classification rules prior to classifying the cases. Across 144 reading disability cases from the original sample of 247 reading disability and control participants, intra-rater reliability for the primary rater (Chronbach's alpha = 0.94) was expectedly higher than inter-rater reliability (Chronbach's alpha = 0.79).
A Random Forest classifier 59 was developed using the consistently classified reading disability cases to demonstrate that the clinical expert classified profiles could be distinguished from each other and from controls based on their behavioral profiles using an automated approach. The analyses were carried out using the R statistics package (R version 3.2.3 with packages: caret 6.0-64, randomForest 4.6-12). The Random Forest was tuned (mtry parameter = 2; number of trees = 500) with a five step parameter search, then classified groups within the training dataset based on a leave-one-out cross validation test (LOOCV). The LOOCV test iteratively trains the Random Forest algorithm using all but one case from the training dataset, then classifies the remaining case to approximate independent train and test data. Classification accuracy was determined by whether the LOOCV classification of each case matched its expert-based label.
Random Forest predicted the expert-rater classifications of Poor Decoders, Poor Comprehenders, Generally Poor Readers, and controls with 94% accuracy using the same behavioral and age variables that were used by the expert-raters. Each variable, perhaps with the exception of age, contributed to classification accuracy (decrease in accuracy with exclusion of the following variable: Word Attack = 48.21%; Passage Comprehension = 39.27%; Verbal Comprehension = 29.66%; Word Identification = 26.16%; RAN = 14.39%; Age = −0.43%). Random Forest misclassified cases were likely to have a Verbal Comprehension or Word Attack score, for example, at a boundary that distinguished the reading disability profiles. These results (1) provide insight into the behavioral variables that were most important for reading disability profile classification and (2) demonstrate that there were unique behavioral profiles in the dataset. As described below, the Random Forest classification probabilities for each reading disability profile and controls were used to perform multidimensional scaling for the 134 cases and controls. This analysis generated two spatial dimensions to confirm that the profiles were clearly different from each other and from controls (Fig. 3).
Imaging Data and Pre-Processing. Bias-field corrected and Rician denoised 60 T1-weighted images from the 8 different research sites (Supplemental Table 1) were rigidly aligned to the MNI T1 template using the SPM12 Coregister function. Gray matter, white matter, and CSF probability in each voxel were estimated using the SPM12 Segment function. The resulting native space gray and white matter images were used to obtain an overall sum of the gray or white matter probabilities as an estimate of total gray or white matter volume. The native space segmented images gray matter, white matter, and CSF images were spatially transformed into a study-specific anatomical space using the SPM12 diffeomorphic (DARTEL) normalization procedure 61 . Jacobian determinant images were calculated from the normalization parameters to create images in template space that represented the amount of volumetric displacement that was required to fit an image to the template. These image pre-processing methods and the SPM default parameters were selected to be aligned with commonly used voxel-based morphometry protocols to facilitate replication 62 . Statistics. Pair Matching. Case-control pairs were identified by (1) creating strata of cases and controls according to smaller or larger brain size (total volume of the gray and white matter segmented images) and research site, and then (2) selecting unique pairs of optimally matched cases and controls within each stratum. Specifically, coarsened exact matching (CEM; R package: cem, version 1.1.17) was used to create the strata for 176 participants that included participants with smaller or larger total brain sizes who were from the same research site. Propensity scores were estimated for each subject with a logistic regression that predicted case-control status using research site and total brain size. A nearest-neighbor algorithm (R-package: MatchIt, version 2.4.21) was used to identify the most similar case-control pairs within each stratum on the basis of the propensity score, the match-quality weight from CEM, and total brain size. Because each case-control pair was required to include participants that did not occur in any other pair, the matching procedure reduced the sample size to 134 subjects (i.e., 67 pairs). Paired-sample t-tests were performed to determine that there were no significant differences in the resultant matched pairs with respect to age, total gray matter volume, total white matter volume, or total brain size (ps > 0.25; Supplemental Table 2). In addition, there was no significant difference in distribution of sex between the cases and controls (24 female cases, 43 male cases, 28 female controls, 39 male controls: Chi-square = 0.28, p = 0.59). There also was no significant difference in propensity score between the reading disability cases and controls (t (132) = −0.33, ns) and no significant case-control differences in propensity scores between the reading disability profiles (F (2,66) = 0.01, ns).
The propensity scores that were calculated for the selection of case-control matches were also used to adjust or residualize the 134 Jacobian determinant images for brain size and research site effects prior to creating the control-case difference images (R-package: AnalyzeFMRI, version 1.1.16). This approach further limited the impact of these potential nuisance variables on the results that can lead to bias amplification, particularly when their effect on a dependent variable is stronger than the predictor variable of interest 63 . The CEM procedures were chosen because we had observed that covarying or residualizing brain volume differences inflated case-control difference image effects when case and control matching was performed without considering brain size, which was confirmed with simulation analyses when a strongly confounded nuisance variable was residualized. Supplemental Fig. 1 shows how residualizing for a highly collinear nuisance variable can bias results and shows that the results of this study were less likely to be affected because we performed an initial selection of cases and controls to match for brain size and research site. As with any regression procedure, our observations suggest that (1) caution should be taken when adjusting for nuisance variables that have high collinearity with variables of interest and that (2) a case-control design is one solution for dealing with this problem.
Paired and Unpaired Comparisons. Statistical analyses were performed using (1) an unpaired group difference approach that is typically used in neuroimaging studies of group differences and (2) the matched-pair case-control approach. The results demonstrated a statistical advantage of the matched-pair approach over the unpaired approach, at least for a sample size of 134 subjects. The FSL randomise function was used to perform these voxel-based comparisons with 10,000 permutations to control for multiple comparisons. Threshold free cluster enhancement (TFCE) was used to optimize sensitivity to focal large effects and spatially broader smaller effects 64 for the unsmoothed Jacobian determinant images that can represent shape and gross volumetric effects 17,65 .
1000 Well-matched and 1000 Random Bootstrap Samples. To ensure that case-control results were not dependent on the sampling of one set of case-control pairs, voxel-based t-test comparisons were performed for 1000 well-matched case-control samples to establish confidence intervals (Fig. 2) for the significant case-control differences (Fig. 1). This was repeated for 1000 randomly paired case-control samples to determine the extent to which the results would be affected when cases and controls are not carefully matched. The key difference between well-matched and randomly matched samples was that a cases were paired with controls across research sites for the randomly matched samples. Neither well-matched nor randomly matched samples exhibited significant case-control differences in total gray and white matter volume and age (all ps > 0.26).
Multi-dimensional Scaling and Profile Effects. Finally, multi-dimensional scaling was performed in R with the Random Forest classification probabilities using the cmdscale function. This procedure provided a Euclidian distance estimate for each participant relative to the others based on their reading disability profile classification probabilities. Two dimensions provided a goodness of fit = 0.87. Inspection of the Random Forest probabilities that suggested Poor Decoders and Generally Poor Readers fell along a continuum of Poor Decoder to Poor Reader, suggesting that a 3 rd dimension was not necessary to differentiate these profiles, as further demonstrated in Fig. 3 by the distribution of Poor Decoders to Generally Poor Readers. The 2 dimensions appeared to characterize the severity of reading skill impairments (phonological decoding and real word decoding) and the severity of comprehension impairment.
The multi-dimensional distance variables for each dimension were subtracted between a case and control to obtain case-control distance differences for each dimension. These two variables were used to determine the extent to which case-control brain structure differences could be attributed to the magnitude of behavioral difference between each case and control for metrics that were sensitive to the distribution of reading disability profiles. This provided a quantitative variable to test profile effects that considered the behavioral profile of both the case and control. Permutation tests (10,000) with FSL randomise (TFCE) were performed to relate difference images to one of the dimension difference variables at a time, to determine the extent to which the effects shown in Fig. 1 were dependent on the magnitude of case-control differences along reading skill and comprehension dimensions.