Deriving and validating biomarkers associated with autism spectrum disorders from a large-scale resting-state database

Resting-state functional magnetic resonance imaging (MRI) has been used to investigate the brain activity related to autism spectrum disorder (ASD). In this study, we applied information from a large-scale dataset, the Autism Brain Imaging Data Exchange (ABIDE), to clinical applications. We recruited 21 patients with ASD and 23 individuals with neurotypical development (TD). We applied ASD biomarkers derived from ABIDE datasets and subsequently investigated the relationship between the MRI biomarkers and indicators from clinical screening questionnaires, the social responsiveness scale (SRS), and the Swanson, Nolan, and Pelham Questionnaire IV. The results indicated that the biomarkers generated from the default mode and executive control networks significantly differed between the participants with ASD and TD. In particular, the biomarkers derived from the default mode network were negatively correlated with the raw scores and model factors of the SRS. In summary, this study transferred the efforts of the global autism research community to clinical applications and identified connectivity-based biomarkers in ASD.

cingulate cortex/precuneus [6][7][8] . Studies using R-fMRI have reported that brain regions within the default mode network (DMN) overlapped with brain regions associated with ToM. Thus, using R-fMRI to measure FC may provide quantitative insights into the social-cognitive ability of patients with ASD [9][10][11][12] . For example, Assaf et al. demonstrated that FC values of specific brain regions were highly correlated with SRS scales in a group of patients with ASD. Weng et al. determined that FC strength between the posterior cingulate cortex and temporal lobe was negatively correlated with social impairment based on the Autism Diagnostic Interview-Revised (ADI-R). These studies have highlighted the potential applications of R-fMRI for investigating the complex cognitive function underlying ASD.
Since 2012, the Autism Brain Imaging Data Exchange (ABIDE) initiative has collected more than 2000 R-fMRI datasets of patients with ASD and individuals with neurotypical development (TD) subjects across international laboratories 13 . ABIDE could allow researchers to investigate brain mechanisms underlying ASD and to identify ASD-related biomarkers through R-fMRI 14 . This study applied information from the ABIDE initiative to investigate a local cohort. We derived R-fMRI biomarkers from ABIDE and obtained the metrics of local datasets. We analyzed and assessed the performance of the biomarkers and relationships between social responsiveness and functional brain networks in patients with ASD.

Methods and Materials
ABIDe: R-fMRI datasets. This study included two databases, namely ABIDE and the Kaohsiung Medical University Hospital (KMUH) databases. Table 1 lists the details. This study included 1112 ABIDE I datasets from 17 sites and 983 ABIDE II datasets from 16 sites. The ABIDE datasets were obtained online 15 . In total, 2095 ABIDE datasets were used (ASD: 1001 and TD: 1094; 5-64 years). These datasets are anonymous and in accordance with HIPPA guidelines. ABIDE: Preprocessing and RSN10 networks. The procedure for preprocessing R-fMRI datasets and generating brain FC networks is displayed in Fig. 1(a). Anatomical 3D volumes and R-fMRI 4D volumes were processed in the FMRIB Software Library (FSL) environment. The anatomic volumes were preprocessed using FSL-BET for brain extraction and subsequently normalized to Montreal Neurological Institute (MNI) coordinates. For R-fMRI volumes, timing inconsistencies and temporal image shifts were corrected using the slice timing and image realignment functions in the FSL. Subsequently, the volumes were registered to preprocessed anatomic volumes by using FSL-BBreg and normalized to the MNI space by using the nonlinear registration tool FSL-FNIRT. The voxel size was resampled to 2 × 2 × 2 mm 3 , and the volumes were smoothed using a Gaussian  Analysis procedure for obtaining masks of the 30 biomarkers from the ABIDE datasets (a) producing 10 resting-state networks using a 3D T1 volume and a 4D R-fMRI volume (b) using a two-sample t-test to obtain 30 masks.
www.nature.com/scientificreports www.nature.com/scientificreports/ filter with a full width half maximum at 6 × 6 × 6 mm 3 . The subsequent signal processing involved applying a temporal bandpass filter (0.01-0.08 Hz) to the R-fMRI volumes and regressing out 24 motion parameters obtained after the realignment procedure 16 and five principal components with the highest variance estimated from voxel time series for the white matter and cerebrospinal fluid by using CompCor 17 . We subsequently applied dual-regression analysis 18 by using the FSL general linear model (GLM) and the 10-brain resting-state networks (RSN10), "PNAS_Smith09_rsn10.nii.gz, " provided by Smith et. al. 19,20 as a reference. The dual-regression analysis was used to assess the FC of each voxel estimated based on the GLM parameters normalized by the residual within-subject noise 18,21 . The procedure generated 10 whole-brain RSN maps for each dataset. The networks (RSN1 to RSN10) correspond respectively to the primary visual, occipital pole, lateral visual, default mode (DMN), cerebellum, sensorimotor, auditory, executive control (ECN), right frontoparietal, and left frontoparietal networks.

ABIDE: Procedures deriving 30 biomarkers.
The block diagrams of the generation of 30 R-fMRI biomarkers, R1 to R10 and A1 to A20, are displayed in Fig. 1(b). The quantities of the biomarkers were calculated based on 30 masks. The masks for R1 to R10, termed R-masks, were generated by identifying voxels with Z values higher than 4 in the RSN10 template (PNAS_Smith09_rsn10.nii.gz) provided by , and the masks for A1 to A20, termed A-masks, were created on the basis of the group difference of ABIDE RSN10 maps. Total RSN10 maps from the ABIDE datasets were 2095 (ASD: 1001 and TD: 1094) We performed a two-sample t-test on the ABIDE RSN10 FC maps with threshold-free cluster enhancement by using FSL-randomise with 5000 permutations for multiple comparisons. Subsequently, we identified the voxels satisfying two criteria: (1) FC values significantly different [family-wise error (FWE)-corrected p < 0.05] between the ASD and TD groups and (2) voxels inside the corresponding R-masks to create A1-A20 masks (ASD > TD: A1 to A10 and ASD < TD: A11 to A20). We subsequently calculated the averaged FC values of the RSN10 maps by using the masks to generate 30 biomarkers for each participant, referred to as R1 to R10 and A1 to A20 hereinafter. The procedure is illustrated in Fig. 2. KMUH: R-fMRI datasets. For the local cohort, 44 individuals (ASD: 21 and TD: 23; 12-22 years) were recruited from the active follow-up psychiatric clinic at KMUH and the community. Both groups of participants were between 12 and 22 years old and had scores of >70 in either the full-scale Wechsler Adult Intelligence Scale or full-scale Wechsler Intelligence Scale for Children, Fourth Edition. The participants in the ASD group were diagnosed with autistic disorder on the basis of the DSM, Fourth Edition, Text Revision symptom criteria in their early childhood in accordance with the Autism Diagnostic Observation Schedule 22 ; their ASD diagnoses were confirmed using the DSM-5 before they were enrolled into this study. This study was approved by the Institutional Review Board of Kaohsiung Medical University and Kaohsiung Medical University Hospital. Informed consent was obtained from the participants' parents and the participants themselves in accordance with the guidelines of the Institutional Committee on Clinical Investigation. The participants underwent imaging experiments performed using a 3.0 T whole-body MRI system (Siemens, Skyra, Germany), equipped with a 32-channel head coil, at Kaohsiung Veterans General Hospital. We obtained brain structural images and R-fMRI images by using a three-dimensional (3D) magnetization-prepared rapid gradient-edge (MP-RAGE) sequence and a gradient-echo echo planar imaging (EPI) sequence, respectively. The imaging parameters for 3D MP-RAGE were TR = 2000 ms, TE = 2.07 ms, FOV = 256 mm, flip angle = 9°, sagittal slices = 160, matrix size = 256 × 256, voxel size = 1 × 1 × 1 mm 3 , and TI = 900 ms. The imaging parameters for EPI were TR = 2300 ms, TE = 30 ms, FOV = 194 mm, slice thickness = 3 mm, axial slices = 40, measurements = 150, in-plane resolution = 3.03 × 3.03 mm 2 and matrix size = 64 × 64. The total scan time of EPI was approximately 5 min.

KMUH: social Responsiveness scale and swanson, Nolan, and pelham Questionnaire IV.
The parents of the participants from KMUH completed the Chinese version of the SRS and Swanson, Nolan, and Pelham Questionnaire (SNAP-IV). The SRS is a 65-item scale that measures the severity of autism www.nature.com/scientificreports www.nature.com/scientificreports/ spectrum symptoms as they occur in natural social settings 5 . We obtained the Chinese version of the SRS from the developer under a license for academic use. The psychometric properties of the Chinese version of the SRS were validated by Taiwanese researchers 23 . The sum of the total raw SRS score and five subscores reflecting the factors in the model (viz., social awareness, social cognition, social communication, social motivation, and autistic mannerisms) were derived for analysis. The SNAP-IV comprises 26 items regarding the symptoms of inattention, hyperactivity/impulsivity, and oppositional defiant disorder (ODD). The Chinese version of the SNAP-IV is a reliable, valid instrument for rating the symptoms of inattention, hyperactivity/impulsivity, and ODD in both clinical and community settings. Its psychometrics properties for Taiwanese populations have been validated 24 . Three SNAP-IV scores (viz., inattention, hyperactivity/ impulsivity, and ODD) for each participant were derived for analysis. Table 2 presents the average SRS and SNAP-IV scores for the KMUH datasets. statistical analysis. Total RSN10 maps from the ABIDE and KMUH datasets were 2095 (ASD: 1001 and TD: 1094) and 44 (ASD: 21 and TD: 23), respectively. We calculated the biomarkers for each RSN10 dataset and subsequently obtained two matrixes (ABIDE: 2095 × 30 and KMUH: 44 × 30) for further statistical analysis. The differences in biomarkers between the ASD and TD groups were assessed using the t-test. For the KMUH datasets, we performed a correlation analysis to investigate relationships between the 30 biomarkers and SRS and SNAP scores and a receiver operating characteristic analysis to evaluate the classification performances of the biomarkers.  www.nature.com/scientificreports www.nature.com/scientificreports/ Figure 3(a) shows the representative slices of the RSN10 templates (Z > 4) generated from PNAS_Smith09_rsn10. nii.gz. Figure 3 Figure 4 displays the masks of the 30 biomarkers. The R-masks were produced using the RSN10 template (Z > 4), and the A-masks were derived from the group analysis of the ABIDE RSN10 FC maps (FWE-corrected p < 0.05, two-sample t-test). The volumes of the masks are listed in Table 3. The volumes of the A3, A4, and A18 masks, which ranged from 34 to 44 mL, were the top three among the A-masks. They were generated from the lateral visual, default mode, and ECN networks, respectively. Table 3 lists the mean and standard deviation of the 30 biomarkers of the ABIDE and KMUH datasets. The significance of the difference between the ASD and TD groups was assessed using the t-test. For the ABIDE datasets, the R-biomarkers derived from the RSN10 template (viz., R3, R4, R5, R6, and R8) provided by  were significantly different between the ASD and TD groups (p < 0.05). All the 20 A-biomarkers differed significantly between the ASD and TD groups (p < 0.01). However, we considered the results strongly biased and excluded them from Table 3. The statistics regarding the ABIDE A-biomarkers likely overfitted because the A-masks were derived based on differences between the groups in the ABIDE datasets. For the KMUH datasets, the five biomarkers (viz., R9, A4, A14, A15, and A18) differed significantly between the ASD and TD groups (p < 0.05, t-test). Of these five biomarkers, A4 and A14 were both derived from the DMN, and R9, A15, and A18 were obtained from the right frontoparietal, cerebellum, and ECN networks, respectively. Table 4 lists Pearson's correlation coefficients between the 30 R-fMRI biomarkers and the nine SRS and SNAP questionnaire metrics in the KMUH datasets. Figure 5 displays the color-coded matrix that is based on Table 4. The DMN-derived biomarker A4 (TD > ASD) and all five SRS metrics were negatively correlated (r = −0.333 to −0.420). In particular, the false discovery rate adjusted p values were statistically significant (adjusted p < 0.05) in five cases (viz., A4 versus SRS total, awareness, cognition, social communication, and autistic mannerism).

Results
A4, A15, and A18, obtained from the DMN, cerebellum, and ECN networks, respectively, exhibited a significant relationship according to the findings of difference tests and correlation analyses. We calculated the receiver operating characteristic (ROC) curves for distinguishing ASD by using the biomarkers (R4, R5, R8, A4, A15, and A18) of the three networks. Figure 6 displays the ROC curves obtained using the six biomarkers above. The areas under the curve (AUCs) for the biomarkers were (0.590, 0.745, p < 0.05), (0.588, 0.646), and (0.646, 0.677) for (R4, A4), (R5, A15), and (R8, A18), respectively. Figure 7 presents the masks of the three networks using different colors to highlight the R-masks and the A-masks. Although R4 and A4 were both derived from the DMN, the AUC of A4 was significantly higher than that of A8 (p < 0.05) 25 . The results indicate that A4 derived from the ABIDE datasets was an effective indicator for classifying ASD, and the FC of brain regions in A4 masks was correlated with cognitive impairments in patients with ASD. www.nature.com/scientificreports www.nature.com/scientificreports/

Discussion
Early in the development stage of this study, we collected R-fMRI datasets to create the KMUH cohort and explored the brain regions associated with the symptoms of ASD. We reviewed the literature, implemented pipelines to reconstruct the RSN10 maps, and used the two-sample t-test to evaluate the ASD and TD datasets (n = 44). The results indicated that no brain voxels were statistically significant (FWE-corrected p < 0.05). Meanwhile, ABIDE commenced its open-science project to provide the large-scale R-fMRI ASD database. We subsequently sought approaches to transfer ABIDE information to clinical applications involving a local cohort. We ultimately formulated an approach to extract biomarkers significantly different between ASD and TD from the ABIDE database and then validate the performance of the biomarkers using the KMUH cohort. Finally, we used correlation analysis to examine potential relationships between the biomarkers and social behaviors estimated based on clinical screening questionnaires.
We systematically analyzed the 10 RSNs of the brain in the KMUH dataset by using the RSN10 template. Although more RSNs have been reported in the literature, RSN10 networks have been consistently reported regardless of variations in acquisition protocols and analysis methods. The benefits of the analysis based on the RSN10 template are multifold. The template is publicly available; thus, researchers can compare results based on it. Smith et al. additionally mapped RSN10 onto behavioral domains on the basis of 7342 BrainMap activation images 26 . The mapping aided the interpretation of RSN10 components. For example, based on the behavioral mapping of RSN10, the biomarker A8 could be associated with action-inhibition, cognition, emotion, and perception-somesthesis-pain. Finally, RSN10 is now widely used in the R-fMRI research community. Although this study analyzed data using lab-made pipelines based on FSL, we found that the method for producing RSN10 maps was similar to that offered by the functions of the open source analysis project, the Configurable Pipeline for the Analysis of Connectomes (C-PAC) 27 . The pipelines, as well as the 30 masks of this study, are available 28 . The open-science materials and tools, including the RSN10 template, C-PAC, ABIDE, and our pipeline, can be used to replicate the methods of this study.   Table 3. The characteristics of the biomarkers in ABIDE and KMUH datasets. *Significant differences between the biomarkers of TD and ASD (p < 0.05) **Significant differences between the biomarkers of TD and ASD (p < 0.01). (2019) 9:9043 | https://doi.org/10.1038/s41598-019-45465-9 www.nature.com/scientificreports www.nature.com/scientificreports/ From the statistical results, we identified three sets of biomarkers that may be involved in the symptoms of ASD. They are (ABIDE, FC difference: R3, R4, R5, R6, and R8), (KMUH, FC difference: R9, A4, A14, A15, and A18), and (KMUH, FC-behavioral scores correlation A4). We observed that the three networks, DMN (R4, A4), cerebellum (R5, A15), and ECN (R8, A18), frequently presented in the three sets. The results suggest that the three networks could be the major resting-state networks associated with ASD symptoms. The ABIDE-derived features, A4 (DMN), A15 (the cerebellum network), and A18 (ECN), reached statistical significance in the difference tests in the KMUH dataset. The AUC results of A4, A15, and A18 were higher than those of R4, R5, and R8. The higher accuracy of the three A-biomarkers implied that the three R-biomarkers were not as sensitive as the three A-biomarkers used for identifying patients with ASD, and the brain regions indicated by the three masks may be the primary source of ASD.
Default modes network: A4. The results of this study indicate that the FC of DMN in the ASD group was weaker than that of the TD group. These results are in agreement with those of previous investigations 9, [29][30][31] . The A4 mask includes several brain regions: the mPFC, posterior cingulate cortex, left occipital cortex, and right MTG. The levels of social awareness, social cognition, social communication, social motivation, and autistic mannerisms from the SRS are all negatively correlated with the FC strength of the A4 mask. This finding is consistent with that of Assaf et al. who suggested that FC strength among the mPFC/anterior cingulate cortex (ACC), precuneus, and DMN correlated negatively with the SRS; in particular, weak FC strength of the ACC was correlated with higher levels of autistic mannerisms 9 .
Cerebellum network: A15. The A15 biomarker of the cerebellum network was higher in the ASD group than in the TD group. The results suggested the cerebellum's potential role in social-cognition behaviors. This is consistent with the findings of previous investigations. In large-scale fMRI studies on social cognition and the cerebellum, Van Overwalle et al. found robust clusters associated with social-cognitive studies, and their FC analysis identified the crucial role of the cerebellum in social mentalizing [32][33][34] . Previous imaging fMRI studies have revealed that the cerebellum activation of patients with ASD differs from that of TD individuals 35  www.nature.com/scientificreports www.nature.com/scientificreports/  www.nature.com/scientificreports www.nature.com/scientificreports/ reported that patients with ASD had increased activation of the cerebellothalamic network in a visually guided saccade experiment. Allen et al. identified increased and widespread activation of the cerebellum in patients with ASD compared with TD controls.
Executive control network: A18. The ECN covers parts of the medial-frontal lobe area, including the ACC, dorsolateral prefrontal cortex, superior frontal lobe, and frontal pole 19,38 . Our results indicated that the FC values of the ECN in the ASD group were higher than those in the TD group. The derived A18 mask covers the ACC, lateral frontal gyrus, and frontal pole. The findings of the correlation analysis indicated that A18 strength was positively correlated with the levels of hyperactivity/impulsiveness and inattention behavioral problems. This network is related to several cognition paradigms, such as action-inhibition, cognition, emotion, and perceptionsomesthesis-pain 19 . The executive control function of attention engages more complex mental operations during monitoring and resolving the conflict between stimulus surroundings. Fan et al. suggested that attentional deficits contribute to the abnormalities of neuropathology in ASD and hypothesized that the attentional network system is a primary role of the pathophysiology of ASD 39 . Keehn et al. indicated that the orienting network was impaired in children with ASD 40 , and the orienting deficit may partly be explained by the ECN.
In summary, this study established an approach for applying information from the large-scale ABIDE database to clinical investigations of local cohorts. We obtained FC biomarkers associated with patients with ASD. They were associated with the the DMN, cerebellum network, and ECN. The results indicated that the social responsiveness of the participants was significantly correlated with the biomarkers related to the DMN.

Data Availability
The datasets generated during and/or analyzed during the current study are available from the corresponding author on reasonable request.