A multi-site, multi-disorder resting-state magnetic resonance image database

Machine learning classifiers for psychiatric disorders using resting-state functional magnetic resonance imaging (rs-fMRI) have recently attracted attention as a method for directly examining relationships between neural circuits and psychiatric disorders. To develop accurate and generalizable classifiers, we compiled a large-scale, multi-site, multi-disorder neuroimaging database. The database comprises resting-state fMRI and structural images of the brain from 993 patients and 1,421 healthy individuals, as well as demographic information such as age, sex, and clinical rating scales. To harmonize the multi-site data, nine healthy participants (“traveling subjects”) visited the sites from which the above datasets were obtained and underwent neuroimaging with 12 scanners. All participants consented to having their data shared and analyzed at multiple medical and research institutions participating in the project, and 706 patients and 1,122 healthy individuals consented to having their data disclosed. Finally, we have published four datasets: 1) the SRPBS Multi-disorder Connectivity Dataset 2), the SRPBS Multi-disorder MRI Dataset (restricted), 3) the SRPBS Multi-disorder MRI Dataset (unrestricted), and 4) the SRPBS Traveling Subject MRI Dataset.


Background & Summary
Diagnostic criteria based on psychiatric symptoms, such as the Diagnostic and Statistical Manual of Mental Disorders (DSM) 1 and the International Classification of Diseases (ICD) 2 , are commonly used in clinical practice and research. However, overlapping and heterogeneous clinical presentations of psychiatric disorders have compromised their validity [3][4][5][6] . It has been suggested that a new framework enabling objective diagnosis is needed for research to properly interpret the etiology and pathophysiology of psychiatric disorders and to treat them effectively. To address this issue, the Research Domain Criteria initiative of the U.S. National Institute of Mental Health has analyzed the genes, molecules, cells, neurophysiology, and behavior involved in functional domains associated with specific neural circuits, with the aim of illuminating the biological basis of mental disorders 7 .
As a method for directly examining the relationship between neural circuits and psychiatric disorders, machine learning classifiers using resting-state functional magnetic resonance imaging (rs-fMRI) have recently attracted attention. Functional connectivity (i.e., the degree to which activity patterns are synchronized between particular brain regions, as revealed with rs-fMRI) is associated with a variety of individual characteristics, as well as psychiatric disorders [8][9][10][11][12] . Connectivity-based classifiers have been developed for many psychiatric disorders, including autism spectrum disorder 13 , depression 14 , schizophrenia 15 , and obsessive-compulsive disorder 16 .
For development of accurate classifiers, large-scale datasets are necessary. In particular, generalizable classifiers that can be replicated with data collected at independent sites require data collection from multiple sites. In multi-site data collection, it is necessary to develop a method for canceling out differences between sites and integrating them into a homogeneous dataset, a process known as data harmonization 17 . Multi-site, multi-disorder data collection is also important to investigate the full spectrum of diseases and disease subtypes. In this context, the development of a harmonizable database is an urgent issue.
To address this issue, the DecNef Consortium (https://bicr.atr.jp/decnefpro/) introduced unified data collection protocols covering multiple sites and disorders. In 2013, around 60 neuroscientists, neuropsychiatrists, engineers, computational scientists, and psychologists from eight Japanese universities and institutes formed the consortium as part of the Japanese Strategic Research Program for the Promotion of Brain Science (SRPBS) project for big data applications, machine-learning algorithms, and sophisticated fMRI neurofeedback methods for diagnosing and treating multiple psychiatric disorders. Many of these techniques are related to brain-machine interfaces. SRPBS, a nation-wide research program for brain science, is supported by the Japanese Advanced Research and Development Programs for Medical Innovation (AMED).
Neuroimaging data of 2,414 patients and control participants were collected at eight sites (Table 1). A coherent protocol was designed and used for rs-fMRI (see MRI acquisition in Methods). Table 1 shows the number of samples collected for each disorder at each site. Altogether, 14 scanners from three manufacturers (Siemens, Philips, and GE) were used to produce these neuroimaging data.
To collect a large amount of neuroimaging data associated with psychiatric disorders, images must be acquired from multiple sites because of the limited capacity of a single site. To properly manage heterogeneous, multisite data, it is important to understand what leads to differences in data across sites and to develop a method for harmonizing the data. To this end, 143 sessions were acquired from 9 traveling subjects who visited the consortium sites (12 scanners). Using the traveling-subject dataset in conjunction with the multi-disorder dataset, we demonstrated that site differences are due to biological sampling bias (differences between participant groups) and engineering measurement bias (differences in the properties of the MRI scanners used) 17 .
Showa University. For patients with autism spectrum disorder, a clinical team assessed the developmental history, present illness, life history, and family history and then made clinical diagnoses according to DSM-IV-TR. For patients with schizophrenia, diagnoses were made by 2 experienced psychiatrists, based on the Structured Clinical Interview for DSM-IV Axis I Disorders-Patient Edition (SCID). Typically developed controls were confirmed to have no psychiatric conditions, according to the Japanese version of the MINI.
University of Tokyo. Psychiatric disorders were diagnosed according to DSM-IV criteria. The MINI was used to screen healthy controls for psychiatric disorders.
Osaka University and CiNet. Chronic pain was diagnosed based on the definition by the International Association for the Study of Pain: "pain that extends beyond the expected period of healing or progressive pain due to non-cancer diseases 23 . " Kyoto University. Schizophrenia spectrum disorder, major depressive disorder/bipolar disorder, and healthy controls were diagnosed using the SCID.    www.nature.com/scientificdata www.nature.com/scientificdata/ KPUM. Patients with obsessive-compulsive disorder (OCD) were primarily diagnosed using the SCID. Experienced clinical psychiatrists or psychologists applied the Yale-Brown Obsessive-Compulsive Scale 24 for clinical evaluation of obsessive-compulsive symptoms in patients with OCD. MRI acquisition. The SRPBS MRI guideline recommended that MRI data be acquired using the following imaging protocol.

1) rs-fMRI
Scan the entire brain, including the cerebellum, and minimize the repetition time (TR). Emphasize the prefrontal regions that are related to psychiatric disorders.
• Coil: 8/12 ch phased array coil (24/32 ch coil is also acceptable) • Sequence: ep2d_bold (Siemens) -Please find the corresponding parameters for other vendors.    www.nature.com/scientificdata www.nature.com/scientificdata/ -Reason: we sometimes find that prefrontal regions are elongated and included in the field of view (FoV) when phase encoding is set at PA and in-plane resolution is set at 3 × 3 mm. • Slice thickness: 3.2 mm • Gap: 0.8 mm (25% of slice thickness) -We recommend setting (Slice Thickness + Gap) at an integer value (4) to prevent reading error of the statistical parametric mapping (SPM). Siemens users can set a slice gap as an integer value corresponding to the percentage of slice thickness: gap = slice thickness (3.2 mm) × 25% = 0.8 mm.  3) Structure image • Parameters for structure images conform to those of J-ADNI2 (high-speed mode: GRAPPA/No SENSE) • No specification of slice numbers (in the right-to-left direction), but please include the entire brain with a considerable margin.

4) Instructions to participants and others
• Instructions -Please relax and look at the fixation point.
-Do not sleep.
-Do not think about anything in particular.
-Do not move your body, especially your head and trunk. • Display Image -A black cross within the fovea is displayed as a fixation point with a gray background that minimizes visual stimulation. • Environment -Carefully fix the participant's head and trunk.
-The room should be dimly lit.
-Please place headphone-style earmuffs over the earplugs.
-It is advisable to monitor heart rate and respiration rate if possible. • Debriefing -Evaluate sleepiness on the Stanford Sleepiness Scale.
-Ask participants to confirm that they followed the instructions (we prepared a unified questionnaire).
Multi-disorder data for Datasets 1-3 were acquired at SRPBS consortium sites. Each participant underwent a single rs-fMRI session, a structural MRI session, and an optional field-map session. Detailed imaging parameters used at each site for fMRI and T1-weighted (T1w) structural MRI are summarized in Tables 5 and 6, respectively.
Data for the Traveling Subject Dataset (Dataset 4) were acquired at sites included in the SRPBS multi-disorder database, as well as three additional sites: Kyoto University, which uses Siemens Skyra scanners, and Yaesu Clinics 1 and 2, which use Philips Achieva scanners. Each participant underwent three 10-min rs-fMRI sessions at each of nine sites, two 10-min sessions at each of two sites (Hiroshima Kajikawa Hospital and Hiroshima University Hospital), and five cycles (morning, afternoon, the following day, the following week, and the following month) consisting of three 10-min sessions at a single site (Advanced Telecommunications Research Institute [ATT]). One participant underwent four, rather than five sessions at the ATT site because of poor physical condition. Thus, a total of 143 sessions were conducted. There were two phase-encoding directions (PA and AP), three MRI manufacturers (Siemens, GE, and Philips), four channels per coil (8,12,24, and 32), and seven scanner types (TimTrio, Verio, Skyra, Spectra, MR750W, SignaHDxt, and Achieva). Detailed imaging parameters used at each site for fMRI and T1-weighted (T1w) structural MRI are summarized in Tables 7 and 8, respectively. Please refer to our previous paper 17 for detailed methods.

Preprocessing and calculation of the resting-state functional connectivity matrix (Dataset 1).
rs-fMRI data were preprocessed using SPM8 implemented in MATLAB (R2016b; Mathworks, Natick, MA). The first 10 s of data were discarded to allow for T1 equilibration. Preprocessing steps included slice-timing correction, realignment, co-registration, segmentation of T1-weighted structural images, normalization to Montreal Neurological Institute space, and spatial smoothing with an isotropic Gaussian kernel of 6 mm full width at half maximum (FWHM). For analysis of connectivity matrices, regions of interest (ROIs) were delineated according to 140 regions covering the entire brain, defined anatomically by the digital atlas of the Brainvisa Sulci Atlas and three subregions of the cerebellum (the left and right cerebellum, and the vermis). Blood oxygen level-dependent    www.nature.com/scientificdata www.nature.com/scientificdata/ matter was eroded by one voxel to consider the partial volume effect. These extracted time courses were bandpass filtered (transmission range, 0.008-0.1 Hz) before linear regression, as was done for regional time courses. Then, for each individual, a matrix of 9,730 functional connections between 140 ROIs was calculated by exhaustively evaluating pair-wise temporal Pearson correlations of BOLD signal time courses, while discarding any flagged frames in the previous procedure (scrubbing). We calculated the framewise displacement (FD) and removed volumes with FD >0.5 mm, as proposed in a previous study 25 . Demographic information of Dataset 1 includes the rate of excluded volumes and the number of excluded volumes for each participant. For details about the entire procedure to calculate the matrix, see Yahata et al. 13 .   www.nature.com/scientificdata www.nature.com/scientificdata/

Face-masking of structural MRI data (Datasets 2 and 3). To prevent identification of individual par-
ticipants via reconstruction of the facial surface from structural MRI data, we performed a face-masking calculation. We removed signals of volumes covering the facial surface. The code, which is available on our GitHub project (https://github.com/bicr-resource/deface), removes the subject's face from the MRI structure image (NIfTI format). The code is written in MATLAB and internally uses SPM8 and mri_deface. Finally, a report file of face-masking results is generated for each participant (Fig. 2). The report contains a 3D reconstruction of the surface of defaced T1w images, two sagittal slices at x = 0 before and after face-masking, five sagittal slices of the detected brain (colored yellow), and removed voxels (colored by cyan) at different x-coordinates (x = {−60, −30, 0, 30, 60} (mm) in naive space). We used this report to assess the quality of the face-masking process (see Technical Validation).

Data Records
All datasets are available on Synapse (Synapse ID: syn22317076) [18][19][20][21] . A summary of data records for all datasets is presented in Table 9. Full datasets are also available on the DecNef Project Brain Data Repository website (https:// bicr-resource.atr.jp/decnefpro/). Please see Usage Notes section for more information on requirements to access datasets.
Dataset 1 comprises the resting-state functional connectivity matrix data (.mat) with the number of (participants) × (the number of functional connectivities = 9,730) calculated for each site; 18 participant demographic information (.csv); and readme file with MRI parameters (.txt) of each site. See Table 2 for the number of participants for each site.
Dataset 3 comprises brain imaging data in NIFTI format from 1410 participants, including those in Dataset 2 who consented to having their data shared publicly 20 . Thus, the data structure is the same as that of Dataset 2.
Dataset 4 comprises brain imaging data from 143 sessions in NIFTI format 21 . Dataset 4 is a BIDS-validated dataset 26 . The "sourcedata" folder contains 143 folders labeled with participant ID numbers (ex. sub-001). Each participant folder contains folders labeled "func, " which contain rs-fMRI EPI images (.nii.gz). "anat" contains defaced T1-weighted structural images (.nii.gz). "fmap" contains field-map images (.nii.gz). Dataset 2 also contains "participants.tsv, " which includes demographic data (BIDS ID, subject ID, site, number of repetitions, phase encoding, MRI manufacture, coil, scanner, session id, sex, age). "dataset_description.json" includes information Normal Successful Fig. 2 Face-masking process. We removed signals of volumes covering the facial surface. This process generates a report file of the face-masking results for each participant, containing a 3D reconstruction of the surface of defaced T1w images.
In all datasets, demographic information includes unique participant ID numbers, sex, and age. Datasets 1-3 additionally include participant handedness, diagnosis, and auxiliary demographic information.

Quality assessment of fMRI and structural MRI data (Datasets 2-4). For Datasets 2-4, consistent
with the MRI data, we calculated quality metrics by applying MRIQC (Esteban et al., 2017). We plotted the spatial and temporal measures listed below for Datasets 2-4 and the open dataset (ABIDEII, http://fcon_1000.projects. nitrc.org/indi/abide/abide_II.html) for comparisons (Figs. 3 and 4). For detailed descriptions of those metrics and related references, please visit the PCP Quality Assessment Protocol website (http://preprocessed-connectomes-project.org/quality-assessment-protocol/). We used Raincloud plot in the ptitprince python package (https:// github.com/RainCloudPlots/RainCloudPlots) 27 for the strip plot, box plot, and violin plot for each dataset.   www.nature.com/scientificdata www.nature.com/scientificdata/  www.nature.com/scientificdata www.nature.com/scientificdata/ Quality assessment of structural MRI data face-masking (Datasets 2 and 3). We manually evaluated the quality of face-masking processing. Three people (one neuroimaging researcher and two clinical psychologists) independently evaluated the accuracy of face-masking by visually checking the structural MRI data. In the evaluation, "1" is successful, "2" is normal, and "3" is doubtful (e.g., part of the brain is missing or part of the face is present). See example images for "successful, " "normal, " and "doubtful" in Fig. 2. Numbers of participants evaluated as "successful, " "normal, " and "doubtful" were 1561, 60, and 6 in Dataset 2 and 1344, 60, and 6 in Dataset 3, respectively.

Usage Notes
Interested parties can visit the Synapse project site (Synapse ID: syn22317076) or the DecNef Project Brain Data Repository site (https://bicr.atr.jp/decnefpro/data) to apply for access to datasets.
Before submitting access requests, applicants should read the privacy policy for each dataset. Datasets may not be used for commercial purposes. Datasets may not be copied or redistributed. If applicants publish manuscripts using these datasets, applicants agree to acknowledge the DecNef Project Brain Data Repository as the data source and to include language similar to the following: "Data used in the preparation of this work were obtained from the DecNef Project Brain Data Repository (https://bicr-resource.atr.jp/srpbsopen/), collected as part of the Japanese Strategic Research Program for the Promotion of Brain Science (SRPBS) supported by the Japanese Advanced Research and Development Programs for Medical Innovation (AMED)." Note that we periodically collect the number of applicants and the amount of data downloaded from the database for reports to the DecNef Project, but not private information. Note that parties wishing to use these data must review and agree to these terms, including those who access shared copies of the data. In the event that data need to be shared with collaborators, those persons must also register with the respective data repositories and agree to all terms. Each dataset has a different application process, as described below.
From Synapse. Applicants need a valid Synapse account. Access is controlled by separate conditions for use, so please check the Synapse project wiki for the terms of use for each dataset.
1) The SRPBS Multi-disorder Connectivity Dataset 18 Applicants download the "Application Form for Data Usage" from the Synapse project site (https://doi. org/10.7303/syn22317078), and email the completed form to decnef-db-admin@atr.jp. We will add download privileges to the applicant's Synapse account.
2) The SRPBS Multi-disorder MRI Dataset (restricted) 19 Applicants download the "Application Form for Data Usage" from the Synapse project site (https://doi. org/10.7303/syn22317079), and email the completed form to decnef-db-admin@atr.jp. We will add download privileges to the applicant's Synapse account.