A longitudinal resource for studying connectome development and its psychiatric associations during childhood

Most psychiatric disorders are chronic, associated with high levels of disability and distress, and present during pediatric development. Scientific innovation increasingly allows researchers to probe brain-behavior relationships in the developing human. As a result, ambitions to (1) establish normative pediatric brain development trajectories akin to growth curves, (2) characterize reliable metrics for distinguishing illness, and (3) develop clinically useful tools to assist in the diagnosis and management of mental health and learning disorders have gained significant momentum. To this end, the NKI-Rockland Sample initiative was created to probe lifespan development as a large-scale multimodal dataset. The NKI-Rockland Sample Longitudinal Discovery of Brain Development Trajectories substudy (N = 369) is a 24- to 30-month multi-cohort longitudinal pediatric investigation (ages 6.0–17.0 at enrollment) carried out in a community-ascertained sample. Data include psychiatric diagnostic, medical, behavioral, and cognitive phenotyping, as well as multimodal brain imaging (resting fMRI, diffusion MRI, morphometric MRI, arterial spin labeling), genetics, and actigraphy. Herein, we present the rationale, design, and implementation of the Longitudinal Discovery of Brain Development Trajectories protocol.

The present Data Descriptor is intended to provide essential details regarding our project and protocol plans. We discuss lessons learned in recruitment, retention, data collection, and data distribution, which help to provide insights into the challenges that investigators face when trying to attain recruitment and design ideals (e.g., community-ascertained recruitment strategy, multi-cohort design) in imaging studies. We provide basic descriptions of the cohort including demographics, symptom heterogeneity, and attrition. We provide demonstrations of correlative analyses leveraging neurobehavioral, psychological, and developmental phenotyping. Finally, with respect to neuroimaging, we present motion features, structural and functional results as a function of demography, and profiles of differential scanning parameters.

Recruitment strategy. The target enrollment for the Longitudinal Discovery of Brain Development
Trajectories project was 384, with 369 recruited between December 2013 and November 2017. The last follow-up characterization visit was completed in June 2019. To best leverage statistical power and generalizability, a community-ascertained recruitment strategy was deployed to approximate a community-representative enrollment for the local region.
To accomplish recruitment and community engagement, the NKI-RS program staff targeted multiple educational and recruitment outreach opportunities in the region (Rockland County) to foster a community-collaborative approach. Primary referral pathways are summarized in Supplementary Table 1 for all participants. Successful partnerships with community members came predominantly from study recruitment booths at community events including local street fairs and the NKI Neuroscience Education Day (a.k.a. NKI Brain Day). The NKI Brain Day was developed in a collaborative educational effort with the NYU Neuroscience Department while leveraging and adapting materials from the Dana Foundation 43 . The overall goal was to provide a hands-on educational neuroscience experience for families. Activities included education about neuroimaging with a mock scan exposure, lectures, facility tours, a laboratory with models and animal brain specimens, optical illusions and other brain games, as well as a neuroscience art area. All components were led by study staff and provided parallel opportunities to familiarize potential participants with the program and facility. As the program grew, word of mouth became a prominent referral source. Regionally-targeted InfoUSA ® -based mailers, email blasts, and appeals for word-of-mouth recommendations facilitated representation from areas that were relatively under-represented.
Given the scope and potential clinical applicability of the collected information, feedback was provided to all participant families by a licensed clinician. The feedback included pertinent physical measures, imaging findings and laboratory results (complete blood count, comprehensive metabolic panel, thyroid stimulating hormone, lipid profile, lead level) along with results from the semi-structured research diagnostic assessment and normed behavioral, cognitive, mood and anxiety measures. Notably, feedback was consistently presented as 'screening' with limited comprehensive clinical applicability in recruitment efforts. Mental health clinics, developmental pediatricians, neurologists, clinical psychologists, psychiatrists, and specialized school staff were not targeted as referral resources given concerns of generalizability of the sample; however, 5% of enrolled participant families learned of the study from a clinical provider (Supplementary Table 1).
Retention strategy. Several procedures and protocol adjustments were implemented to promote retention.
These included: • Birthday cards were sent to participants to help maintain a positive connection and track address changes.
• Enrolled participants were invited to educational talks and events designed to promote an understanding of the complexities of the brain and the need for brain imaging studies; preliminary findings and results were relayed as well. The scientific importance of the longitudinal dataset was emphasized to encourage retention. • Written and verbal participant satisfaction was ascertained at all protocol visits to ensure the experience was regarded as positive and worthwhile at the participant and community level. • Based on participant feedback, the baseline characterization protocol was compressed from two six-hour days to a brief consent visit easily arranged after school hours followed by a one day, 8-hour intensive protocol. This change took effect in mid-September 2015. Protocol changes are detailed in Supplementary Fig. 1. In addition to decreasing the number of full site days, use of a brief consent visit limited the duration of participant fasting for phlebotomy because participants did not complete consent procedures while fasting. • Retest MRI scans were permitted after any study visit rather than only the baseline characterization visit.
Participants who failed to present for the midpoint visit were not withdrawn but asked to complete the final protocol visit. A maximal 3-month follow-up window was provided to facilitate scheduling of protocol visits. • The initial protocol called for two longitudinal follow-up timepoints at 15 and 30 months following the baseline (referred to as: 0/15/30 track) to minimize possible season-related variation between participants. However, over the course of the study, the long duration between longitudinal visits became a source of concern regarding retention, leading to the creation of a complementary track with 12 months between consecutive visits rather than 15 (0/12/24 track). In addition to adding flexibility, the 0/12/24 month track helped families anchor annual longitudinal follow-up visits during a standard school break and shortened the study enrollment period. • The availability of visits on non-school days was maximized in response to participant feedback; 87.6% of visits occurred on non-school days including 60.1% of visits on weekends, 23% during winter/spring/summer breaks, 4.1% on national holidays, and 0.4% on teacher conference days.
Participant procedures. Screening. To determine eligibility and ensure safety, legal guardians of potential participants completed a pre-screening phone interview with an intake coordinator. The screening interview  www.nature.com/scientificdata www.nature.com/scientificdata/ inquired about the participant's health history and assessed for study inclusion criteria (Table 1). Individuals meeting study criteria were invited to participate.
Medications. Participants with a history of antidepressant, mood-stabilizing, or neuroleptic medication use were excluded at enrollment (Table 1). Participants taking stimulant medications were asked to discontinue their medication during protocol visit days, as stimulant medications can affect cognitive and behavioral testing as well as functional brain mapping. Participants who chose not to discontinue medications, or whose physicians required that medication not be interrupted, were enrolled and completed all study procedures.
IRB approval. The study was approved by the Nathan Kline Institute for Psychiatric Research Institutional Review Board. Prior to conducting the research, written informed consent was obtained from participants' legal guardians and written assent was obtained from the participants. Participants who became adults in the longitudinal follow-up provided written consent once becoming 18 years old. experimental design. Participants had 5 study visits. As highlighted in Retention Strategy, the baseline characterization included MRI and consisted of either two 6-hour visits if enrolled prior to mid-September 2015 or a brief consent visit followed by an 8-hour characterization visit if enrolled after. Mid-point and final visits were a single 8-hour characterization including MRI. The MRI retest visit was ideally within 3 weeks of either the baseline, mid-point, or final visit. All assessments are listed in Table 2. Participants were assigned to complete assessments in one of three characterization schedules. An example is provided in Table 3 Clinician-administered assessments. Clinician-administered assessments were obtained by a multidisciplinary team of psychologists, social workers, research nurses, and psychiatrists. All clinician-administered assessments were conducted by, or under supervision of, licensed clinicians. All cognitive assessments were administered by trained research staff under the supervision of a neuropsychologist. Participant responses were first scored by the administering rater with verification of scoring by another trained staff member. Separate psychiatric diagnostic and neuropsychology multidisciplinary team meetings occurred weekly to review administration, reconcile coding disparities, maintain consistent coding conventions, and minimize rater drift. Psychiatric diagnostic meetings generated consensus research diagnostic conclusions based on the results of the semi-structured research diagnostic summary combined with all other available information collected during the protocol (i.e., the best estimate method). Individuals for whom consensus and KSADS/SCID standard diagnostic summary were in agreement were coded as having no additional consensus diagnosis in the dataset to indicate the KSADS/SCID standard diagnostic summary was confirmed. In instances of discrepancy between consensus and KSADS/SCID standard diagnostic summary, the consensus diagnostic changes were specified.
Baseline visit only. Wechsler Abbreviated Scale of Intelligence, Second Edition (WASI-II). All participants were administered the WASI-II during baseline assessment only 44 . Four subtests: Vocabulary, Similarities, Block Design and Matrix Reasoning measure verbal comprehension and perceptual reasoning. The WASI-II subtest scores can be combined to estimate Full Scale Intelligence Quotient (FSIQ).
Wechsler Individual Achievement Test (WIAT-IIA), Second Edition Abbreviated. Three subtests were included: Word Reading ranging from phonological skills and letter recognition to word recognition 45 ; Numerical Operations ranging from counting and number recognition to complex calculations involving equations, fractions, decimals, etc.; and Spelling ranging from single and blended sound dictation to word dictation.
Repeating at each longitudinal time point. Kiddie-Schedule for Affective Disorders and Schizophrenia (KSADS) for School-Aged Children Present and Lifetime Version (PL). All participants and their guardians were administered the KSADS-PL at baseline 28 . If they were under 18 throughout the study, they were administered KSADS-PL at mid-point and final visits. The KSADS-PL is a semi-structured DSM IV-based research diagnostic assessment.
Structured Clinical Interview for DSM Disorders (SCID-I/NP). Participants who turned 18 while participating were administered the SCID at mid-point and/or final visits 46 , if applicable, instead of the KSADS. The SCID is a semi-structured DSM IV-based research diagnostic assessment. Unlike the KSADS, the guardian was not an informant on the SCID.
Delis-Kaplan Executive Function System (D-KEFS). The full DKEFS battery was administered to participants who were 8 years and older 47 . The full battery was introduced June 2014 and shortened in January 2017 to include the following subtests: Verbal Fluency, Trails, Design Fluency, Color-Word Interference, and Tower, (see Supplementary Fig. 1).
Rey Auditory Verbal Learning Test (RAVLT). The RAVLT, a 15-item word list repeated over 5 learning trials and recalled after an immediate and twenty-minute delay 48 , was administered to participants 8 years and older to characterize verbal learning and memory.
Digit Span. The WISC-R version of digit span was administered to all participants to characterize short term and working memory via verbal recitation of short strings of serially presented numbers 49 , recalled either in the same order as presented aurally by the examiner or in backwards order.

Behavior and Phenotypic Characterization Questionnaires
The High-Functioning Autism Spectrum Screening Questionnaire (ASSQ, 6-17) † ✓ ✓ Continued www.nature.com/scientificdata www.nature.com/scientificdata/ Magnetic resonance imaging (MRI). All MRI scanning took place on the 3.0 T Siemens TIM Trio located at the Nathan Kline Institute. A 32-channel head coil was used for all acquisitions in our sample. Details regarding the imaging protocols are presented in Table 4. In each imaging visit, participants were asked to complete two morphometric sequences (T1-weighted, T2-FLAIR), a Diffusion Tensor Imaging (DTI) sequence, six fMRI runs, and one arterial spin labeling (ASL) sequence.
Functional MRI. For the six fMRI runs, three were resting state protocols, where the participants were asked to keep their eyes fixated on a crosshair, stay still, and not think of anything in particular. Two of these 10-minute resting state sequences used a multiband imaging protocol 50 , one prioritizing temporal resolution (TR = 645 ms/3 mm isotropic voxels) and the other prioritizing spatial resolution (TR = 1400 ms/2 mm isotropic voxels). The third resting state sequence lasted 5 minutes and used typical single band acquisition parameters (TR = 2500 ms/3 mm isotropic voxels); it is included for reference. Two 2.5-minute visual checkerboard stimulation runs were completed by participants, one with each of the multiband sequences. The checkerboard task consisted of a flickering checkerboard (8 Hz) in block design. Each of the 3 checkerboard blocks lasted 20 seconds, with 20-s resting periods between each stimulation block. A final breath-hold task was performed with a multiband sequence (TR = 1400) for 4 minutes and 30 seconds. In this task, participants were given the following cues: "Rest, " "Get Ready, " "Breathe In, " "Breathe Out, " "Deep Breath and Hold. " Participants were then shown 6 circles of decreasing size, which indicated how much longer they needed to hold their breath. The "Rest" cue lasted 10 seconds; the "Get Ready, " "Breathe In, " "Breathe Out, " and "Deep Breath and Hold" cues each lasted  www.nature.com/scientificdata www.nature.com/scientificdata/ 2.5 seconds; and the 6 circles each lasted 3 seconds (holding breath for 18 seconds in between). This cycle was repeated 7 times during the run.
Diffusion Tensor Imaging. DTI scans consisted of 128 diffusion directions with a b-value equal to 1500 s/mm 2 and 9 b-0 images, with 2 mm isotropic voxels, data collected in the AP direction, and with a multiband acceleration factor equal to 4. Acquisition time was 5 minutes and 43 seconds. During the scans participants listened to music.
Morphometric Imaging. Morphometric imaging consisted of T1-weighted and a T2-FLAIR scan. T1-weighted images were collected with 1 mm 3 isotropic voxels using an MPRAGE sequence. Each run took 4 minutes and 18 seconds to collect. A clinical T2-FLAIR was collected for radiological reading, with a slice thickness of 3 mm and in plane resolution of 1 mm 2 . Each run had a duration of 3 minutes and 2 seconds.
Arterial Spin Labeling. A pseudo-Continuous Arterial Spin Labeling (pCASL) sequence was run for each participant. As in the fMRI resting state sequences, participants were asked to keep their eyes fixated on a crosshair for 5 minutes and 15 seconds. The labeling plane offset was 80 mm with a postlabeling delay (PLD) of 1s.  Table 3. Sample baseline characterization schedule. *During the cognitive testing blocks, breaks were provided between tasks at rater discretion as deemed necessary. www.nature.com/scientificdata www.nature.com/scientificdata/ Physical assessment and laboratory analysis. Basic physical measurements (height, weight, and waist/hip circumference), cardiovascular measurements (heart rate and blood pressure), and grip strength were obtained by trained research staff under the supervision of team physicians and nurses at each of the three main protocol time points. Serum laboratory analysis was conducted at each main protocol time point: complete blood count, comprehensive metabolic profile, lipid profile, lead level, and thyroid function testing. Fasting was encouraged but changed to optional in July 2015, with fasting status coded for each phlebotomy encounter ( Supplementary Fig. 1). Urine Toxicology was obtained for participants aged 11 and older. Of the 326 baseline completers, 157 (48.2%) provided a blood sample. Of the 247 midpoint completers, 117 (47.4%) completed phlebotomy. Of the 177 final visit completers, 95 (53.7%) completed phlebotomy. Study cohort laboratory test results are displayed in Supplementary Fig. 3, along with clinical laboratory-determined healthy ranges. A genetics sample was obtained once during any protocol visit and sent to the NIMH Genetics Repository for further processing, analysis, and sharing. Of the 326 baseline completers, 162 (49.7%) submitted a genetics sample. Finally, actigraphy was obtained for up to four weeks prior to the characterization visit using the Phillips Respironics Actiwatch 2 system (see Actigraphy section below).
Data distribution and use. Development of an End User Inquiry Mechanism. As of 12/31/2020, 229 collaborative research sites have obtained access to the enhanced NKI-RS dataset. An inquiry response mechanism to ensure "End Users" appropriately understand and access data was developed and has been retained after study completion. Without an inquiry mechanism, the capacity to leverage the dataset may be impacted, directly threatening relevance and utilization. Accordingly, an End User Response Panel-composed of program investigators with expertise in phenotyping, clinical characterization, database management, and imaging-was developed. Since implementation, the Panel has received and jointly responded to 232 unique inquiry threads (2017: 83; 2018: 62; 2019: 41; 2020: 46) to rocklandsample.enduser@nki.rfmh.org.
Summarizing lessons learned. Throughout program implementation, critical appraisal of strategic approaches in recruitment, retention, characterization, and dissemination to the scientific community was performed in an ongoing manner through weekly program leadership meetings. While large-scale research implementation is based on study-specific factors, some key considerations that may support similar efforts are listed below.
Participant engagement. The NKI-Rockland Initiative initially relied on InfoUSA-guided mailings to the community as a primary recruitment tool (yield typically: 1-3% per 10,000 flyers mailed). However, such efforts were of limited value in recruiting individuals under 18.0 years old. Accordingly, promotion of community and scientific partnership became a core tenet of recruitment and retention in this longitudinal cohort. A multifaceted approach was employed as follows: 1) efforts were made to form relationships with community agencies in promoting the advancement of the regional area as a benchmark for longitudinal pediatric developmental neuroscience; 2) educational initiatives were brought to schools and anchored at NKI in the form of NKI Brain Day; and 3) active and past participants, who were invited to community outreach and educational events, often acted as impromptu ambassadors to other non-enrolled program attendees. Participant word of mouth became a primary referral pathway, particularly as the study progressed. Consistent with our goals of representativeness, the resultant sample was found to broadly represent Rockland County and its surrounding areas (see Supplementary Table 2).  www.nature.com/scientificdata www.nature.com/scientificdata/ Balancing Experimental Needs Against Participant Burden. The initial cross-sectional design of the NKI-RS initiative applied a two-day on-site characterization battery that facilitated active proctoring of all questionnaires in a group-psychometric testing area. Participant feedback following direct translation of the two-day NKI-RS characterization to the longitudinal cohort suggested a need for consolidation to a one-day intensive characterization. Families cited the preference for weekend visits and minimal disruption to schooling as the study was not of clinical benefit to most participants. Accordingly, a brief consent visit was arranged and typically scheduled after school hours. Families were also given the option of completing questionnaires that did not assess clinically-defined safety items at home via the COllaborative Informatics and Neuroimaging Suite (COINS) web-based entry platform 51,52 . Participants and families were coached on minimizing distraction during home-based questionnaire entry. The remaining procedures from the initial two onsite 6-hour study visits were then consolidated to a single 8-hour visit. With this change, 87.6% of characterization visits were completed on non-school days with ample breaks and staggering of procedures to decrease data loss.

Selection of phenotypic measures for population samples.
A substantial portion of the dimensional mental health assessments in the NKI-Rockland phenotypic battery is focused on detecting and quantifying psychiatric symptomatology (e.g., Conners ADHD Scale, Children's Depression Inventory). Although reflective of the practices of the larger field, the reliance on measures that assess only the presence of deficits can be limiting as they fail to differentiate behaviors among unaffected individuals. Recent efforts by our team 53 and others [54][55][56][57] have called this practice into question as it is akin to grouping all with an IQ above 100 together. The distributions resulting from such tools tend to be truncated and thus suboptimal for dimensional analysis in genetics, imaging, and epidemiologic studies. The inclusion of both the Conners and SWAN for the assessment of ADHD-the former focused on detection of symptoms and the latter on that of strengths as well as weaknesses-allows interested users to explore this issue, which is not unique to the NKI-RS phenotypic battery. When possible, attempts were made to phenotype shared developmental constructs that extend across the larger population, e.g., Child Behavior Checklist 27 , Early Adolescent Temperament Questionnaire 58 , Adult Temperament Questionnaire 59 , Penn Computerized Neurocognitive Battery 60 , Vineland Adaptive Behavior Scales 61 , and NEO Five Factor Inventory 62 . However, given a lack of assessments emphasizing strength in psychiatric symptom www.nature.com/scientificdata www.nature.com/scientificdata/ characterization, standard clinical measures remained a central aspect of phenotyping. As the psychometric instruments producing dimensional distributions in non-clinical samples continue to become available, we encourage use of such measures in phenotyping investigations of community samples.
Actigraphy. Collection of actigraphy data was complicated by rates of unit damage and loss; 5 out of 28 units remained viable at the end of the project period. From a practical perspective, the number of units included in the budget at the onset of the project substantially underestimated real needs once the number of participants and visits per participant involved in the design were considered. Delays in visits at times caused stresses on our system due to the limited number of actigraphy units. The result of these challenges is that actigraphy represents one of the most notable sources of missing data in our sample. Future efforts would benefit from more careful estimation of the number of devices required to support the various contingencies that can arise in such a design. Recent actigraph price reductions increase feasibility of obtaining actigraphy data in large community studies. Participant incentives should be considered for consistent actigraph usage and return of functioning devices.
Multicohort Design. As highlighted previously, an initial structured multicohort design transitioned to an unstructured approach due to greater challenges of recruiting adolescents. While maintaining many strengths of a multicohort design, an unstructured design trades further flexibility in opportunistic recruitment for sample non-uniformity that may decrease (1) precision in delineation of age and cohort effects and (2) accuracy of mapping developmental trajectories 7 . Pubertal-age participants (enrollment age 11 and up) and their families were more likely to cite conflicting demands (e.g., school activities, schoolwork, peer social networks, and summer camps) that often required high levels of commitment from the participant. Younger participants were more likely to complete all protocol visits (Age less than 11 = 60%, Age 11 and up = 48%, p = 0.03). Together, these observations may support the possibility that decreased assessment burden spread over more visits of shorter duration may facilitate improved adolescent recruitment and retention.
Retention. As discussed in the Retention Strategy section, several changes were made to provide protocol flexibility: (1) addition of a 12/24-month follow-up pathway; (2) consolidation of the baseline 2-day on-site characterization to a one-day intensive visit; (3) obtaining the retest MRI scans after any of the three time point visits; (4) providing a 3-month window for return visits; and (5) retaining participants who missed the midpoint visit. While these adaptations facilitated a larger and more complete dataset, they have several implications. (1) The 12/24 month follow-up pathway may be prone to seasonal developmental effects (e.g., probing a participant's psychiatric symptoms only during their summer break); this may be a relevant consideration when pooling data from the 15/30-month follow-up pathway or directly considering seasonal effects. (2) Though other work by our group 53 has supported use of more frequent, shorter visits in clinical populations, this option was not provided given the larger non-clinical population. This approach also avoided confounds with variable characterization protocols. Participants who completed home-based assessments may have experienced conflicting attentional demands. (3) The ability to complete the retest at a later longitudinal visit inherently shifts the age distribution while likely improving data quality, with older participants suffering less motion artifact. (4) The 3-month visit window augments time effects in the sample as either a confounding variable or one for scientific exploration. Data privacy. During the consent process, all participants provide informed consent for their data to be shared via IRB-approved protocols. Given the sensitive nature of the information provided during NKI-RS participation, a Certificate of Confidentiality was obtained from the Department of Health and Human Services (HHS). The certificate helps to protect the privacy of human subjects by allowing the research team to refuse to disclose names or other identifying characteristics of study participants in response to legal demands (https:// humansubjects.nih.gov/coc/index).
With respect to data-sharing via the International Neuroimaging Data-sharing Initiative (INDI), informed consent for sharing was obtained prior to participation in the study. Given the critical priority of protection of participant privacy, no protected health information is ever released via the INDI. For imaging data, face information was removed from anatomical scans, and protected/identifying information was removed from all image headers and databases. Additionally, random 7-digit INDI identifiers were assigned to all datasets. Recognizing the risks associated with sharing phenotypic data, which is high dimensional and personal in nature, we developed two release versions: NKI-Rockland Lite. This release version contains only de-identified imaging data and limited phenotyping (age, sex, and handedness). No psychiatric, cognitive, or behavioral information is included. The NKI-Rockland Lite is immediately available without data usage agreement requirements. To access the NKI-RS Lite release, please visit the Study Website.
NKI-Rockland full phenotypic release. This release contains the full high-dimensional phenotypic protocol. Phenotypic data includes comprehensive psychiatric and behavioral assessment questionnaires (item-level information included), neuropsychological testing, cognitive/behavioral performance measures, basic laboratory measures, respiratory and cardiac recordings obtained during imaging, and actigraphy (for a minimum 1 night of sleep). Users of the NKI-Rockland Full Phenotypic Release complete the following agreements and obtain appropriate institutional approval prior to accessing the data (expected completion time <5 minutes). The entire phenotypic dataset will be made available to approved users via the COINS databasing system, which was the primary support for the NKI-Rockland data collection; also Longitudinal Online Research and Imaging System (LORIS) database, in which a curated version of the data has been created 64 .

Data acknowledgment. A guiding principle for data-sharing in the NKI-Rockland Sample initiative is to
provide researchers with the greatest flexibility to carry out scientific inquiries. In this regard: 1. We do not require authorship on any manuscripts generated using NKI-Rockland datasets; citation as a data-source is sufficient. 2. We do not require review of manuscripts generated using NKI-Rockland datasets; we do strongly encourage careful documentation of selection criteria applied to the NKI-RS in choosing datasets for analysis. 3. We do not require specification of data analyses prior to data usage. Distribution for use. The primary website for all NKI-Rockland Initiatives, including the present, is located at: http://fcon_1000.projects.nitrc.org/indi/enhanced/index.html. Phenotypic data may be accessed through either an NKI-RS-dedicated instance of the Longitudinal Online Research and Imaging System (LORIS) located at https://data.rocklandsample.rfmh.org/ and supported by the NKI-RS research program staff or through the COINS Data Exchange (https://coins.trendscenter.org/). Except for age, sex, and handedness, which are publicly available with the imaging, NKI-RS phenotypic data are protected by a Data Usage Agreement (DUA). Investigators must complete the DUA (found at http://fcon_1000.projects.nitrc.org/indi/enhanced/data/DUA.pdf ) and have it approved by an authorized institutional official before receiving access. The intent of the DUA is to ensure that data users (1) agree to protect participant confidentiality when handling the high dimensional NKI-RS phenotypic data (which includes single item responses) and (2) agree to take the necessary measures to prevent breaches of privacy. The DUA does not place any constraints on the range of analyses that can be carried out using shared data, nor does it include requirements for co-authorship by the originators of the NKI-RS.

Imaging data. All imaging data can be accessed through the 1,000 Functional Connectomes Project and its
International Neuroimaging Data-sharing Initiative (FCP/INDI) based at http://fcon_1000.projects.nitrc.org/indi/ enhanced/neurodata.html. This website provides an easy-to-use interface with point-and-click download of datasets that have been previously compressed; it also provides directions for users who are interested in direct download of the data from an Amazon Simple Storage Service (S3) bucket. A CSV file with the complete list of S3 links is provided in the study website. However, if a more specific list of S3 links is required (e.g., sex = female & age <10), it can be generated though LORIS. Imaging data is stored in the Brain Imaging Data Structure (BIDS) format, which is an increasingly popular approach to describing MRI data in a standard format. All data are labeled with the participant's unique identifier.
Partial and missing data. Some participants may not be able to successfully complete all components of the NKI-RS protocol due to a variety of factors (e.g., participants experiencing claustrophobia may not be able to stay in the scanner for the full session). To prevent data loss, when possible, we included a mock MRI scanner experience during the first visit. Overall, we attempt to collect as much of the data as possible within the allotted data collection intervals and log data losses when they occur. Data license. NKI-Rockland Sample data are distributed using the Creative Commons-Attribution-Noncommercial license, which is described at: https://creativecommons.org/licenses/by-nc/4.0/legalcode. For the high-dimensional phenotypic data, all terms specified by the DUA must be complied with.

Technical Validation
Sample composition. Geographic Table 2), however, communities in closer geographic proximity to NKI were more likely to enroll. A zip code region within Rockland county (10952, 10977) was notably underrepresented. Many residents in this region are members of a religious community that is found at a much higher rate in Rockland County than in the US population (Supplementary Table 2).
Diagnosis, Age and Sex. With respect to diagnosis in the 326 baseline completers, though fewer females enrolled (Fig. 2a), numbers with no diagnosis (n = 160) were comparable between the sexes (male = 78, female = 82, p = 0.752) indicating that boys in the sample were more likely to have at least one diagnosis (male = 98, female = 56, p < 0.001) (Fig. 2b). Much of this difference was driven by high rates (25%) of ADHD diagnosis at baseline (male = 54, female = 25), which were, as expected, more frequent in boys (p < 0.001).
Rates of pediatric ADHD in our sample were above other epidemiologic samples of 9.4% 66 . However, rates of other diagnoses were generally consistent with established community prevalence studies 67 . Study-specific www.nature.com/scientificdata www.nature.com/scientificdata/ criteria excluded participants with history of psychiatric hospitalization, current suicidal/homicidal ideation as well as antidepressant, neuroleptic, or mood stabilizer use. As a result, psychiatric symptom severity at clinical thresholds warranting treatment was generally identified and excluded. However, use of ADHD medication was not exclusionary allowing enrollment of participants with clinically significant ADHD symptoms and families seeking refined assessment and guidance. We believe this was a major factor in selective elevated rates of ADHD in the sample.
Substance and Medication use. Among the 326 baseline completers, 160 were aged 11 and up at baseline, with 158/160 completing substance use measures. 25,9,18, and 1 reported past alcohol, tobacco, marijuana, and other illicit substance use, respectively. All 160 submitted urine toxicology samples; 8 (7 for THC and 1 for amphetamine) had a positive urine toxicology for an illicit substance not readily explained by medications (i.e., stimulant medication in context of amphetamine positive urine). Rates of substance-related diagnosis (which was limited to cannabis abuse) are represented in Fig. 2b. With respect to psychotropic medication use, many medication classes were excluded (see Table 1). 37 of the 351 guardians reported their child took any medication, 15 for behavioral health indications and 25 for medical indications which was most commonly asthma (n = 11). The majority of participants who took behavioral health medications did so in treatment of ADHD (n = 12; 11 took stimulants and 2 used non-stimulants).
Quality assessment. Consistent with policies established through our prior data generation and sharing initiatives, all imaging datasets collected through the NKI-RS are made available to users regardless of data quality given a lack of consensus on what constitutes "good" or "poor" quality data. Also, "lower quality" datasets can facilitate the development of artifact correction techniques and help in evaluating the impact of such real-world confounds on reliability and reproducibility.
Phenotypic data. Raw data were evaluated for technical errors prior to each release by reviewing scatterplots by age and boxplots comparing past releases to each progressive release. Additionally, group level data were inspected to ensure the observed distributions and inter-relationships were sensible. In keeping with the rationale for imaging data, phenotypic data are not 'cleaned' of outliers, unless data are identified in cursory reviews as containing overt technical errors. Data cleaning and review is expected to be incorporated into each independent research analysis. Figures 3-6 highlight expected performance on several core characterization measures. Figures 3 and 4 depict examples of assessments with symmetric (near normal) distributions (e.g., IQ, academic   www.nature.com/scientificdata www.nature.com/scientificdata/ achievement testing, and SWAN) and truncated (e.g., CBCL, Conners) distributions. Shapiro-Wilk normality tests displayed in Fig. 4 highlight relative differences between symptom-oriented (CBCL and Conners) attentional measures and the more normally distributed SWAN that emphasizes attentional strength in addition to symptom-associated weakness (see Selection of Phenotypic Measures for Population Samples section). This is further demonstrated by stronger subscale correlation trends between SWAN and Conners for participants with SWAN > = 0 (indicating more weakness as a proxy for symptomatology) over those with SWAN < 0. Figure 5 demonstrates anticipated improvement in both raw speed and errors with age for the Color Naming, Inhibition, and Switching conditions of the D-KEFS Color Word Interference Test. As expected, based on prior D-KEFS norming studies, standardized speed and error scores do not demonstrate prominent age effects aside from relatively under-recruited age bands. Figure 6 shows expected age-relationships in physical maturation including age-related improvements in motor speed/coordination (grooved pegboard) and strength (grip strength). Muscle bulking during pubertal onset as measured by Tanner stage coincides with increasing strength for boys relative to girls (Fig. 6d) without differences in speed/coordination (Fig. 6c) [68][69][70][71] .
To further facilitate the evaluation of phenotypic data, we plotted correlations among a broad sampling of measures (see Fig. 7). Statistical relationships observed after false discovery rate-based correction for multiple comparisons revealed a wealth of associations that are in general alignment with the broader psychiatric literature. For example: (1) age correlated with increased physical strength 72 , body mass 68 , and attention; (2) general measures of internalizing and externalizing symptoms exhibited high correlations with one another 73-77 ; (3) academic achievement correlated with intellectual aptitude 78 and with socioeconomic status 79,80 but was inversely correlated with most internalizing and externalizing symptom measures 81-83 -notably, ADHD traits; Fig. 4 Distribution of attention ratings. Total CBCL Attention subdomain distribution compared to total SWAN distribution (left column). Distributions of Conners Inattention compared to SWAN Inattention (middle column) and Conners Hyperactivity compared to SWAN Hyperactivity (right column). The distribution of each scale was tested for normality using the Shapiro-Wilk test. Finally, Pearson (r) and Spearman (p) correlations between the SWAN and Conners are represented for both the hyperactivity and inattention subscales for all participants, those with SWAN > = 0, and participants with SWAN <0.
As expected 85 , child and parent measures assessing similar symptom domains were often poorly correlated, even for companion measures such as the Youth Self Report (YSR) and Child Behavior Checklist (CBCL) (Supplementary Fig. 4). Correlation strength between the YSR and CBCL was lower than prior published data obtained in clinically referred samples 27 . However, this difference narrowed in a post hoc subgroup analysis of participants with a baseline KSADS-PL diagnosis ( Supplementary Fig. 4b), arguing that YSR and CBCL are more closely correlated when psychiatric symptoms are present. Similarly, of the 121 possible intercorrelations between the YSR and CBCL syndrome scales, only two (YSR Attention Problems-CBCL Thought Problems and YSR Rule Breaking-CBCL Rule Breaking) showed statistical significance for individuals with no diagnosis at www.nature.com/scientificdata www.nature.com/scientificdata/ baseline ( Supplementary Fig. 4c). This observation is consistent with the illness orientation of clinical measures, such as YSR and CBCL and may support use of different tools for phenotyping non-clinical populations.
Structural MR imaging. Measurements of T1-weighted structural images were estimated using Mindboggle, an automated morphometric analysis package that draws upon Advanced Normalization Tools (ANTS) and FreeSurfer to generate consensus outputs and an enriched set of measures for analysis [86][87][88][89][90] . Brain morphometry was calculated for a total of 796 scans from 127 females and 152 males. Data from 6 females and 5 males were excluded due to insufficient quality for FreeSurfer to complete. Development curves for intracranial, gray matter, white matter, cerebral spinal fluid, and ventricular volume, mean cortical thickness, and surface area are shown in Fig. 8. Data from all participants were included in these plots, independent of data quality. By using the test-retest data, reliability was calculated for each of the structural measures. To visually inspect the quality of T1-weighted images, an instance of the Braindr web application 91 was created. This instance contained only the images collected for the sample presented in this paper. Within Braindr, a rater has a binary choice for each image. They can either choose to pass (score = 1) or fail (score = 0) an image depending on the quality. We asked 11 raters to cast their vote based on the general quality of the image but to specifically examine if the border areas between white and gray matter are blurry or not. For each structural scan, four slices were shown to the raters (two axial and two sagittal), an approach previously used by our group 92 . Therefore, each structural scan received 44 pass/fail votes. Slices were randomly presented to the raters. A subset of images used for training the raters is shown in Supplementary Fig. 5. For each image, the average score across raters was calculated. From a total of N = 585 structural scans, 455 images obtained a score greater than 0.5 (77.8%). We also compared the Braindr scores with age and Euler number, which has been shown to  www.nature.com/scientificdata www.nature.com/scientificdata/ be a good index of structural image data quality 92,93 (see Fig. 9a). We found a strong age effect on the quality of structural images. For the younger participants (age < = 10 y.o.), 59.2% of the data passed quality control criteria (Braindr score > = 0.5), and for older participants (age > = 16 y.o.), 94.2% of the data passed quality control. The Receiver Operating Characteristic (ROC) curve shows that the Euler number can be reliably used as a classifier to identify passed (braindr > = 0.5) or failed (braindr < 0.5) images, with 0.91 specificity and 0.91 sensitivity. For our data, the Euler number with optimal sensitivity and specificity that identifies passable quality structural data was >−158. Of note, this Euler number might be specific to our dataset and the generalizability of the Euler number cutoff should be evaluated in future works.
Diffusion tensor imaging scans. Quality control for diffusion MRI scans was calculated through the QSIprep toolbox 94 . Specifically, the mean Framewise Displacement (FD, see Fig. 10) and number of bad slices were calculated. The mean FD was 0.882 (median = 0.7181) and the mean number of bad slices was 28.57 (median = 0, range = [0, 1000]). Of note, 50.5% of the scans had zero bad slices.
Functional scans. To measure the relationship between age and sex with head motion, we calculated the FD 95 for each functional scan (Fig. 10). The Euler number for the structural scans and mean FD for the diffusion scans were also included in the analysis. There is a high correlation between FD and age (r > −0.387) for all functional scans across study participants. This high correlation holds for each sex, males and females. We also found high correlations in FD among the different functional scans (r: 0.496-0.767), as well as between the functional scans and the Euler number (r: 0.469-0.674). This high level of correlation between quality measures for different functional scans, as well as between functional and structural scans, has also been observed in other studies 92,96,97 .
During the collection of fMRI, physiological data (cardiac and respiratory) were obtained during scanning. Both physiological signals are available along with the imaging data. For each participant, the mean respiration rate was calculated by identifying the peak of the Fourier spectrum of the signal. The mean respiration rate was found to be negatively correlated with age (r = −0.0340), with a 18.4% reduction in rate being observed when comparing mean respiratory rates between children < = 9 years old (mean = 0.347 Hz) and > = 17 years old (mean = 0.283 Hz), see Fig. 11. www.nature.com/scientificdata www.nature.com/scientificdata/

Usage Notes
Considering distributional properties when using phenotypic measures. As highlighted in prior sections, several illness-focused phenotypic measures in the NKI-Rockland Sample have limited variation among individuals without symptoms. We encourage consideration of distributional characteristics of measures when formulating questions using these data. At times, it may be more sensible to limit the inclusion of participants to those with elevated levels of symptomatology (i.e., clinical, subclinical), as indicated by either general score (e.g., CBCL Total Score) or something more specific (e.g., MASC Total Score), or to treat participants without any elevations in clinical variables as a unified group. Clinical advisors are recommended members of research teams interrogating clinical phenotypic data, so they can provide guidance on relevant symptom thresholds and clinical significance of findings.
Scaled/standardized vs. raw scores. Individuals less familiar with psychiatric and cognitive assessment tools are not always aware of the transformations applied to data to account for age-and/or sex-effects. A classic example is IQ, the components of which can be present in standardized (i.e., a score of 100 represents the average IQ of the population and the standard deviation of scores is 15), scaled (e.g., average is 10; standard deviation is 3), and raw (e.g., number of items correct, number of points earned) forms. Additionally, many of the questionnaires, such as the CBCL, are transformed into a T score based on age and sex. As demonstrated under Technical Validation, the transformed measures for the various phenotypic tools minimize age-effects (and sex-effects, when considered); this is important because usage of the raw scores without some form of correction can result in observations being driven by age rather than other sources of individual variation.

Cross-Initiative harmonization.
A growing number of open initiatives focused on mapping brain, cognitive and behavioral development are emerging in the community. This will create opportunities to use datasets such as the one presented here either in aggregate analysis, or as one of multiple used to establish independent replications of findings. Although promising, significant hurdles in figuring out how to put the pieces together need to be overcome before such visions can be recognized. While growing attention is being given to issues related to batch effects in imaging and differences in sampling strategies (e.g., clinically referred, community self-referred, community ascertained, community representative, enriched, etc.), differences in behavioral phenotyping protocol are often underappreciated. For example, the Adolescent Brain Cognitive Development (ABCD) 98 ; the Brazilian High Risk Cohort 29 ; the Chinese Color Nest Project (CCNP) 99,100 ; the NIH Human Connectome Project Development (HCP-D) 101,102 ; the Child Mind Institute Healthy Brain Network (HBN) 103,104 ; the Pediatric Imaging, Neurocognition, and Genetics (PING) data repository 105 ; and the NKI-Rockland Sample (NKI-RS) 16 only share one common behavioral symptom measure-the Child Behavior Checklist 27 -which is not included in the Philadelphia Neurodevelopmental Cohort (PNC) 30 . In large part, this reflects differences in goals of the resources and differences in the timing of their launches (e.g., the NIH Toolbox 106 was not widely available when projects, such as the NKI-RS and PNC, started). Moving forward, greater focus on harmonization of samples using common data elements, and the development of methods for mapping similar though distinct tools www.nature.com/scientificdata www.nature.com/scientificdata/ used in different initiative samples to one another, will be crucial to facilitate cross-initiative sample aggregation and finding replication in independent datasets. organization of neuroimaging data. Detailed instructions on how to access and download neuroimaging data are described on the study website. Briefly, neuroimaging data are organized following the Brain Imaging Data Structure (BIDS) specification 107 . Within each participant's folder, there are subfolders identifying the imaging sessions. Depending on participation, each participant folder may have a baseline (BAS), midpoint (Follow-up 1; FLU1), final (Follow-up 2; FLU2), and retest (TRT) subfolders. Since BIDS makes provisions for phenotypic data collected during scanning (physiological, event-related), these data are also included in each session folder in addition to the NifTI MRI series. DICOMs are not included.
Handling Motion-Parameter inflation by respiration with multiband imaging. As has recently been noted by Power et al. 108 and Fair et al. 109 , multiband functional imaging is directly affected by respiration. This artifact is most notable in the motion estimation parameters. Respiration-induced motion affects scans differently depending on scan parameters and has been observed in other large multisite neuroimaging studies that use multiband fMRI sequences, including the Human Connectome Project (HCP) and Derivatives (Lifespan 101 and Disease 110 studies) and the Adolescent Brain Cognitive Development (ABCD) 98 studies. None of these studies has evaluated the impact of sequences with different sampling rates on respiration-induced head motion. The resting state scans of this study with a TR = 1400 ms have a much higher average framewise displacement (FD) than scans with a TR = 645 ms and TR = 2500 ms, with median FD 1400 = 0.258, FD 645 = 0.144 and FD 2500 = 0.209, respectively (see Fig. 12). The motion artifact caused by respiration is clearly noticed in the power spectrum plots of the motion parameters (Supplementary Figs. [6][7][8], in which there is an increase in power in the 0.25-0.40 Hz range in the phase encoding direction (Anterior-Posterior). Due to aliasing, this artifact is not clearly noticed in the single-band fMRI data. To reduce this artificial respiration induced head motion, we implemented the filter approach on the motion parameters proposed by Fair. A notch filter with a center frequency at 0.36 and a width of 0.07 Hz was applied. However, this filter is not applicable to functional data with TR = 1400 since the filter design includes the nyquist frequency of that sampling rate. Therefore, we applied a lowpass filter with a similar cutoff frequency, 0.31 Hz. For the single band data (TR = 2500 ms), this filter is also not applicable (Nyquist frequency = 0.2 Hz). We then attempted the technique suggested by Gratton et al. 111 and applied a low-pass filter with a cutoff frequency at 0.1 Hz. Motion parameters (FD) were recalculated after applying each of the filters. By applying the 0.1 Hz filter on the motion parameters, FD results are more similar across sequences. However, this method can potentially be overly aggressive by moving frames with real high motion below a specified threshold, and as such we currently recommend against its usage. Further work is still needed to compare motion estimates across different fMRI sequence parameters and evaluate preprocessing techniques to reduce these effects and www.nature.com/scientificdata www.nature.com/scientificdata/ their consequences in their final data analysis. These different preprocessing techniques include but are not limited to using external physiological signal correction methods (e.g., RETROICOR 112  Preprocessed and derived neuroimaging data. In the current data release only raw neuroimaging data is being provided. Our group and others are working on discovering the optimal preprocessing strategy, focused on addressing motion induced respiration artifacts on the fMRI data mentioned above. Another problem with providing preprocessed data is that there is no one size fits all solution. For example, for some analyses, it might be optimal to perform global signal regression (e.g., reduction in distance-dependent correlation 117 ), but for others, the contrary is true (e.g., dynamic connectivity analysis 115,118,119 . There is also the issue of difference in results due to preprocessing toolboxes that can have a large effect on the final analysis 120 . Addressing all these issues/ variables are beyond the scope of this data release paper. For future data releases, our group will be providing all the NKI-RS (not just of this substudy), including different preprocessing strategies, derivatives (for structural, diffusion, and functional MRI), and quality control metrics. We also encourage users to "push back" processed and derived data. Users can reach out to our group for instructions on how to upload these data to the International Neuroimaging Data Initiative (INDI) 121 for sharing via its AWS bucket.

Code availability
No custom codes or algorithms were used to generate or process the data presented in this manuscript.