Using multiple short epochs optimises the stability of infant EEG connectivity parameters

Atypicalities in connectivity between brain regions have been implicated in a range of neurocognitive disorders. We require metrics to assess stable individual differences in connectivity in the developing brain, while facing the challenge of limited data quality and quantity. Here, we examine how varying core processing parameters can optimise the test–retest reliability of EEG connectivity measures in infants. EEG was recorded twice with a 1-week interval between sessions in 10-month-olds. EEG alpha connectivity was measured across different epoch lengths and numbers, with the phase lag index (PLI) and debiased weighted PLI (dbWPLI), for both whole-head connectivity and graph theory metrics. We calculated intra-class correlations between sessions for infants with sufficient data for both sessions (N’s = 19–41, depending on the segmentation method). Reliability for the whole brain dbWPLI was higher across many short epochs, whereas reliability for the whole brain PLI was higher across fewer long epochs. However, the PLI is confounded by the number of available segments. Reliability was higher for whole brain connectivity than graph theory metrics. Thus, segmenting available data into a high number of short epochs and calculating the dbWPLI is most appropriate for characterising connectivity in populations with limited availability of EEG data.

www.nature.com/scientificreports/ and duration of epochs and explored the pattern ICCs for varying epoch numbers and lengths. This allows us to address the practical question of how data should be prepared for connectivity analysis.

Material and methods
participants. This study was part of a larger investigation that focussed on the test-retest reliability of behavioural, eye tracking, and EEG measures across 2 sessions separated by a 1 week delay (mean 7.8, range 2-20 days for the included infants). A delay of 1 week was selected to minimise the effects of repetition on infant attention and responses 50 and to encompass a degree of developmental stability. Shorter intervals may lead to data loss (see section Attrition rates in Supplementary Information). Longer intervals may encompass significant developmental change, confounding interpretation. The study was conducted at the Kinder Kennis Centrum at Utrecht University, The Netherlands, where a team of trained and experienced researchers and research assistants collected the data. The medical ethical committee of the University Medical Center Utrecht approved the study (application number: , and all methods were carried out in accordance with the relevant guidelines and regulations. Families with infants aged around 10 months were invited to participate in the study in writing (home addresses were shared with the research centre by the communal register of the cities within the Utrecht province). Upon arrival at the lab, legal guardians of the infants (parents/caregivers) received information about the procedure of the study and gave signed informed consent. After the session had finished, they received 30 euros and a toy for the participating infant as an incentive. The session was repeated after 1 week. EEG data for the first session were available for 73 infants, and for the second session for 64 infants (the remaining 9 families did not want to return for a second session). EEG data and participants are identical to those reported in the study by Van der Velde et al. 48 .
After data cleaning, different subsamples of the data were used for the analyses in order to include the maximal number of participants with specific amounts of data available. First, we selected the alpha frequency band based on visual inspection of data from the first session in the 73 infants (35 males, M Age = 302 days, sd Age = 13, range 272-344 days). Second, we included 3 different subsamples for analyses including long epochs, short epochs, and with constant amounts of data (see "Selection of epoch lengths and numbers" and Fig. 1 for an overview of the methods). experimental procedure. The EEG task consisted of the presentation of naturalistic dynamic videos: 5 vignettes of women singing Dutch nursery rhymes (recorded in The Netherlands after 51 ), and 6 vignettes of moving toys 51 (60 s duration each). Videos were presented 3 times as part of a larger EEG battery, resulting in a total duration of 6 min. Infants were seated in a high chair in front of the stimulus screen, with their parents sitting behind them. A curtain separated the participants and stimulus screen from the experimenter and recording screen to avoid the infants being distracted by the experimenter.
The EEG signal was recorded with a 32 electrode Biosemi ActiveTwo system at a sampling rate of 2048 Hz (a layout can be found in the Supplementary Information online). The Common Mode Sense (CMS) and Driven Right Leg (DRL) were used as active ground signal. Two external electrodes on the left and right mastoid and one electrode under the eye were recorded as well. The EEG session was recorded with a video camera. eeG data cleaning and segmenting. Raw EEG data were preprocessed using Matlab (versions 2015a and 2017a, Natick, MA, USA), and Fieldtrip (a toolbox for MEG/EEG data processing, available at https ://www. field tript oolbo x.org, 52 ). First, data were down-sampled to 512 Hz, and filters were applied to decrease influence from high-frequency noise, slow wave drifts, and line noise (band-pass filer 0.1-70 Hz, and Notch filter at 50 Hz). Next, independent component analysis (ICA) was performed to correct for eye movement and blink artefacts. Artefacts caused by flat lines, jumps in the signal, muscles, clipping, or excessive noise were manually removed from the continuous data. Channels were removed from the data if artefacts affected more than 50% of the signal across the session. After data cleaning, the data were re-referenced to the average reference. This resulted in clean data segments of different lengths.
Next, we segmented the clean data segments into epochs of 1, 2, 3, 4, 5, and 6-s duration. We focussed on EEG connectivity in the alpha frequency band because this band displayed the highest test-retest reliability in the previous study, is characterised by a high signal-to-noise ratio, is less affected by muscle artefacts than other frequency bands, and is often the frequency band of interest in developmental studies 20,21,27,48,53,54 . Since alpha peaks typically occur at lower frequencies in younger participants, we selected our alpha band based on visual inspection of the power spectra calculated across the epochs from the first session for all 73 participants 21,53,55 . We observed a clear peak around 6-8 Hz (see Supplementary Information online), and selected these frequencies as the alpha band (consistent with ranges used in other studies in infants 21,51,[56][57][58] ).

Selection of epoch lengths and numbers.
In order to examine the biases towards epoch number, epoch length, and total data amounts, we selected different subsamples of the data for our calculation of EEG connectivity values. We took 3 approaches to selecting epochs and examining the reliability of subsamples: (1) low numbers of longer epochs: values across 20-60 epochs of 1-5 s duration each, with epochs randomly selected across each session 44,48 ; (2) high numbers of shorter epochs: values across 30-150 epochs of 1 and 2 s duration each, with randomly selected epochs as in approach 1 21,27 ; and (3) constant total amount of data: values across 120 1-s epochs, 60 2-s epochs, 40 3-s epochs, and 10 6-s epochs (where 10 6-s randomly selected epochs were segmented into 1-, 2-, and 3-s epochs to ensure that values for the different segmenting methods were calculated across the same data 21,45 ). Only infants with artefact-free data across all 32 electrodes were included in these analyses, since connectivity metrics are influenced by the numbers of nodes and edges included in the networks 59  (1) PLI = E sgn(I{X}) , Figure 1. Overview of the methods. Clean EEG data were segmented in different epoch lengths. After randomly selecting different numbers of epochs, connectivity matrices were calculated with the PLI and dbWPLI methods, and averaged across 6-8 Hz. Finally, connectivity metrics were derived from the matrices. Reliability was calculated with the intra-class correlation (ICC) for the extracted connectivity metrics from different methods from both sessions. where I{X} is the imaginary component of the cross-spectrum, and E{·} is the expected value operator 44 . We used in-house scripts to calculate Vinck's PLI and dbWPLI values, which were identical to the ones used in 21,27 . PLI and dbWPLI-based connectivity matrices were averaged across the alpha frequency band (6)(7)(8). The matrices were subsequently used to calculate the network characteristics of interest: (a) whole brain connectivity, (b) the normalised weighted clustering coefficient, (c) the normalised weighted path length, and (d) the small-worldness index. Whole brain connectivity was defined as the average (PLI or dbWPLI) connectivity across all possible electrode pairs.
Three further network characteristics were based on graph theory and calculated using Matlab functions and the Brain Connectivity Toolbox (BCT, available at https ://sites .googl e.com/site/bctne t/) 60 for the PLI values and absolute dbWPLI values 45 . Graph theory assumes that nodes (here, EEG sensors) are connected by edges with different values representing the strength of these connections (e.g., PLI or dbWPLI values) 60,61 . We computed weighted values rather than binary connectivity values, since thresholds for binary matrices are often arbitrarily chosen, and weak connections also provide information on the network 43 .
The normalised weighted clustering coefficient (C w norm ) is a local metric reflecting functional segregation, and measures the average clustered connectivity around individual nodes 62,63 . We first calculated the average weighted clustering coefficient C w across all 32 nodes (here, EEG channels) after rescaling the connection weights 62,63 : We then computed C w norm by dividing the observed clustering coefficient C w from the weighted connectivity matrix by the average clustering coefficient C w rand from 1,000 surrogate matrices 20 . The normalised weighted path length (L w norm ) is a global metric reflecting functional integration, and is measured as the average shortest path (sequence of edges) between two nodes 62 . We first calculated the observed weighted characteristic path length L w after inversing the weights as the average shortest path lengths between nodes 62 : The normalised path length or L w norm was calculated as L w divided by the average characteristic path length L w rand across 1,000 surrogate connectivity matrices to obtain L w norm . Finally, the small-worldness index (SWI) reflects the efficiency of the functional organisation of the network or graph, and is measured as the ratio between the normalised clustering coefficient and normalised characteristic path length 64 . We obtained values for the SWI by dividing the normalised weighted clustering coefficient by the normalised weighted path length 64 as follows: The results of these processing steps are 1 value for each of the 4 network characteristics (whole brain connectivity, normalised weighted clustering coefficient, normalised weighted path length, and small-worldness index), for both connectivity measures (PLI, and dbWPLI), for each session (test, and re-test), for each of the 3 approaches for individual infants.

Statistical analyses.
Test-retest reliability between the two sessions was calculated across participants using the intra-class correlation or ICC(3,1) (also called ICC(C-1)) with the following formula; where MS R is between object variance (participant here), MS E is the error variability or mean squared error, and k is the number of measurements per participant. The ICC (3,1) is a two-way fixed model ICC for single scores measuring consistency [65][66][67] , and has been used in previous test-retest reliability studies of EEG connectivity 38,45,48,49 . For ease of the reader, we use the term ICC to refer to ICC(3,1) here. We adapted the following convention to interpret the reliability values: poor-ICC < 0.40; fair-0.40 ≤ ICC ≤ 0.59; good-0.60 ≤ ICC ≤ 0.74; and excellent-ICC ≥ 0.75 35,38,45,49 . Negative ICC values were set to 0 42 . P values reflect whether the ICC value is significantly different from the null hypothesis. To further clarify, we are describing the pattern of ICC values, rather than statistically comparing ICC values with each other. Reliability of these measures not only depends on ICC values but also on the stability of the EEG measure and the aspect of connectivity being measured. Statistically comparing ICC values would falsely suggest that reliability differences depend on the For conciseness, we only report ICC values for whole brain connectivity across low numbers of longer epochs, and high numbers of shorter epochs, and for graph metrics across a constant total amount of data which were based on different subsamples of the complete sample (see Table 1, Supplementary Tables S1-S3 online for original ICC values reported in the main manuscript, and Supplementary Tables S4-S9 online for reliability of graph metrics for low numbers of longer epochs, and for high numbers of shorter epochs).

Results and discussion
Reliability of whole brain connectivity across low numbers of longer epochs. Figure 2 displays ICC values and their 95% confidence intervals across low numbers of longer epochs (N = 19). For the PLI-based whole brain connectivity, ICC values ranged from 0 to 0.87 (Fig. 2a). For the dbWPLI-based whole brain connectivity, ICC values ranged from 0 to 0.85 (Fig. 2b). ICC values generally increased with increasing epoch numbers and lengths. Reliabilities were within the poor range for 20 and 30 1-and 2-s epochs (0 ≤ ICC PLI ≤ 0.14, 0 ≤ ICC dbWPLI ≤ 0.24), and in the good and excellent ranges for 50 and 60 4-and 5-s epochs (0.60 ≤ ICC PLI ≤ 0.87, 0.62 ≤ ICC dbWPLI ≤ 0.85).
These findings suggest that (as might be expected) test-retest reliability in infants across a period of 1 week is higher when more data is included. M/EEG studies in adults found similar ICC values for connectivity in the good and excellent range. Whole brain connectivity based on PLI estimates from four 4-s epochs exhibited an ICC value of 0.61 for 8-10 Hz in an eyes-closed resting state paradigm assessed over a 2-year period 49 . Use of 12 4-s epochs for a whole brain PLI-based connectivity estimate showed excellent reliability with an ICC value of 0.79 for the same paradigm. The dbWPLI-based whole brain connectivity estimates were also highly reliable displaying an ICC value of 0.80 38 . In the infants, we observed similar values for 4-s epochs when calculated across at least 50 epochs for both the PLI-and dbWPLI-based measures. Thus, for infant studies more epochs are needed for reliable EEG connectivity estimates compared to adult studies. This moreover demonstrates that EEG methods typically applied in adults may not always be suitable for infant studies. Increased levels of noise in infant EEG data compared to adult EEG data are likely to play an important role in this difference.
Another possibility is that for infants a longer time of measurement is required to measure connectivity states that are stable across 1 week. Neuroimaging studies examining transient states of brain connectivity during rest and tasks suggest that the duration of brain states decreases and the number of transitions between brain states increases with development between childhood and adulthood (in EEG 68,69 , and fMRI studies [70][71][72] ). If transient connectivity states exist for longer periods in infants compared to adults, then more time would be needed to pick up on these slower states compared to faster transient connectivity states in adults. In addition, developmental changes in connectivity strengths (both functional and structural) may also play a role here 70,73,74 .  . One factor to take into account is the difference in the number of infants included in the sample. The requirement of a minimum of 60 epochs of 5-s duration significantly decreased the sample size from 60 to 19 infants in the present study. Smaller samples are less likely to detect a true large-sized effect than large samples 75 .
Another possible explanation for this discrepancy is that we used different pre-processing steps to calculate PLI-and dbWPLI-based connectivity measures. In our previous study, we derived the connectivity measures from instantaneous phase lags from a Hilbert transformation 46 , whereas we estimated phase lags from Fourier coefficients across epochs in the current study 44 . The Hilbert transform estimates instantaneous phases, but these estimates are more accurate for narrow band-pass filtered data compared to broad band-pass filtered data. Analyses across a broader frequency range would however include alpha peaks of more participants compared to analyses across a narrow frequency range. The method of Vinck et al. 44 allows for the calculation of phase lag indices from the Fourier coefficients, and can be reliably calculated across a broader range of frequencies including the alpha peaks of different individuals as in the current study. The Fourier method thus may be more appropriate in research with developmental populations or a heterogeneous sample with high variability between individuals in alpha peaks 53,58,76,77 . Finally, use of the Fourier coefficients to estimate connectivity has previously led to replicable results in young infants 21,27 . These findings do suggest that when researchers want to estimate PLI-based connectivity for 20 5-s epochs, calculations from the narrow-band Hilbert transformed data are more reliable than calculations from the Fourier coefficients in homogeneous samples.
Reliability of whole brain connectivity across high numbers of shorter epochs. Results for the reliability analyses across high numbers of shorter epochs are depicted in Fig. 3 (N = 22). Again, ICC values increased with increasing numbers of epochs from poor reliability for 30   www.nature.com/scientificreports/ 120, and 150 epochs, respectively. Excellent reliability values were reached for dbWPLI-based connectivity across 90 and 120 1-s epochs, and for PLI-based connectivity across 120 1-s epochs. Across 120 1-s epochs, the ICC for dbWPLI-based connectivity was slightly higher than the ICC for PLI-based connectivity (ICC dbWPLI = 0.82, versus ICC PLI = 0.79). These findings demonstrate that good and excellent reliable connectivity estimates can be achieved for 1-and 2-s epochs when calculated with the dbWPLI across at least 90 epochs, and with the PLI across at least 90 1-s and 150 2-s epochs. Consistent with the simulations from Vinck et al., the PLI and dbWPLI estimates show poor reliability when calculated across 30 1-or 2-s epochs 44 .
These results further suggest that reliability is higher for the 1-s compared to the 2-s epochs, and higher for the dbWPLI-than PLI-based whole brain connectivity. Two factors and their robustness to noise come to mind when explaining these findings. First, the assumption of stationarity of the signal for Fourier transform analysis may be violated for the different epoch lengths. The Fourier Transform assumes that the EEG signal can be decomposed into sines and cosines with a constant mean, variance, and covariance over time. This is more likely to hold true during shorter epochs of 1-s duration compared to epochs of 2-s duration, resulting in a more reliable estimate for shorter epochs 45,78 . Alternatively, estimates across longer epochs such as 5 s will even more likely show violations of non-stationarity. Indeed, we found lower ICC values for 20 5-s epochs than in our previous study where we derived our dbWPLI-and PLI-based estimates from Hilbert transformed data with instantaneous phase information instead of phase information from Fourier transformed data. Noise in the infant data will furthermore increase the non-stationarity of the signal, and thus amplify the effects of nonstationarity on the connectivity estimates across longer epochs.
Second, differences in reliability between the dbWPLI-and PLI-based estimates may arise from differences in robustness to noise. The dbWPLI weights the phase lag consistency such that phase differences near 0° or 180° angles contribute less to the final connectivity estimate than phase differences near 90° or 270° angles. Spurious connectivity values that may arise from noise with small phase differences are thus ignored 44 . The PLI in contrast does not apply these weights and is therefore less robust to noise artefacts. As expected for infant data with high noise levels 21,79 , the dbWPLI provides a more robust connectivity estimate than the PLI for these high numbers of shorter epochs when derived from Fourier coefficients.

Reliability of network characteristics across a constant amount of data. Comparisons of the
ICCs for different connectivity metrics across a constant amount of data are presented in Fig. 4 (N = 41). Across all segmentation and calculation methods, ICCs for whole brain connectivity were higher than ICCs for the other network characteristics (0.43 ≤ ICCs Whole brain ≤ 0.86, and 0 ≤ ICCs Graph metrics ≤ 0.59). ICCs for the normalised weighted clustering coefficient (0.23 ≤ ICCs ≤ 0.57) were higher than those for the normalised weighted path length (0 ≤ ICCs ≤ 0.44) and the small-worldness index (0 ≤ ICCs ≤ 0.40). For the dbWPLI-based metrics, the highest ICC for whole brain connectivity was found across 60 2-s epochs (ICC = 0.68), whereas ICCs for the other metrics were highest across 120 1-s epochs (ICC for C w norm = 0.59, ICC for L w norm = 0.44, and ICC for SWI = 0.40) compared to the other segmenting methods. For the PLI-based metrics, the highest ICC for whole brain connectivity was calculated across 60 2-s epochs (ICC = 0.58) compared to the other segmenting methods; for the normalised weighted clustering coefficient across 120 1-s epochs (ICC = 0.44); for the normalised weighted path length across 40 3-s epochs (ICC = 0.20); and for the small-worldness index across 20 6-s epochs (ICC = 0.25).
The current findings suggest that segmenting 2 min of EEG data into 1 or 2-s epochs provides more reliable dbWPLI-based connectivity metrics than segmenting into 3-or 6-s epochs. This was consistent with previous studies examining EEG connectivity in infants and adults 21,27,44,45 . Possibly, the debiasing and weighting methods are less robust to noise for low numbers compared to high numbers of epochs due to the normalisation or debiasing step that depends on the number of epochs 44 . Findings for the PLI-based connectivity metrics were however less consistent across segmentation methods, where the most reliable segmentation method varied with the connectivity metric of interest.
Furthermore, we found that whole brain connectivity was a more reliable metric than graph theory metrics (with the exception of the normalised clustering coefficient derived with the dbWPLI across 120 1-s epochs). Overall, the normalised weighted clustering coefficient showed more reliable estimates than the normalised weighted path length and the small worldness index. The observed pattern of reliabilities between connectivity metrics has been reproduced by several test-retest reliability studies in adults 35,36,38,42,49 . This pattern of increased reliability for first-order graph metrics compared to second-order metrics may arise from differences in variances in connectivity matrices where second-order graph theory metrics are more sensitive to variability in the connectivity matrices than first-order graph theory metrics 35 . Furthermore, it is possible that graph theory metrics cannot be reliably measured within these data segments, and more data (longer than 2 min in total) is needed to reliably measure graph metrics 42,80 .
Our previous study using the PLI across 20 5-s epochs showed a similar pattern between metrics: ICC = 0.84 for normalised clustering, ICC = 0.84 for the normalised path length, and ICC = 0.67 for the small-worldnessindex 48 . As discussed in the previous section, the difference in ICC values between the previous and current study likely arises from the estimates of instantaneous phase differences with the Hilbert transform, and phase differences across the epochs with the Fourier transform.
We are currently unable to make comparisons with our previous findings for the graph metrics based on the dbWPLI. In our previous study, we found that inter-subject variability was higher, and that 95% confidence intervals were wider for dbWPLI-based than PLI-based whole brain connectivity. As a result, dbWPLI-based network characteristics were not included in further graph theory analyses. The current findings and previous  44 suggest that the number of 20 epochs may have been too low to calculate reliable dbWPLI-based network characteristics in infants.

conclusions
The current study demonstrates that EEG connectivity can be reliably estimated in young infants. Overall, reliability of EEG network characteristics increases with increasing total amounts of data. However, optimal epoch numbers and lengths for high test-retest reliability vary with the calculation method used to estimate EEG connectivity: smaller numbers of longer epochs for PLI-based measures, and higher numbers of shorter epochs for dbWPLI-based measures. When choosing an EEG connectivity method in developmental research, several other factors need to be considered along with test-retest reliability. First, the quality of the EEG can have an impact on the reliability of EEG measures. For EEG data with lower noise levels and abundant lengths of artefact-free data, calculation of PLI-based whole brain connectivity from Hilbert transformed data across 20 5-s epochs would provide more reliable measures. For EEG data with higher noise levels and limited lengths of artefact-free data, dbWPLI-based whole brain connectivity from Fourier transformed data across more than 90 1-s or 60 2-s epochs would provide a reliable estimate of brain connectivity. The latter would be more appropriate in studies with vulnerable populations such as atypically developing young infants or individuals with neurodevelopmental disorders. Increased heterogeneity within such populations may also play a role.
Second, researchers should take into account the aspects of brain connectivity they aim to measure. Different EEG measures may be sensitive to different features of brain connectivity. Reliability estimates are influenced by both measurement error, and the stability of the process being measured over the selected timescale. Thus, one critical element to consider may be the timescale over which a particular measure of connectivity is stable. Within the present study, we examined reliability in infants tested twice with an average of a 1-week interval. Selection of this interval does lead to the possibility that there are true developmental changes in brain connectivity during the testing epoch. However, any decrease in interval may decrease the amount of artefact free data available, as infants may recognise repetition of the stimulus protocol and become less attentive (consistent with observations in the current study also). In a previous infant EEG study on event-related potentials, ICC values slightly increased when only including infants tested at intervals of 7 days or more, consistent with this possibility 34 . Of note, infant studies and longitudinal studies during early development often focus on age groups with a narrow range, commonly around 1-2 weeks. Measures that are stable over this interval are therefore necessary for data www.nature.com/scientificreports/ pooling. However, measures sensitive to more transient states of connectivity would appear unreliable in such an analysis, but this should not be taken as reflecting measurement noise. Some moment-to-moment fluctuations in connectivity may reflect shifts between cognitive states and may thus not be stable over time; researchers interested in individual differences in these states may need to derive higher level descriptions of their behaviour that do reflect persistent attributes, such as their intra-individual variability 71,72,81,82 . Researchers interested in a specific aspect of connectivity may wish to explore its reliability over several time intervals to dissociate measurement accuracy and developmental stability of different brain systems. Finally, excellent test-retest reliability should be interpreted with caution. First, according to the paradox of reliability, excellently reliable and robust measures are unsuitable for correlational research: high test-retest reliability comes with low variability between individuals 83,84 . Excellently reliable measures that are stable over time reflect static constructs that are also likely stable in these individuals. The highly reliable construct however might not be the most relevant feature for brain-behaviour correlations (e.g. in fMRI research 85 ). Thus, there is a dissociation between optimal test-retest reliability and their utility in predicting behaviour. This should especially be considered in the context of predictive biomarker research where the field is shifting from a categorical approach to a dimensional approach 83,86 . Second, high test-retest reliability values may be artificially increased by confounding factors that are stable themselves: such as head size, volume conduction, and measurement noise. It is possible that increased stable noise levels artificially increase the reliability of measures that are less robust to EEG noise (as in fMRI studies 87 ). Thus, coupling the assessment of reliability with the assessment of robustness to time-invariant covariates (noise) is critical.
One limitation of this study is that only one age group was included in the current analyses. Reliability values and conclusions may differ for EEG data collected in toddlers or children compared to the data from 10-monthold infants in the current study. In addition, it is possible that conclusions vary between EEG data collected during the social and non-social dynamic videos 51 . Finally, we did not statistically compare the ICC values, but only tested whether the ICC values were different from the null hypothesis. Although methods exist to compare correlations, comparisons for ICC values are less straightforward as ICC values also depend on other factors such as stability of the EEG measure, measurement error, number and length of epochs. Here, we aimed to characterise the different comparison levels and explore the profile of EEG connectivity metrics.
Future research could consider reliability across different age groups and dynamic stimuli. Examining the reliability and the stability of brain connectivity at different age groups will further clarify whether early individual variability in brain connectivity persists into childhood and whether this is associated with later stable traits, for example restricted and repetitive behaviours in autism spectrum disorders 21,27 .

Data availability
Data is available upon formal request from the YOUth Cohort Study, please see https ://www.uu.nl/en/resea rch/ youth -cohor t-study /data-acces s.