No evidence for a relationship between social closeness and similarity in resting-state functional brain connectivity in schoolchildren

Previous research suggests that the proximity of individuals in a social network predicts how similarly their brains respond to naturalistic stimuli. However, the relationship between social connectedness and brain connectivity in the absence of external stimuli has not been examined. To investigate whether neural homophily between friends exists at rest we collected resting-state functional magnetic resonance imaging (fMRI) data from 68 school-aged girls, along with social network information from all pupils in their year groups (total 5,066 social dyads). Participants were asked to rate the amount of time they voluntarily spent with each person in their year group, and directed social network matrices and community structure were then determined from these data. No statistically significant relationships between social distance, community homogeneity and similarity of global-level resting-state connectivity were observed. Nor were we able to predict social distance using a regularised regression technique (i.e. elastic net regression based on the local-level similarities in resting-state whole-brain connectivity between participants). Although neural homophily between friends exists when viewing naturalistic stimuli, this finding did not extend to functional connectivity at rest in our population. Instead, resting-state connectivity may be less susceptible to the influences of a person's social environment.

by the number of total simulations. All p values were > .05, indicating that the fMRI cohort data were appropriately matched to the total cohorts for all demographic characteristics.

Study power
The fMRI sample affords a total of 767 dyads. We conducted mixed-effects modelling on the dyad data to examine potential neural homophily after accounting for the data dependency. Currently, there is no software available to perform a statistical power analysis with the mixed-effects model we specified to test our hypothesis with the dyadic data (see section on Dyadic similarities in functional connectivity as a function of social proximity for linear mixed effects model specification); as such, we created an in-house simulation code to perform a sensitivity power analysis to examine what effect size could have been detected in the current data. The simulation computed the statistical power to detect a certain effect size (i.e. similarity) given the social distance information, the structure of dyads (i.e. how the 68 students were paired), and the magnitudes of random effects. Social distance information and structure of dyads were directly taken from the data.
Magnitudes of random effects were taken from the estimates in the main mixed-effects modelling analysis we conducted. We ran this simulation (replication = 1000) repeatedly, systematically changing the effect size to estimate a power curve between the true effect size and statistical power. The simulation results showed that our sample size is sufficient to detect very small effects of social distance on similarity. Specifically, the study was powered to detect a change in z standardised similarity score of at least .073 with one unit change in social distance at a power of 80%, and a change in z-standardized similarity score of .094 at a power of 95%. As previous neuroimaging work identified a difference of -.2 to -.23 in z-standardized similarity scores between social distances 1 to 2 and 2 to 3, respectively 1 , we consider our sample size to be sufficiently powered to detect meaningful changes in similarity score between our social distance units.
It is worth noting that this relatively high statistical power was due to the fact that the data from these 767 dyads did not exhibit much dependency. This is not surprising. Unlike typical nested data where observations are nested within participants, in this type of data of social similarity, correlation values are posited to be nested within pairs of participants (see the model specification in the LME Model Specification section; see also Chen et al. 2 ). However, there is little obvious conceptual reason to believe that correlation values between a certain participant and other participants are more similar to each other than the correlation values of different participants. In other words, in this type of dyad similarity analysis, statistical power tends to be influenced more by the number of dyads than the number of participants.

Social network characterisation
Social network data were processed as follows: Roster-and-rating data (5-point scale) were binarised using a threshold of 4 (i.e. only instances in which students spent "more than some" or "most" of the time with another student were included). This threshold was selected to mitigate central tendency bias often reported with Likert-type questionnaires 3 . Any non-mutual connections were then removed (i.e. if subj i gave subj j a rating of 4 or greater but subj j gave subj i a rating below 4, the connection would be lost). This yielded an unweighted (binary), undirected (reciprocal) adjacency matrix for each cohort, from which social networks graphs were derived.

Alternative Social Distance Metric
To ensure that discarding lower ratings of social connectedness did not drive the effects we observed in our analysis, we repeated the analysis using the magnitude of the direct tie between participants to indicate social proximity. Participants' ratings corresponded to how much time they voluntarily spent with every member of their year group. As subject i rated subject j but subject j also rated subject i , each dyad had two ratings (corresponding to incoming and outgoing connections for each dyad member). To get the social proximity rating for a dyad, the two ratings were added together. The maximum rating a dyad could achieve was 10, corresponding to both students saying they voluntarily spent "most" of their time with the other dyad member. The minimum rating was 2, corresponding to both members saying they voluntarily spent no time with the other member of the dyad. Only data from scanned participants was used in this analysis. Second degree connections via other students in the year group were not considered.

Social Network Metrics
Each student cohort was described in terms of its network characteristics, in particular, its network diameter, modularity, mean path length, reciprocity and density. Network diameter is the length of the longest geodesic distance between two nodes in the network, i.e. the number of edges between subj i and subj j when these individuals are the farthest from each other in the network.
Modularity ( ) is a measure of how easily a network segregates into smaller subnetworks and is defined by the equation: where is the number of edges in the network, is the element of the adjacency matrix in row and column , is the degree (number of edges associated with a node) of , is the community to Mean path length is the mean shortest path length (number of edges separating a pair of nodes) between all nodes in the network. Reciprocity defines the proportion of connections in a directed graph that are mutual connections. It is otherwise defined as the probability that the counterpart (j to i) of a directed edge (i to j) is included in the graph. Graph density is the ratio of actual connections (edges) to possible connections in the graph; larger values denote more densely connected networks.

Functional MRI Data Acquisition and Analysis
All participants were imaged using a 32-channel head coil. A structural T1-weighted image was acquired using an MPRAGE sequence 5  During resting-state data acquisition, participants were asked to lie still with eyes open and look at a blank screen in front of the scanner. Instructions were given to relax and think of nothing in particular.
FMRI data processing was carried out using FEAT (FMRI Expert Analysis Tool, version 6.0 6 ) part of the FMRIB Software Library (FSL; Oxford, United Kingdom) 7,8 . Registration of functional images to high resolution structural and Montreal Neurological Institute (MNI-152) standard space images was carried out using FLIRT 9,10 . Registration from high resolution structural to standard space was then further refined using FNIRT nonlinear registration 11,12 .
The following pre-statistics processing was applied: motion correction using MCFLIRT 10 ; non-brain removal using BET 13 ; multiplicative mean intensity normalization of the volume at each time point and high pass temporal filtering (Gaussian-weighted least-squares straight line fitting, with sigma=50.0s). Independent components analysis (ICA)-based exploratory data analysis was carried out using MELODIC 14 , in order to investigate the possible presence of unexpected artefacts. FIX -FMRIB's ICA-based Xnoiseifier 15,16 was used to auto-classify ICA components into "good" and "bad" components following hand-classification and training using a sample of 10 subjects' data (4 from cohort 1-fMRI and 3 each from cohorts 2-fMRI and 3-fMRI, all randomly selected). "Bad" components were removed from the data and clean data were registered to standard space using warping parameters determined by FEAT. These data were then used for further analysis.

Resting-state network analysis
Default mode network (DMN) and frontoparietal networks (FPNs) were extracted from 17 . Z-statistic images of independent components analysis (ICA) maps were then binarised using a threshold of z=4 and warped to 1 mm MNI-152 standard space. Our threshold of z=4 is larger than that reported by 17 , who adopted a threshold of z=3 for presentation of resting-state networks. Our larger z threshold was chosen to increase the probability that brain nodes overlapping DMN and FPN spatial maps were truly part of these networks. The salience network was derived using the online meta-analysis tool Neurosynth.org 18 (accessed January 2019; seed positioned in the anterior insular [x=36, y=18, z=4]). The salience network was binarised using a threshold of 0.3, based on results from previous research 19 . The overlap between each resting-state network and the whole-brain parcellation was determined by applying a binarised mask of the resting-state network over whole-brain parcellation data. Nodes with at least 50 voxels overlapping the resting-state network were used in the analysis.
A graphical depiction of the analysis is provided Fig. S8. Functional resting-state network architecture develops throughout the lifespan and can differ between adolescents and adults 20 . As we used resting-state network maps derived from mostly adult data, we wanted to ensure that these network maps were appropriate for use in our sample. We first ran independent components analysis (ICA) on resting-state fMRI data from our three student cohorts using MELODIC

LME model specification
The LME model used in the current analysis is specified as: = 00 + 01 * + + + is the similarity in brain function between students and , where 00 is the intercept, and 01 represents the relationship between social distance and brain similarity; and are crossed random effects (i.e. student-specific effects) of students and , respectively, and are residuals. We posited a Gaussian distribution for the random effects and residuals. No distribution assumption was made for social distance. Where social network data were described using community affiliation, in the above equation was replaced with (i.e. whether or not two students belong to the same friendship community, as determined by the Louvain method).
We also tested a model with random slopes (i.e. random slopes of the subjects were added to the model above). This model failed to converge in most of the analyses. The omission of random slopes could make the statistical test less conservative 21,22 . However, as our results all showed nonsignificant effects in the model without random slopes, we decided not to pursue the model with random slopes any further.

Brain network characterisation
For graph theory analysis of functional connectivity, modularity was defined as: where + is modularity from positively weighted connections, − is modularity from negatively weighted connections, + is the sum of all positive connection weights of , − is the sum of all negative connection weights of , + is the present within-module positive connection weights, − is the present within-module negative connection weights, + is the chance expected withinmodule positive connection weights, − is the chance expected within-module negative connection weights and ( , ) is 1 if and are in the same module and 0 otherwise.
The absolute difference in brain modularity within a dyad pair (i.e. between subj i and subj j ) was calculated for every pair of dyads in a social network. Differences were then standardised within each cohort to have a mean of 0 and standard deviation of 1.
Nodal strength (a node-level measure of centrality, the importance of a node in its network) and diversity (a node-level measure of integration that takes into account the strength of a node within its own module) were also calculated using asymmetric values for positively and negatively weighted connections, as previously described by Rubinov & Sporns 23 . Strength was defined as: where ′+ is the normalised sum of positive connection weights associated with node , and + and − are the raw sums of positive and negative connections weights, respectively, associated with node .
Diversity was defined as: where ± ( ) = ± ( ) ± , ± ( ) is the strength of node within module (the total weight of connections of to all nodes in ) and is the number of modules in modularity partition .
Nodal strength and diversity data from each participant were compared with every other participant from the fMRI cohort using a Pearson's correlation. Prior to further analysis, correlation strengths were standardised within each cohort to have a mean of 0 and a standard deviation of 1.

Data-driven predictive model of social proximity from neural similarity
To examine whether brain functional connectivity encodes social distance, we employed regularised regression techniques to predict social distance between two students based on similarities in their functional brain connectivity of all pairs of nodes. Specifically, we computed the absolute difference in connection strength for each edge in the 272 node brain network (i.e. each node-to-node connection) for every student dyad in the fMRI cohort. This yielded * (  24 . Combining the two approaches, elastic-net regression allows for adjustment of the lasso-to-ridge ratio (α), providing greater opportunity for better model fits 25 .
Elastic-net regressions of connectivity and social distance data were conducted in R using the glmnet package 26 . Three regression models were trained using data from two fMRI cohorts each (see Table   S1). One fMRI cohort's data were withheld during training so that the performance of the regression model could be evaluated using a previously unseen set of data. The best-performing regression model for each training set was determined by optimising the tuning parameters λ and α (see Table S1). λ is a tuning parameter for the shrinkage penalty used to adjust the regression coefficients in the elastic-net regression. When λ = 0, the penalty term has no effect but as λ tends toward infinity, the shrinkage penalty grows, and the regression coefficient estimates approach zero. The optimal value of λ for each regression model was determined using a 10-fold nested cross-validation within the training data. The largest value of λ such that the cross validation error was within one standard error of the minimum was selected and the model was re-fit using all available observations.
The tuning parameter α dictates the ratio of ridge to lasso in the elastic-net regression. α values After obtaining the predicted social distance scores from the elastic-net regression models, we evaluated the accuracy of predictive models by examining the relationship between the predicted social distance and the observed (actual) social distance. As before, there is dependency in the data structure, owing to the involvement of each student in multiple dyads, potentially inflating the test statistics assessing statistical significance of the accuracy scores. To account for the dyadic nature of the data, we again used an LME model to obtain a p value using the observed social distance as the dependent variable and the predicted social distance as the independent variable, with and included as crossed random effects.
Finally, to evaluate the overall predictability of social distance using similarity in resting-state connectivity, we conducted a meta-analysis of LME models from the three elastic-net regressions (models 1, 2 and 3). A positive beta weight with 95% confidence interval excluding zero would indicate that prediction of social distance from neural similarity is feasible.

Supplementary Results: Prediction of social distance based on node-to-node neural similarities
Using elastic-net regression, we sought to determine whether similarities in node-to-node connectivity within the brain could predict social distance between students. Data were split into training and testing sets such that each cohort was used to train two models and test the predictive validity of a third model. Assignments of training and testing sets for the three models are provided in Table S1. Optimal parameters (i.e. α and λ values) for each model were determined using training data; these parameters were then used to predict social distance in the testing set (to which the model was naïve).
RMSE quantifies how much a set of predicted values differ from their observed counterparts by measuring the standard deviations of the prediction errors. Lower values indicate smaller errors.
RMSEs were 0.60, 0.65 and 0.82 for models 1, 2 and 3, respectively. Correlations for the observed vs predicted distance with the p-values of the beta weights obtained from the LMEs between each pair of participants are presented in Fig. S13a. Meta-analysis of LME models showed poor predictive power of models to classify social distance of dyads based on whole-brain functional connectivity (Fig. S13b). Our meta-analysis results suggest that our predictive models will not extrapolate well to predict social distance in previously unseen social networks.   Figure S1. Meta-analysis of brain similarity as a function of social distance (a) and community affiliation (b) for nominated friendships Figure S2. Meta-analysis of graph metric similarity as a function of social distance (a) and community affiliation (b) for nominated friendships Figure S3. Meta-analysis of brain similarity as a function of social distance (a) and community affiliation (b) for roster-and-rating method, threshold 5 (I spend "most of my time" with this person) Figure S4. Meta-analysis of graph metric similarity as a function of social distance (a) and community affiliation (b) for roster-and-rating method, threshold 5 (I spend "most of my time" with this person). Figure S5. Meta-analysis of brain similarity as a function of social distance (a) and community affiliation (b) for directed (non-mutual) friendship networks. Figure S6. Meta-analysis of graph metrics as a function of social distance (a) and community affiliation (b) for directed (non-mutual) friendship networks. Figure S7. Meta-analysis of brain similarity as a function of social proximity determined by the magnitude of direct friendship ratings between dyad members. Figure S8. Processing pipeline for determining similarity of functional connectivity between participants' resting-state networks. Figure S9. Pearson's correlation matrices for node-to-node brain correlations within individual participants in cohort 1-fMRI. Figure S10. Pearson's correlation matrices for node-to-node brain correlations within individual participants in cohort 2-fMRI. Figure S11. Pearson's correlation matrices for node-to-node brain correlations within individual participants in cohort 3-fMRI. Figure S12. Processing pipeline for graph metric analysis of resting-state data. Community detection was performed on each participant's brain data to determine modules (communities) from which further graph metrics could be derived. Brain modularity, nodal strength and nodal diversity were determined for each participant and compared between all participants in the MRI cohorts. Figure S13. Performance of regression models predicting social distance from similarity in wholebrain functional connectivity. a) Predicted vs. observed social distance in testing data sets for models 1, 2 and 3. Blue dotted line represents the line of perfect accuracy (predicted = observed). Pearson's correlations were negative for models 1 and 2, indicating that the models performed worse than chance. P values are from LME with crossed random effects, taking into account clustering of dyad members. b) Meta-analysis of LME models for regression model of predictive performance.