Physical characteristics not psychological state or trait characteristics predict motion during resting state fMRI

Head motion (HM) during fMRI acquisition can significantly affect measures of brain activity or connectivity even after correction with preprocessing methods. Moreover, any systematic relationship between HM and variables of interest can introduce systematic bias. There is a large and growing interest in identifying neural biomarkers for psychiatric disorders using resting state fMRI (rsfMRI). However, the relationship between HM and different psychiatric symptoms domains is not well understood. The aim of this investigation was to determine whether psychiatric symptoms and other characteristics of the individual predict HM during rsfMRI. A sample of n = 464 participants (174 male) from the Tulsa1000, a naturalistic longitudinal study recruiting subjects with different levels of severity in mood/anxiety/substance use disorders based on the dimensional NIMH Research Domain Criteria framework was used for this study. Based on a machine learning (ML) pipeline with nested cross-validation to avoid overfitting, the stacked model with 15 anthropometric (like body mass index, BMI) and demographic (age and sex) variables identifies BMI and weight as the most important variables and explained 10.9 percent of the HM variance (95% CI: 9.9–11.8). In comparison ML models with 105 self-report measures for state and trait psychological characteristics identified nicotine and alcohol use variables as well as impulsivity inhibitory control variables but explain only 5 percent of HM variance (95% CI: 3.5–6.4). A combined ML model using all 120 variables did not perform significantly better than the model using only 15 physical variables (combined model 95% confidence interval: 10.2–12.4). Taken together, after considering physical variables, state or trait psychological characteristics do not provide additional power to predict motion during rsfMRI.

. Characteristics of 464 subjects recruited for this study. Table S2. A list of 120 psychological (self-reports, non-drug related and drug related) and physical (age and body composition) variables with both univariate correlation (Pearson, r and P-value FDR corrected) with motion (log transformed) and variable importance (VI) in the stacked model. Table S3. Pearson correlation coefficients between variables used in mediation analysis. Table S4. Path coefficients of mediation analyses. Figure S1. Scatterplots between two measures of motion, ENORM and FD. Figure S2. (a) the process of repeated nested CV; (b) the inner-loop of repeated nested CV. Figure S3. Correlation matrix between 120 variables. Figure S4. Percent of variance explained for motion with all variables using different methods. Figure S5. Percent of variance explained for motion with different sets of variables using combined (stacked) method. Figure S6. Partial dependence plots for the variables with the highest importance among anthropometric (a, BMI) and self-report (b, NicDepen) variables. Figure S7. Path diagram of the final mediation model.

Supplementary methods a. Machine learning methods
Selection of algorithms. In this work, we implemented 6 algorithms: elastic net (ENET), principal component regression (PCR), partial least square (PLS), support vector regression (SVR), random forest (RF), and conditional inference forest (CF). The first three methods assume linear relationships between each predictor/feature and the response variable; ENET performs feature selection while PCR and PLS do not. PCR and PLS both assign weights on all features via orthogonal components, but the weights are determined in an unsupervised manner in PCR but supervised in PLS, thus the former tends to have larger bias but lower in prediction than the latter (James et al., 2013). The other three methods relax the linearity assumption between features and response variable by different means (various kernel functions in SVR, and recursive binary splits in RF and CF). Although CF (Hothorn, Hornik, & Zeileis, 2006) corrects the bias of RF in favor of variables with more distinct values (Strobl, Boulesteix, Zeileis, & Hothorn, 2007) in terms of variable importance, it's not clear how different the two methods are in prediction accuracy. For this reason, we included both forest methods.
Repeated nested cross-validation. It is well known that using the same dataset to build a prediction model and evaluate its performance can cause overestimation in the model performance (Varma and Simon, 2006). If we have two datasets, one of them can be used to train a prediction model, and the other to evaluate the model performance. Because we had only one dataset, one possibility was to split the data into a training set for model building and a validation set for model evaluation. Alternatively, the whole dataset could be partitioned to " parts, where one part served as a validation set and the other parts were combined to serve as a training set, and the process iterated so that each part served as a validation set exactly once (Outer loop). For a given machine learning method, a training set could be partitioned into # parts and the optimal tuning parameters could be determined by # -fold cross validation (Inner loop). The trained model was then applied to make predictions in the corresponding validation set. Iterating across the " parts, the " sets of predicted values were combined to compare with the observed values to evaluate model performance. The whole procedure is known as nested or double cross-validation (Stone, 1974;Varma and Simon, 2006), and was repeated by different partition indices to improve the stability of the results (Filzmoser et al., 2009). We chose to repeat 20 times of nested cross-validation (CV), with 10fold CV for both the outer and the inner loops.

Stacked regression/Super learner (the inner loop)
In the inner loop, we applied 6 machine learning methods and then combined their predictions by stack ensemble. Suppose a training set was expressed as a matrix, concatenated in columns by a response vector and a predictor matrix , with × ( " − 1)/ " observations.
Step u Building base learners. Each base learner was built by # -fold CV, which consisted of data partition and parameter optimization.
Data partition: A training set was partitioned into # non-overlapped parts where ( # − 1) parts were used to train a model with hyper-parameter 0 , which was then used to predict the held-out part and to compute the corresponding model performance metric ( # here); this process was repeated so that this step would give # values of # . Optimization: For a grid of hyper-parameter combinations, denoted as 0 , = 1, … , , the same partition indices of # -fold CV were applied to each 0 . Note 0 was either a vector (two parameters for eNet) or a scalar (other methods). The optimal hyperparameter 789 were determined by the "one-standard-error" (1-SE) rule: among the 0 's whose mean # (averaged across " test sets) fell within one SE of the maximal mean # , the 0 that corresponded to the most parsimonious model was 789 . Note that SE of the metric is defined as = < # Should 789 be identified, the predicted values of the held-out sets at 789 , denoted as => 789 ?, were extracted for stack ensemble.
Step v Building a stack ensemble model. The predicted values of the optimal hyperparameter 789 from base learners (Step u) were then combined with the observed response values into a matrix A with ( + 1) columns. The observed response could be regressed on the variables to obtain a stack ensemble model M E . We chose to average predicted values of base learners weighted by their mean # values from Step u.
Hyper-parameter selection. Elastic net had two tuning parameters and . Both PCR and PLS had one parameter ("ncomp" for number of components). Both RF and CIF were tuned on a single parameter "mtry" for the number of predictors randomly selected to grow a tree; the number of trees was set at 500 and not tuned. Although in theory SVR may have three parameters: cost, , and , the scale parameter ("sigma") for radial basis function was estimated by the midpoint of the 10 th and 90 th percentiles of Euclidean distance between all training points (Caupto et al., 2002); also, the cost and parameters have some relationship and Kuhn suggests using cost ("C") as the only tuning parameter. To alleviate computation burden, we used random search (Bergstra and Bengio, 2012) with adaptive resampling algorithm (Kuhn, 2014) for no more than 15 parameter combinations, so the actually used parameter combinations and values varied with replications and training sets. Below, we present parameter values of a training set of a replicate as an example (the asterisk signs indicate the optimal parameters):

b. Mediation Analysis
We conducted path analysis (PA) modeling to explore potential mediation effects of respiratory measures (amplitude and rate) on the relations between motion and two most important predictors (Weight and Nicotine dependence). Respiration rate and amplitude were measured using a GE MR Bellows placed around the abdomen. Respiration rate was calculated by finding the number of peaks and dividing by total scan time. Respiration amplitude was calculated as the mean peak to trough difference, however, as this measure is related to the belt length rather than air flow, amplitude is in arbitrary units and is not equivalent to volume. Because weight and nicotine dependence were measured before scanning (pre-scan variables) while respiratory measures and motion were collected during scanning, it followed logic to use weight and nicotine dependence as independent variables, motion (log-transformed) as the dependent variable, and respiratory measures as potential mediators. These variables were first standardized to zero means and unit standard deviation before entering PA models. Due to 7 missing values in both respiratory amplitude and respiratory rate, we applied the fullinformation maximum likelihood estimation and robust (Huber-White) standard error for estimation. We began with a saturated model and removed non-significant paths one at a time. The final model was chosen to give a minimal value in sample-size adjusted-BIC statistic. This analysis was conducted using R lavaan package, version 0.5-23.1097 (Rosseel, 2012).
The final model contained 4 paths: from weight, nicotine dependence, and respiratory rate to motion, and from nicotine dependence to respiratory amplitude, and 1 covariance between respiratory amplitude and rate. The goodness-of-fit statistics included "K # = 4.85 ( = 0.90), CFI = 1.00, TLI = 1.04, RMSEA 0.00 (90% CI (0, 0.02)), SRMR 0.025. The path coefficients from weight to motion (0.293) and from nicotine dependence to motion (0.174) were only slightly different from path coefficients estimated in a model without respiratory variables (0.292 and 0.183, respectively). The small changes (0.3% and 4.9%, respectively) suggested negligible mediation effects of respiration. Table S1: Characteristics of 464 subjects recruited for this study. BMI: Body Mass Index, PHQ-9: Patient Health Questionnaire, OASIS: Overall Anxiety Severity and Impairment Scale, DAST: Drug Abuse Screening Test. Table S2. 120 psychological (self-reports, non-drug related and drug related) and physical (age and body composition) variables with both univariate correlation (Pearson, r and P-value FDR corrected) with motion (log transformed) and variable importance (VI) in the stacked model. Right three columns represent the correlation and VI after regressing out the effect of physical variables (residualized (Res) r, p and VI) ). None of the state and trait mood/anxiety/trauma related measures showed any significant correlation to the level of motion inside the scanner. Nicotine dependence is the only variable that survives FDR correction in the univariate correlation analysis with the motion after regressing out the effect of physical variables. Order of variables within each group (physical, drug related and non-drug related) are in alphabetic order.  Figure S1. Scatterplots between two measures of motion, Euclidean Norm (ENORM, average Euclidean norm of six motion parameters) and Framewise Displacement (FD, sum of the absolute values of the six motion parameters) without and with natural log transformation (r=0.993 and 0.995 consecutively) (n=464). Figure S2(a). The process of repeated nested CV. This figure illustrates the process of nested CV repeated 20 times indexed by . In the outer loop, the original data of subjects were partitioned into " parts indexed by . For each iterate, ( " − 1) parts served as a training set to optimize tuning parameters in the inner loop (see Supplementary Figure 1b.). The stack model M V,W E from the inner loop was then used to predict the held-out set of the outer loop. Predicted values =( ) were combined across held-out sets to evaluate the performance of the stack models. Figure S2(b). The inner-loop of repeated nested CV. In the inner loop, each training set from the outer loop was further partitioned into # parts indexed by . A model with a tuning parameter combination 0 was trained on ( # − 1) parts, and then this model was applied to predict the held-out set. The predicted values were then compared to the observed values by R-square. Iterating across # folds led to # values of R-square. The process was repeated for all parameter combinations, and the optimal tuning parameter combination ( 789 ) was determined by the "one-SE" rule. This optimal parameter combination was applied to the whole training set and gave the predicted values => 789 ?. This process was then repeated for all machine learning methods. The predicted values across machine learning methods were then combined with observed values to build a stack model M E .  Figure S5. Percent of variance explained for motion with different sets of variables using combined (stacked) method. Bdy: Anthropometrics, Dmo: Demographics, SR: Self Reports, Rsd: Residualized after regressing out the Bdy and Dmo variables, Drg: Drug (including alcohol and nicotine) related self reports, NDSR: Non-drug related self report, and Nic: Nicotine-related self reports. Error bars represent 95% confidence intervals. Figure S6. Partial dependence plots for the variables with the highest importance among anthropometric (a, BMI) and self-report (b, NicDepen) variables. Individual models vary, but agree in direction where coefficients are non-zero. The COMB model appears as a weighted average of the others, and is somewhat between the most extreme individual models. ENET: Elastic net, PCR: Principle component regression, PLS: Partial least square, RF: Random forest, CF: Conditional forest, SVM: Support vector machine, COMB: Combined (stacked) method. Figure S7. Path diagram of the final mediation model. The values indicate path coefficients (straight arrows) and covariance (arc) for standardized variables. RRate: respiratory rate, RAmp: respiratory amplitude, and NicDep: PROMIS Nicotine Dependency score.