Long-term leisure-time physical activity and other health habits as predictors of objectively monitored late-life physical activity – A 40-year twin study

Moderate-to-vigorous physical activity (MVPA) in old age is an important indicator of good health and functional capacity enabling independent living. In our prospective twin cohort study with 616 individuals we investigated whether long-term physical activity assessed three times, in 1975, 1982 and 1990 (mean age 48 years in 1990), and other self-reported health habits predict objectively measured MVPA measured with a hip-worn triaxial accelerometer (at least 10 hours per day for at least 4 days) 25 years later (mean age of 73 years). Low leisure-time physical activity at younger age, higher relative weight, smoking, low socioeconomic status, and health problems predicted low MVPA in old age in individual-based analyses (altogether explaining 20.3% of the variation in MVPA). However, quantitative trait modeling indicated that shared genetic factors explained 82% of the correlation between baseline and follow-up physical activity. Pairwise analyses within monozygotic twin pairs showed that only baseline smoking was a statistically significant predictor of later-life MVPA. The results imply that younger-age physical activity is associated with later-life MVPA, but shared genetic factors underlies this association. Of the other predictors mid-life smoking predicted less physical activity at older age independent of genetic factors.

. Daily step count by 1990 baseline covariates Table S2. Multivariate models for LT-mMET and the other baseline predictors of moderate-to-vigorous physical activity and daily step count Table S3. Daily step count in twin pairs discordant for different baseline characteristics Table S4. Intraclass correlations and their 95% confidence intervals for monozygotic and dizygotic twins Table S5. Cross-twin cross-trait correlation matrix with standard deviations on diagonal for monozygotic twins Table S6. Cross-twin cross-trait correlation matrix with standard deviations on diagonal for dizygotic twins Table S7. Model fit and model comparison statistics Table S8. Standardized estimates of genetic (g 2 ) and environmental (e 2 ) components, genetic correlations (rG) and regression coefficients (β) for midlife MET factor as risk factor of the outcome variables in best fitting models Figure S1. Bivariate genetic model for baseline MET factor and follow-up physical activity variable This supplementary material has been provided by the authors to give readers additional information about their work.

Baseline predictor variables
LT-mMET = Leisure-time mean MET value (in MET-hours per day) to estimate the mean volume of physical activity during the three baseline survey years (from participants with complete data on physical activity in1975, 1981 and 1990) METf = MET factor indicating leisure time MET during the baseline years from participants having leisure time physical activity data from at least one of the baseline questionnaires in 1975, 1981 and 1990 BMI = Body mass index

Variable transformations
For quantitative trait analysis the variables were transformed as follows. The MET variables (1975, 1981, 1990 were transformed by taking their cubic roots prior to modelling their variability as a factor. Daily step count (Steps) and mean daily time of standing (Standing) were rescaled by dividing the observed values by 1 000, mean daily time of light physical activity (LPA) by 5 000 and mean daily time of sedentary behavior (lying and sitting, SB) by 10 000. The logarithm-transformation was used for most intensive 10 minute period value during the monitoring week (Peak-10min MET) and the square root-transformation was used for mean daily time of moderate-to-vigorous physical activity (MVPA).
Quantitative trait models for the MET factor (METf) included only continuous variables and the analysis was conducted based on the maximum likelihood (ML) estimator. As smoking is a categorical variable, models including this variable were based on the weighted least squares estimator (WLS). Conventional model fit statistics were available for the ML estimator, but as the WLS is not based on maximization of the likelihood, likelihood-based indices are not available (including e.g. the information criteria). Standard errors and confidence intervals in all quantitative trait models were based on 10 000 bootstrap draws.

Intraclass correlations
Introduction to quantitative trait genetic analyses is provided elsewhere (see e.g. Neale, M.C. and Cardon, L.R. Methodology for Genetic Studies of Twins and Families. Kluwer: Dordrecht, The Netherlands [1992], and Lynch, M. and Walsh, B. Genetic Analysis of Quantitative Traits. Sinauer Associates: Sunderland, MA [1998].) and in the following we focus on the particulars concerning the present models. Table S4 shows the intraclass correlations (ICC) for monozygotic and dizygotic twins for the pooled data and for the sex groups. The ratio of the correlations can be used to assess the model variance component combinations to model in quantitative trait models (see e.g. Sham, P. Statistics in Human Genetics. Wiley: London, GB [1998]). Briefly, when the correlation ratio is equal to two exactly, then only the additive genetic variance (A) and unique environmental variance (E) can be modelled. In this case the common environmental (C) and dominance effects (D) are exactly zero, and may lead to convergence problems in estimation, if these components are modelled. When the ratio falls below two, the common environmental effect becomes non-zero; when the ratio exceeds two, the dominance genetic effect becomes non-zero. Generally, the ICC's were similar between the genders.

Bivariate models
Cross-twin cross-trait correlations are shown in Table S5 for monozygotic and Table S6 for dizygotic twins. In univariate investigations we found no significant differences between the twin pairs. Our univariate model investigations indicated that neither C-nor D-component had significant contribution on the phenotypes. We, thus, decided to model only the A and E components and their correlations in the bivariate quantitative trait models.
The quantitative trait model can be used to test, if one variable is the direct risk factor for an outcome or if the risk attributed to the outcome is mediated via genes or environmental factors. The conceptual model to test the mediating mechanism is shown in Fig. S1. As a baseline model for the test we modelled the mediation model by estimating estimates for parameters a12 and e12, while constraining β to zero. Sub-models of the mediation model include only either the additive genetic or the unique environmental parameters. The direct risk factor model is specified by estimating the regression parameter β from the model in Fig. S1, while constraining a12 and e12 to zero and comparing model fit to the mediation model. It is also possible that there is no statistically meaningful relationship between the variables after accounting the variability by the variance components. This can be tested by setting both the dashed and dotted effects in supplement Figure 1 to zero and comparing model fit to the mediation model. In all models duplicates of twin effects including genetic effects, environmental effects, factor loadings (λ2, λ3) residual variations (r1, ..., r3), the path coefficient (β) and means were constrained equal. The model in supplement Figure 1 is usually modelled via the Cholesky decomposition model (see e.g. Neale, M.C. and Cardon, L.R. Methodology for Genetic Studies of Twins and Families. Kluwer: Dordrecht, The Netherlands [1992]). However, we used the equivalent correlated factors model to obtain direct estimates of genetic and environmental correlations (see Loehlin, J.C. The Cholesky approach: A cautionary note. Behav. Genet. 26, 65-69 [1996].). Table S7 shows model fit indices for models examining the nature of association between the MET factor and various physical activity variables. Within nested models the choice of the best fitting model was based on the sequential likelihood ratio test (LRT). If more than one candidate model remained after the LRT, the selection was based on the information criteria and residual-based criteria and parsimony so that the simplest model was chosen as the best fitting model. For step count, the genetic mediation and direct risk factor models had nonsignificant worsening in model fit. However, both the Akaike information criterion (AIC) and Bayesian information criterion (BIC) seemed to favour the direct risk factor model. All LRT test were non-significant for LPA and hence we preferred the model with fewest parameters, i.e. no association between the factor and LPA. For Peak-10min MET only the LRT for the genetic mediation was non-significant, indicating that the risk between these two variables seems to have largely similar genetic origin. Similar result was observed for MVPA based on the LRT and the information criteria. Based on AIC, selection of the other non-significant model, the direct risk factor model (DRF), would be 4.19×10 -13 times as probable as the genetic factor model to minimize the information loss indicating a significantly worse model fit. So, we prefer genetic mediation model.

Bivariate model fit and parameter estimates
For standing the likelihood ratio test indicates that none of the candidate models fit has a significantly worse fit to data, and based on parsimony we conclude that there is no significant association between the variables. For sedentary behaviour only the no-association model has a significantly worse fit to the data. However, both the mediation models and the direct risk factor models have very close estimates for the information criteria. Based on AIC and BIC the direct risk factor has the lowest observed values. However, either the mediation model for rG or rE are 0.61 times as likely to minimize information loss, which is not significantly worse explanatory power. Hence, there is no clear evidence to favour either the mediation or direct risk factor models.
Parameter estimates from the models of supplement Table 7 are shown in Table S8. Approximately half of the variation in standing and LPA were explained by genetic factors and the remaining half by environmental factors, although there was no significant relationship to the genetic or environmental components of the MET factor. The MET factor was a direct risk factor of the sedentary behaviour and step count outcomes with standardized regression coefficients of -0.16 and 0.29, respectively. The MET factor had statistically significant genetic association with MVPA and Peak-10min MET with a genetic correlation of approximately 0.59.
Cross-trait correlation between baseline MET factor and follow-up physical activity variables was decomposed into genetic and residual parts based on the model where we estimated both the genetic and environmental correlations. For MVPA the estimated cross-trait correlation was 0.35 (95% CI: 0.25, 0.43) with approximate contribution from genetic factors: 82 (53, 100) %. For Peak-10min MET the estimated correlation was 0.34 (0.25, 0.43) with approximate contribution from genetic factors: 98 (68, 100) %.          rG: genetic correlation, rE: environmental correlation, DRF: direct risk factor, LRT: likelihood ratio test, LL: log-likelihood, RMSEA: root mean square error of approximation, SRMR: standardized root mean square residual, CFI: comparative fit index, TLI: Tucker-Lewis index, AIC: Akaike information criterion, BIC: Bayesian information criterion. Table S8. Standardized estimates of genetic (g 2 ) and environmental (e 2 ) components, genetic correlations (r G ) and regression coefficients (β) for baseline MET factor as risk factor of the outcome variables in best fitting models MET  Note. Unstandardized risk coefficient and its 95% confidence interval in the direct risk factor model for SB: -0.37 (-0.60, -0.14) and Steps: 3.35 (2.23, 4.54). Confidence intervals based on 10 000 bootstrap draws. g 2 indicates the broad sense heritability estimate.

Figure S1. Bivariate genetic model for baseline MET factor (METf)) and followup physical activity variable (PA)
By modelling the effects shown with dashed lines, the parameters of the gene-environment mediation model can be estimated. The direct risk factor effect is shown by the dotted line.