Introduction

Many large-scale, collaborative genome-wide association studies (GWAS) have successfully identified genetic determinants described to explain part of the pathophysiological mechanism underlying a wide range of traits. Despite these efforts and increased sample sizes, the explained variability of many traits is relatively small and only a small proportion of the familial heritability can be explained by the candidate variants found [1,2,3].

In addition to genetics, environmental factors and gene–environment interactions may contribute to this unexplained trait heritability [3, 4]. Recently, genome-wide gene–environment interaction studies have been conducted to further explore the potential mechanisms underlying an array of diseases or disease traits of interest [5,6,7,8,9]. Thus far, these collaborative efforts have largely focused on a single environmental factor, such as smoking [7, 10,11,12,13], physical activity [8, 14], alcohol intake [15], educational attainment [6], and others [5, 16]. By accounting for the environmental risk factor, these efforts have identified novel loci beyond those identified by the traditional main effects-only GWAS. However, multiple environmental factors may simultaneously modify the genetics effects of loci [17]. In addition, single lifestyle variables may not capture the spectrum of relevant environmental variation, resulting in biased effect estimation and false-negative results due to reduced statistical power.

Lifestyle factors, such as smoking, physical inactivity and alcohol consumption, all contribute independently to the risk of developing cardiovascular diseases, and composite lifestyle risk scores (LRS) have been used previously to assess the combined effect of multiple lifestyle factors on cardiovascular disease development [18,19,20]. However, when applying LRS methodology to large collaborative consortium settings, challenges arise as not all lifestyle components in the LRS are available in all participating cohorts and/or may not be measured using the same instrument. If ignored, significant measurement error and potential heterogeneity may be introduced with reduced statistical power and potential bias. In the present study, we explore different approaches for incorporating cohort-wide missingness of individual lifestyle components with meta-analysis of genome-wide gene–environment interaction on systolic blood pressure in four European-ancestry (EA) cohorts.

Methods

Participating cohorts and subject inclusion

In this study, we included data from four cohorts: the Atherosclerosis Risk in Communities Study (ARIC), the Framingham Heart Study (FHS), the Hypertension Genetic Epidemiology Network (HyperGEN), and the Netherlands Epidemiology of Obesity Study (NEO). For cohorts with data collected from multiple visits, we chose a single visit that could maximize sample size with non-missing data. We included a total of 24,048 EA individuals aged 18–80 and with non-missing genotype and phenotype information, including age, sex, systolic blood pressure (SBP), antihypertensive medications, body mass index (BMI), and the four lifestyle factors (smoking status, alcohol consumption, education level, and physical activity). The individual level data for ARIC and FHS are available on dbGAP (dbGAP ID: phs000280.v7.p1 for ARIC and phs000007.v2.p1 for FHS), and HyperGEN and NEO are in-house data.

Phenotype and covariates

Resting SBP (mmHg) was calculated by taking the average of all available BP readings at the same clinical visit, and further adjusted by adding 15 mmHg for subjects with antihypertensive medication use [21]. SBP values that were more than six standard deviations away from the mean were winsorized to exactly six standard deviations from the mean, in order to reduce the potential influence of outliers. Other covariates included age, sex, field center (if appropriate), and principal components to account for population stratification.

Genotyping and Quality Control (QC)

Genotyping was performed separately within each cohort using Affymetrix (Santa Clara, CA, USA) or Illumina (San Diego, CA, USA) genotyping arrays (Supplementary Table S1). Each cohort performed imputations with IMPUTE2 [22] or MaCH [23], using the cosmopolitan reference panel from the 1000 Genomes Project Phase 1 Integrated Release Version 3 Haplotypes (2010–11 data freeze, 2012-03-14 haplotypes) [24]. SNPs were excluded if they were non-autosomal, had minor allele frequency < 1% or low imputation quality (r2 < 0.1). We conduct further quality control filters centrally during the meta-analysis.

Lifestyle Risk Score

We considered four lifestyle factors: smoking status (never/former/current smoker), current alcohol intake (abstinence/modest/heavy), educational attainment beyond high school (none/some college/college degree), and physical activity (inactive/active). Current alcohol intake included three groups (abstinence: 0 drinks/week; modest: 1–7 drinks/week; heavy: >7 drinks/week). We classified participants as “college degree” if they completed at least a 4-year college degree, as “some college” if they received any education beyond high school including vocational school but did not complete a 4 year college degree and as “none” if they received no education beyond high school [6]. Physical activity is expressed in metabolic equivalents (MET; 1 MET = 1 kcal/kg/h). Inactive individuals were defined as those with <225 MET—minutes per week of moderate-to-vigorous leisure-time or commuting physical activity, or in the lower quartile (25%) of the physical activity distribution within cohort. The detailed definitions of active and inactive physical activity followed a previous study on gene–physical activity interaction [14].

We constructed the LRS with two steps. First, each lifestyle factor, treated as an individual lifestyle component, was categorized into no risk (with value of 0), low risk (with value of 1) and high risk (with value of 2) based on its presumed effect on BP or cardiovascular health, except physical activity which only had no risk and low risk [17]. The higher risk value the category was assigned, the more relevant to unfavorable cardiovascular health outcomes. Note that we categorized modest alcohol intake as no risk and abstinence as low risk because there was evidence that moderate alcohol consumption had consistently been associated with a decreased risk of type 2 diabetes [25] and coronary artery disease [26] compared with abstention or excessive consumption [27]. Table 1 detailed the LRS component definition.

Table 1 Definition of LRS component, with no risk as the value of 0, low risk as the value of 1 and high risk as the value of 2.

Second, the “Complete” Quantitative LRS (QLRS-C) was calculated by summing up all four components, ranging from 0 to 7. We also calculated the “Partially Missing” Quantitative LRS (QLRS-M) using only 2–3 components pre-selected for each cohort by design to simulate real cohort-level missingness, as described in Table 2. For example, for ARIC, we included three lifestyle components (smoking, education, and physical activity) when constructing QLRS-M. QLRS-M ranges from 0 to 4 or 5, depending on the inclusion of lifestyle components for each cohort.

Table 2 Components included in the calculation of “Partially Missing” Quantitative LRS (QLRS-M) for each cohort (by design).

After constructing the Quantitative LRS, we further created Dichotomous LRS for the “Complete” (DLRS-C) and the “Partially Missing” (DLRS-M) summary scores. We gave a value of 0 (unexposed group) if the corresponding Quantitative LRS < 2 and a value of 1 (exposed group) if Quantitative LRS ≥ 2 (i.e., at least one risk component classified as high risk or at least two components classified as low risk). These dichotomized LRS measures are used to define exposed and unexposed strata in our analyses.

Note that cohorts with partially missing lifestyle components have equal or lower LRS than its “true” score, had we observed all lifestyle components. This leads to potential misclassification when dichotomizing the LRS into exposed and unexposed groups. However, no participant would be misclassified as exposed and can only be misclassified as unexposed, leading to heterogeneity in the unexposed group only.

Statistical analysis

Overview

We conduct a two-stage analysis procedure. In Stage 1, each cohort performed LRS-stratified genome-wide association analysis on SBP using the main effect model (E(Y) = β0 + βG SNP + βC C), where Y is SBP, SNP is the imputed additive dosage value of the genetic variant, and C is the vector of covariates. This model was run in the DLRS-C exposed and DLRS-C unexposed strata separately, and then repeated in the DLRS-M exposed and DLRS-M unexposed strata. In Stage 2, we performed meta-analysis within each stratum, and then evaluated the joint effects of main and interaction effects by calculating the p-values for the 2 degree of freedom joint test. Under Stage 2, we considered four different meta-analysis approaches of handling cohort-level missingness of lifestyle components (Naïve, Safe, Complete and Moderator Approaches). We evaluated the performance of the four approaches under four scenarios that create various patterns of missingness among the cohorts.

Stage 1: Cohort-specific stratified analysis and QC of association results

For Stage 1, each cohort performed four genome-wide association analyses on SBP using the main effect model: two strata (exposed/unexposed) × two LRS (DLRS-C/DLRS-M). Association analyses were implemented either using ProbABEL [28] for studies with unrelated samples, or using MMAP (https://mmap.github.io/) for studies with family relatedness. Relatedness in families were accounted for using a kinship matrix in the linear mixed model. Each cohort provided the robust estimates of the stratum-specific genetic main effect and corresponding robust standard error for all four analyses. Cohort-specific details are presented in Supplementary Table S1.

We performed extensive quality control (QC) using the R package EasyQC [29] on each of the cohort-specific association results centrally, which contained ~8–9 million variants. We restricted to SNPs with the imputation quality score ≥ 0.5 and the product of the imputation quality and minor allele count ≥ 20. Details of central QC can be found in Supplementary Note I.

Since QC and filtering were performed separately within each stratum, the set of variants remaining in each stratum differed slightly. Thus, we further harmonized the set of variants between the exposed and unexposed strata within each LRS, to ensure that the set of variants was identical between strata. After QC, the number of variants in each association result was between 5.3M and 8.2 M.

Stage 2: Meta-analysis

After obtaining cohort-specific GWAS results using DLRS-C and DLRS-M, we first performed meta-analyses within each stratum (exposed/unexposed) using the results obtained from analyses using DLRS-C, and considered this set of meta-analyzed results as a “benchmark”, as there is no missing lifestyle component in each cohort’s LRS construction.

Then, to mimic the real life situation where some cohorts would provide GWAS association results obtained from analyses using DLRS-C (referring to “Complete” results) but the others could only provide results using DLRS-M (referring to “Partially Missing” results), we further performed the meta-analyses using a mixture of results obtained from cohort-specific analyses conducted with DLRS-C and DLRS-M. We considered four scenarios using different cohort mixture patterns by changing each cohort’s contribution of lifestyle components, in order to better utilize the data. The setting of each scenario is presented in Table 3. For example, Scenario 1 uses “Complete” results from ARIC, and “Partially Missing” results from HyperGEN, FHS and NEO.

Table 3 Setting of Scenarios 1–4 using a mixture of results obtained from cohort-specific genome-wide association analyses conducted with DLRS-C and DLRS-M, and the inclusion of association results in the meta-analysis using Naïve, Safe, Complete and Moderator Approaches.

As mentioned in the LRS section, cohort-level missingness in lifestyle components will cause misclassification when dichotomizing LRS into exposed and unexposed groups, hence leading to heterogeneity in the unexposed group only. To account for this heterogeneity, we considered four different meta-analysis approaches of utilizing “Complete” and “Partially Missing” results under various scenarios discussed above.

  1. (1)

    Naïve Approach. This approach simply takes all association results contributed by each participating cohort without worrying whether their LRS includes all lifestyle components, for both exposed and unexposed groups.

  2. (2)

    Safe Approach. Since heterogeneity only occurs in the unexposed group, it is “safe” to only take association results from cohorts with LRS-C for the unexposed group analysis, while including results from all cohorts no matter whether the missing data exist in LRS for the exposed group analysis.

  3. (3)

    Complete Approach. This approach only uses association results from cohorts with “Complete” LRS data in meta-analysis, for both exposed and unexposed groups.

  4. (4)

    Moderator Approach. This approach uses all the contributed data from cohorts without regard to their missingness in lifestyle components. It utilizes the framework of meta-regression, while including moderator terms indicating the missing LRS components across cohorts in the design matrix of the meta-regression to account for missingness during meta-analysis. Technical details of this approach are available in the Supplementary Method.

Table 3 also shows the inclusion of association results in the meta-analysis using each of the approaches described above under Scenarios 1–4. Note that different scenarios serve to illustrate different patterns of missingness across the cohorts and the cohort inclusion only vary in the cases of Safe and Complete Approaches, since both Naïve and Moderator Approaches will utilize all the cohort-contributed association results in the meta-analysis. Here we take Scenario 1 as an example: For the Naïve Approach, we analyze exposed and unexposed groups separately using “Complete” results from ARIC, and “Partially Missing” results from HyperGEN, FHS and NEO without differentiating “Complete” or “Partially Missing”. For the Safe Approach, we include ARIC results alone and ignore other cohorts’ contributions with “Partially Missing” results for the unexposed group; for the exposed group, we analyze all four cohorts using “Complete” results from ARIC, and “Partially Missing” results from HyperGEN, FHS and NEO. For the Complete Approach, we analyze exposed and unexposed groups separately, but only use “Complete” results from ARIC with no other cohorts included. For the Moderator Approach, we take “Complete” results from ARIC, and “Partially Missing” results from HyperGEN, FHS and NEO for both exposed and unexposed groups as input of the meta-regression.

For the “benchmark” meta-analysis and the first three approaches (Naïve, Safe and Complete), we used METAL [30] to perform meta-analyses within each stratum and EasyStrata [31] to calculate the two degree of freedom joint p values. For the Moderator Approach, we used the Moderator Web App and R code developed by Dr. RJ Waken (https://rjwaken.shinyapps.io/missing_lrs_meta/).

Results

Sample characteristics

Sample characteristics are presented in Supplementary Tables S2 and S3, and S4. ARIC had the largest sample size (N = 9426) and HyperGen cohort had the fewest number of participants (N = 1249). All cohorts had similar distributions of age, sex and BMI, except that FHS and HyperGEN had a wider age range than ARIC and NEO (Supplementary Table S2). In Supplementary Tables S3 and S4, the exposed group had slightly higher SBP level than the unexposed group for all four cohorts in terms of DLRS-C. However, the difference in SBP levels between exposed and unexposed groups was smaller when we defined exposure groups using DLRS-M. The proportion of subjects in the exposed group was smaller when using DLRS-M compared to DLRS-C, indicating potential misclassification.

Results comparison between approaches

Note that since Scenarios 2–4 presented similar patterns to Scenario 1 in terms of comparison with “benchmark” results and within-scenario comparison between different approaches, we would focus on illustrating Scenario 1 in the following of this subsection. The detailed comparison results of Scenarios 2–4 are available in the Supplementary Figs. S1S9.

Figure 1 presents the results of the four meta-analysis approaches compared to the “benchmark” results. Among variants that reach genome-wide significance level (p value < 5 × 10−8), we observed that the Complete Approach yielded much larger p values than the “benchmark” results, thus could be considered with lower statistical power due to lower sample size. The Naïve Approach was able to detect the same set of genome-wide significant variants as the “benchmark” results, but with slightly smaller p values. The Safe and Moderator Approaches led to slightly larger p values than “benchmark” results. The Q–Q plot (Fig. 2) also shows that the Complete Approach obtained the most deflated p values among the four approaches (λComplete vs benchmark = 0.972). The Safe Approach and Moderator Approach yielded similar slightly conservative results (λSafe vs benchmark = λModerator vs benchmark = 0.985), while the results of the Naïve Approach were slightly inflated (λNaive vs benchmark = 1.004).

Fig. 1: Scatterplots of comparison of four approaches to “benchmark” results in terms of –log10 (p value).
figure 1

Each plot shows SNPs with p value < 10−6 for any of the two approaches being compared in the plot. SNPs reaching genome-wide significant (p value < 5 × 10−8) in “benchmark” results are marked as triangle.

Fig. 2: Q-Q plot of different approaches compare to “benchmark” results for Scenario 1.
figure 2

Circle: the “Complete” Approach, triangle: the “Moderator” Approach, square: the “Naïve Approach”, plus sign: the “Safe” Approach. λNaive = 1.004, λSafe = λModerator = 0.985, λComplete = 0.972. The points for Safe Approach and Moderator Approach completely overlapped on each other. (Scenario 1: use “complete” results from ARIC, and “partially missing” results from HyperGEN, FHS, and NEO).

Figure 3 shows the pair-wise comparison among four meta-analysis approaches. The Safe and Moderator Approaches yielded similar but slightly larger p values than Naïve Approach, and the degree of similarity increased with significance. Notably, the results of Safe Approach and Moderator Approach were almost identical, but the number of variants included in the analyses for the Moderator Approach (Number of variants = 5,258,666) was much smaller than the Safe Approach (Number of variants = 8,181,669), because the analysis of Moderator Approach was restricted to SNPs with association results present in all four cohorts.

Fig. 3: Scatterplots of comparison between four approaches in terms of –log10 (p value).
figure 3

Each plot shows SNPs with p value < 10−6 for any of the two approaches being compared in the plot.

Result comparison between scenarios for Safe Approach

Here we further evaluated the performance of the same meta-analysis approach under different scenarios. Since we generally were concerned with false-positive results, we focused our attention only to the non-inflated Safe Approach. Figure 4 presents the scatterplot of association results between “benchmark” and the Safe Approach for each of the four scenarios, for variants with p value < 1 × 10−6 in at least one of comparing results. We observed that for SNPs reaching genome-wide significance (p value < 5 × 10−8) in “benchmark” results, the points of Scenarios 3 and 4 almost lay along the diagonal line, while points of Scenarios 1 and 2 were a bit away from the diagonal. This indicated that the Safe Approach under Scenarios 3 and 4 more accurately identified positive signals than under Scenarios 1 and 2.

Fig. 4: Scatterplots of comparison of four scenarios to “benchmark” results in terms of –log10 (p value) for Safe Approach.
figure 4

Each plot shows SNPs with p value < 10−6 for any of the two approaches being compared in the plot. SNPs reaching genome-wide significant (p value < 5 × 10−8) in “benchmark” results are marked as triangle.

The Q–Q plot (Supplementary Fig. S10) shows that when p values were large (>10−5), Scenario 4 with less missingness provided more similar p value distributions with “benchmark” results (λscenario 4 vs benchmark = 0.991) compared to Scenario 1 (λscenario 1 vs benchmark = 0.983) and Scenario 3 (λscenario 3 vs benchmark = 0.984). Although Scenario 2 seemed to perform very well on large p values (λscenario 2 vs benchmark = 0.994), it provided substantially deflated results toward the tail when reaching genome-wide significance. In the meantime, Scenarios 3 and 4 had similar p value distributions and both of their p values were very close to the “benchmark” distribution when p values were small. The p values of Scenario 1 were closer to the diagonal line than those of Scenario 2 when p values were small, and this may due to the sample size of the cohort with “Complete” results in Scenario 1 (ARIC, N = 9426), which was greater than that of Scenario 2 (FHS, N = 7638).

In general, we consider Scenario 4 performed better than Scenario 3, in turn than Scenarios 1 and 2. This meets our expectation as Scenario 4 had the smallest proportion of cohorts using “Partially Missing” results; thus it was expected to bring the most comprehensive information into meta-analysis.

Discussion

In this study, we evaluated four different strategies handling the cohort-level missingness of individual lifestyle components in the meta-analysis of gene–lifestyle interaction using LRS-stratified summary statistics from participating cohorts. We aimed to find the best way to leverage the available data while appropriately handling the heterogeneity due to missing data in the LRS, and further improve the power of identifying novel loci for the trait of interest. Only utilizing data contributed by the cohorts without missingness in any lifestyle components (the Complete Approach) has lower statistical power due to lower sample size, while freely meta-analyzing all the association results contributed by the cohorts even with missing components in the LRS (the Naïve Approach) is slightly inflated. The Safe Approach and Moderator Approach are both slightly conservative and their p values are almost identical to each other. We also observed that, as expected, the more cohorts with non-missing lifestyle components we used in meta-analysis, the more accurate the results. This result confirms our primary hypothesis.

A risk score is a commonly used approach to evaluate combined effects of risk factors and it may play an important role in personalized medicine. In the past, the scientific community has proposed several well-known risk scores. For example, the Framingham Risk Score [32] is a sex-specific score used to estimate the 10-year cardiovascular risk, and the diabetes risk score [33] is a screening tool for identifying subjects at high-risk for type 2 diabetes. The LRS has also become popular as people are increasingly interested in their clinical implications drawn by the joint effects of individual lifestyle factors to a specific trait, disease, or time-to event outcome. In the meantime, the genetic risk score (GRS) has become a widely used tool to improve identification of persons who are at risk for common complex diseases [34, 35].

There have been some prior studies combining GRS and LRS to explore their joint behavior on risk of CVD [19] and Colorectal Cancer [36]. Specifically, these studies divided study samples into subgroups based on the combination of GRS level and LRS level, and found that within and across genetic risk groups, adherence to poor behavioral lifestyle was associated with increased risk of diseases, and there was no interaction effect between genetic risk and lifestyle risk. This might seem discouraging regarding whether adding genetic information could add much to the risk prediction studies using LRS. However, it is important to note that the GRS was calculated based on variants reported from previous standard genome-wide significant analyses without taking its potential modification effect into consideration; variants whose effects may differ by level of LRS might therefore be missed by standard GWAS screening. Moreover, a LRS may have a different modification effect on each variant, so instead of looking at aggregated GRS only, interaction with one variant at a time should also be evaluated. Our study looked into the combination of genetic and lifestyle information by performing meta-analysis of gene-by-lifestyle interaction in order to find novel loci for complex disease traits, and those potential novel loci may provide additional information for computing a GRS, which could increase the power of previous studies.

Handling missing data in the aggregation of risk factors is challenging, yet important and worth the effort to explore in further detail. Based on the properties of genetic architecture, GRS can be computed using imputed or proxy SNPs, when the originally reported variants are not available, based on the largely available reference panel, such as 1000 Genome Project [37]. Thus, it is more flexible than LRS in terms of dealing with missingness. There were several methods proposed to impute phenotypes using the correlation structure between phenotypes, family structure or information from other cohorts [38,39,40], but these methods rarely dealt with the case that one phenotype is completely unavailable for all the individuals in one particular cohort contributing to a large meta-analysis, which is what we encountered in our study. When considering using summary statistics in meta-analysis, a previous study [41] tried to deal with the issue of missingness by restricting the study sample to cohorts with at least three out of five lifestyle behaviors available, reducing sample size and thus power to a great extent, with the issue of heterogeneity unresolved. Our study proposes making the best use of the available data gathered from cohorts to obtain accurate combined effects of risk factors, thereby providing a novel perspective for LRS based meta-analysis in future research.

Our study examined the Moderator Approach, which is a novel method of accounting for missingness via meta-regression in the gene-by-environment interaction field. Instead of performing stratum-specific meta-analyses and then evaluating the interaction, this approach can achieve the final goal in one step via meta-regression, with meta-analysis results of both exposure groups as input. However, due to the meta-regression setting, the Moderator Approach requires that the number of cohorts with GWAS results available for a SNP (4 in our study) is greater than the number of predictors divided by two (which is [one main effect + one interaction effect + four missingness effects]/2 = 3 in our study). Therefore, it restricted the analyses to the SNPs existing in the GWAS results of all four cohorts, thereby eliminating a large number of SNPs from the analyses and possibly missed positive signals. On the other hand, the design matrix of the meta-regression model in the Moderator Approach should be treated with caution because in some patterns of missingness, the design matrix would suffer from multicollinearity and we could not successfully obtain the least square estimates. Since the Safe Approach can provide almost identical results as the Moderator Approach but does not have a restriction on the missingness pattern and the number of cohorts and predictors, we would recommend using the Safe Approach to handle missingness during meta-analysis. Potential future works would be to further investigate the Moderator approach and to evaluate the performance of Safe Approach and Moderator Approach under large-scale meta-analyses.

Although our general suggestion is to use safe approach, we also provide some implementation suggestions for each of the approaches. Since Naïve Approach can be largely affected by the sample misclassification and it produces inflated results, Naïve Approach is only applicable when there is little issue on misclassification and researchers are more concerned about the type II error than the type I error. Complete Approach is underpowered due to sample size reduction, so if the sample size of cohorts with “Complete” results is large enough to have an adequate power, the Complete Approach can be an option as well. Moderator Approach utilizes all the available data but restricted to SNPs existing in all four cohorts in our study, so when the variants available from cohorts’ association results are consistent, Moderator Approach is also a good choice.

In the gene-by-lifestyle interaction analysis, the power depends on the effect size of the genetic variant, the effect size of the interaction and minor allele frequency. These factors may lead to differences in power when applying to other traits due to their potentially different underlying genetic mechanisms. Our work mainly focuses on the inclusion strategies of participating cohorts based on their availability of risk factors of interest and use BP as an illustrated phenotype to demonstrate the concept of this work. Therefore, we do not expect the observed pattern will change in the analysis of other heritable phenotypes. Similarly, we believe that our conclusions may be generalizable to other types of risk scores or environmental variables, where the misclassification issue may occur and affect the performance of the meta-analysis results. On the other hand, researchers may be interested in treating the LRS as continuous variable rather than performing LRS-stratified analyses. In this case, a stringent linear assumption of interaction effect is made. And Naïve, Safe and Complete Approaches, which all base on the fact that misclassification only exists in the unexposed group, will not work. However, the meta-regression framework has potential to deal with continuous LRS by incorporating moderator terms indicating the missingness. Further investigations are needed in this regard.

Our study has several important strengths. To our knowledge, this is the first study to explore how to deal with cohort-level missingness in individual lifestyle components in order to improve the power for identifying novel genetic loci for complex disease traits through collaborative meta-analysis. Our study performed thorough comparisons between four meta-analysis approaches via various cohort mixture scenarios, thus providing comprehensive information for investigators to refer to.

Although this study has several strengths as an innovative work for dealing with missingness in gene-by-lifestyle interaction, it has some limitations. Our empirical-based evidence serves as the first step to explore the effect of missingness in lifestyle factors and generates the potential hypothesis; however, it still needs systematic experiments to disentangle its underlying mechanisms as many uncertain factors can’t be controlled due to the nature of empirical data. For example, whether stronger effect in unexposed group may lead to inflated/deflated results due to the missingness and hence potential misclassification of unexposed group. Also, although we considered various settings, we still were not able to catch every possible pattern, such as missingness assignment in LRS-M calculation, cohort mixture scenarios with LRS-C or LRS-M. This kind of design may lose some flexibility and consequently fail to capture all the information during the comparison. Moreover, our study mainly evaluated the performance of different approaches in terms of joint effects instead of focusing on the interaction effect. We did not manage to capture a clear pattern when comparing the interaction effect between different meta-analysis approaches, due to the small sample size of our study. It is worth pursuing the comparison of the interaction effect itself among different approaches by incorporating more cohorts in our next step.

In summary, we evaluated four approaches of incorporating the cohort-level missingness of lifestyle components in the meta-analysis of gene-by-lifestyle interaction. Based on our results, we generally recommend using the Safe Approach since it is straightforward to implement and yields non-inflated results. Handling this missingness of individual lifestyle components appropriately can efficiently increase statistical power of gene-by-lifestyle interaction meta-analysis for identifying novel loci of complex traits.