Introduction

Lean body mass consists primarily of skeletal muscle, is an important contributor to physical strength, mobility, stamina, and balance1,2,3, and has been a very recent focus of an effort to define “sarcopenia” (loss of muscle tissue) for clinical care and drug development4. The determinants of adult skeletal muscle mass have not been well characterized. It is known, for example, that exercise produces increases in muscle mass, and there is some evidence that protein intake is directly associated with lean mass5. Heavier people have increased muscle mass6, which may be due to the loading effect of increased fat mass or may reflect a common genetic background between muscle and adipose tissue7. With aging, there is a progressive loss of skeletal muscle mass, and a concurrent increase in fatty infiltration and fibrosis of muscle. This loss of muscle mass may reach a critical point at which time functional impairment and even disability occurs8. In fact, the annual healthcare costs of sarcopenia in the United States are estimated to be in excess of 18 billion dollars9.

As estimated from family and twin studies, lean mass is a highly heritable phenotype with heritability estimates of 0.52–0.6010, 11. While there have been previous studies related to the genetic background of BMI and fat mass, few studies have searched for genes associated with lean mass12,13,14,15, 16, 17. To date, no single-nucleotide polymorphisms (SNPs) have been found to be associated at a genome-wide significance level with lean mass (p-values < 5 × 10−8)18, 19. A copy number variation located in the GREM1 gene was reported to be associated with lean body mass in a genome-wide association study (GWAS) of 1627 Chinese20. Guo et al.19 identified in 1627 Chinese and replicated in 2286 European ancestry individuals, a locus near CNTF and GLYAT genes at 11q12.1 in a bivariate GWAS for bone-size phenotypes and appendicular lean mass. Most recently in a study of Japanese women, the PRDM16 gene was suggested to be associated with lean mass21.

With the advent of relatively simple, inexpensive methods of measuring the fat and lean compartments of the body using dual energy X-ray absorptiometry (DXA) and bioelectrical impedance analysis (BIA), multiple cohort studies have accumulated phenotypic information on body composition that permits large-scale GWAS to be performed. While whole body lean mass incorporates all of the non-fat soft tissue including the internal organs, appendicular lean mass, estimated by DXA and BIA, may be a better reflection of skeletal muscle mass22,23,24. To identify genetic loci associated with whole body and appendicular lean mass, we performed a large-scale GWAS meta-analysis in over 100,000 participants from 53 studies yielding sufficient power to identify common variants with small to moderate effect sizes.

Results

GWAS meta-analyses for discovery and replication

Descriptions and characteristics of the study populations in the discovery stage and the replication stage are shown in Supplementary Tables 4 and 5 and Supplementary Note 2.8. The age of the participants ranged from 18 to 100 years. In the GWAS discovery set, comprising 38,292 participants for whole body lean mass and 28,330 participants for appendicular lean mass, a substantial excess of low p-values compared to the null distribution was observed after genomic control adjustment of the individual studies prior to meta-analysis: λ GC = 1.076 and λ GC = 1.075, for whole body and appendicular lean mass, respectively (Supplementary Fig. 1)

Meta-analyses were conducted using the METAL package (www.sph.umich.edu/csg/abecasis/metal/). We used the inverse variance weighting and fixed-effect model approach. Supplementary Table 1 shows the genome-wide significant (GWS) and suggestive (sGWS) results in the discovery set. For whole body lean mass, we observed one GWS result in/near HSD17B11 and 12 sGWS results (in/near VCAN, ADAMTSL3, IRS1, FTO (two SNPs), MOV10, HMCN1, RHOC, FRK, AKR1B1, CALCR, and KLF12). For appendicular lean mass, one result was GWS (intronic SNP in PKIB) and seven were sGWS (in/near VCAN, ADAMTSL3, HSD17B11, IRS1, FRK, TXN, and CTNNA3).

We selected 21 associations (13 for whole body lean mass and 8 for appendicular lean mass; a total of 16 discovery SNPs with 5 SNPs overlapping between the two phenotypes) (Supplementary Table 1) to conduct a replication study in a set of 33 cohorts comprising up to 48,125 participants of European descent for whole body lean mass and 43,258 participants for appendicular lean mass. Both in silico replication and de novo genotyping for replication was conducted. Table 1 shows the results for successfully replicated SNPs in participants of European ancestry, including the discovery phase, replication phase, and the combined results.

Table 1 Results for the successfully replicated SNPs in discovery, replication and combined sample

For whole body lean mass, joint analysis of the discovery and replication cohorts successfully replicated five SNPs in/near HSD17B11, VCAN, ADAMTSL3, IRS1, and FTO (p-values between 1.4 × 10−8 and 1.5 × 10−11 and lower than discovery p-values). Three of these five SNPs (located in/near VCAN, ADAMTSL3, and IRS1) were also successfully replicated (p-values between 5 × 10−8 and 2.9 × 10−10) for appendicular lean mass. None of the eight replicated associations (five for whole body and three appendicular lean mass) had significant heterogeneity at α = 0.00625 (0.05/8, Bonferroni-corrected for eight tests). Only mild heterogeneity was indicated in two whole body lean mass SNPs when using an uncorrected threshold of α = 0.05: FTO (p = 0.018, I 2 = 34%) and HSD17B11 (p = 0.04, I 2 = 31%). For appendicular lean mass, p-values for heterogeneity were >0.05 for all three replicated SNPs.

Supplementary Table 1 shows the results for all participants including those of non-European descent as well. Results were similar showing low heterogeneity using “transethnic meta-analysis” (MANTRA)25 for inclusion of both cohorts of European ancestry and replication cohorts with Asians or African Americans. Heterogeneity probability values were below 0.5 for all replicated SNPs both for whole body and appendicular lean mass. Furthermore, in this combined analysis, except for the VCAN locus, log10 Bayes’ factors were >6.0 and p-values were smaller than those found in European-only ancestry analysis.

Additional analyses stratified by sex failed to identify significant sex-specific associations or evidence of an interaction between SNPs and sex (Supplementary Table 2 and Supplementary Note 1). Similarly, we also found no evidence for heterogeneity between measurement techniques (BIA vs DXA; Supplementary Table 3 and Supplementary Note 2). Finally, we failed to replicate previously reported candidate genes for lean mass (Supplementary Note 2.4)

Association with other anthropometric phenotypes

We looked for associations between the lead SNPs in the five replicated loci and other reported anthropometric phenotypes from the GIANT Consortium (Supplementary Table 10)26,27,28,29. There were no significant associations (p < 0.05) between the SNP in HSD17B11 and any reported phenotypes. The allele associated with greater lean mass was associated with lower values of various anthropometric traits for the IRS1 (hip circumference (HC), waist circumference (WC)), VCAN (hip ratio, WC adjusted for BMI, waist to hip ratio adjusted for BMI), and the ADAMTSL3 locus (height, hip, and WC with the association becoming more significant for hip and waist adjusted for BMI). The replicated FTO SNP was very significantly associated with BMI (p = 2.7 × 10−144) and significantly associated with HC (p = 9.3 × 10−81) and WC (p = 1.3 × 10−96) in the same direction as the lean mass association (i.e., the higher lean mass allele was associated with higher values of anthropometric traits).

Annotation and enrichment analysis of regulatory elements

Among five replicated SNPs, rs9991501 (HSD17B11 locus) and rs2287926 (VCAN locus) are missense SNPs. rs2287926 was predicted as possibly damaging to the protein structure and/or function by PolyPhen-230. Since the remaining GWAS SNPs are non-coding, to estimate whether these SNPs are located in regulatory elements in specific human tissue/cell types, we performed a tissue-specific regulatory-element enrichment analysis using experimental epigenetic evidence including DNAse hypersensitive sites, histone modifications, and transcription factor-binding sites in human cell lines and tissues from the ENCODE Project and the Epigenetic Roadmap Project. As shown in Table 2, 86 SNPs were in high LD (r 2 ≥ 0.8) with the GWAS lead SNP rs2943656 at the IRS1 locus. The SNPs in this locus that were in high LD were enriched in enhancers estimated by ChromHMM31 (permutation p-values <0.05; after multiple testing corrections), especially enriched in fat and brain tissues, but not in skeletal muscle, smooth muscle, blood and gastrointestinal tract tissues in the ENCODE and Roadmap projects. Although the IRS1 locus was not specifically enriched with skeletal muscle tissue enhancers, the lead SNP rs2943656 itself at the IRS1 locus was actually located within a histone mark-identified promoter in an adult skeletal muscle sample and a histone mark-identified enhancer in several muscle samples (such as muscle satellite cultured cells, fetal skeletal muscle, skeletal muscle myoblasts, and skeletal muscle myotubes (Supplementary Figs. 6 and 7)). Based on the position weight matrices (PWMs) score from Chip-seq and other sequencing resources, rs2943656 was found to possibly alter regulatory motifs, including Irf, Foxo, Sox, and Zfp105 in skeletal muscle tissues with a PWM score p-value <4−7(≈6.1 × 10−5)32.

Table 2 Tissue-specific regulatory-element enrichment analyses of the GWAS loci (GWAS SNPs and SNPs in LD with the GWAS SNPs)

For the ADAMTSL3 locus, the GWAS lead SNP rs4842924 was located in a histone mark-identified enhancer in smooth muscle tissues (Supplementary Figs. 11 and 12). The promoter/enhancer enrichment analysis in the ADAMTSL3 locus showed significant enrichments in skeletal muscle, smooth muscle, fat and brain tissues, but not in blood and gastrointestinal tract tissues. At the FTO locus, a few SNPs in high LD with the GWAS lead SNP, rs2287926, were located within a group of enhancers that are not muscle tissue-specific (Supplementary Fig. 13); therefore, no significant enrichment was found in any tissues listed in Table 2.

Expression quantitative trait loci

We queried existing cis-expression quantitative trait loci (eQTLS) analyses on the five replicated GWAS SNPs rs2943656, rs9991501, rs2287926, rs4842924, and rs9936385 with transcripts within 2 Mb of the SNP position in skeletal muscle tissues as well as in subcutaneous adipose, omental adipose, liver tissue, lymphocytes, and primary osteoblasts (obtained from bone biopsies). Rs9936385 was associated with FTO expression in skeletal muscle tissues in the FUSION samples (p = 4.4 × 10−11) (Supplementary Table 10). However, in sequential conditional analysis, upon addition of the lead FTO eQTL SNP (rs11649091, association p-value with FTO gene expression = 5.1 × 10−15), rs9936385 was no longer significantly associated (p = 0.16), whereas rs11649091 remained significantly associated with FTO gene expression (p = 1.9 × 10−5). SNP rs11649091 could not be imputed using our HapMap imputation references; thus, we did not have association results between rs11649091 and lean mass in the present study. The T allele of rs9936385, associated with reduced lean mass in the present study, was significantly associated with lower FTO gene expression levels in the GTEx project (Supplementary Table 9 and Supplementary Fig. 14). For rs9936385, we also examined IRX3 and IRX5 expression in skeletal muscle tissues, as recent reports have implicated GWAS SNPs associated with obesity in intron 2 of the FTO gene as being associated with IRX3 and IRX5 gene expression in brain33 and adipose tissue34. SNP rs9936385 was not significantly associated with IRX3 and IRX5 gene expression in skeletal muscle tissues. For missense SNP rs9991501 in the HSD17B11 locus, a significant association with HSD17B11 gene expression was found in skeletal muscle from the GTEx project (p = 1.4 × 10−4). There were no significant GWAS SNPs in strong LD with this one; thus conditional analyses were not performed.

As shown in Supplementary Table 10, for GWAS lead SNP rs2943656 in the IRS1 locus, significant eQTLs with the IRS1 gene expression in omental (p = 4 × 10−7) and subcutaneous fat tissue (p = 6.44 × 10−6) were found.

Finally we found no evidence for differential expression of our five replicated genes in young vs old muscle biopsies (Supplementary Note 2.7 for methods and results).

Discussion

In this first large-scale GWA meta-analysis study for lean mass that included most of the cohorts worldwide with lean mass phenotypes, we identified and successfully replicated five GWS loci (in/near HSD17B11, VCAN, ADAMTSL3, IRS1, and FTO genes) for whole body lean mass and three of these (in/near VCAN, IRS1, and ADAMTSL3 genes) for appendicular lean mass, both important for sarcopenia diagnosis.

This study contributes to a better understanding of the biology underlying inter-individual variation in muscle mass, since lean body mass consists primarily of muscle mass (especially in the extremities). Genetic determinants of lean body mass cannot be studied specifically by using anthropometric measures such as height, waist circumference, hip circumferences, or BMI, as evidenced by our finding of associations between genetic loci and lean mass that were not observed in results from the GIANT consortium26,27,28,29. Four novel GWS loci for lean mass phenotypes harboring ADAMTSL3, VCAN, HSD17B11, and IRS1 genes have biologic effects supporting their role in skeletal muscle. Although the functional involvement of the ADAMTSL3 gene (a disintegrin-like and metalloprotease domain with thrombospondin type I motifs-like 3) remains unknown, it has been shown to be consistently associated with adult height in large human samples26, including individuals of African ancestry35. The gene is expressed ubiquitously, including in skeletal muscle but the lead GWAS SNP in this locus was not significantly associated with expression of ADAMTSL3 in any of the tissues examined. One might hypothesize that our observed association between this gene and lean mass could be a reflection of an allometric relationship between muscle mass/size and body size36. Since our analyses were adjusted for height, it is plausible that variation in ADAMTSL3 is associated with muscle mass directly. In fact, in a recent study that identified classes of potentially regulatory genomic elements that are enriched in GWAS loci, the height phenotype was almost exclusively enriched in DNAse sites in muscle37. In our study, we observed enrichment of regulatory genomic elements in this locus in both skeletal muscle and smooth muscle, which may not fully support a unique role of this gene in only skeletal muscle tissue.

Versican (VCAN) plays a role in intercellular signaling and in connecting cells with the extracellular matrix, which also is important for the skeletal muscle. Interestingly, VCAN facilitates chondrocyte differentiation and regulates joint morphogenesis38, and therefore has a wider role in the musculoskeletal health. The GWAS SNP in the HSD17B11 locus was significantly associated with the HSD17B11 gene expression in skeletal muscle. Hydroxysteroid (17-beta) dehydrogenase 11 (HSD17B11) functions in steroid biosynthesis, and most relevant to muscle tissue, contributes to androgen catabolic processes. Although not much is known about how variants in this gene associate with sex-hormone related human phenotypes, androgen metabolism is certainly a driver of muscle tissue anabolism and catabolism.

As for the FTO locus, variants in the FTO gene, which is known to regulate postnatal growth in mice39, have been found to be associated with adiposity/obesity40 and related traits, such as BMI41, metabolic syndrome/type 2 diabetes and even with menarche42. Apart from its role in adiposity, in two recent candidate gene studies, SNPs in FTO were found to associate both with DXA-derived fat and lean mass43, and in one, the association for lean mass was only slightly attenuated after fat mass adjustment44, similar to our results using fat mass-adjusted lean mass. Also, FTO knockout mice have been shown to have not only reduced fat mass, but also decrease in lean mass45. Recently, obesity associated non-coding sequences in the FTO gene were found to be functionally connected to the nearby gene, IRX3, by directly interacting with the promoters of this gene33, suggesting obesity SNPs located inside the FTO gene may regulate gene expression other than FTO. Our GWAS lead SNP rs9936385 in the FTO locus was in LD with the obesity GWAS SNP and located in the same haplotype. A denser fine-mapping study using sequencing of the FTO locus with a better resolution will be helpful in narrowing down the FTO region to identify potential causal variant(s). The actual function of the variants and the underlying mechanisms of FTO’s involvement in skeletal muscle biology still need to be further elucidated by in vitro and animal experiments.

The other body composition-related gene that was successfully replicated is the insulin receptor substrate 1 (IRS1), which belongs to the insulin signaling pathway and participates in growth hormone and adipocytokine signaling pathways. Besides being overexpressed in adipocytes, IRS1 is also highly expressed in skeletal muscle13, IRS1 polymorphisms have been associated with fasting insulin-related traits46, adiposity13, and serum triglycerides/HDL cholesterol47. Our GWAS lead SNP rs2943656 was associated with IRS1 gene expression in skeletal muscle obtained from the GTEx project. Interestingly, another SNP, rs2943650, near IRS1 in high LD with our lead SNP (R 2 = 0.854) was previously found to be associated with percent body fat in the opposite direction from the association that we find with lean mass13. The body fat percentage decreasing allele in this study was associated with lower IRS1 expression in omental and subcutaneous fat13. Another study reported an association between a SNP in high LD (rs2943641) with IRS-1 protein expression and insulin-induced phosphatidylinositol 3-OH kinase activity in skeletal muscle13. Also rs2943656 was associated with obesity traits in the GIANT Consortium; the allele associated with greater lean mass was inversely associated with obesity traits. Further understanding of the potential functional effects of these variants in the IRS1 locus are needed to determine whether they have pleiotropic and opposite effects on fat and lean tissue. The same is true for variants in the FTO locus for which the allele that was associated with greater lean mass was previously reported to be associated with greater fat mass.

From the current eQTL data analysis, we have no definitive evidence that the non-coding GWAS SNPs (or variants in high LD) functionally influence gene expression of IRS1, HSD17B1, and FTO in skeletal muscle. Use of a larger reference panel for imputation in the GWAS sample, and analysis of larger tissue expression data sets coupled with conditional analysis will help reveal if underlying functional associations exist. It should be emphasized that with the current data we are not able to determine which SNPs may be functional and it is possible that the identified lean mass variants are not driving the associations with expression. Using ENCODE and Epigenetic Roadmap data, we found that the GWAS SNPs (or SNPs in LD with these SNPs) were significantly enriched in the predicted gene regulatory regions (in our case, enhancers) not only in skeletal muscle, but also in smooth muscle, fat and brain tissues. With the complexity of lean mass phenotypes, we cannot rule out the possibility that these genes are involved in regulating lean mass biology via tissues other than skeletal muscle.

Using results from the overall meta-analysis, the percent variance explained by the successfully replicated SNPs was 0.23% and 0.16% for whole body and appendicular lean mass, respectively. Estimates were slightly higher when we used individual level data from the Framingham Study cohorts (percent variance explained of 0.97% and 0.55% for whole body and appendicular lean mass, respectively). This relatively small percentage of explained variance is not dissimilar to other body composition measures such as bone mineral density of the femoral neck, for which 63 SNPs explained 5.8% of the variance48. To estimate the proportion of variance for lean mass explained by all genotyped SNPs across the genome in Framingham Study participants, we applied a GREML model implemented in the GCTA package49 with the assumption that all 550K genotyped SNPs captured ≥80% of the common sequence variance in the Framingham Study participants who are Caucasians of European ancestry. We estimated that the proportion of whole body lean mass and appendicular lean mass variance explained by all genotyped SNPs together was 43.3% (SE: 2.7%) and 44.2% (SE: 2.6%), respectively (after adjustment for age, age2, sex, height and fat mass), suggesting most of the heritability of lean mass was not detected in the current study due to the limitations of study design (only common variants in this study). This percent variance is similar to the 45% of the variance explained for height using this method49.

Because of the substantial sexual dimorphism in body composition with men having higher muscle mass compared with women24, we performed both sex-combined and sex-stratified analyses. We examined potential sex differences in the genetic associations. Lean mass is a highly heritable trait and the heritability is of similar magnitude in both genders24, 50. No formal interaction test between SNP and sex was significant for any of these SNPs. Thus, our findings do not support any substantial sex-specific genetic influence for any of the successfully replicated lean mass SNPs. We cannot rule out the possibility that the reported GWS SNPs are false-positive findings, although the chance of such false positives is extremely low due to the robustness of replication in our well-powered study. There are also other limitations to our study. Because lean mass is correlated with fat mass, we adjusted for fat mass to focus our search on genes contributing to lean mass independent of those regulating fat mass. A potential limitation with this strategy of adjusting for fat mass is that the power to identify genetic signals with a similar impact on lean and fat mass will be reduced. Nevertheless, the FTO signal was found to be significantly associated with lean mass after fat adjustment and the direction of this association was the same as the association with fat mass. Since androgens have a major impact on muscle mass, it is a limitation of the present study that the X chromosome, harboring the androgen receptor gene, was not included in the present meta-analysis. Another potential weakness of this study is our decision to meta-analyze body composition results using two different techniques (BIA and DXA). Nevertheless, the two methods are highly correlated (r = 0.83 for Framingham cohort participants), and by combining them power to detect GWS loci was greatly enhanced.

In conclusion, in this first large-scale meta-analysis of GWAS, five GWS variants in or near the HSD17B11, VCAN, ADAMTSL3, IRS1, and FTO genes were found to be robustly associated with lean body mass. Three of these loci were found to be significantly enriched in enhancers and promoters in muscle cells, suggesting that our signals have a potential functional role in muscle. Our findings shed light on pathophysiological mechanisms underlying lean mass variation and potential complex interrelations between the genetic architecture of muscle mass, fat mass, body height and metabolic disease.

Methods

Study summary

We focused on two phenotypes: (1) whole body lean mass; (2) appendicular lean mass. For these two phenotypes, we performed a genome-wide meta-analysis of the discovery cohorts (Stage I), then meta-analyzed the genotyped discovery SNPs in replication cohorts (Stage II), followed by a combined analysis with discovery and replication cohorts (Supplementary Fig. 1). The total sample size for the combined analysis was just over 100,000 from 53 studies. Our primary results are based on the ~85,000 individuals of European descent in 47 studies, as was pre-specified. Because whole body and appendicular lean mass are correlated with fat mass and height, analyses were adjusted for these potential confounders in addition to sex, age, age2, and other study-specific covariates to focus our search on genes contributing to lean mass independent of those of body height and fat mass.

Study population

The Stage I Discovery sample comprised 38,292 individuals of European ancestry drawn from 20 cohorts with a variety of epidemiological designs and participant characteristics (Supplementary Tables 46 and Supplementary Note 2.8). Whole body lean mass was measured using DXA (10 cohorts, n = 21,074) and BIA (10 cohorts, n = 17,218). Of the 20 cohorts, 15 consisted of male and female subjects, while 2 had male and 3 had female subjects only. In total, the cohorts included 22,705 women and 15,587 men. Appendicular lean mass was estimated in 28,330 subjects from a subset of 15 cohorts (9 using DXA and 6 using BIA).

Subjects from 33 additional studies were used for replication with a total sample size of 63,475 individuals. Of these 63,475, the majority was of European ancestry (n = 47,227 in 27 cohorts), and the remaining 16,248 were of African American, South Asian, or Korean ancestry (Supplementary Table 4). All these 33 cohorts had data for whole body lean mass. Among them, 16 studies had DXA measurements (n = 23,718) and 17 studies had BIA measurements (n = 39,757). Twenty-five cohorts had data for appendicular lean mass in a total of 45,090 individuals (16 cohorts with DXA (n = 23,718) and 9 with BIA (n = 21,372)). Of these, 42,360 individuals (23 cohorts) were of European ancestry and the remaining 2730 of African American and Korean descent. Our a priori aim was to perform replication in cohorts with European subjects only and to explore if adding non-European cohorts would increase power or show evidence of heterogeneity due to ethnicity. The Stage II Replication included cohorts with existing GWAS data that were unavailable at the time of the Stage I Discovery, and cohorts who agreed to undergo de novo genotyping. All studies were approved by their institutional ethics review committees and all participants provided written informed consent.

Lean mass measurements

Lean mass was measured in all cohorts using either DXA or BIA. DXA provides body composition as three materials based on specific X-ray attenuation properties; bone mineral, lipid (triglycerides, phospholipid membranes, etc.) and lipid-free soft tissue. For each pixel on the DXA scan, these three materials are quantified. For the cohorts with DXA measures, the phenotype used for these analyses was the lipid free, soft tissue compartment that is referred to as lean mass, and is the sum of body water, protein, glycerol, and soft tissue mineral mass. Two lean mass phenotypes were used: whole body lean mass and appendicular lean mass. The latter was obtained by considering only pixels in the arms and legs collectively, which has been demonstrated to be a valid measure of skeletal muscle mass51.

Some of the cohorts estimated body composition using BIA, which relies on the geometrical relationship between impedance (Z), length (L), and volume (V) of an electrical conductor. Adapted to the human body, V corresponds to the volume of fat-free mass (FFM) and L to the height of the subject. Z is composed of the pure resistance (R) of the conductor, the FFM, and the reactance (Xc), produced by the capacitance of cellular membranes, tissue interfaces and non-ionic tissues: Z 2 = R 2 + Xc2. A variety of BIA machines were used by the various cohorts (summarized in Supplementary Table 5), and in some cohorts, the specific resistance and reactance measures were not available because the manufacturers provided only summary output on FFM. For BIA cohorts with specific resistance and reactance measures, we used the validated equation from Kyle et al.52 with an R 2 of 0.95 between BIA and DXA to calculate the appendicular lean mass.

Stage 1: genome-wide association analyses in discovery cohorts

Genotyping and imputation

Genome-wide genotyping was done by each study on a variety of platforms following standard manufacturer protocols. Quality control was performed independently for each study. To facilitate meta-analysis, each group performed genotype imputation with IMPUTE53 or MACH54 software using HapMap Phase II release 22 reference panels (CEU or CHB/JPT as appropriate). Overall imputation quality scores for each SNP were obtained from IMPUTE (“proper_info”) or MACH (“rsq_hat”). Details on the genotyping platform used, genotype quality control procedures and software for imputation employed for each study are presented in Supplementary Table 6.

Study-specific genome-wide association analyses with lean mass

In each study, a multiple linear regression model with additive genetic effect was applied to test for phenotype–genotype association using ~2.0 to 2.5 million genotyped and/or imputed autosomal SNPs. Other covariates adjusted in the model included ancestral genetic background, sex, age, age2, height, fat mass measured by the body composition device (kg) and study-specific covariates when appropriate such as clinical center for multi-center cohorts. Adjustment for ancestral background was done within cohorts using principal component analyses as necessary. Furthermore, for family-based studies, including the Framingham Study, ERF, UK-Twins, Old Order Amish Study and the Indiana cohort, familial relatedness was taken into account in the statistical analysis within their cohorts by: (1) linear mixed-effects models that specified fixed genotypic and covariate effects and a random polygenic effect to account for familial correlations (the R Kinship package; http://cran.r-project.org/web/packages/) in the Framingham Study; (2) the GenABEL55 in the ERF and UK-Twins cohorts; (3) the Mixed Model Analysis for Pedigrees (MMAP) program (http://edn.som.umaryland.edu/mmap/index.php) in the Amish cohort; and GWAF, an R package for genome-wide association analyses with family data in the Indiana cohort56.

Meta-analyses

Meta-analyses were conducted using the METAL package (www.sph.umich.edu/csg/abecasis/metal/). We used the inverse variance weighting and fixed-effect model approach. Prior to meta-analysis, we filtered out SNPs with low minor allele frequency, MAF (<1%) and poor imputation quality (proper_info<0.4 for IMPUTE and rsq_hat<0.3) and applied genomic control correction where the genomic control parameter lambda (λ GC) was >1.0.

We used quantile–quantile (Q–Q) plots of observed vs. expected –log10 (p-value) to examine the genome-wide distribution of p-values for signs of excessive false-positive results. We generated Manhattan plots to report genome-wide p-values, regional plots for genomic regions within 100 Kb of top hits, and forest plots for meta-analyses and study-specific results of the most significant SNP associations. A threshold of p < 5 × 10−8 was pre-specified as being genome-wide significant (GWS), while a threshold of p < 2.3 × 10−6 was used to select SNPs for a replication study (suggestive genome-wide significant, sGWS).

Stage 2: replication

In each GWS or sGWS locus, we selected the lead SNP with the lowest p-value for replication. In addition, GWS or sGWS SNPs that had low-linkage disequilibrium with the lead SNPs (LD < 0.5) were also selected for replication. Both in silico replication and de novo genotyping for replication was conducted. In silico replication was done in 24 cohorts with GWAS SNP chip genotyping that did not have data available at the time of the initial discovery efforts (Supplementary Table 7). De-novo replication genotyping was done using: KBioScience Allele-Specific Polymorphism (KASP) SNP genotyping system (in OPRA, PEAK25, AGES, CAIFOS, DOPS cohorts), TaqMan (METSIM), Illumina OmniExpress + Illumina Metabochip (PIVUS and ULSAM), or Sequenom’s iPLEX (WHI) (Supplementary Table 8). Samples and SNPs that did not meet the quality control criteria defined by each individual study were excluded. Minimum genotyping quality-control criteria were defined as: SNP call rate > 90% and Hardy–Weinberg equilibrium p > 1 × 10−4.

Meta-analysis of replication and discovery studies

In the replication stage, we meta-analyzed results from: (1) individuals of European descent only (Rep-EUR); and (2) all replication cohorts with multiple ethnicities (Rep-All). Likewise we meta-analyzed results from discovery cohorts and European-descent-only replication cohorts (“Combined EUR”) from discovery cohorts and all replication cohorts (“Combined All”). To investigate and account for potential heterogeneities in allelic effects between studies, we also performed “trans-ethnic meta-analysis” using MANTRA25 in the replication sample that included all ethnic groups (“Rep-All”) and in the combined analysis of the discovery and all ethnic groups in the replication sample (“Combined All”).

A successful replication was considered if: (1) the association p-value in the cumulative-meta-analysis (Combined EUR) was genome-wide significant (p < 5 × 10−8) and less than the discovery meta-analysis p-value; or (2) the association p-value in the meta-analysis of replication-cohorts only (Rep-EUR) was less than p = 0.0024 (a Bonferroni-adjusted threshold at p = 0.05/21 since there were a total of 21 tests performed for whole body and appendicular lean mass in Rep-EUR cohorts during replication). Using the METAL package we also estimated I 2 to quantify heterogeneity and p-values to assess statistical significance for a total of eight associations that were replicated in the cumulative-meta-analysis (combined EUR, five SNPs for whole body and three for appendicular lean mass).

To estimate the phenotypic variance explained by the genotyped SNPs in the Framingham Heart Study (FHS), we used a restricted maximum likelihood model implemented in the GCTA (Genome-wide Complex Trait Analysis) tool package57, 58 and adjusted for the same set of covariates included in our GWAS.

Finally, we examined associations between all imputed SNPs in/near five genes (THRH, GLYAT, GREM1, CNTF, and PRDM16 including 60 kB up and downstream of the gene) and lean mass, as these genes have been implicated to have associations with lean mass in previous association studies18,19,20.

Annotation and enrichment analysis of regulatory elements

For coding variants, we predicted their function by PolyPhen-2. For all variants, we annotated potential regulatory functions of our replicated GWAS SNPs and loci based on experimental epigenetic evidence including DNAse hypersensitive sites, histone modifications, and transcription factor-binding sites in human cell lines and tissues from the ENCODE Project and the Epigenetic Roadmap Project. We first selected SNPs in high LD (r 2 ≥ 0.8) with GWAS lead SNPs based on the approach of Trynka et al.59 We then identified potential enhancers and promoters in the GWAS loci (GWAS SNPs and SNPs in LD with the GWAS SNPs) across 127 healthy human tissues/normal cell lines available in the ENCODE Project and the Epigenetic Roadmap Project from the HaploReg4 web browser60 using ChromHMM31. To evaluate whether replicated GWAS loci were enriched with regulatory elements in skeletal muscle tissue, we performed a hypergeometric test. Specifically we tested whether estimated tissue-specific promoters and enhancers in a GWAS locus were enriched in eight relevant skeletal muscle tissues/cell lines vs enrichment in non-skeletal muscle tissues (119 tissues/cell lines). The permutation with minimum p-value approach was performed to correct for multiple testing. Permutation p-values <0.05 were considered statistically significant. In addition, we also performed enrichment analyses in smooth muscle tissues/cells, fat tissue, brain, blood cells and gastrointestinal tract tissues. The eight skeletal muscle relevant tissues/cells were excluded when conducting enrichment analyses for other tissue types. The detailed information for tissue types and chromatin state estimation is described in the Supplementary Materials.

cis-eQTL

We conducted cis-eQTL analyses on the five replicated GWS loci, SNPs rs2943656, rs9991501, rs2287926, rs4842924, and rs9936385, with gene expression within 2 Mb of the SNP position. A linear regression model was applied to examine associations between SNP and gene expression. The eQTL analyses were performed in five studies with available human skeletal muscle tissues, including: GTEx61, STRRIDE62, 63, a study with chest wall muscle biopsies from patients who underwent thoracic surgery for lung and cardiac diseases64, the Finland-United States Investigation of NIDDM Genetics (FUSION) Study65, and a study of Pima Indians66. In addition, eQTL analyses were also conducted in studies with other human tissues, including subcutaneous adipose67, 68, omental adipose67, 68, liver tissue67, lymphocytes69, and primary osteoblasts70 (obtained from bone biopsies). These five GWAS SNPs were either genotyped or imputed in each sample. The detailed methods are described in the Supplementary Materials. Multiple testing was corrected by using false discovery rate (FDR q-value <0.05) to account for all pairs of SNP-gene expression analyses in multiple tissues and studies.

Data availability

All relevant data are available from the authors and summary level results are available on dbGaP.