The genetic architecture of appendicular lean mass characterized by association analysis in the UK Biobank study

Appendicular lean mass (ALM) is a heritable trait associated with loss of lean muscle mass and strength, or sarcopenia, but its genetic determinants are largely unknown. Here we conducted a genome-wide association study (GWAS) with 450,243 UK Biobank participants to uncover its genetic architecture. A total of 1059 conditionally independent variants from 799 loci were identified at the genome-wide significance level (p < 5 × 10−9), all of which were also significant at p < 5 × 10–5 in both sexes. These variants explained ~15.5% of the phenotypic variance, accounting for more than one quarter of the total ~50% GWAS-attributable heritability. There was no difference in genetic effect between sexes or among different age strata. Heritability was enriched in certain functional categories, such as conserved and coding regions, and in tissues related to the musculoskeletal system. Polygenic risk score prediction well distinguished participants with high and low ALM. The findings are important not only for lean mass but also for other complex diseases, such as type 2 diabetes, as ALM is shown to be a protective factor for type 2 diabetes.

1. In relation to the publication mentioned above, it would be interesting to see age-specific effects (how do the effect of the discovered SNPs compare in middle-aged vs elderly?). 2. Are the authors aware of the issues with the currently released versions of the exome sequencing data from the UK Biobank [systematic under-marking of duplicate reads in the SBP pipeline and under-calling of variants in primary assembly regions from which the alternative contigs are derived for the FE exome dataset.]? How was this handled? 3. "Genes involved in multiple gene sets are likely to act as hub genes and may play a central regulatory role." While this would be true in an ideal world, the way GO and other gene sets are defined are highly bias towards more studied genes, generating a vicious circle. Also the fact that the 2nd and 3rd most often appearing gene show no significant gene-level score, supporting no link with the disease, may simply represent study-bias. 4. "It is also negatively correlated with body fat (rg=−0.17, 9.44E-6), suggesting an opposite developmental direction towards lean and fat mass." -I'm not sure I agree with the interpretation. Any SNP that is responsible for general body size (height or weight) will show an opposite effect on lean and fat mass, because it is associated with something that is related to the sum of these two measures. 5. As a proof of concept and support of the bioimpedance based measure, could the derived PRS be correlated with the DXA-measured lean mass (in up to 5K samples)? A score that explains 10% of trait variance must have an easily detectable correlation in 4k samples. 6. I wonder whether the authors see any relationship between (self-reported or preferably actimeter-measured) physical activity and lean mass? If so, should it be corrected for? 7. While the authors corrected for height and height^2, I was wondering how similar their trait residuals are to a lean body mass index (=lean mass/(height^2)) measure?
Minor points 1. "Phenotypic outliers were monitored by the Tukey method." -You should be more specific, have those values been removed? Did you use k=1. 5? In this study, Pei et al. perform a GWAS of appendicular lean mass in ~450,000 participants of the UK Biobank (UKB) of European ancestry. The authors detect 717 variants associated with ALM. They then assess the predictive accuracy of a resulting polygenic score and perform a Mendelian randomisation to test if variation in ALM are causal of certain diseases like type 2 diabetes. Overall, I found the paper well written and clear in its message. I noted below a few comments, which, if addressed, would improve the quality of the manuscript.
Major comments: Ancestry. As far I can judge, they authors have determined ancestry based on self-report, which may be unreliable in many instances. In this cases, method like mixed-models GWAS implemented in BOLT may be unsuccessful to fully correct for stratification. I'd recommend that the authors, at least use projected PCs to call ancestry. I think this has been done centrally by the UKB so the authors could request that information.
Replication. The authors propose to assess the replication or replicability of their results by looking up the association within sexes. I don't think this is a proper replication approach and therefore I'd recommend that the authors remove it. They could for example either perform a look up in summary statistics from previous GWAS or they could look up in non-European ancestry participants the UKB. In particular within South-Asian ancestry participant who show the smallest Fst (differentiation) with European ancestry participants.
Collider bias. Many of the genes/loci highlighted by the authors seems to overlap with those of height. I'm wondering to which extent this could be due to collider biases given that the authors have adjusted their analyses for height (and height squared), which is an heritable trait. Can the author perform an analysis unadjusted for height and show correlation of effect sizes? Also, check how many of these 717 SNPs remain genome-wide significant when not adjusting for height? Variance explained. The authors calculate the variance explained by the genome-wide significant SNPs by simply summing up the 2p(1-p)b^2, where p is the minor allele frequency (MAF) and b is the estimated SNP effect. Because of estimation error (although small here given and sample size is quite large) and most importantly because of winner's curse this estimator of the variance explained is necessarily inflated. The ideal case would be to re-estimate the variance explained in an independent sample. Or at the very least perform a correction for Winner's curse.
MR. The authors applied a test for heterogeneity (HEIDI) to exclude the possibility that their finding are driven by pleiotropy and not by causality. Line 231: SMR is not a SNP prioritisation method but a GENE prioritisation method.
Line 270: I think that "computational" would sound better here than "running".
Line 273-274: That looks like a trick to make LDpred run faster, right? Please clarify and justify why you've pre-selected these SNPs. In principle LDpred could run without.
Line 303: Please rephrase as the sentence does not make sense.
Line 351: Did you use clumping to identify these loci? Please clarify.
Line 371: Are these 302 SNPs the most associated one vs the 259? The conclusion is a bit rushed otherwise.
Line 381-388: Your set of SNPs is ascertained to have limited differences between sexes. Therefore you lack power to identify sex-specific effects. That should be acknowledged somewhere (discussion probably).
Line 387: please report these effects here.
Line 393: "As expected…" please justify why this relation expected? Do we have evidence of negative selection for this trait?
Line 400: ER (enrichment ratio) would be better than EA.
Line 421: rs400596 is an eQTL for PAM in blood. Do you also have the strength of association with expression of PAM in skeletal muscle? Please report it as that would add to your demonstration.
Line 424: its potential function relevance Line 433: Please replace "are not able to derive" with "could not be calculated". Do you know why? Is it because there were not many I suppose.
Line 438-443: r2>0.3 is too lenient for rare variants. The main challenge with imputation is allele frequency. So Are the 33 variants common or rare? If the former then, this is expected.
Line 556: Your analyses were adjusted for AFM, then you find a negative correlation with fat mass. This looks like collider to me.
Line 593: single variant do not have heritability. You could instead speak about variance explained by the variant.
Line 797: Could you please specify what curve has been fitted?

Reviewer #1
Comment #1 "Study participants. Why the authors select self-reported white individuals? The authors carried out the genetic association analysis using linear mixed models to control for population structure. Furthermore, it is not clear for me why the authors discarded individuals genotyped but no imputed. In total, how many individuals are removed from the initial dataset?" Response: Thank you very much for your constructive comments.
Although mixed-model analysis is effective in correcting for population structure, careful data quality control remains critical. Therefore, we only analyzed self-reported white individuals, as recommended by the BOLT-LMM paper [1], as a key first measure to control potential population stratification. Many previous studies also focused on self-reported white individuals, such as [2,3].
In the UKB analysis, a total of 968 participants were identified with unusually high heterozygosity and/or missing rates [4]. These genotyped participants were not imputed and therefore were excluded from our association analysis.
In total, 487,378 participants were both phenotyped and imputed, of whom, 37,135 were removed. The final number of participants is 450,243. Please refer to Page 6 Lines 97-100. Therefore, to avoid collider bias, we decided not to use height and height square as covariates in this revised manuscript. We noticed that none of the several recent GWAS of lean mass used height as covariate [5][6][7]. We have re-done all follow-up analyses. In brief, a total of 1,059 variants from 799 loci were identified, explaining 15.5% phenotypic variance. Response: The PCs were centrally calculated and released by the UKB. The number of PC determined "Caucasian" participants was 409,616, a sample size that was ~40,000 less than that of self-reported white participants. Though the sample of Caucasian participants was more homogeneous, the reduced sample size may cause a certain loss of statistical power for association testing. In contrast, even with a modest level of population stratification in the self-reported white population, the linear mixed model was still able to correct for it. Indeed, the analysis of self-reported white participants was recommended by the authors of BOLT-LMM, and was performed on 23 UKB traits in the BOLT-LMM publication [1]. We also noticed that multiple previous studies analyzed self-reported white participants, such as [2,3].
In this revised manuscript, we have performed additional association analysis in 400,879 PC determined Caucasian participants. The attenuation ratio (AR), which measures the relative contribution of confounding factors and is defined as (intercept-1)/(mean chi^2−1), is 0.13. This measure is equal to that from the self-reported white populations, implying that including self-reported white participants may not introduce additional population stratification. In the 400,879 PC determined Caucasian participants, of the 1,059 lead primary SNPs (as detected in the self-reported white population), up to 980 (92.5%) remain significant at the 5×10 -9 level. P-values for the remaining 79 SNPs range from 5.09×10 -9 to 4.56×10 -6 . The weaker signals are likely due to the reduced sample size instead of population stratification.
Please refer to Page 19 Lines 364-366. With due respect, we would like to argue that our GWAS replication was indeed rigorous and conservative. Essentially, our study may be performed based on a two-stage design, with the first stage involving GWAS in one sex group (e.g., male), and the second stage replicating top hits in the other sex group (e.g., female). As we assumed a maximal number of 1,000 independent loci for the lean mass trait, we selected the top 1,000 hits from the first stage (male group) for replication in the second stage (in female group). Since we only tested 1,000 independent SNPs in the second stage, a Bonferroni-corrected significance level of 5×10 -5 (0.05/1000) was conservative to declare a successful replication. Indeed, in our original analysis with height adjusted, the number of independent loci with p<5×10 -5 was smaller than 1,000 in both sex groups, which means that had we followed a two-stage design all the loci would have passed the significance threshold of 5×10 -5 and been replicated.
In this revised analysis without height adjustment, the number of independent loci with p<5×10 -5 was 1,988 and 1,713 in female and male populations, respectively, almost twice the presumed number of 1,000. This may inflate type I error rate for the Tier 3 variants (i.e., whose p values were in the range of 5×10 -5 > p >5×10 -9  Response: We agree that the SNP effect may be over-estimated due to the winner's curse effect. However, as stated by Zhong and Prentice [8], the winner's curse bias becomes negligible as the sample size becomes large. As you have pointed out, given the huge sample size (total N=450,000), the magnitude of the inflation was expected to be very minor in the present study.
In this revised manuscript, we have corrected for the winner's curse effect by a simple FDR based method [9]. The explained phenotypic variance is 17.8% and 15.5% before and after the FDR correction, respectively. We have also estimated the heritability explained by all the identified loci. Specifically, we removed all variants in the identified loci (lead SNP + 500 kb flanking region to either side) from the original data, and estimated the total heritability using BOLT-REML. We then subtracted the estimated heritability from another estimation based on all variants, which produced the heritability explained by the identified loci. As a result, the identified loci collectively explained a heritability of 0.279 and 0.288 in females and males, respectively.
Therefore, there may still exist a portion of ~10% heritability in the identified loci, whose contribution to the phenotypic variance is yet to be defined. Response: This negative correlation could be either due to phenotypic constraint between the two measures, or due to collider bias as pointed out by reviewer #2, since it was fat mass adjusted lean mass that was used in our association analysis. In this revised manuscript, we re-interpreted the results with caution. Please refer to Page 28 Lines 579-582. Response: We did not examine the relationship between them. While we acknowledge that physical activity does have a significant effect on lean mass [11], our requested data did not include any physical activity related items. In this revised manuscript, we acknowledged this as a significant limitation in the discussion section. Please check Page 32 Lines 660-662.
Comment #8: "While the authors corrected for height and height^2, I was wondering how similar their trait residuals are to a lean body mass index (=lean mass/(height^2)) measure?" Response: In our opinion, the adjustment of height and height^2 is more accurate than lean body mass index in correcting for the effect of height, because lean mass may not have a perfect relationship with height^2. In response to reviewer #2, we have noticed that including height into covariates has caused collider bias. Therefore, we instead analyzed appendicular lean mass without adjustment of height. We also noticed that several previous GWAS studies did not adjust for height either [5][6][7]. Response: Yes k=1.5 was used to monitor the outlier. However, these outliers were not removed, because neither fat mass nor lean mass followed a normal distribution. Instead, both distributions were right skewed (Supplementary Figure 1), which means an excess number of outliers were present at the right tail of the distribution. When using the Tukey (k=1.5) method, up to 8,787 male subjects and 10,948 female subjects were classified as outliers. But they were not necessarily extreme observations given such a huge sample size. For example, the range to define normal male lean mass was [18.25, 38.65] kilograms. Any values outside this range would be considered as outliers, which was obviously over conservative. Excluding these outliers may reduce statistical power in subsequent association test. Therefore, we did not exclude these extreme outliers in order to maximize the power for association test.
The most adverse effect caused by extreme outliers is that they may bias the estimation of effect size. However, in our analysis the raw lean mass was already transformed into a standard normal distribution variable that was later used in association test. As a matter of fact, no outliers were observed in the transformed variable that was used for effect size estimation and hence the potential adverse effect of outliers is unlikely to exist in our analysis. score across all the 1,059 lead variants is as high as 0.97, demonstrating high imputation certainty.
These loci may represent potential sex specificity loci pending further replication. In response to your concern, we downloaded the GWAS summary statistics of the 23 traits released by the BOLT-LMM paper and estimated AR in these traits using the same commend. We got inflated AR values for all traits than the ones reported in the BOLT-LMM paper. For example, for height trait, we got an AR of 0.116, which is higher than the reported one 0.078.
Upon communicating with the corresponding author Dr. Po-Ru Loh of the BOLT-LMM paper, he recommended a more sophisticated commend that was used in their study, For our lean mass trait, the refined AR was estimated to be 0.078 (s.e 0.008), which is comparable to that of most of the 23 traits in the BOLT-LMM study. Therefore, we are confident that the inflated estimation was due to different commends of the LDSC software instead of uncorrected population stratification.
In this revised manuscript, we have updated our results.