Unveiling genetic variants for age-related sarcopenia by conducting a genome-wide association study on Korean cohorts

Sarcopenia is an age-related disorder characterised by a progressive decrease in skeletal muscle mass. As the genetic biomarkers for sarcopenia are not yet well characterised, this study aimed to investigate the genetic variations related to sarcopenia in a relatively aged cohort, using genome-wide association study (GWAS) meta-analyses of lean body mass (LBM) in 6961 subjects. Two Korean cohorts were analysed, and subgroup GWAS was conducted for appendicular skeletal muscle mass (ASM) and skeletal muscle index. The effects of significant single nucleotide polymorphisms (SNPs) on gene expression were also investigated using multiple expression quantitative trait loci datasets, differentially expressed gene analysis, and gene ontology analyses. Novel genetic biomarkers were identified for LBM (rs1187118; rs3768582) and ASM (rs6772958). Their related genes, including RPS10, NUDT3, NCF2, SMG7, and ARPC5, were differently expressed in skeletal muscle tissue, while GPD1L was not. Furthermore, the ‘mRNA destabilisation’ biological process was enriched for sarcopenia. Our study identified RPS10, NUDT3, and GPD1L as significant genetic biomarkers for sarcopenia. These genetic loci were related to lipid and energy metabolism, suggesting that genes involved in metabolic dysregulation may lead to the pathogenesis of age-related sarcopenia.


Results
Characteristics of the study participants. A total of 7753 eligible subjects were included in this study (2518 subjects from the VHSMC cohort and 5235 from the KARE cohorts). However, 792 were excluded due to the exclusion criteria ( Fig. 1), leaving a remainder of 6961 participants (1781 subjects from the VHSMC cohort and 5180 from the KARE cohort) that were included in analyses. The mean age of the VHSMC cohort was higher than that of the KARE cohort (69.10 ± 7.83 years vs 62.79 ± 8.33 years, P < 0.001, Table 1). No significant difference was observed in mean height (1.59 ± 0.08 m in VHSMC vs 1.59 ± 0.09 m in KARE, P = 1.000) between the two cohorts. The mean weight (63.24 ± 10.51 kg in VHSMC vs 62.64 ± 10.37 kg in KARE, P = 0.037) and BMI (24.74 ± 3.21 kg/m 2 in VHSMC vs 24.53 ± 3.15 kg/m 2 in KARE, P = 0.016) were statistically different between the cohorts. The LBM of the VHSMC cohort was lower than that of the KARE cohort (40.10 ± 7.83 kg vs 42.03 ± 8.27 kg, respectively, P < 0.001), whereas the body fat mass (BFM) of the VHSMC cohort was higher than that of the KARE cohort (20.60 ± 6.23 kg vs 18.28 ± 5.85 kg, respectively, P < 0.001). Descriptive statistics for subgroups according to sex are also presented in Table 1. The mean values of SMI and ASM, which could only be calculated for the VHSMC cohort, were 6.77 ± 1.00 kg/m 2 and 17.49 ± 4.06 kg, respectively.
GWAS meta-analysis of lean body mass and body fat mass. A total of 2,360,975 SNPs were used for the GWAS meta-analysis of LBM and BFM. Quantile-quantile (Q-Q) and Manhattan plots for LBM are shown in Fig. 2. The Q-Q plot revealed no evidence of test statistic inflation (variance inflation factor [VIF] = 1.044). The top ten variants for LBM are listed in Table 2; two of which were genome-wide significant loci. The most significant variant was rs1187118 (effect = 0.720, standard error [SE] = 0.117, P = 1.09 × 10 −9 , HetPVal = 0.199) near Glutamate Metabotropic Receptor 4 (GRM4) and High Mobility Group AT-Hook 1 (HMGA1), followed by rs3768582 (effect = 0.554, SE = 0.100, P = 4.09 × 10 −8 , HetPVal = 0.537) near Neutrophil Cytosolic Factor 2 (NCF2). The remaining eight variants are presented as candidate loci in Table 2. The Q-Q and Manhattan plots for BFM are shown in Supplementary Fig. S1. The Q-Q plot revealed no evidence of test statistic inflation (VIF = 1.037). www.nature.com/scientificreports/ The GWAS meta-analysis for BFM showed no genome-wide significant loci and the variant with the smallest P-value was rs1592269 (effect = 0.753, SE = 0.148, P = 3.43 × 10 −7 , HetPVal = 0.379) near GRM4 and HMGA1.
The top ten candidate loci associated with BFM with P-values < 1.00 × 10 −5 are listed in Supplementary Table S1.
As the GWAS results for the LBM and BFM phenotypes exhibited similar loci (GRM4 and HMGA1), linkage disequilibrium (LD) analysis was performed. A high (r 2 = 0.935) LD between rs1187118 and rs1592269 was observed, indicating a relatively high dependency.

GWAS of appendicular skeletal muscle and skeletal muscle index.
A total of 2,804,834 SNPs were used for GWAS analyses of ASM and SMI, using only the VHSMC cohort. The Q-Q and Manhattan plots for ASM are shown in Fig. 3; the Q-Q plot did not exhibit evidence of test statistic inflation (VIF = 1.031). The top ten variants for ASM are listed in Table 3
In addition, gene ontology (GO) analyses of biological processes revealed that the term 'mRNA destabilisation (GO: 0061157)' (FDR-adjusted P = 0.090) was enriched, which is involved in skeletal muscles related genes. The term contains a pathway of alpha-ketoglutarate-dependent dioxygenase FTO (U6 small nuclear RNA [2′-O-methyladenosine-N(6)-]-demethylase FTO), which is involved in the regulation of fat mass, adipogenesis, and body weight. Thus, it contributes to the regulation of body size and body fat accumulation 23 .

Discussion
This study discovered novel genetic biomarkers of LBM (rs1187118) and ASM (rs6772958) from the VHSMC and KARE cohorts, which comprise relatively aged (mean age: 69.10 vs. 62.79, respectively) Koreans. Their related genes for LBM, such as RPS10, NUDT3, NCF2, SMG7, and ARPC5, were expressed in skeletal muscle tissue. In addition, in the biological process, the term 'mRNA destabilisation (GO: 0061157)' (FDR-adjusted P = 0.090) was enriched for sarcopenia. This process contains alpha-ketoglutarate-dependent dioxygenase FTO. These results suggest that the pathogenesis of sarcopenia requires further investigation using a metabolic pathway linked to mRNA. www.nature.com/scientificreports/ The aetiology of sarcopenia is complex and includes oxidative stress, inflammation, inadequate diets, a sedentary lifestyle, and genetic factors 13 . A previous study on genetic markers for sarcopenia identified the loci near FTO, ESR1, NOS3, KLF5, and HLA-DQA1 to be associated with physical phenotypes, such as low handgrip strength and decreased LBM [24][25][26] . Nonetheless, these identified loci can only explain a small portion of phenotypic variations; thus, additional genetic loci should be identified. A recent large meta-analysis of the Cohorts for Heart and Ageing Research in Genome Epidemiology (CHARGE) Consortium and various other cohorts identified only a few loci, such as FTO and VCAN for LBM 22 . Therefore, assuming that identifying genetic variants for sarcopenia is challenging, we conducted GWAS analysis on a cohort comprising elderly subjects. The findings revealed that several genetic variants related to metabolism could be of importance in determining the pathogenesis of sarcopenia. Previous sarcopenia GWAS for European descendants showed association with FTO 22,27,28 and several loci, including TGFA and HLA-DRB1 29 .
Our meta-analysis for both LBM and BFM showed significant differences in the intergenic area of GRM4 and HMGA1, with a high LD between rs1187118 and rs1592269. HMGA1 is overexpressed in adipose tissue, impairs adipogenesis, and prevents diet-induced obesity, and insulin resistance 30 . The top loci for LBM and BFM www.nature.com/scientificreports/ were similar, and those of ASM and SMI were similar since the parameter of LBM was calculated from body weight minus BFM, and SMI was calculated from ASM/height 2 . Hence, it would be useful to calculate the correlation and genetic correlation for each parameter. In VHSMC cohorts, the correlation and genetic correlation were 0.078 and 0.078 between LBM and BFM, respectively, whereas those between SMI and ASM were 0.948 and 0.948, respectively. In KARE cohorts, the correlation was − 0.02 between LBM and BFM with the genetic correlation being 0.349. The eQTL analysis for muscle mass using GTEx datasets showed that RPS10, NUDT3, NCF2, SMG7, and ARPC5 were differentially expressed in the muscle tissue for sarcopenia. However, this finding requires further validation. As the regional locations of HMGA1, RPS10, and SIMM29 were in the upper stream of NUDT3, and may represent a regulatory function for the association of NUDT3 with sarcopenia, further focus should be directed towards NUDT3. A previous study by Singh et al. suggested that NUDT3 was a candidate targetlocus, and emphasised the need for real-world validation using transcriptome-wide association study (TWAS) approaches that combine GWAS and eQTL summary data 24 . In the current study, NUDT3 was found to be related to LBM in an elderly cohort. NUDT3 belongs to the MutT or Nudix protein families, which act as homeostatic checkpoints at important stages in inositol phosphate metabolic pathways. These pathways, such as phosphatidyl-1d-myo-inositol and glycerophospholipid metabolism, from the Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway database (https:// www. kegg. jp/ pathw ay/ map00 564) 31 may, therefore, be related to LBM. For these reasons, it is necessary to understand the metabolic aspects of sarcopenia. A study into DEGs in skeletal muscle tissues from patients with cachexia 32 showed that NCF2 was identified from signal pathways related to inflammation. These findings are consistent with the findings of the present study. SMG7 encodes a protein that is essential for nonsense-mediated mRNA decay, which is related to body height and BMI-adjusted waist circumference from a GWAS catalogue (https:// www. ebi. ac. uk/ gwas/). As SMG7 is linked with telomerase reverse transcriptase (TERT), sarcopenia may be related to muscle cell senescence via microRNA-195 33 . In addition, ARPC5 encodes the actin related protein2/3 complex, which exhibited a negative fold change in expression related to the cytoskeleton in muscle tissue 34 . As these findings may represent secondary changes, or may be postulated from bioinformatics analysis, further studies are needed.
Additionally, the present study found that GPD1L is a significant genetic marker for ASM and SMI in the VHSMC cohort. Although this is a novel finding for GWAS using ASM as a parameter for sarcopenia, it requires further validation by studies from several cohorts. A tissue-based study into rat muscle identified GPD1L as a candidate locus for sarcopenia 35 . These findings were also observed in a previous study that investigated the sarcopenic muscle tissue of elderly women 36 , in which GPD1L was found to be downregulated via cytoplasmic www.nature.com/scientificreports/ energy metabolism. In addition, a systemic genetic approach identified that GPD1L and its molecular mechanism for obesity in human adipose tissue were associated with energy metabolism 37 . GPD1L expression was found to be negatively correlated with microRNA-210 (miR-210) levels, and was consistently downregulated in obese subjects 37 . They hypothesised that the decreased miR-210 levels increased GPD1L, thus inhibiting hypoxic transcription factor-1α (HIF-1α) activity. A previous study into the circulating miRNAs in plasma revealed that miR-210 is significantly downregulated in elderly patients with sarcopenia, compared to patients without sarcopenia 38 . Combined with results of previous studies 37, 38 , the findings presented here suggest that GPD1L could be a genetic biomarker for sarcopenia, based on both miR-210 and HIF-1α pathways. Hence, an additional biomarker for sarcopenia may be postulated from this metabolic research. Recent studies into plasma biomarkers for sarcopenia have identified higher levels of amino acids and lower levels of phosphatidylcholines (PCs) and lysophosphatidylcholine (lysoPC) 39,40 . The association between GPD1L and PCs or lysoPC and sarcopenia may involve (1) dysregulation of GPD1L related to decreased PCs and lysoPC from previous lipid biomarkers 39,40 , www.nature.com/scientificreports/ or (2) an increase in the glycerol-3 phosphate pathway inducing changes in glycolysis via GPD1L. However, the results of the present study can only be used to suggest a genetic hypothesis; thus, further follow-up studies are needed. Analysis of the enriched biological processes identified via GO analysis of the cohorts revealed that alphaketoglutarate-dependent dioxygenase FTO is related to sarcopenia. This finding is consistent with those of a previous study on the influences of FTO and muscle phenotypes 27 . In addition, alpha-ketoglutarate is a component of the tricarboxylic acid cycle, which is related to the HIF-1α pathway. This evidence suggests that a simultaneous understanding of both genes and gene-metabolic pathways is necessary to understand the pathogenesis of sarcopenia.
One of the primary strengths of this study is the utilisation of a relatively elderly cohort sample, which provides a better sarcopenic phenotype. Here, NUDT3 and RPS10 were replicated using a real cohort, which was an approach suggested by a previous study using the TWAS of muscle tissue 24 . Furthermore, our study focused on East Asian subjects, which have not been fully evaluated, unlike other ethnic groups. In this regard, we conducted phenome-wide association studies (pheWAS) using the "Common metabolic disease knowledge portal" (https:// hugea mp. org), indicating that SNPs such as rs1187118, rs3768582, and rs6772958 are related to metabolic conditions such as waist-hip ratio, lipid metabolism, and body fat percentage in the European population (Supplementary Table S3).
Nevertheless, certain limitations were noted in this study. First, although novel signals for LBM and ASM were discovered with genome-wide significance, our results were based on bioinformatics analysis and, therefore, must be replicated in other Asian cohorts or multi-ethnic samples. A large number of samples for phenotypes, such as ASM and SMI, will improve the study's validity. Hence, further studies, including replication or metaanalysis, are needed in other cohorts of the Asian population. Moreover, the number of SNPs (2,804,834) in the VHSMC cohort was limited as we set the imputation accuracy to 0.9. These points should be considered in the interpretation of the results. Second, the difference in ageing biology between sexes further hinders the identification of meaningful biomarkers for age-related conditions. Although GWAS analysis was conducted according to sex, the results did not show significant loci with genome-wide significance. It is expected that a metabolite-GWAS, considering sex as a factor, could help address this problem. Third, bioinformatics analysis revealed that genetic variants and metabolic pathways were related to sarcopenia, however, the causality of this hypothesis requires further investigation. Moreover, previous studies on genetic variants in sarcopenia have shown that these variants may be associated with the effects of genetic, metabolic, and environmental factors 22,27,28 . Fourth, we used bioelectrical impedance analysis (BIA) for LBM and BFM examinations, as it is a non-invasive method for measuring body composition. However, dual-energy X-ray absorptiometry (DXA) is the standard method for muscle mass. BIA and DXA have different limitations for studies using body composition measurements. A previous study that compared these two methods found that BIA overestimated ASM compared to DXA 41 . In addition, BIA devices differed in the two cohorts (InBody 3.0 for KARE cohort, InBody770 for VHSMC cohort), which may be a confounding factor. In a technical review of BIA for people with high body fat, InBody 3.0 tended to be lower, with a difference of about 2% in an extreme case (unpublished data). A previous study showed that different BIA devices were reliable by high intraclass correlation coefficients and low standard errors 42 . Since the focus of our study was on muscle mass rather than fat mass, and we analysed each cohort using different PCs, differences associated with the BIA device between the two cohorts would not significantly influence the LBM values and analysis results presented in this study. However, it is necessary to consider these when interpreting research results.
In conclusion, sarcopenia can result in adverse outcomes, such as an increased risk of falls, a decreased quality of life, and mortality. Thus, it is necessary to identify a biomarker for this condition. Here, the loci near genes such as RPS10, NCF2, SMG7, ARPC5, and NUDT3 were identified to be significant biomarkers for LBM. In addition, the loci near GPD1L were identified as significant biomarkers for ASM and SMI, which serve as novel Table 3. Results of GWAS for appendicular skeletal muscle (leading SNPs, top 10). Chr, chromosome; SNP, single nucleotide polymorphism; MAF, minor allele frequency; SE, standard error; Mapped Genes from ANNOVAR; GWAS, genome-wide association study. a VHSMC, Veterans Health Service Medical Center; b Kref, Korean reference data; c GnomAD, Genome Aggregation Database (east Asian). www.nature.com/scientificreports/ index for sarcopenia. These genes are related to metabolism pathways, such as glycerophospholipid pathways, energy metabolic pathway, the inositol phosphate and HIF-1α pathways, and alpha-ketoglutarate-dependent dioxygenase FTO. Further studies are required to evaluate the aetiology of sarcopenia.

Methods
Study subjects. Schematic plots of the analytical study design are shown in Fig. 1. Data were obtained from two cohorts: the VHSMC (n = 2518) and KARE (Ansan/Ansung study: from Korean Genome and Epidemiology Cohort, n = 5235) cohorts. Each cohort has its own distinct characteristics. The VHSMC cohort is a hospitalbased elderly cohort that includes many patients with various diseases. The KARE cohort is a nationwide representative cohort for genome research in Korea; it is a longitudinal cohort of the Ansan and Ansung communities in Korea. This study included subjects from the KARE cohort and VHSMC cohort consisting of micro array data. Patients who had functional declines or limitations, or who had chronic diseases that may affect primary Muscle mass measurement. BIA measurements were performed using InBody 770 (Biospace Co., LTD, Seoul, Korea) in the VHSMC cohort and using InBody 3.0 (Biospace Co., LTD, Seoul, Korea) in the KARE cohort. Each subject stood on the footplate and held both of the hand electrodes. The screen automatically displayed measurements of LBM (kg), skeletal muscle mass (kg), BFM (kg), and body fat percentage (%). LBM and BFM data were available for both cohorts and were used as initial phenotypes for analysis. Subgroup analysis was conducted using ASM or SMI, which were derived from BIA; these data were available only for the VHSMC cohort. The parameters were defined according to the consensus of the AWGS 2019 11 .
Genotyping and imputation. Genomic DNA was separated from venous blood samples, and 100 ng DNA was genotyped using Korea Biobank Array Affymetrix Axiom 1.1 (Affymetrix, Santa Clara, CA), which was designed by the Korean National Institute of Health 43 . Genotypes were identified with a K-medoid clustering-based algorithm to minimise the batch effect 44 . The PLINK (version 1.9, Boston, MA) 45 and ONETOOL 46 software packages were used for quality control procedures and association analyses. Samples matching any of the following criteria were excluded: (1) sex inconsistencies or (2) a call rate of up to 97%. SNPs were filtered if the call rate was lower than the Hardy-Weinberg equilibrium (HWE) test (P < 1 × 10 −5 ). The genotype imputation was conducted using the Michigan imputation server (https:// imput ation server. sph. umich. edu). Only  Statistical analyses. Baseline characteristics of the study population are presented herein as means with standard deviation (SD) for continuous variables and numbers, and as proportions for categorical variables. Genome-wide analyses were conducted using a linear model; PLINK was used within each cohort. Age, sex, and ten principal component scores were included as covariates. Meta-analyses of the VHSMC and KARE cohorts were performed using the METAL software (http:// csg. sph. umich. edu/ abeca sis/ meta). Cochran's Q-test for heterogeneity was conducted; its P-value was marked with 'HetPVal' 50 , where HetPVal < 0.05 indicates heterogeneity between two datasets 51 . The dense regional association result of each GWAS was plotted using the LocusZoom software 52 . The threshold for statistical significance in this model was P < 5.0 × 10 −8 , which is conventionally considered to reflect genome-wide significance.
Functional annotation analyses. Expression Quantitative trait (eQTL) studies were performed using the Genotype-Tissue Expression (GTEx) dataset (https:// gtexp ortal. org/ home/), which provides a variety of human tissues from donors using the densely genotyped data to assess genetic variations within their genomes. Genes related to metabolites were analysed using KEGG pathway analysis 31 . Associated genes were further investigated for DEGs in the skeletal muscles of subjects 19 to 28 and 65 to 76 years of age from the Gene Expression Omnibus (GEO) dataset (GSE38718) 53 . In addition, biological process, cellular component, and molecular function GO analyses were performed using gene set enrichment analysis. The Benjamini-Hochberg false discovery rate (FDR)-adjusted 0.1 significance level was applied for multiple hypothesis test corrections 54 .
Ethics declarations. The institutional review boards of the Veterans Health Service Medical Center approved this study protocol and informed consent waiver (IRB No. 2020-02-015 for VHSMC cohort and IRB No. 2021-05-005 for KARE cohort) since this study was performed in retrospective manner, and the study was conducted in compliance with the Helsinki Declaration.
Consent to participate. Informed consent waiver was approved by the institutional review boards of the Veterans Health Service Medical Center since this study was performed in a retrospective manner and anonymised and de-identified data were used for the analyses. The KARE cohort and VHSMC cohort obtained the informed consents from participants.

Data availability
The data supporting the findings of this study are available upon reasonable request.