Integrative analyses of genomic and metabolomic data reveal genetic mechanisms associated with carcass merit traits in beef cattle

Improvement of carcass merit traits is a priority for the beef industry. Discovering DNA variants and genes associated with variation in these traits and understanding biological functions/processes underlying their associations are of paramount importance for more effective genetic improvement of carcass merit traits in beef cattle. This study integrates 10,488,742 imputed whole genome DNA variants, 31 plasma metabolites, and animal phenotypes to identify genes and biological functions/processes that are associated with carcass merit traits including hot carcass weight (HCW), rib eye area (REA), average backfat thickness (AFAT), lean meat yield (LMY), and carcass marbling score (CMAR) in a population of 493 crossbred beef cattle. Regression analyses were performed to identify plasma metabolites associated with the carcass merit traits, and the results showed that 4 (3-hydroxybutyric acid, acetic acid, citric acid, and choline), 6 (creatinine, l-glutamine, succinic acid, pyruvic acid, l-lactic acid, and 3-hydroxybutyric acid), 4 (fumaric acid, methanol, d-glucose, and glycerol), 2 (l-lactic acid and creatinine), and 5 (succinic acid, fumaric acid, lysine, glycine, and choline) plasma metabolites were significantly associated with HCW, REA, AFAT, LMY, and CMAR (P-value < 0.1), respectively. Combining the results of metabolome-genome wide association studies using the 10,488,742 imputed SNPs, 103, 160, 83, 43, and 109 candidate genes were identified as significantly associated with HCW, REA, AFAT, LMY, and CMAR (P-value < 1 × 10–5), respectively. By applying functional enrichment analyses for candidate genes of each trait, 26, 24, 26, 24, and 28 significant cellular and molecular functions were predicted for HCW, REA, AFAT, LMY, and CMAR, respectively. Among the five topmost significantly enriched biological functions for carcass merit traits, molecular transport and small molecule biochemistry were two top biological functions associated with all carcass merit traits. Lipid metabolism was the most significant biological function for LMY and CMAR and it was also the second and fourth highest biological function for REA and HCW, respectively. Candidate genes and enriched biological functions identified by the integrative analyses of metabolites with phenotypic traits and DNA variants could help interpret the results of previous genome-wide association studies for carcass merit traits. Our integrative study also revealed additional potential novel genes associated with these economically important traits. Therefore, our study improves understanding of the molecular and biological functions/processes that influence carcass merit traits, which could help develop strategies to enhance genomic prediction of carcass merit traits with incorporation of metabolomic data. Similarly, this information could guide management practices, such as nutritional interventions, with the purpose of boosting specific carcass merit traits.


Results
Metabolites associated with carcass merit traits. The results of regression analyses showed 15 out of 31 analyzed metabolites were associated with one or more than one of the carcass merit traits (Table 1). At P-values less than 0.05, 3 (3-hydroxybutyric acid, acetic acid, and citric acid), 3 (creatinine, l-glutamine, and succinic acid), 1 (methanol), 1 (l-lactic acid), and 3 (succinic acid, lysine, and glycine) metabolites were identified as associated with HCW, REA, AFAT, LMY, and CMAR, respectively. However, some metabolites with a P-value from 0.05 to 0.1 explained more than 1% of the phenotypic variance of the associated carcass merit traits, thus, a relatively relaxed threshold of P-value < 0.1 was chosen to include more metabolites that may be potentially associated with carcass merit traits. For HCW, at P-values less than 0.1, 3-hydroxybutyric acid, acetic acid, citric acid, and choline were the associated metabolites, accounting for 1.92%, 1.69%, 1.48%, and 1.09% of the phenotypic variance in HCW, respectively. Creatinine, l-glutamine, succinic acid, pyruvic acid, l-lactic acid, and 3-hydroxybutyric acid were significantly associated with REA, and these six metabolites accounted for 1.87%, 1.79%, 1.60%, 1.56%, 1.04%, and 0.80% of phenotypic variance, respectively. AFAT was associated with fumaric acid, methanol, d-glucose, and glycerol, and these four metabolites explained 2.40%, 1.71%, 1.67%, and 1.39% of the phenotypic variance, respectively. l-lactic acid and creatinine were associated with LMY and accounted for 2.71% and 1.42% of phenotypic variance, respectively. Five metabolites, including succinic acid, fumaric acid, lysine, glycine, and choline, were associated with CMAR and each respectively accounted 1.76%, 1.36%, 1.22%, 1.20%, and 1.05% of the phenotypic variance in CMAR, respectively. Most of these metabolites were mainly associated with a single trait. However, a few metabolites were associated with more than one trait. For example, 3-hydroxybutyric acid was associated with both HCW and REA. l-lactic acid and creatinine were both associated with REA and LMY. Choline was associated with HCW and CMAR, and fumaric acid was associated with both AFAT and CMAR ( Table 1). The additive genetic variance and heritability estimates of the 15 metabolites associated with the carcass merit traits were low to moderate (Table S1 in Supplementary file 1).

Significant SNPs and candidate genes associated with metabolites. Genomic inflation factors
for all association analyses ranged from 0.95 to 1.01 (Table S2 in Supplementary file 1), a value around 1 indicates that there is no population stratification, and the statistical models are well fitted. Summarized results of the mGWAS for the 15 metabolites (identified as associated with the carcass merit traits, P-value < 0.1) are presented in Table 2. Manhattan plots and QQ plots are provided in Figs. S3-S17 in Supplementary file 4. The average of phenotypic variance of the metabolites explained by a single SNP was 5.13% with a range of 3.57-10.95%. Details of significant SNPs for the 15 metabolites are provided in Supplementary file 2. The number of genes associated with the 15 metabolites varied from 3 (fumaric acid) to 53 (succinic acid) and a full list of candidate genes for each of the 15 metabolites is provided in Supplementary file 3.
Through integrating the metabolite and carcass merit trait regression analyses and the mGWAS results, a total of 103, 160, 83, 43, and 109 candidate genes were found to be associated with HCW, REA, AFAT, LMY, and CMAR, respectively ( Table 3). As for metabolites, some candidate genes identified through the mGWAS were associated with multiple carcass traits (  (Tables S4-S8 in Supplementary file 1). Interestingly, 75% of the biological functions were commonly associated with all the five carcass merit traits in this study (Table S9 in Supplementary file 1 and Fig. S2 in Supplementary file 4). Some of the major functions common across the traits included molecular transport, small molecule biochemistry, lipid metabolism, cell-to-cell signaling and interaction, carbohydrate metabolism, cellular assembly and organization. Additionally, the five topmost significantly enriched biological functions for each trait and the candidate genes involved are presented in Table 4. Among the top five significant enriched functions, molecular transport and small molecule biochemistry were commonly associated with all carcass merit traits. Lipid metabolism was the most significant biological function for LMY and CMAR, and it was the second and fourth highest biological function for REA and HCW, respectively. Cell-to-cell signaling and interaction was one of the top significant biological functions associated with HCW, REA, AFAT, and CMAR. Carbohydrate metabolism was among the top significant biological functions associated with both HCW and CMAR. Further investigation within some of the biological functions revealed molecular/metabolic processes related to the carcass merit traits. For HCW, within the molecular transport function, 11 genes (AGTR1, CHKA,  CPT1A, DDX5, IGHMBP2, IL21R, LRP5, NTRK2, PVALB, SULT1E1, and TRAF3) were involved in concentration of corticosterone and lipid, and quantity of steroid and steroid hormone (Fig. 1a). For CMAR, within the lipid metabolism function, 17 candidate genes (AGTR1, AQP9, CCDC80, CHKA, CPT1A, DAB1, DDX5,  IGHMBP2, LRP5, PLA2G2A, PLA2G2E, PLA2G5, PTGS1, PVALB, SLC10A1, SPRED2, and SSPN) were involved in multiple metabolic processes related to fatty acid and lipid metabolism (Fig. 1b), such as synthesis of fatty acid and release of lipid. Additionally, within the carbohydrate metabolism function for CMAR, 12 candidate genes (AGTR1, AQP9, CHKA, CPT1A, GYG1, LRP5, NKX3-2, PDCL, PLA2G2A, PLA2G2E, PLA2G5, and TREH) were involved in carbohydrate metabolic processes such as carbohydrate biosynthesis (Fig. 1c). It is worth noting that 8 candidate genes (AGTR1, AQP9, CHKA, CPT1A, LRP5, PLA2G2A, PLA2G2E, and PLA2G5) associated with CMAR were involved in both lipid and carbohydrate metabolism. Table 1. A summary of metabolites associated with carcass merit traits in a multibreed population of beef cattle. 1 HCW hot carcass weight in kg, REA rib eye area in cm 2 , AFAT average backfat thickness in mm, LMY lean meat yield in %, CMAR carcass marbling score from 100 (trace marbling) to 499 (more marbling). 2 The unit of metabolite concentration is µM. 3 The significance level of regression analysis is P-value < 0.1. 4 b regression coefficient. 5 V m /V P : the proportion of phenotypic variance of carcass merit traits explained by associated metabolites (%).  [15][16][17] . However, improving understanding of the biology involved is hampered by the limited knowledge of how these metabolites are associated with different economically important traits in different livestock species. Carcass merit traits are of fundamental interest to every beef producer and everyone involved in the beef industry. However, these traits are relatively expensive to measure using ultrasound technologies on individual live animals, which is a limitation for selection and improvement of these traits. Since blood metabolites are easily measurable/quantifiable even on live animals, we speculate that identification of genetic/biological associations between metabolite concentrations and beef cattle carcass merit traits could potentially enhance genetic prediction and selection for these traits in beef cattle. In addition, identification of blood metabolite biomarkers associated with carcass traits at an earlier stage would have a more practical application for genetic selection and for sorting animals into different finishing groups for more uniform carcass outputs. Therefore, we collected the blood samples at the start of the feedlot test instead of close to slaughter and examined the associations between 31 plasma metabolites and 5 carcass merit traits. We further explored the potential biological linkage between these metabolites and carcass merit traits. Our results showed that several metabolites were associated with the carcass merit traits studied. However, individual metabolites, despite being significantly associated with the trait, only accounted for 0.80-2.71% of the total phenotypic variance of carcass merit traits. This relatively small percentage of phenotypic variance reflected the complex nature of these traits, which we believe are regulated by multiple metabolic pathways involving many metabolites with each having only a small contribution/effect. It is also possible that due to the limited number of metabolites we profiled in the current study, we were not able to identify those metabolites with major influences on the traits studied. Additionally, this study used a more relaxed threshold (P-value < 0.1) to identify metabolites potentially associated with carcass merit traits, therefore, validation in independent beef cattle populations or further studies considering a wider range of metabolites is warranted. It is also worthwhile to analyze metabolites on samples collected at different developmental stages to see whether and how the associations between the metabolites and the carcass traits may change. Furthermore, Table 2. A summary of significant SNPs, the number of putative QTLs, and the number of candidate genes for metabolites associated with carcass merit traits in a multibreed population of beef cattle. 1 The unit of metabolite concentration is µM. 2 The P-value range (minimum to maximum) of significant SNPs, the significance level is P-value < 1 × 10 -5 . 3 β range: the range of allele substitution effect of each significant SNP. 4 V SNP /V P range: the range metabolite phenotypic variance explained by each significant SNP (%). 5 V SNP /V P mean: the average of metabolite phenotypic variance explained by each significant SNP (%). 6 No. of QTL: the number of putative QTLs identified for each metabolite. 7 No. of gene: the number of candidate gene identified for each metabolite.
Metabolite 1 P-value range 2 β range 3 V SNP /V P range (%) 4 V SNP /V P mean (%) 5  www.nature.com/scientificreports/ we observed that a majority of the significant metabolites were only associated with one trait. However, some metabolites in the current study were associated with two traits, indicating potential biological relationships between these traits. For example, in this study, we observed that 3-hydroxybutyric acid was associated with both HCW and REA, and beef cattle with high HCW and REA had lower concentration of 3-hydroxybutyric acid, indicating that animals with high HCW and REA may have better carbohydrate metabolism. Additionally, 3-hydroxybutyrate is the main representative of ketone bodies and one important function of ketone bodies is to provide acetoacetyl-CoA and acetyl-CoA for the synthesis of cholesterol, fatty acids, and complex lipids 20 . Thus, a lower concentration of 3-hydroxybutyric acid may lead to reduced lipid synthesis in animals with high HCW and REA. Interestingly, creatinine, the final catabolite of muscle energy metabolism 21 , was positively associated with both REA and LMY in the current study (Table 1), and these two carcass merit traits measure muscle Table 3. Metabolites and their candidate genes associated with carcass merit traits in a multibreed population of beef cattle. 1 HCW hot carcass weight in kg, REA rib eye area in cm 2 , AFAT average backfat thickness in mm, LMY lean meat yield in %, CMAR carcass marbling score from 100 (trace marbling) to 499 (more marbling). 2 The unit of metabolite concentration is µM.  Hydroxybutyric acid  RNASE1, RNASE6, RNASE4, ANG2, CDH13, TRAF3, AMN, CDC42BPB, PGM2, SULT1E1, CSN1S1,  CSN2, HSTN, FRAS1, ANXA3, LOX, SRFBP1, NTRK2, COL12A1   Acetic acid   PFN2, AGTR1, S100Z, CRHBP, AGGF1, LBH, YPEL5, ATAD2B, KLHL29, OR12K5, OR1B1, OR1L1,  OR1L3, OR10W4, OR5B17, IZUMO2, MYH14, ZNF814, SNORA70, TMEM132E, HMGCLL1, GFRAL,  TRAM2, TMEM14A, GSTA2, GPR139, UMOD, PDILT, ACSM5, ACSM2B, ACSM1, UQCRC2, PDZD9,  MOSMO, VWA3A, IL21R, GTF3C1, KATNIP, ARMH3, HPS6, LDB1, PPRC1, SNORD22, DUSP5,  SMC3, RBM20, KLRC1, APBB2, MAN2A1   Citric acid  SERPINE3, INTS6, ZNF667, ZNF583, USP32, CA4, ZNHIT3, MYO19, TRAF3, AMN, CDC42BPB www.nature.com/scientificreports/ development and the proportion of lean meat in a carcass respectively. In line with our results, Hanset et al. 22 previously observed higher concentrations of plasma creatinine in double muscled bulls as compared to conventional or normal muscled bulls, and Patel et al. 23 proposed creatinine in serum as a promising biomarker for human muscle mass. These previous studies and the results from our study demonstrate that creatinine is a potential indicator trait or biomarker for muscle related traits in beef cattle. Finally, the heritability of metabolites estimated in this study could be used as reference information. Large standard errors for these estimates were observed due to the sample size used and future research utilising a larger sample size is warranted.
Candidate genes, enriched molecular functions and gene networks for carcass merit traits. Generally, identification of SNPs and genes associated with carcass merit traits mainly relies on association studies between DNA variants and the traits. For example, Wang et al. 24 performed GWAS based on 7.8 million imputed whole genome sequence variants for carcass merit traits using Canadian beef cattle and they identified hundreds of candidate genes associated with carcass merit traits. However, the knowledge about the Table 4. Five topmost significantly enriched biological functions for carcass merit traits, and genes involved in functions. 1 HCW hot carcass weight in kg, REA rib eye area in cm 2 , AFAT average backfat thickness in mm, LMY lean meat yield in %, CMAR carcass marbling score from 100 (trace marbling) to 499 (more marbling). 2 The P-value range (minimum to maximum) of significant biological functions, the significance level is P-value < 0.05.  24 for HCW, REA, AFAT, LMY, and CMAR, respectively. Of note, some of the candidate genes were also reported in different cattle breeds or populations in other studies. For example, ST18 was associated with AFAT in the current study and it was associated with the metabolite d-glucose. Medeiros de Oliveira Silva et al. 25 also identified ST18 as candidate gene for backfat through GWAS in a Nelore cattle population. Additionally, by integrating metabolomic data, this study added more information to some previously identified associations between genes and carcass merit traits. For example, Wang et al. 24 reported that UMODL1, L3HYPDH, JKAMP, and LUZP2 were candidate genes associated with REA and LMY, but the potential mechanism of how these genes could influence the two traits remained unclear. Our study showed that these same genes (UMODL1, L3HYPDH, JKAMP, and LUZP2) were associated with the concentration of creatinine www.nature.com/scientificreports/ which is a metabolite associated with REA and LMY. These results indicated that these genes may be associated with the synthesis or degradation of creatinine in animals and thereby influence the related traits. Similarly, HLTF, GYG1, RYR2, RBM47, and APBB2 were reported to be associated with REA and CMAR by Wang et al. 24 .
Our results showed these genes were associated with succinic acid which was negatively associated with both REA and CMAR. Both examples represent one of situations that genes may influence different traits by regulating the same metabolites, and the mechanism of how these genetic variants affect the concentration of metabolites still needs more studies. According to our results, we would like to highlight that some genes could affect the same carcass merit traits by influencing different metabolites. For instance, AMN was associated with both 3-hydroxybutyric acid and citric acid, and both metabolites were identified as associated with HCW. Therefore, information obtained via analyzing metabolites could improve the understanding of genetic effects on these phenotypes. In our companion paper by Li et al. 26 , similar findings were also observed. These two studies indicate that metabolites play important roles in the variation of both feed efficiency and carcass merit traits. Integration of metabolomic and genomic data could help identify functional or causal SNPs or genes, and interpret the biological meaning of the candidate genes identified in GWAS. In addition, these two studies investigated the associations between different omics levels, which could shed light on the interrelationship between different omics layers and potential molecular mechanisms underlying these traits. Therefore, our findings have broadened our knowledge on the genetic and molecular mechanisms of these traits. Based on what we have learned from these two studies, we recommend applying such multi-omics analyses to study other important traits in beef cattle.
In addition to adding more information to known associations, incorporating metabolomic data can help us identify additional novel associations as metabolites represent a level close to the final phenotypes (i.e., carcass merit traits). In the current study, some additional candidate genes were reported to be associated with carcass merit traits. Therefore, we expect that including the candidate gene SNPs in the DNA marker panel or increasing the weight applied to such SNPs could either improve accuracy of genomic prediction of the traits or decrease the DNA marker density used in genomic prediction while retaining accuracy. A preliminary attempt of this latter option was done by Melzer et al. 27 for the prediction of three traditional milk traits in dairy cows. Melzer et al. 27 applied regression methods to identify important milk metabolites and then those SNPs with significant genetic effects on important metabolites were identified and used to predict milk traits. Compared with the classical approach that uses all SNPs (40,317) in prediction, this metabolite approach could achieve similar prediction precision with less than 1% of the total amount of SNPs. Fontanesi 28 suggested integration of metabolomic data would be useful if the heritability of a trait is low, if a trait is hard to be precisely and directly measured on the animals, or if the prediction accuracy was limited by the small number of phenotyped animals in the training populations. Since carcass merit traits are expressed at later stages of animal production and are usually measured by sacrificing potential breeding stock, these traits are more suitable for DNA marker based genomic prediction, and incorporating metabolomic data into the genomic prediction has the potential to enhance the genomic prediction accuracy. In addition to metabolites, the information carried by other omics data, such as RNA and protein, also helps to prioritize SNPs associated with complex traits, and can further contribute to improving genomic prediction accuracy of these traits. For example, Fang et al 29 applied an extended genomic best linear unbiased prediction (GBLUP) model called genomic feature BLUP (GFBLUP) that included a separate random effect for the joint action of SNPs within genomic features which were obtained from RNA differential expression analyses. Compared to GBLUP, the accuracy of genomic prediction for mastitis and milk production traits with GFBLUP was marginally improved (3.2 to 3.9%) in within-breed prediction but significantly increased (164.4%) in across-breed prediction. Theoretically, the genomic features could be defined from various sources of biological knowledge (e.g., metabolomics) and the GFBLUP model could be applied to other complex traits. Therefore, it would be of interest to investigate the accuracy of prediction for carcass merit traits with and without utilizing multi-omics data.
In order to further investigate biological functions associated with carcass merit traits, five topmost significant biological functions associated with each trait were identified in the current study. These top five biological functions showed substantial overlap with the top five biological functions identified by previous studies 24,30,31 , which indicated those overlapping top biological functions potentially have important biological meaning for the carcass merit traits in beef cattle. Since our carcass data were collected from animals that were finished for meat production, genes involved in these overlapping top biological functions, such as lipid metabolism and carbohydrate metabolism, likely play a more important role in determining the carcass merit traits. Therefore, the identification of top and other enriched biological functions and their corresponding genes will not only improve our understanding of the underlying biology but also help prioritize candidate genes and related putative causal SNPs for future studies. Additionally, construction of gene networks for biological functions could help us elucidate complicated connections among genes and disentangle the potential relationships among genes, biological functions and phenotypes. For example, molecular transport was identified as a top enriched biological function associated with all carcass merit traits and its network of HCW as an example showed that some of the associated genes were involved in concentration of lipid and corticosterone, and quantity of steroid and steroid hormones (Fig. 1a). In beef cattle production, more than 30 commercially-available steroid hormone implants are marketed in the U.S. and the effects of steroid hormone implants on improving carcass leanness, increasing average daily gain, and altering dry matter intake has been reviewed by Smith and Johnson 32 . Thus, those genes linked to the functions of steroid and steroid hormones in the network may consequently influence final muscle mass in the carcass. For those genes involved in the concentration of lipid, they may influence fats in the carcass by regulating breakdown or storage of fats. Additionally, we would like to highlight the network of lipid metabolism for CMAR (Fig. 1b) because lipid metabolism was the most significant biological function associated with this trait. In this network, some genes were involved in fatty acid metabolism including fatty acid synthesis, release and concentration. The phenotypic and genetic correlations between fatty acid composition and marbling have been reported in different beef cattle populations [33][34][35][36] . Our results provide further insight into www.nature.com/scientificreports/ the potential molecular and genetic background accounting for genetic correlations between marbling and fatty acid composition in beef cattle, further indicating that the selection for fatty acid composition or concentration could influence marbling in beef cattle as previously proposed 36 .

Conclusion
In this study, genomic, metabolomic, and phenotypic data were integrated to investigate biological functions/ processes related to carcass merit traits in beef cattle. Plasma metabolites associated with HCW, REA, AFAT, LMY, and CMAR were identified and individual metabolites were found to account for a small proportion of the total phenotypic variance of the carcass merit traits. Several candidate genes as associated with carcass merit traits were identified through mGWAS along with regression analyses. These genes are involved in multiple biological functions that are related to the associated carcass merit traits. Additionally, the results of our integrative analyses could help interpret previous results from DNA marker based GWAS of the carcass merit traits and revealed functional genes associated with these economically important traits. Therefore, our integrative study has provided insights into relationships between genes, metabolites and carcass merit traits, which could potentially lead to improvement of genomic prediction accuracy via incorporating metabolomic related data.

Material and methods
Animal population, metabolomic and phenotypic data collection. All animals in this study were cared for according to the guidelines of the Canadian Council on Animal Care (2009) and the experimental procedures were approved by the University of Alberta Animal Care and Use Committee (AUP00000777). In total, 493 bulls (n = 93), heifers (n = 125) or steers (n = 275) from different herds including Charolais (n = 73), Hereford-Angus crosses (n = 191), and Beefbooster composite breed (predominantly Charolais-based, n = 229) were used. Among these animals, 277 animals from two herds had implants, while 216 animals from the other three herds had no implant. The effect of the factor of implant was examined using statistical analysis, and its effect has been captured under the herd variable through subsequent phenotypic value adjustment. Animals were born between 2002 to 2011 and initially measured for feed intake using the GrowSafe system (GrowSafe Systems Ltd., Airdrie, Alberta, Canada) at the feedlot test station under multiple projects, which were described previously [37][38][39][40] . The animals from the same herd of a particular year were tested in the same feedlot and diet. Blood samples were collected from all animals by jugular venipuncture in the early morning on the first day of feedlot feeding test and immediately frozen at − 80 °C for storage. Plasma metabolites were quantified using nuclear magnetic resonance (NMR) spectroscopy as described by Li et al. 18 . Briefly, blood samples were thawed at room temperature and centrifuged at 10,000 rpm for 10 min to separate the plasma. Plasma was then filtered through 3 kDa molecular weight cut-off filters (Merck Millipore Ltd. 3KDA filter tubes; Darmstadt, Germany) to exclude macromolecules, including lipids and proteins. After filtration samples with a low volume were diluted with high-performance liquid chromatography (HPLC) water to 570 μl to ensure an adequate volume for NMR acquisition. Samples were further prepared in a 5 mm NMR tube (New Era Enterprises Inc., NJ, USA) that contained a total of 700 μl including 570 μl filtered serum, 60 μl DSS and 70 μl D2O. Spectra were acquired on a 500 MHz VNMRS spectrometer equipped with a 5 mm cold probe (Agilent Technologies, CA, USA). Metabolites were identified and quantified with a targeted profiling approach using the Profiler and Library Manager modules in the same software which contained 304 total metabolites as references. Each spectrum was reviewed by a separate analyst and a final review was performed on all of the spectra before exporting concentration results. Concentration measurements were adjusted to report metabolite concentrations (µM). In total, 33 metabolites were initially identified and quantified using NMR. However, two of them were excluded due to missing values, resulting in 31 metabolites for further analyses. In order to collect carcass data, animals were then slaughtered after the feedlot tests at either a commercial plant or the Lacombe Research and Development Centre (LRDC) abattoir when a majority of them reached > 8 mm backfat as predicted from ultrasound measurement. The processes of carcass data collection were previously described 39,[41][42][43][44][45] . In summary, hot carcass weight (HCW) in kg was obtained by summing up the weight of each side of the carcass that was split during dressing, about 45 min post-mortem. Average backfat thickness (AFAT) in mm, rib eye area (REA) in square centimeters, and carcass marbling score (CMAR) at the grading site between the 12th and 13th ribs was assessed by trained personnel. Carcass marbling score was measured as a continuous variable from 100 (trace marbling or less) to 499 (abundant or more marbling) to reflect the amount of fat deposit interspersed between the muscle fibers (i.e., intramuscular fat) of the longissimus thoracis. Lean meat yield (LMY) was calculated as LMY, % = 57.96 + (0.202 × REA, cm 2 ) − (0.027 × HCW, kg) − (0.703 × AFAT, mm) as an estimate of saleable meat in the carcass 37 .
Animal genotyping, SNP imputation and quality control. DNA was extracted from the blood samples using DNeasy Blood & Tissue Kit (QIAGEN, Ontario, Canada) and then genotyped using the Illumina BovineSNP50 v2 BeadChip (Illumina Inc., CA, USA). For the SNP imputation, a step-wise procedure was applied as described in our previous studies 24,40 using Beagle 5.1 software 46 . Briefly, we first imputed from the 50 K SNPs to the AffyHD panel of 444,558 SNPs with 4,247 animals of mixed beef breeds in the reference population. We then imputed from the imputed AffyHD panel to the whole genome sequence variants with the reference population of 3,093 Bos taurus animals from the 1000 Bull Genomes Project 47 (run 7). Finally, 53,258,178 variants (SNPs and indels) on 29 autosomes were obtained after the imputation with average imputation accuracy of 0.97 across the DNA variants with a standard deviation (SD) of 0.08, which was assessed through a fivefold cross-validation as described in our previous studies 24,40 . For each SNP, post-imputation quality control was then performed to filter out the imputed variant genotypes if one of the following conditions was met: (1)

Regression analyses between carcass merit traits and metabolites and metabolome-genome wide association studies.
Phenotypic values of carcass merit traits and metabolites were adjusted for factors including animal type (bull, heifer, steer), birth year, herd, feedlot pen, animal age at slaughter, and breeding composition using linear regression models. The breed composition of each animal was predicted based on their 50 K SNPs using ADMIXTURE software to account for population stratification 48,49 . The value of K = 6 was chosen because it had the smallest cross-validation error and yielded the most accurate breed composition prediction based on prior knowledge of breed composition on a subset of animals. Residuals of metabolomic and phenotypic data beyond 3 standard deviations from the mean of residuals were considered as outliers and excluded from further analyses. The descriptive statistics of phenotypic data on carcass merit traits and metabolites are shown in Table S11 in Supplementary file 1. In order to determine relationships between carcass merit traits and metabolites, regression analyses were conducted between the adjusted carcass merit traits and the 31 adjusted metabolites using R statistical software. A carcass merit trait and a metabolite were considered to be significantly associated when the regression analyses have a P-value < 0.1. For the metabolome-genome wide association studies (mGWAS), the adjusted values of metabolites that were significantly associated with the carcass merit traits were the response variable in the single SNP-based mixed linear model association (mlma) as implemented in GCTA software 50 . The linear mixed model can be described as follows: where y ij is the adjusted metabolite value of the i th animal with the j th SNP (i.e. the ij th animal), b j is the allele substitution effect of the j th SNP, x ij is the j th SNP genotype of animal i coded as 0, 1, 2 for genotypes A 1 A 1 , A 1 A 2 , and A 2 A 2 , respectively, a ij is the additive polygenic effect of the ij th animal ∼ N(0, Gσ 2 a ) , and e ij is the random residual effect ∼ N(0, Iσ 2 e ). The genomic relationship matrix G was derived based on total filtered SNP markers (i.e. 10,488,742 SNPs) as described by Yang et al. 51 , which is essentially the same as the second Van-Raden's formulation 52 . The SNP allele substitution effect was estimated and the significance test of the SNP allele substitution effect was conducted via a generalized least square F-test as implemented in the GCTA package. The SNPs with P-value < 1 × 10 -5 were considered to be significantly associated with the metabolite according to the recommendation of The Wellcome Trust Case Control Consortium 53 . The phenotypic variance of the metabolite explained by each significant SNP was calculated by 2pqβ 2 S 2 * 100% , where p and q denote the SNP allele frequency of A 1 and A 2 , respectively; β is the SNP allele substitution effect; 2pqβ 2 is the additive variance of the SNP, and S 2 is the phenotypic variance of the metabolite.

Identification of candidate gene and functional enrichment analyses for carcass merit traits.
A 140-kbp window (70-kbp upstream and 70-kbp downstream) of each significant SNP was used to map candidate genes based on ARS-UCD 1.2 bovine genome assembly from the Ensembl BioMart database (accessed in February 2021). The 70-kbp was the chromosomal length within which a high linkage disequilibrium phase correlation ( r 2 > 0.77) was maintained across a sample of Canadian beef cattle breeds 54 .
The Entrez gene IDs were used as gene identifiers and small nucleolar RNA and microRNA were excluded from gene functional enrichment analyses. Bovine genes were changed to known human orthologous genes from Ensembl, whereas for those genes without human orthologs their bovine gene IDs were maintained in the gene list. Then candidate genes of all metabolites associated with the carcass merit traits (HCW, REA, AFAT, LMY or CMAR) as identified in the regression analyses between carcass merit traits and metabolites were combined and imported into the Ingenuity Pathway Analysis software (accessed in February 2021) (IPA; www. Ingen uity. com). Significantly enriched molecular and cellular biological functions and gene networks (P-value < 0.05) for each carcass merit trait were inferred and gene-sub-biological function/process interactions within the most significant molecular and cellular functions were predicted in the IPA.

Data availability
The dataset supporting the results of this article are included within the article and its supplementary files. Whole genome sequence datasets generated and/or analyzed during the current study for imputation are available from the NCBI SRA database under BioProjects PRJNA176557 and PRJNA256210. The original genotype and phenotype data sets are available for non-commercial purposes from GP or CL following the execution of a materials transfer agreement.