GWAS and eQTL analysis identifies a SNP associated with both residual feed intake and GFRA2 expression in beef cattle

Residual feed intake (RFI), a measure of feed efficiency, is an important economic and environmental trait in beef production. Selection of low RFI (feed efficient) cattle could maintain levels of production, while decreasing feed costs and methane emissions. However, RFI is a difficult and expensive trait to measure. Identification of single nucleotide polymorphisms (SNPs) associated with RFI may enable rapid, cost effective genomic selection of feed efficient cattle. Genome-wide association studies (GWAS) were conducted in multiple breeds followed by meta-analysis to identify genetic variants associated with RFI and component traits (average daily gain (ADG) and feed intake (FI)) in Irish beef cattle (n = 1492). Expression quantitative trait loci (eQTL) analysis was conducted to identify functional effects of GWAS-identified variants. Twenty-four SNPs were associated (P < 5 × 10−5) with RFI, ADG or FI. The variant rs43555985 exhibited strongest association for RFI (P = 8.28E-06). An eQTL was identified between this variant and GFRA2 (P = 0.0038) where the allele negatively correlated with RFI was associated with increased GFRA2 expression in liver. GFRA2 influences basal metabolic rates, suggesting a mechanism by which genetic variation may contribute to RFI. This study identified SNPs that may be useful both for genomic selection of RFI and for understanding the biology of feed efficiency.

Feed can account for more than 75% of variable costs of beef enterprises 1 . Consequently, selection of cattle that efficiently convert feed to carcass growth would improve farm profits due to reducing expenditure on feed while maintaining protein output 2 . Moreover, there is pressure on the agricultural industry to reduce methane emissions and improve its environmental footprint, while simultaneously increasing beef output to meet the growing demand for protein worldwide 3 . Selection for feed efficient cattle could increase beef output while concurrently decreasing methane production, as it has been suggested that low residual feed intake (RFI) (feed efficient) animals emit less methane than their high RFI counterparts 4 .
RFI is a measure of feed efficiency, defined as the difference between actual and predicted feed intake (FI) 5 . RFI has been shown to be moderately heritable, with an estimated heritability of 0.33 2,6 , making it an ideal trait for selection as any genetic gain will be maintained and propagated through the cattle herd 6 . However, calculation of RFI is currently impeded by both the expense and logistics associated with its measurement, involving recording of both FI and body weight gain for each individual animal for a period of 70 days 7 . Identification of genetic markers for RFI and component traits, such as FI and average daily gain (ADG), and their incorporation into genomic assisted breeding programmes would enable more rapid and cost effective selection of feed efficient cattle 8 . Indeed, RFI has been incorporated into the Australian dairy industry's genomic breeding programme 9 . Unlike the situation that predominates for dairy production systems worldwide, effective identification SCIeNTIFIC RePoRTS | (2018) 8:14301 | DOI: 10.1038/s41598-018-32374-6 concentrates and 3 kg fresh weight of hay, while steers were offered 8 kg concentrates and 5 kg fresh weight of hay. Hay was offered in order to maintain healthy rumen function and to reflect an Irish commercial high concentrate based dietary regimen. Refused feed was weighed weekly and subtracted from total feed offered in order to calculate total feed consumed. Dry matter intake was then calculated in order to determine FI, which was used for calculation of RFI. Cattle were weighed at the beginning and end of the test period, and every 21 days during the test period. ADG was calculated as the coefficient of linear regression of body weight on time, computed in the software package R 34 . Mid-test metabolic bodyweight (body weight 0. 75 , MBW) was calculated as body weight 0. 75 in the middle of the RFI measurement period, which was estimated from the intercept and slope of the regression line after fitting a linear regression through all MBW observations. RFI was calculated for each animal as the difference between actual and predicted FI. Predicted FI for each animal was computed by regressing FI on MBW and ADG. Calculation of predicted FI was calculated for each contemporary group individually.
Genotyping. DNA was isolated for genotyping from one of two tissue types sourced from 429 cattle described previously. Muscle was used when blood was unavailable. Blood samples were obtained by jugular venipuncture at the end of the RFI measurement period, as per Fitzsimons et al. 4 , and stored at −80 °C prior to use 4 . Muscle samples were obtained via biopsy of the M. longissimus dorsi following the RFI measurement period, as per Kelly et al. 35 , and stored at −80 °C before DNA extraction. DNA from blood samples was extracted using the Maxwell 16 Blood DNA kit (Promega, Madison, WI, USA) as per manufacturer's instructions. DNA was extracted from muscle samples using a phenol-chloroform extraction method. Briefly, 0.1 g of frozen muscle tissue was immersed in 1 mL of Trizol and homogenized using a Precellys 24 homogeniser. 200 µl chloroform was added to the homogenate, which was then centrifuged at room temperature for five minutes at 16,000 g. After centrifugation the aqueous phase was transferred to a new tube. Two volumes of ice cold ethanol were added to the aqueous phase and this mixture was centrifuged at 16,000 g for 15 minutes at 4 °C resulting in the formation of a DNA pellet. The supernatant was removed and the pellet was washed by the addition of 1 ml 70% ethanol and centrifuged at 16,000 g for 5 minutes at 4 °C. Washing was carried out twice. Following washing, any remaining supernatant was removed and the pellet was left to air-dry. The DNA was then re-suspended in 150 µl RNase/ Dnase free H 2 O.
Once DNA was isolated, samples were analysed for quality and quantity using a Nanodrop spectrophotometer. DNA of sufficient quality for genotyping was available for 422 samples. All DNA samples were normalised to a concentration of 50 ng/µl for genotyping analysis. Genotyping was carried out on the IDBv3 chip 13 by Weatherby's Scientific Ltd. (Johnstown, Naas, Co. Kildare, Ireland). The ICBF provided a further genotypes for 1,262 cattle that had been genotyped on the IDBv3 chip by Weatherby's Scientific.
In addition to the 1,684 animals genotyped directly on the IDBv3 chip, 338 cattle were genotyped on the Illumina Bovine HD genotyping chip. These 338 cattle were imputed to IDBv3 density using Fimpute version 2.2 36 . The reference population used for Fimpute was 50,000 Irish cattle with genotyped parents. Imputation of all 338 cattle was conducted across breed type to reflect the Irish national cattle population.
Once genotyping and imputation were complete the study consisted of 2,022 animals with genotypic data for all IDBv3 markers. This genetic data was uploaded to the SNP Variation Suite (SVS) environment (Golden Helix, Version 7.7.6).
Preparation of files for analysis. Quality control (QC) was carried out on genotypes imported into the SVS environment. SNPs were removed from analysis if they had a call rate of less than 0.80 or a minor allele frequency of less than 0.05. Cattle were removed from analysis if they had a call rate of less than 0.95. Following QC, 2,008 cattle and 44,338 markers remained for analysis. LD pruning was carried out at r 2 threshold of 0.5 and 7,841 markers were discarded following pruning 37 . The remaining 36,496 SNPs that passed all QC measures were acceptable for further analysis (Supplementary Table S3).
The collated phenotypic data were merged with the genotype data, creating a dataset containing 1,822 cattle eligible for analysis. A genomic kinship matrix was computed from the population, which was included as a covariate in the GWAS in order to account for relatedness. From this dataset, cattle from five beef breeds were analysed (n = 1492). The breeds included in the analysis were AA, BB, CH, LM and SI (n = 102, 177, 387, 537 and 289, respectively).
Genome-wide association studies. GWAS were carried out in the SVS environment of Golden Helix using a mixed linear model method, EMMAX 38 , for each breed individually. GWAS resulted in the generation of summary statistics for each trait of interest, i.e. RFI, ADG and FI, for each breed (AA, BB, CH, LI, and SI).

Meta-Analysis.
Following initial breed specific GWAS, meta-analyses were carried out for each trait using the software package METAL 39 . METAL combines P-value and direction of effect from each GWAS to conduct Z-score method meta-analysis. METAL analysis results in two outputs, the Z-score for each SNP and a P-value for each SNP. A large positive Z-score results in a small P-value providing evidence that the allele positively associated with the trait under test. Conversely, a large negative Z-score results in a small P-value, showing an allele is negatively associated with the trait 39 . A P-value of less than 5 × 10 −5 was used to denote genome-wide significance as per recent GWAS studies 22 . (DAVID, version 6.8) 40 was used for functional annotation. From the meta-analysis, a list of candidate genes was generated using Ensembl's Variant Effect Predictor. The list contained the nearest gene within a 500 kb window to each nominally significant SNP (P < 0.05). DAVID assigned genes to pathways as per the Kyoto Encyclopaedia of Genes and Genomes (KEGG), and determined enrichment of pathways using Fisher's exact test 41 . In order to account for multiple testing, a Benjamini-Hochberg correction was applied 42 . Pathways were deemed to be significant if they obtained a corrected P-value of <0.05. Pathways specifically addressing human diseases and disorders were not included in further analysis of DAVID identified pathways, as these were not relevant to this study. eQTL analysis. Samples for eQTL analysis were obtained from CH and Holstein-Friesian cattle that had been genotyped as part of the current study and for which RNA-Seq data were available within our group. The RFI range of the CH and Holstein-Friesian cattle were 1.48 to −0.98 and 1.48 to −1.41 respectively. RNA-Seq raw read counts were collated from liver and muscle tissue analyses carried out by our group in published studies 29,43 and studies in preparation (Higgins et al., McKenna et al.) related to feed efficiency traits. Forty-two liver samples and 39 muscle samples were brought forward to eQTL analysis. For eQTL identification, liver and muscle samples were analysed separately.

Validation of internationally identified RFI SNPs in
Raw read counts were filtered and genes with more than 10 instances of zero expression were removed from analysis, resulting in 14,588 and 14,309 genes with expression in the liver and muscle, respectively, remaining for eQTL analysis. Filtered raw read counts were normalised using DESeq2′s variance stabilizing transformation (VST) command 44 . Covariates included in DESeq2 were batch, RFI status (i.e. high or low RFI) and breed. VST normalised counts were brought forward for eQTL analysis.
eQTL analysis was carried out using the R package Matrix eQTL 45 . Only SNPs that reached genome-wide significance after meta-analysis (n = 24) and their nearest gene were considered for eQTL analysis. If a nearest gene was greater than 100 kilobases away from a SNP, this combination was not included in eQTL analysis. As part of eQTL analysis RFI, breed and sex were included as covariates. SNPs with a MAF of less than 0.1 or with known functions, i.e. missense mutations, were excluded, as were genes that were not expressed in either tissue of interest, i.e. liver or muscle. This resulted in 11 SNPs in the analysis. A Bonferroni correction was applied to account for the 11 SNPs. If an eQTL reached a P-value of 0.0045 or less, it was considered significant after multiple test correction.  Table 1). The SNP most associated with RFI was rs43555985, located at chromosome 8 position 69,658,202, a non-coding region 53.4 kb upstream from GFRA2. Two variants were within the start-stop coordinates of genes; intronic variant rs110418027 in SMC1B and 3′ untranslated region (UTR) variant rs43691372 in DIS3. The remaining four SNPs are located in the non-coding region of the Bovine genome and their distance to nearest gene is specified in Table 1. The per breed GWAS results for each of these RFI associated variants are illustrated in Table 2 and Supplementary Table S4. Functional annotation of genes containing or near to nominally significant SNPs for RFI using DAVID did not identify any enriched pathways that survived Benjamini-Hochberg correction 42 .

GWAS and meta-analysis for RFI. GWAS results generated for RFI by meta-analysis are plotted in
GWAS and meta-analysis for ADG. GWAS results for ADG are illustrated in Fig. 2. A total of 14 SNPs reached genome-wide significance for ADG. The most significantly associated SNP was rs386023985 which is located at chromosome 19 position 48,916,589, 7.9 kb upstream from the ERN1 gene. One missense variant, rs136457441 in RPL26, was associated with ADG. One associated variant was synonymous, rs382426807 in STAT5A. Five intronic variants were associated with ADG in the genes: CSFRA2, ITFG1, TBC1D16, TLL1 and BCAS3. The remaining 5 associated SNPs were located upstream or downstream of genes as indicated in Table 1. Individual breed GWAS results for the SNPs associated with ADG following meta-analysis are outlined in Table 2 and Supplementary Table S4.
Functional annotation of genes nearest to nominally significant SNPs for ADG identified 7 pathways that were significantly enriched following Benjamini-Hochberg correction 42 (Table 3). The thyroid hormone signalling pathway was the most enriched pathway (corrected P = 0.01).
GWAS and meta-analysis for FI. GWAS results for FI are plotted in Fig. 3. Three SNPs reached genome-wide significance for FI (Table 1). Individual breed GWAS results for these variants are presented in Table 2 and Supplementary Table S2. The SNP most associated with FI was IDBV32000008978 located at chromosome 20 position 67,944,737. This is a synonymous variant in ADAMTS16. The other two were an intronic variant in HNF1B and a variant located 58.9 kb upstream of MAP3K7CL. Functional annotation of the FI SNP results identified two pathways that were significantly enriched after correction; axon guidance and the thyroid hormone signalling pathway ( Table 3).

Validation of internationally identified SNPs in an Irish cattle population. Of the 102 interna-
tionally identified RFI SNPs included on the custom IDBv3 genotyping chip, 71 passed all QC measures and were included in the GWAS for RFI in the current study. Two of these SNPs, rs29014641 and rs109500421, were nominally significant in our study but did not survive multiple test correction for this subset of SNPs. This subset of SNPs was not exhaustive for RFI and did not include all variants within quantitative trait loci (QTLs) as identified by Nkrumah, et al. 18 , for example. However, a post-hoc search for genotyped SNPs within those regions which found that no genetic variants within these QTLs reached genome-wide significance following meta-analysis. eQTL analysis of SNPs identified as significant from meta-analysis. Table 4 contains results of eQTL analysis. One cis-eQTL was detected in liver, between rs43555985, the top associated SNP from the RFI GWAS, and GFRA2 (P = 0.0038; survives multiple test correction). eQTL analysis indicated that the minor allele of rs4355985 is associated with increased expression of GFRA2 (Fig. 4). The same minor allele is associated with lower RFI in the GWAS. The effect of GFRA2 expression on RFI is presented in Supplementary Fig. S5 on a per genotype basis.

Discussion
Despite the economic and environmental benefits of RFI, the trait or indeed any measure of feed efficiency, is not widely adopted within breeding programmes for beef cattle due to the difficulty and expense associated with measuring feed intake 10 . The identification of robust genetic markers of RFI applicable to several breeds, as well as crossbred cattle, would enable the traits inclusion in genomic breeding programmes. This study sought to identify SNP I.D. SNPs associated with RFI that could be applicable to Irish beef production enterprises as well as uncovering novel markers of potential use to international beef producers. To unravel the underlying biology causing phenotypic variation in feed efficiency related traits, we carried out eQTL analysis of GWAS-identified variants to study their effect on local gene expression. rs43555985 was associated with RFI and is an eQTL of the GFRA2 gene in liver tissue. The minor allele of this SNP was associated lower RFI within each of the individual breed GWAS and following meta-analysis. The   minor allele of rs43555985 was also associated with increased expression of GFRA2 following eQTL analysis. GFRA2 is a cell-surface receptor that facilitates binding of a member of the glial cell-derived neurotrophic factor family. GFRA2 knock-out mice are unable to digest food correctly, have impaired salivary secretion and gut motility and exhibit a slower growth rate than wild-type mice while having an increased basal metabolic rate 46 . If increased GFRA2 expression is associated with improved feed efficiency, the mechanism may involve lowering metabolic rates. Increased metabolic rate leads to increased energy requirements to carry out biological processes and to maintain physiological homeostasis, resulting in less consumed energy being used for growth 47 . It has been illustrated previously that high-RFI lambs have a higher basal metabolic rate than their low-RFI (more desirable) counterparts and low-RFI heifers exhibited lower metabolic rates than their high-RFI (inefficient) counterparts 48,49 . Further investigation and validation of rs43555985 prior to inclusion in genomic breeding programmes is required. Furthermore, rs43555985 is also located 79.9 kb upstream from XPO7. Post-hoc eQTL analysis for this gene illustrated that there is no statistically significant relationship between rs43555985 genotype and XPO7 expression.

Meta-analysis P-value
The second most statistically significant SNP for RFI, rs41638273, maps to a region of chromosome 2 that contains the SLC40A1 gene. This region is also the site of a QTL for RFI which contains the myostatin gene 50 . Specific mutations in the myostatin gene have been linked with increased muscle growth traits 51 . Improved feed efficiency was associated with double muscled Angus steers by Cafe et al. 52 when compared to lesser muscled  Table 3. Significant KEGG pathways identified for each trait in a multi-breed population of beef cattle following meta-analysis of GWAS results. B-H P-value: Benjamini-Hochberg corrected P-value; ADG: average daily gain; FI: feed intake. Pathways were designated as significant if they reached Benjamini-Hochberg corrected P < 0.05. counterparts 52 . DIS3, a gene nearby to a variant associated with RFI in the current study, encodes a protein involved in RNA metabolism 53 and has been linked with feed conversion efficiency in pigs 54 . The minor allele of rs386023985 was negatively associated with ADG following GWAS meta-analysis in the current study. Similarly, this variant was negatively associated with ADG within each individual breed GWAS conducted, this SNP also reached genome-wide significance within the LM individual breed GWAS. rs386023985 has not previously been associated with ADG or other growth traits in cattle. The gene nearest to rs386023985 is ERN1 (IRE1), a sensor of metabolic stress, is involved in the unfolded protein response 55 .
Copy number variation in RPL26, a ribosomal protein gene, has been linked to RFI divergence in Holstein cows 56 . A variant identified in this study, rs136457441, is a missense variant in RPL26 causing an isoleucine to threonine change at amino acid position 67 in the RPL26 protein. This variant, associated with ADG following meta-analysis in the current study and reached genome-wide significance for ADG within the CH individual breed GWAS, is located in exon 3 of RPL26. rs136457441 has been designated as tolerated by the Sorting Tolerant from Intolerant (SIFT) algorithm, which predicts whether amino acid substitutions effect protein function 57 . Further investigation into the functional effect of this mutation is required to elucidate its biological role in ADG.  Table 4. Results from eQTL analysis of genome-wide significant SNPs in liver and muscle. SNP: Single nucleotide polymorphism; ADG: average daily gain; FI: feed intake; RFI: residual feed intake, *eQTLs were designated as significant if they reached P < 0.0045. An exonic variant associated with ADG is rs382426807, a synonymous variant in STAT5A. This gene encodes a transcription factor that can be activated as part of the somatotropic axis, which is the pathway involved in the secretion of growth hormone and skeletal muscle growth 58 . STAT5A has been associated with increased live weight gain in Polish Black-and-White bulls 59 and increased expression of the growth hormone receptor, an upstream activator of STAT5A, has been previously demonstrated in efficient beef heifers by Kelly et al. 60 .
Five variants associated with ADG were located in introns of the following genes: TLL1, CSF2RA, ITFG1, TBC1D16 and BCAS3. TLL1 encodes a member of the tolloid family metalloproteases that have been previously implicated in the cleavage and development of myostatin in humans 61 . Myostatin in its normal state negatively regulates muscle growth. The production of aberrant myostatin protein isoforms results in the development of the double muscle phenotype 51 .
CSF2RA encodes for a granulocyte/macrophage colony stimulating factor 62 . ITFG1, the gene within which rs110780286 is located, is involved in T-cell differentiation and may induce the production of anti-inflammatory cytokines 63 . It has been previously illustrated that immune genes and immune pathways are associated with variation in feed efficiency and ADG in cattle 64,65 . Several groups have suggested that the immune system plays a key role in weight gain and feed efficiency in cattle. For example, Reynolds et al. 64 found that steers with higher ADG have lower immunity related gene expression 64 and it has been demonstrated that cattle with poor feed efficiency had increased activation of their immune system 65 . It is possible that cattle with poor feed efficiency and low ADG are experiencing chronic inflammation which results in poor feed efficiency which has been suggested previously by Alexandre et al. 65 following analysis of beef cattle divergent in RFI and by Mani et al. 66 upon investigation of inflammation in RFI divergent pigs 65,66 .
rs41595251 is associated with ADG and is a variant located upstream from OPRM1, the µ-opioid receptor gene, on chromosome 9. OPRM1 has been associated with increased food intake in humans 67 . ADG-associated SNP rs136789347 is nearby to OR5M10 which encodes for an olfactory receptor in humans 68 . Olfactory receptors have been suggested as one method by which the endocannabinoid system stimulates the feeding drive in mice 69 . rs41614223 is located downstream from the transcriptional start site of NKAIN2, which produces a Sodium-Potassium ATPase involved in action potential generation in neurons 70 . Each of these genes, OPRM1, OR5M10 and NKAIN2 have a neurological function. There is evidence from bovine studies 71,72 that there is significant neurological control of food consumption. The association of these neurological genes with ADG in this cohort of beef cattle may further indicate that feeding behaviour in cattle may also be subject to some degree of neurological control 73 . Further investigation is required to investigate the role neurological systems play in modulating the development of divergent RFI and related traits in cattle.
rs41592667 is upstream of FRK which encodes for tyrosine-protein kinase FRK. Gene sets enriched for cell cycle-related genes, similar to FRK, have previously been shown to be associated with feed intake and feed efficiency in beef cattle 74 . TBC1D16, a gene which encodes for a GTPase and contains the intronic variant rs109252082, has been associated with growth rate in pigs 75 . Despite these SNPs not being associated with feed efficiency or component traits prior to the current study, they are near to, or within, genes that have been associated with feed efficiency related traits previously.
HNF1B, nearby to a variant associated with FI, has previously been identified as differentially expressed in Holstein cattle divergent for RFI 76 , and this gene is a target of miR-802, which has been identified as upregulated in high RFI cattle 77 . The silencing of HNF1B in mice leads to impaired insulin sensitivity 78 . However, previous work by our group has shown that RFI divergent beef cattle have similar levels of insulin sensitivity and it is unlikely that insulin sensitivity plays a role in RFI divergence 28 . Further work is required to understand the contribution of HNF1B to the development of divergence in FI. In this study the variant IDBV332000008978 was associated with FI. This variant is a synonymous variant within the ADAMTS16 gene which is a member of ADAMTS protease family and has previously been identified as associated with FCR in pigs 79 .
Following functional gene set enrichment analysis using DAVID, the thyroid hormone signalling pathway was found to be most enriched for ADG. Thyroid hormones play a key role in the regulation of basal metabolism in mammals 80 , although, it has been demonstrated previously that the levels of thyroid hormones are not related to RFI status in heifers 81 . However, in a study of dairy cattle is was reported that low levels of thyroid hormones are associated with lower RFI 82 . Further investigation into the role of the thyroid hormone signalling pathway is warranted to further elucidate the role this biological mechanism plays in the divergence of RFI in cattle. The retrograde endocannabinoid signalling pathway was also found to reach significance level in the list of nominal significant genes for ADG. It has been demonstrated that the endocannabinoid system plays a role in inducing food intake and modulating energy expenditure and feed intake in mice 83,84 . It is possible that alterations in genes in the retrograde endocannabinoid pathway may also stimulate or inhibit feeding behaviours in cattle which may impact on feed efficiency. It has been observed previously that low RFI cattle have fewer daily feeding events and have a lower eating rate than high RFI cattle 85 . Focal adhesion was another pathway found to be enriched for nominally significant ADG associated SNPs. Focal adhesion is a pathway involved in cell motility, proliferation and survival. This pathway is dependent upon focal adhesion kinase 86 . PTK2, the gene encoding for focal adhesion kinase has been previously noted as downregulated in high-RFI animals from a population of dairy cattle 87 .

Conclusion
In this study we illustrate genome-wide associations between SNPs and RFI and its component traits in beef cattle. In total, we identified 24 SNPs as reaching statistical significance for RFI, ADG and FI in a multi-breed cohort of beef cattle. Several of the SNPs identified in this study are located nearby or within genes related to immune function, muscle growth and development, and neurological pathways. The identification of a novel eQTL for RFI at GFRA2 also represents an insight into the biology of feed efficiency.
Due to the small sample size of our individual breed GWAS, which we used meta-analysis to overcome, all identified SNPs and the eQTL must be validated, both in larger Irish and international populations before incorporation into genomic assisted beef cattle breeding programmes. Furthermore, validation is required in larger reference populations to account for the LD and genetic heterogeneity which exists between breeds of cattle.
An additional method which may have been employed to increase sample size could have been single step GWAS (ssGWAS) [88][89][90] . ssGWAS incorporates genotypes, phenotypes and pedigree information to calculate genomic estimated breeding values for animals with or without genotypes 88 .
It is important to ensure that the SNPs influence these traits and have no negative impact on other economically important production traits. SNPs with a validated desirable effect can be included in Irish and international genomic assisted breeding programmes to facilitate the rapid and cost effective selection of more feed efficient beef cattle.