Genome-wide association study for performance traits in chickens using genotype by sequencing approach

Pértille, Fábio; Moreira, Gabriel Costa Monteiro; Zanella, Ricardo; Nunes, José de Ribamar da Silva; Boschiero, Clarissa; Rovadoscki, Gregori Alberto; Mourão, Gerson Barreto; Ledur, Mônica Corrêa; Coutinho, Luiz Lehmann

doi:10.1038/srep41748

Download PDF

Article
Open access
Published: 09 February 2017

Genome-wide association study for performance traits in chickens using genotype by sequencing approach

Fábio Pértille¹,
Gabriel Costa Monteiro Moreira¹,
Ricardo Zanella²,
José de Ribamar da Silva Nunes¹,
Clarissa Boschiero¹,
Gregori Alberto Rovadoscki¹,
Gerson Barreto Mourão¹,
Mônica Corrêa Ledur³ &
…
Luiz Lehmann Coutinho¹

Scientific Reports volume 7, Article number: 41748 (2017) Cite this article

4228 Accesses
20 Citations
2 Altmetric
Metrics details

Subjects

Abstract

Performance traits are economically important and are targets for selection in breeding programs, especially in the poultry industry. To identify regions on the chicken genome associated with performance traits, different genomic approaches have been applied in the last years. The aim of this study was the application of CornellGBS approach (134,528 SNPs generated from a PstI restriction enzyme) on Genome-Wide Association Studies (GWAS) in an outbred F₂ chicken population. We have validated 91.7% of these 134,528 SNPs after imputation of missed genotypes. Out of those, 20 SNPs were associated with feed conversion, one was associated with body weight at 35 days of age (P < 7.86E-07) and 93 were suggestively associated with a variety of performance traits (P < 1.57E-05). The majority of these SNPs (86.2%) overlapped with previously mapped QTL for the same performance traits and some of the SNPs also showed novel potential QTL regions. The results obtained in this study suggests future searches for candidate genes and QTL refinements as well as potential use of the SNPs described here in breeding programs.

Exome-wide analysis implicates rare protein-altering variants in human handedness

Article Open access 02 April 2024

Dick Schijven, Sourena Soheili-Nezhad, … Clyde Francks

Genome-wide association studies

Article 26 August 2021

Emil Uffelmann, Qin Qin Huang, … Danielle Posthuma

Pleiotropy, epistasis and the genetic architecture of quantitative traits

Article 02 April 2024

Trudy F. C. Mackay & Robert R. H. Anholt

Introduction

Production efficiency in the poultry industry is constantly improving as a result of selection for growth rate, feed efficiency and carcass traits for broilers, and egg production and quality traits for layers^1,2. The understanding of genomic information of loci controlling those traits are important to improvement of selection efficiencies of breeding programs¹.

Several studies have been conducted in chickens using markers randomly distributed in the genome (microsatellites), which have allowed the identification of several QTLs for production traits^{3,4,5,6,7,8,9,10,11,12,13}. Following these, some of the studies have focused their attention on the identification of SNPs in functional and positional candidate genes and to test their association on target QTL regions^{14,15,16,17,18,19}. With the advent of next-generation sequencing (NGS), it was possible to identify a global SNP profile and to perform genome-wide association studies (GWAS) to find novel QTL regions^{20,21,22,23,24,25,26} and also to refine the previously published regions^{27,28,29,30,31}.

Despite the high-throughput data generation by NGS, which have facilitated the identification of SNPs in several populations, the use of this method for GWAS is still a limitation. This is caused by the high cost involved with the generation of data to be applied in a large number of individuals. To solve this high cost problem, SNP panels were designed to be applied in GWAS^32,33. However, some important regions in the genome are inaccessible to sequence capture approaches³⁴ mainly because they are based on predesigned SNP profiles. To overcome those limitations, and to present a unique SNP profile, we used the PstI-derived SNPs dataset from CornellGBS optimized approach. This dataset was originated from a SNP call from the reduced representation of the sequenced genome (~5%) through PstI restriction enzyme³⁵. This SNP dataset is reliable and reproducible, showing a unique profile of SNPs with microchromosome enrichment³⁵ that contains 2-4 times higher gene density than macrochromosomes^36,37.

The aim of this study was to identify genetic markers using PstI-derived SNPs dataset, and further use that information to conduct a GWAS with performance traits in chickens. In addition, we have performed a linkage disequilibrium (LD) analyses in the parental, F₁ and F₂ generations, to better understand the segregation of haplotype blocks, and the population structure, from the associated and suggestively associated SNPs identified. Finally, we have compared the location of these mentioned SNPs with known QTLs, with the objective to validate and to refine the regions of known QTLs.

Results

Animals and Phenotypes

The descriptive statistics for the eight performance traits analyzed can be observed in Table 1. Detailed description of these animals and traits were provided elsewhere^7,16. The large variability is expected since the animals are from a broiler x layer F₂ population.

Table 1 Means, standard deviations (SD), maximum (max) and minimum (min) values for performance traits of 444 individuals from the F₂ population.

Full size table

Genotypes

In our previously work³⁵, using a minimum taxon call rate of 90%, we have identified 67,096 SNPs originated from 462 chickens using the GBS approach. However, in this study, different filter parameters were applied. We have reduced the loci call rate filtering criteria to 70%. This parameter is the minimum threshold of individuals call rate for each loci to be included in the output. This reduction had minimal impact on sample call rate (proportion of missing genotypes per individual) and large impact on number of SNPs. The sample call rate reduced from 99.96% ± 0.04% to 99.90% ± 0.1% and the number of SNPs increased from 67,096 to 134,528. This allowed us to capture more SNPs, but the number of missing genotypes increased (for details, see M&M section). To overcome this, we have imputed the missing genotypes using Beagle 4.1 software³⁸. This approach resulted in a panel of 134,528 derived PstI-SNPs present in all animals.

SNPs validation

The dataset of 134,528 SNP chromosomal positions obtained with the CornellGBS before and after the imputation analysis was compared with the 600 K Affymetrix® HD genotyping array dataset in order to perform a method validation, since both sets were obtained from the same animals (5 individuals from F₂-7810 family). The genotype concordance of the SNPs with concordant chromosomal positions detected between the two methods is shown in Table 2. On average, 91.80%, and 91.66% of the SNPs had concordant genotypes between the CornellGBS and 600 K datasets before and after imputation, respectively. The accuracy of the heterozygous genotypes was slightly lower after the imputation. Reduced representation methods, like CornellGBS, has limitations calling the heterozygous markers³⁹. In our study, we have observed that 82.14 and 82.30% of heterozygous SNPs, while 97.97 and 97.65% of all homozygous SNPs were validated before and after imputation, respectively.

Table 2 Assessment of genotype concordance between 134,528 PstI-derived filtered SNPs before and after imputation and genotyped SNPs dataset from 600 K Affymetrix® HD genotyping array from five F₂ individuals (F₂-7810 family); and genotype validation percentages for homozygous and heterozygous SNPs.

Full size table

Homozygous and heterozygous SNPs

Out of 62 million possible genotypes (462 samples × 134,528 sites), the average frequency of heterozygous SNPs was 25.32% (±5.6%) before the imputation and after the imputation, it increased to 27.70% (±5.2%). The average heterozygosity observed per chickens before imputation ranged from 8.30–44.69% and after the imputation between 11.38–44.67%. The proportion of heterozygous SNPs remained virtually unchanged before and after imputation among the lines/generations (Table 3).

Table 3 SNP heterozygosity of genotyped populations (parental, F₁ and F₂ generations) after and before imputation.

Full size table

Principal component analyses

From the list of imputed genotypes we have conducted a principal component analysis (PCA), based on covariance, using Tassel v.5.2.26³⁹ to check the F₂ population structure. This plot was useful for visualizing internal structure explained by the variance from PstI-derived SNPs dataset of 134,528 SNPs using eigenvector-based multivariate analyses. Each individual lies in its proper group consistent with our F₂ population structure composed by five F₂ dame-based families (Fig. 1).

Descriptive Statistics of Heritability

The genetic and residual variance for each trait and their genomic heritability are shown in Table 4. Heritabilities ranging from moderate to high, as is expected^40,41, were observed for feed intake and body weights traits, respectively. Low heritabilities were observed for the traits evaluated in short period (between 35 and 41 days) as feed conversion, and feed efficiency, because they are complex traits influenced by several environmental factors⁴².

Table 4 Genetic and residual variances, and genomic heritability for each trait analyzed in this study.

Full size table

Genome-wide association study

Twenty significant SNPs (P < 7.86E-07) were associated with feed conversion adjusted to body weight at 35 days (adj35) and one significant SNP associated with body weight at 35 days of age (Fig. 2). In addition to that, 92 suggestive (P < 1.57E-05) SNPs were associated with feed conversion adj35, feed intake adj35, feed efficiency adj35, birth weight, and body weight at 35 and 41 days of age (see Supplementary Spreadsheet S1 for the effects of associated SNPs; Manhattan and QQ plots are available on Supplementary Fig. S1).

**Figure 2: SNPs associated with feed conversion adj35.**

Linkage disequilibrium analysis

Seventeen haplotype blocks were generated from the associated and suggestively associated SNPs from the F₂ population (see Fig. 3, Supplementary Fig. S2 for haplotypes details and Supplementary Table S1 to SNPs’ Mendelian descriptions). We noticed a standard block pattern between the SNPs that matched with the F₂ population structure (Fig. 3). Interestingly, we have checked the genotype frequency of blocks formed by LD analysis to determine if the blocks were fixed or not in the parental lines. From the haplotype blocks, we checked the origin of the variation (fixed or variable) and frequency from F₂ generation in the parental lines (Supplementary Table S2 and Supplementary Fig. S2 for a more detailed description of frequencies). We also determined the advantageous haplotype for each trait in the F₂ generation (Table 5). This information enabled us to identify from which parental line (TT or CC) comes the genotypic variation observed in F₂ for each block. All blocks with r² > 0.56 had the most frequent haplotype agreeing with the advantageous phenotype in the F₂ individuals (Fig. 3; Supplementary Fig. S2 and Supplementary Table S2), and this advantageous haplotype (lower feed conversion and higher values of other evaluated traits) was fixed in one of the parental lines, except in blocks 2 and 13. This information is also available for each genome-wide suggestive and/or associated SNPs in Supplementary Spreadsheet S1, as well as the number of genotype observations obtained per SNP.

**Figure 3: Haplotype blocks obtained by the solid spine of LD and family structure using Haploview 4.2.**

Table 5 TagSNP significance levels from MLM analyses of each 17 blocks obtained by solid spine of LD in the F₂ population using Haploview 4.2.

Full size table

QTL overlapping SNPs

Through Animal QTLdb, we have selected all the 1,458 known QTLs⁴³ mapped for body weight, feed efficiency, feed conversion and growth, all evaluated in different chicken lines and ages. Out of those, we have observed that 253 QTLs overlapped with 81 of the 94 associated and suggestively associated SNPs with performance traits obtained from the GWAS in this study: 206 QTLs associated with body weight, 39 with postnatal growth, 4 with feed intake, 3 with feed conversion, and 1 with feed efficiency. The complete QTL list that overlapped with these regions can be seen in the Supplementary Table S3 and the graphical representation of the suggestive and significant SNPs distribution in relation to the QTLs can be observed in Fig. 4.

**Figure 4: Karyotype of the QTLs (from Animal QTLdb) distribution regions of the chicken genome overlapping suggestive and significant SNPs associated with performance traits (black marks).**

Discussion

For better understanding of complex traits control in a segregating F₂ population, our research group have focused the attention on genetic association and linkage analyses using different approaches, as: candidate genes^14,15,16,17 and QTL mapping^3,4,5,6,7, respectively, and more recently, NGS approaches^27,28. We have presented here the first study using a higher density of SNPs in this F₂ population with GWAS purpose. Therefore, we have optimized a method called CornellGBS in chickens³⁵ to overcome the concept of pre-designed panels, since we planned a method for genotyping efficiently a specific dataset of SNPs in our specific population.

CornellGBS is a widely employed method to genotype large genomes of model and non-model species exploring important regions in the genome³⁴ as microchromosomes^36,37, as previously mentioned. This is due to the high coverage of tags (contigs) depending on the number of sequenced individuals of the reduced genome by restriction enzyme cleavage providing a specific SNP profile³⁵. The CornellGBS technique was previously developed for inbreeding population and it is known by its general low sequencing coverage, which can cause significant loss of SNPs, mainly heterozygous³⁹. For our outbreed population, we used a reasonable multiplex of individuals (~48 animals per lane of Illumina flowcell) to maintain a reasonable sequencing coverage per individual (~5X). We also reduced the loci call rate and use imputation to increase the number of SNPs genotyped. The reduction in the loci call rate was also applied in a recent study that used the same PstI restriction enzyme to cleave the cattle genome⁴⁴. Furthermore, it was already mentioned that the combination of GBS and imputation of missing internal SNPs in haplotype blocks procedures can promote a cost reduction by allowing further reduction of the filtering criteria or sequencing coverage without causing losses in SNP calls³⁴. Using this strategy, we doubled the number of SNPs, successfully imputing all lost genotypes (increasing the individual call rate to 100%), the validation ratio remained > 90%, and the percentage of heterozygous genotypes in our population had an increase of approximately 2% after the imputation.

The use of the GBS SNP panel for GWAS in our outbred F₂ crosses resulted in 20 SNPs associated (P < 7.86E-07) with feed conversion adj.35, one SNP associated with body weight at 35 days (BW35) and other 93 SNPs suggestively associated (P < 1.57E-05) with different performance traits (Table 1). Additionally, we noticed that all the evaluated traits, presented an up deviation of the theoretical quantiles (Fig. 2 and Supplementary Fig. S1) of the probability distributions between expected and observed p-values, indicating the existence of QTLs. These results corroborated the Manhattan plot peaks of associated SNPs, indicating that these traits had part of the phenotypic variation significantly explained by the genetic component⁴⁵. Interestingly, we detected association for several new QTLs located in microchromosomes (GGA11-28). This was only possible because of the distribution of the SNPs used. From our set of SNPs, 38.93% are located in large chromosomes (GGA1-5), 14.15% in intermediate size (GGA6-10), and most, 46.90% are located in microchromosomes (GGA11-28), confirming the microchromossome enrichment mentioned before³⁵. Feed conversion, for exemple, had a high number of significant SNPs (P < 7.86E-07), mainly located in microchromosomes (GGA8, 10, 14, 18, 23, 26, and 27). However, for this trait, the SNP peaks observed by Manhatan plot, in large and intemediate size chromossomes (Fig. 2a), were not well defined, as is usually observed for QTLs peaks⁴⁶. We believe that this is explained by the SNP profile used in this study, which has a lower density of SNPs on large chromosomes compared to microchromosomes³⁵. Moreover, feed intake is a complex trait subject to a high residual effect and controlled by several genes with a small effect, which require a large sample size to detect associations^47,48. This small effect also was previously attributed to the short period used to measure this trait (between 35 and 41 days of age) impairing the animal adaptation to the new enviromental condiction⁴. On this account, reliable QTLs for this trait were detected in studies that used a larger sampling size (1,534 individuals) for a longer feed intake evaluation period (~4 weeks)^49,50 than used in this study.

The GBS strategy can result in clusters of SNPs next to each other³⁵. In order to better define the QTL regions we performed LD analysis. The block pattern between the SNPs matched with our F₂ population structure⁵¹ (Fig. 3). This allowed us to define possibilities for genetic selection of the lines that did not present the genotype fixed giving attention to the different phenotypic abilities between the CC layer line and the TT broiler line^5,16,51. As for example the blocks 2 and 13 (consider Fig. 3(d) for block numbers), which had variable genotypes for both the CC and the TT lines, and the paternal line presented the favorable genotype most frequent in both cases (see Supplementary Fig. S2 and Fig. 3).

Also is important to check if these SNPs are within QTL regions previously published. In the past years, many studies identified QTLs associated with performance traits in different chickens populations^{3,4,5,6,8,9,10,11,12,13,17,52,53} (1,458 QTLs described in the Animal QTLdb) aiming to map loci that control these traits. Recently, to better understand these loci, studies have also applied GWAS with performance traits in chickens^13,26,54. The validation of single SNP position obtained by GWAS overlapping with QTL regions can confirm interesting genomic regions to explore. From the 94 genome-wise associated and suggestively associated SNPs with the performance traits analyzed in this study, most of them were fairly distributed in mapped QTL regions in the chicken genome (Fig. 4). Only 13 SNPs did not overlap with QTL regions previously mapped. From these 13 SNPs, one was located on chromosome 1, one in chromosome 8 (GGA1 and 8), one in the Z sex chromosome (GGAZ) and 10 were located in microchromosomes (GGA17, 18, 20, 25, 27 and 28), which confirms the microchromosome enrichment profile obtained by this approach³⁵ and suggests novel QTLs to be explored in these regions. It is also important to mention that most of these 13 SNPs (those located on GGA1, 8, 18, 20, 25 and 27) were associated (P < 7.86E-07) or suggestively associated (P < 1.57E-05) with feed conversion adj35, one with feed efficiency adj35 (GGA17), and two with body weight at 41 days (GGA27) (see Supplementary Spreadsheet S1 for details). The genes where these SNPs are located are mainly related with cell cycle and metabolic pathways (according to the Reactome pathways - http://www.reactome.org/PathwayBrowser) and were within introns, upstream and downstream of these genes (see Supplementary Table S4 for functional annotation).

Despite the importance of the overlap test performed here, previous studies in QTL mapping usually had large confidence intervals (>1 Mbps) and often encompassing several genes, making difficult the selection of candidate genes⁵⁵. Therefore, we also checked the overlap of these SNPs only with QTLs mapped using specifically the same traits and the same F₂ population^4,5,14,17 used in this study. From 23 different QTL intervals, we identified 12 SNPs overlapping with seven of them (see the QTLs bolded in Supplementary Tables S3 and S5 to check the QTL list). It is worth mentioning the SNPs located near the QTL regions, or flanking regions (see Supplementary Fig. S3). On GGA1, for exemple one SNP (marker 6, see Supplementary Spreadsheet S1) associated with feed intake (P = 3,83E-07) overlaped with one QTL mapped for the same trait⁵⁰ in another population, and also with 6 QTLs mapped for body weight at different ages⁵⁶ and one with feed efficiency⁵⁷. On the other hand, on GGA4, a well studied chromosome in chickens^{4,9,11,12,14,16,20,23,24,25,54,58,59,60,61,62,63}, we identified three SNPs composing the haplotype 3 (markers 21–23), in which one was associated with BW35 (P < 7.86E-07) and suggestively associated with BW41 (P < 1.57E-05), and two SNPs suggestively associated with BW41 (P < 1.57E-05). These three SNPs overlapped with one QTL region previously mapped in this same population for these same traits⁴ (QTL_IDs from ChickenQTLdb = 7157; 7162 and 7185). The boundary SNPs from haplotypes 2 and 3 (Fig. 3(d)) are separated by a short distance (less than 4 Mbps), but these QTLs are not linked, beside they have effect on the same traits (BW35 and BW41) (see Supplementary Spreadsheet S1) in our F₂ population. It is worth to mention, the haplotype 2 that overllaped with QTLs mapped for different BW in different ages^{52,53,64,65,66,67} and growth^13,52,67 traits in different populations.

To the best of our knowledge, we showed the application of the CornellGBS PstI-derived SNPs to a GWAS for the first time in chickens. We showed a strategy, changing filtering criteria and subsequent genotype imputation, to increase the number of reliable SNPs to be analyzed. We found 13 SNPs indicating new regions associated with performance traits, mainly in microchromosomes, that have not been previously reported. We improved the available information about loci controlling performance traits and we refined these regions to discover novel candidate regions to be explored. Finally, by demonstrating that GBS is a valid strategy for QTL mapping in a species that has genome sequence and SNP panel available, we can argue the validity of GBS in species without genome resources.

Methods

All experimental protocols employed in the present study that relate to animal experimentation were performed in accordance with the resolution number 010/2012 approved by the Embrapa Swine and Poultry Ethics Committee on Animal Utilization to ensure compliance with international guidelines for animal welfare.