SNP discovery and genotyping using Genotyping-by-Sequencing in Pekin ducks

Zhu, Feng; Cui, Qian-Qian; Hou, Zhuo-Cheng

doi:10.1038/srep36223

Download PDF

Article
Open access
Published: 15 November 2016

SNP discovery and genotyping using Genotyping-by-Sequencing in Pekin ducks

Feng Zhu¹,
Qian-Qian Cui¹ &
Zhuo-Cheng Hou¹

Scientific Reports volume 6, Article number: 36223 (2016) Cite this article

3783 Accesses
19 Citations
Metrics details

Subjects

Abstract

Genomic selection and genome-wide association studies need thousands to millions of SNPs. However, many non-model species do not have reference chips for detecting variation. Our goal was to develop and validate an inexpensive but effective method for detecting SNP variation. Genotyping by sequencing (GBS) can be a highly efficient strategy for genome-wide SNP detection, as an alternative to microarray chips. Here, we developed a GBS protocol for ducks and tested it to genotype 49 Pekin ducks. A total of 169,209 SNPs were identified from all animals, with a mean of 55,920 SNPs per individual. The average SNP density reached 1156 SNPs/MB. In this study, the first application of GBS to ducks, we demonstrate the power and simplicity of this method. GBS can be used for genetic studies in to provide an effective method for genome-wide SNP discovery.

PHARP: a pig haplotype reference panel for genotype imputation

Article Open access 25 July 2022

Development of a multi-species SNP array for serrasalmid fish Colossoma macropomum and Piaractus mesopotamicus

Article Open access 29 September 2021

A highly flexible and repeatable genotyping method for aquaculture studies based on target amplicon sequencing using next-generation sequencing technology

Article Open access 06 May 2019

Introduction

The domestic duck is an economically important agriculture animal and is consumed worldwide, especially in Asia¹. Meanwhile, duck is also a suitable material for population genetics and evolutionary studies^2,3. However, there is a limitation for population genetics studies and genomic selection due to the lack of a duck-specific DNA chip platform. Reduced-representation methods using restriction enzymes for the digestion reduce the genome complexity and are suitable for assaying SNPs from large numbers of samples with high reproducibility and low per-sample cost⁴. Genotyping by sequencing (GBS) is one such highly efficient strategy for genome-wide SNP detection and this approach has been successfully applied to aquatic, plants, and animals like chicken, pig and cattle^{5,6,7,8,9,10,11,12}. In this study, we developed a GBS strategy (Fig. 1) and applied it for SNP detection in the domestic Pekin duck, and evaluated the results by PCR-RFLP. Our theoretic analysis and experimental data showed this is a low cost and effective method for discovering SNPs in animal genomes for which chip microarrays are not yet available.

Results

The selection of restriction enzyme

We used 11 commonly used restriction enzymes to conduct an in silico digestion study. The results of simulated digestion are illustrated in Fig. 2A. Tag number is one important index for evaluating enzyme digestion performance. MseI digestion was predicted to achieve more tags than other enzymes. The smooth tag size distribution curve for MseI supported its choice as a good candidate for this GBS study (Fig. 2B). Aside from the tag number, the genome-wide distribution of tags is another characteristic for enzyme selection. Theoretic results also suggest that MseI is better than other enzymes and has an even distribution pattern across the genome (Fig. 2C, Supplementary Figs S1 and S2). From consideration of tags on repeat regions and degenerative sites, MseI also achieved the highest tag number (Supplementary Table S1). Based on these analyses, MseI is the best candidate enzyme, from the eleven considered, for a GBS study in ducks. Regarding reads length and sequencing depth, 500 bp tag size would be suitable for this study. There are 211,898 tags whose length ranged from 400–500 bp in the in silico study (Fig. 2B).

SNP discovery

A total of 544 million clean reads (63.25 Gb) were generated and 96.12% (523 million reads) of these were mapped to the duck genome with an average mapping rate of 96.25% (Supplementary Table S2). In total, about 13% of the genome was covered with tags, compared with 6.9% coverage predicted from in silico digestion. In total, 49,413 of GBS fragments were detected; tag length ranged from 39830 to 4230 bp, with median length 465 bp. Individual data information is shown in Supplementary Table S3. A total of 169,209 high-confidence SNPs were retained from all samples, with a mean of 55,920 SNPs identified for each individual (40,897 to 63,927) (Supplementary Table S4). Among the called genotypes, the number of SNPs was counted at 10-kb window size along the pseudo-chromosome displayed in Fig. 3. The SNP density reached 1156 SNPs/Mb, with an average of 41.23 SNPs identified for each fragment. The mean of the SNP missing rate was 5.64% and only 4 samples had a missing rate greater than 20% (Supplementary Table S4).

SNP validation

To validate results, we performed an in silico study with 50 randomly selected SNPs. After considering chromosomal distribution of SNPs and suitable enzyme digestion loci, we chose 24 SNPs to perform PCR-RFLP analysis. These SNPs were randomly distributed in the duck genome with approximately one selected site per chromosome. The results of PCR-RFLP assay are illustrated in Supplementary Table S5. A total of 982 sites were identified in PCR-RFLP and 280 SNPs were found, of which 90% (251/280) of identified SNPs were concordant with GBS’s results. Moreover, 94% (921/982) of genotypes were consistent with the GBS results. All the SNPs found using GBS were successfully validated by PCR-RFLP. Although randomicity and incomplete sequence coverage could lead to inconsistent results between GBS and PCR-RFLP, the PCR-RFLP assay results showed that the SNP library obtained by GBS were highly credible.

Discussion

Genomic selection and genome-wide association studies need 10⁵ to 10⁶ SNPs. However, many non-model species do not have reference chips for detecting variation. To solve this issue, one option is to generate more sequencing data as sequencing costs continue to fall quickly. The GBS method has great potential for application to the genomes of agricultural animals, as an alternatives to chip platforms. Pertille and coworkers sequenced 462 chicken using GBS method and identified 67,096 SNP with a 4.66% coverage of whole genome¹¹. A pig sequencing experiment detected putative SNPs with an average density of 0.33 SNPs/10 Kb⁹. Additionally 63,797 SNPs were identified in a cattle study¹². In this study, about 0.5-1X genome coverage data were obtained and the SNP density was found to be 1156 SNPs/Mb. Compared to other agricultural animals, our results showed excellent performance and high coverage for digested tags in ducks. The performance was slightly lower than that of the chicken study due to the population size of our study and the quality of the genome. A few individuals with relatively high missing rate (>20%) can be rescued using imputation methods when the sample size is large^13,14, especially in the designing genomic selection study.

We observed that the ratio of genome coverage was a mean of 13%, higher than the 6.9%, predicted coverage. Two possibilities might lead to this result. Firstly, the duck reference genome still has many gaps. In the current duck reference genome, there are more than 70,000 contigs/scaffold. Therefore, the real number of predicted fragments cannot represent the real data until the quality of the reference genome is improved. Another reason is that shorter fragments were used in sequencing, even though a narrow selection range was set using a Pippin system. In practice, higher genome coverage will obtain more SNPs, but with reduced sequencing depth in some loci with the same sequencing data. In summary, we genotyped 49 Pekin ducks using GBS and identified 169,209 confident SNPs. We have demonstrated that GBS is a highly effective method for accessible and low cost genome-wide genotyping.

Methods

Ethics statement

All experiments were performed according to regulations and guidelines established by the Animal Care and Use Committee of China Agricultural University (permit number: DK996). All protocols and procedures were approved by the Beijing Administration Committee of Laboratory Animals under the leadership of the Beijing Association for Science and Technology (permit number: SYXK 2007–0023). Blood was extracted from the wing-vein via vacuum piping, with 75% alcohol/cotton ball for disinfection. All efforts were made to minimize animal suffering during the study.

Samples preparation

Forty-nine Pekin ducks from the same flock were randomly selected at the Beijing Jinxing Golden Star Duck Centre. Birds were fed ad libitum from 0 to 6 weeks. A blood sample was collected from each individual.

DNA extract, library construction, Sequencing

Genomics DNA was extracted from blood using the standard phenol/chloroform method. An in silico digestion of the duck reference genome (BGI 1.0, Ensemble 82) was performed to choose the appropriate restriction endonuclease using R package SimRAD¹⁵. According to the results of simulated digestion, 100 ug genomic DNA was digested with restriction endonuclease MseI, which recognizes a 4-bp sequence (TTAA) and creates a 2-bp overhang (Supplementary Table S6). Then a set of variable barcode adapters that recognize Mse1-compatible sequences were ligated to the digested DNA fragments. The ligation mixture was purified using Ampure XP beads. Fragments ranging from 550 to 580 bp, including adapter sequences, were purified with gel extraction. Next, the restriction fragments were enriched by PCR amplification with adapter-specific primers. The quality evaluation was performed by ABI StepOne Plus. The data of 2 × 125 bp pair-end reads were generated by the Illumina HiSeq2500.

Mapping

The raw reads that had <20 sequence quality score and <50-bp of sequence length were removed, and then barcode sequences were eliminated. The clean sequences were aligned to the duck reference genome (BGI 1.0, Ensemble 82) using Burrows-Wheeler Aligner (BWA) with the default parameters¹⁶. Read grouping and removal of PCR duplicates were done using Picard (http://picard.sourceforge.net.). The data were deposited in the NCBI sequence read archive (SRP068685).

Mutation detection

The genome analysis toolkit (GATK) was used to perform local realignment of reads to correct misalignments, and then to detect the SNPs and call the genotypes (-stand_call_conf 20 -stand_emit_conf 20, other parameters were default)¹⁷. Two criteria were used to identify the SNPs: 1. the missing rate of each locus could not be more than 0.2; 2. the mapping depth of each locus per sample should be more than 4. The information of tags calling si illustrated in Supplementary Table S3

PCR-RFLP

Restriction enzymes for the PCR-RFLP assay were selected using information from REBASE (http://rebase.neb.com)¹⁸. Primers for PCR were designed using Primer-Blast (http://www.ncbi.nlm.nih.gov/tools/primer-blast/)¹⁹. Conditions for the PCR were as follows: 94 °C for 5 min; 34 cycles of 94 °C for 30 s, 60–62 °C for 30 s and 72 °C for 1 min. This wasfollowed by a further 10 min extension at 72 °C. The restriction assay was performed at 37 °C for 2 h.

Additional Information

How to cite this article: Zhu, F. et al. SNP discovery and genotyping using Genotyping-by-Sequencing in Pekin ducks. Sci. Rep. 6, 36223; doi: 10.1038/srep36223 (2016).

Publisher’s note: Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

References

Zhu, F. et al. De novo transcriptome assembly and identification of genes associated with feed conversion ratio and breast muscle yield in domestic ducks. Anim Genet 46, 636–645, doi: 10.1111/age.12361 (2015).
Article CAS PubMed Google Scholar
Hou, Z. C. et al. Genetic structure of Eurasian and North American mallard ducks based on mtDNA data. Anim Genet 43, 352–355, doi: 10.1111/j.1365-2052.2011.02248.x (2012).
Article PubMed Google Scholar
Li, H. F. et al. Origin and genetic diversity of Chinese domestic ducks. Mol Phylogenet Evol 57, 634–640, doi: 10.1016/j.ympev.2010.07.011 (2010).
Article CAS PubMed Google Scholar
Davey, J. W. et al. Genome-wide genetic marker discovery and genotyping using next-generation sequencing. Nat Rev Genet 12, 499–510, doi: 10.1038/nrg3012 (2011).
Article ADS CAS PubMed Google Scholar
Lee, J. et al. Genotyping-by-sequencing map permits identification of clubroot resistance QTLs and revision of the reference genome assembly in cabbage (Brassica oleracea L.). DNA Res, doi: 10.1093/dnares/dsv034 (2015).
Hess, J. E. et al. Use of genotyping by sequencing data to develop a high-throughput and multifunctional SNP panel for conservation applications in Pacific lamprey. Mol Ecol Resour 15, 187–202, doi: 10.1111/1755-0998.12283 (2015).
Article CAS PubMed Google Scholar
Poland, J. A., Brown, P. J., Sorrells, M. E. & Jannink, J. L. Development of high-density genetic maps for barley and wheat using a novel two-enzyme genotyping-by-sequencing approach. PLoS One 7, e32253, doi: 10.1371/journal.pone.0032253 (2012).
Article ADS CAS PubMed PubMed Central Google Scholar
Wang, Z. et al. Genetic diversity and population structure of six Chinese indigenous pig breeds in the Taihu Lake region revealed by sequencing data. Anim Genet 46, 697–701, doi: 10.1111/age.12349 (2015).
Article CAS PubMed PubMed Central Google Scholar
Chen, Q. et al. Genotyping by genome reducing and sequencing for outbred animals. PLoS One 8, e67500, doi: 10.1371/journal.pone.0067500 (2013).
Article ADS CAS PubMed PubMed Central Google Scholar
Li, C. et al. SNP discovery in wild and domesticated populations of blue catfish, Ictalurus furcatus, using genotyping-by-sequencing and subsequent SNP validation. Mol Ecol Resour 14, 1261–1270, doi: 10.1111/1755-0998.12272 (2014).
Article CAS PubMed Google Scholar
Pertille, F. et al. High-throughput and Cost-effective Chicken Genotyping Using Next-Generation Sequencing. Scientific reports 6, 26929, doi: 10.1038/srep26929 (2016).
Article ADS CAS PubMed PubMed Central Google Scholar
De Donato, M., Peters, S. O., Mitchell, S. E., Hussain, T. & Imumorin, I. G. Genotyping-by-sequencing (GBS): a novel, efficient and cost-effective genotyping method for cattle using next-generation sequencing. PLoS One 8, e62137, doi: 10.1371/journal.pone.0062137 (2013).
Article ADS CAS PubMed PubMed Central Google Scholar
Davies, R. W., Flint, J., Myers, S. & Mott, R. Rapid genotype imputation from sequence without reference panels. Nature genetics 48, 965–969, doi: 10.1038/ng.3594 (2016).
Article CAS PubMed PubMed Central Google Scholar
Chan, A. W., Hamblin, M. T. & Jannink, J. L. Evaluating Imputation Algorithms for Low-Depth Genotyping-By-Sequencing (GBS) Data. PLoS One 11, e0160733, doi: 10.1371/journal.pone.0160733 (2016).
Article CAS PubMed PubMed Central Google Scholar
Lepais, O. & Weir, J. T. SimRAD: an R package for simulation-based prediction of the number of loci expected in RADseq and similar genotyping by sequencing approaches. Mol Ecol Resour 14, 1314–1321, doi: 10.1111/1755-0998.12273 (2014).
Article CAS PubMed Google Scholar
Li, H. & Durbin, R. Fast and accurate short read alignment with Burrows-Wheeler transform. Bioinformatics 25, 1754–1760, doi: 10.1093/bioinformatics/btp324 (2009).
Article CAS PubMed PubMed Central Google Scholar
McKenna, A. et al. The Genome Analysis Toolkit: a MapReduce framework for analyzing next-generation DNA sequencing data. Genome Res 20, 1297–1303, doi: 10.1101/gr.107524.110 (2010).
Article CAS PubMed PubMed Central Google Scholar
Roberts, R. J., Vincze, T., Posfai, J. & Macelis, D. REBASE–a database for DNA restriction and modification: enzymes, genes and genomes. Nucleic acids research 43, D298–D299, doi: 10.1093/nar/gku1046 (2015).
Article CAS PubMed Google Scholar
Ye, J. et al. Primer-BLAST: a tool to design target-specific primers for polymerase chain reaction. BMC Bioinformatics 13, 134, doi: 10.1186/1471-2105-13-134 (2012).
Article CAS PubMed PubMed Central Google Scholar

Download references

Acknowledgements

The work was supported by the National Scientific Supporting Projects of China (2015BAD03B06), Beijing Municipal Science & Technology Commission (Z141100002314011) and the Program for Changjiang Scholar and Innovation Research Team in University (IRT1191).

Author information

Authors and Affiliations

Department of Animal Genetics and Breeding, National Engineering Laboratory for Animal Breeding and MOA Key Laboratory of Animal Genetics and Breeding, China Agricultural University, 100193, Beijing, China
Feng Zhu, Qian-Qian Cui & Zhuo-Cheng Hou

Authors

Feng Zhu
View author publications
You can also search for this author in PubMed Google Scholar
Qian-Qian Cui
View author publications
You can also search for this author in PubMed Google Scholar
Zhuo-Cheng Hou
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

Z.-C.H. and F.Z. conceived and designed the experiments. F.Z. and Q.-Q.C. performed the experiments. F.Z. analyzed the data. F.Z. and Z.-C.H. wrote the paper.

Ethics declarations

Competing interests

The authors declare no competing financial interests.

Electronic supplementary material

Supplementary Information

Rights and permissions

This work is licensed under a Creative Commons Attribution 4.0 International License. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in the credit line; if the material is not included under the Creative Commons license, users will need to obtain permission from the license holder to reproduce the material. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/

Reprints and permissions

About this article

Cite this article

Zhu, F., Cui, QQ. & Hou, ZC. SNP discovery and genotyping using Genotyping-by-Sequencing in Pekin ducks. Sci Rep 6, 36223 (2016). https://doi.org/10.1038/srep36223

Download citation

Received: 19 July 2016
Accepted: 11 October 2016
Published: 15 November 2016
DOI: https://doi.org/10.1038/srep36223

This article is cited by

Facilitating population genomics of non-model organisms through optimized experimental design for reduced representation sequencing
- Henrik Christiansen
- Franz M. Heindler
- Isa Schön
BMC Genomics (2021)
Genome-wide association study reveals novel loci associated with feeding behavior in Pekin ducks
- Guang-Sheng Li
- Feng Zhu
- Zhuo-Cheng Hou
BMC Genomics (2021)
Genomic divergence reveals unique populations among Indian Yaks
- Jayakumar Sivalingam
- M. R. Vineeth
- D. Ravikumar
Scientific Reports (2020)
Genome-wide association study reveals novel loci associated with body size and carcass yields in Pekin ducks
- Meng-Ting Deng
- Feng Zhu
- Zhuo-Cheng Hou
BMC Genomics (2019)
Evaluation of genotyping by sequencing for population genetics of sibling and hybridizing birds: an example using Syrian and Great Spotted Woodpeckers
- Artur Gurgul
- Anna Miksza-Cybulska
- Łukasz Kajtoch
Journal of Ornithology (2019)

Comments

By submitting a comment you agree to abide by our Terms and Community Guidelines. If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.