Abstract
With the rise of high-throughput sequencing technology, traditional genotyping arrays are gradually being replaced by sequencing technology. Against this trend, Illumina has introduced an exome genotyping array that provides an alternative approach to sequencing, especially suited to large-scale genome-wide association studies (GWASs). The exome genotyping array targets the exome plus rare single-nucleotide polymorphisms (SNPs), a feature that makes it substantially more challenging to process than previous genotyping arrays that targeted common SNPs. Researchers have struggled to generate a reliable protocol for processing exome genotyping array data. The Vanderbilt Epidemiology Center, in cooperation with Vanderbilt Technologies for Advanced Genomics Analysis and Research Design (VANGARD), has developed a thorough exome chip–processing protocol. The protocol was developed during the processing of several large exome genotyping array-based studies, which included over 60,000 participants combined. The protocol described herein contains detailed clustering techniques and robust quality control procedures, and it can benefit future exome genotyping array–based GWASs.
This is a preview of subscription content, access via your institution
Access options
Subscribe to this journal
Receive 12 print issues and online access
$259.00 per year
only $21.58 per issue
Buy this article
- Purchase on Springer Link
- Instant access to full article PDF
Prices may be subject to local taxes which are calculated during checkout
Similar content being viewed by others
References
Samuels, D.C. et al. Finding the lost treasures in exome sequencing data. Trends Genet. 29, 593–599 (2013).
Guo, Y. et al. Exome sequencing generates high quality data in non-target regions. BMC Genomics 13, 194 (2012).
Abecasis Lab. Exome Chip Design Wiki Site (http://genome.sph.umich.edu/wiki/Exome_Chip_Design).
Szatkiewicz, J.P. et al. Detecting large copy number variants using exome genotyping arrays in a large Swedish schizophrenia sample. Mol. Psychiatry 18, 1178–1184 (2013).
Huyghe, J.R. et al. Exome array analysis identifies new loci and low-frequency variants influencing insulin processing and secretion. Nat. Genet. 45, 197–201 (2013).
McElroy, J.J. et al. Maternal coding variants in complement receptor 1 and spontaneous idiopathic preterm birth. Hum. Genet. 132, 935–942 (2013).
Moura, R. et al. Exome analysis of HIV patients submitted to dendritic cells therapeutic vaccine reveals an association of CNOT1 gene with response to the treatment. J. Int. AIDS Soc. 17, 18938 (2014).
Seddon, J.M. et al. Rare variants in CFI, C3 and C9 are associated with high risk of advanced age-related macular degeneration. Nat. Genet. 45, 1366–1370 (2013).
Mosley, J.D. et al. Mechanistic phenotypes: an aggregative phenotyping strategy to identify disease mechanisms using GWAS data. PLoS ONE 8, e81503 (2013).
Psaty, B.M. et al. Cohorts for Heart and Aging Research in Genomic Epidemiology (CHARGE) Consortium: design of prospective meta-analyses of genome-wide association studies from 5 cohorts. Circ. Cardiovasc Genet. 2, 73–80 (2009).
Grove, M.L. et al. Best practices and joint calling of the HumanExome BeadChip: the CHARGE Consortium. PLoS ONE 8, e68095 (2013).
Perreault, L.P. et al. Comparison of genotype clustering tools with rare variants. BMC Bioinform. 15, 52 (2014).
Ritchie, M.E., Liu, R., Carvalho, B.S. & Irizarry, R.A. Comparing genotyping algorithms for Illumina's Infinium whole-genome SNP BeadChips. BMC Bioinform. 12, 68 (2011).
Nelson, S.C., Doheny, K.F., Laurie, C.C. & Mirel, D.B. Is 'forward' the same as 'plus'?...and other adventures in SNP allele nomenclature. Trends Genet. 28, 361–363 (2012).
Illumina. “TOP/BOT” Strand and “A/B” Allele (http://res.illumina.com/documents/products/technotes/technote_topbot.pdf).
Zhang, Y. et al. Rare coding variants and breast cancer risk: evaluation of susceptibility loci identified in genome-wide association studies. Cancer Epidemiol. Biomarkers Prev. 23, 622–628 (2014).
Abecasis, G.R. et al. An integrated map of genetic variation from 1,092 human genomes. Nature 491, 56–65 (2012).
The International HapMap Project. Nature 426, 789–796 (2003).
Ilumina. Infinium Genotyping Data Analysis (http://res.illumina.com/documents/products/technotes/technote_infinium_genotyping_data_analysis.pdf).
University Medical Center. BioVU (https://victr.vanderbilt.edu/pub/biovu/).
Price, A.L. et al. Principal components analysis corrects for stratification in genome-wide association studies. Nat. Genet. 38, 904–909 (2006).
Purcell, S. et al. PLINK: a tool set for whole-genome association and population-based linkage analyses. Am. J. Hum. Genet. 81, 559–575 (2007).
Goldstein, J.I. et al. zCall: a rare variant caller for array-based genotyping: genetics and population analysis. Bioinformatics 28, 2543–2545 (2012).
Dunnett, C.W. A multiple comparison procedure for comparing several treatments with a control. J. Am. Stat. Assoc. 50, 1096–1121 (1955).
Wittke-Thompson, J.K., Pluzhnikov, A. & Cox, N.J. Rational inferences about departures from Hardy-Weinberg equilibrium. Am. J. Hum. Genet. 76, 967–986 (2005).
Kruskal, W.H. & Wallis, W.A. Use of ranks in one-criterion variance analysis. J. Am. Stat. Assoc. 47, 583–621 (1952).
Conover, W.J., Johnson, M.E. & Johnson, M.M. A comparative study of tests for homogeneity of variances, with applications to the outer continental shelf bidding data. Technometrics 23, 351–361 (1981).
Guo, Y. et al. Multi-perspective quality control of Illumina exome sequencing data using QC3. Genomics 103, 323–328 (2014).
Perreault, L.P. et al. Comparison of genotype clustering tools with rare variants. BMC Bioinform. 15, 52 (2014).
Guthridge, J.M. et al. Two functional lupus-associated BLK promoter variants control cell-type– and developmental-stage–specific transcription. Am. J. Hum. Genet. 94, 586–598 (2014).
Wu, C., DeWan, A., Hoh, J. & Wang, Z. A comparison of association methods correcting for population stratification in case-control studies. Ann. Hum. Genet. 75, 418–427 (2011).
Gomes, I. et al. Hardy-Weinberg quality control. Ann. Hum. Genet. 63, 535–538 (1999).
Hosking, L. et al. Detection of genotyping errors by Hardy-Weinberg equilibrium testing. Eur. J. Hum. Genet. 12, 395–399 (2004).
Hong, H. et al. Assessing batch effects of genotype-calling algorithm BRLMM for the Affymetrix GeneChip Human Mapping 500 K array set using 270 HapMap samples. BMC Bioinform. 9 (suppl. 9), S17 (2008).
Acknowledgements
Development of this protocol is supported by Cancer Center Support Grant (CCSG) nos. (P30 CA068485) and R01CA158473. We thank M. Bjoring for editorial support.
Author information
Authors and Affiliations
Contributions
Y.G. wrote the manuscript and designed the protocol with J.L.; S.Z., J.H., H.W., Q.S. and X.Z. contributed to script writing and generated the figures and tables; Y.S. and D.C.S. provided intellectual contributions to the overall design of the protocol.
Corresponding author
Ethics declarations
Competing interests
The authors declare no competing financial interests.
Integrated supplementary information
Supplementary Figure 1 Example clusters relevant to the Step 34A of the Procedure.
Cartesian plot of the same two SNPs shown in Figure 6. The x-axis denotes normalized intensity value of A allele. The y-axis denotes normalized intensity value of B allele.
Supplementary information
Supplementary Figure 1
Example clusters relevant to the Step 34A of the Procedure. (PDF 965 kb)
Supplementary Table 1
Markers distributions of major SNP arrays. (XLSX 9 kb)
Supplementary Table 2
SNPs that do not match HG19 plus strand after perform Step 24 and 25. (XLSX 12 kb)
Supplementary Table 3
HAPMAP trio samples used in the example study. (XLSX 15 kb)
Supplementary Table 4
Supplementary script list. (XLSX 10 kb)
Supplementary Table 5
Resource files used in the protocol. (XLSX 9 kb)
Supplementary Table 6
AIMs on the exome chip. (XLSX 40 kb)
Supplementary Table 7
Identical and triallelic SNPs on the exome chip. (XLSX 57 kb)
Supplementary Table 8
SNPs with different alleles between the exome chip and the 1000 Genomes Project. (XLSX 21 kb)
Rights and permissions
About this article
Cite this article
Guo, Y., He, J., Zhao, S. et al. Illumina human exome genotyping array clustering and quality control. Nat Protoc 9, 2643–2662 (2014). https://doi.org/10.1038/nprot.2014.174
Published:
Issue Date:
DOI: https://doi.org/10.1038/nprot.2014.174
This article is cited by
-
Clinical associations with a polygenic predisposition to benign lower white blood cell counts
Nature Communications (2024)
-
Prostate cancer genetic risk and associated aggressive disease in men of African ancestry
Nature Communications (2023)
-
The association between disability progression, relapses, and treatment in early relapse onset MS: an observational, multi-centre, longitudinal cohort study
Scientific Reports (2023)
-
Risk SNP in a transcript of RP11-638I2.4 increases lncRNA–YY1 interaction and pancreatic cancer susceptibility
Archives of Toxicology (2023)
-
Establishing analytical validity of BeadChip array genotype data by comparison to whole-genome sequence and standard benchmark datasets
BMC Medical Genomics (2022)
Comments
By submitting a comment you agree to abide by our Terms and Community Guidelines. If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.