With abundant sequencing data, falling prices and mature genotyping platforms, researchers have more options than ever to explore the connections between genes and phenotype.
Key to these studies are single nucleotide polymorphisms (SNPs). Millions of human SNPs have been discovered in recent years—over 6 million from The International HapMap Project alone in its first 3 years. This effort has enabled researchers “to look intelligently at the genome,” says Stephen Chanock, chief of the laboratory of translational genomics and director of the core genotyping facility at the National Cancer Institute in Bethesda, Maryland, USA. Companies have also responded, offering genotyping tools to accommodate varying sample throughputs, multiplexing capabilities and chemistries.
Musing on the blistering pace of genotyping innovation over the past 7 years, Chanock, who is also senior author of one of three recent prostate cancer gene-association studies, says, “It really is like a revolution, a dynamic process that continues on.”
What is in a SNP?
The most common form of genetic variation between individuals, SNPs occur once every 1,000 bases or so (Box 1). Millions of these variants are indexed in the National Center for Biotechnology Information's dbSNP database, covering organisms from Anopheles gambiae to Zea mays.
Not just any base difference between two individuals is a SNP; a variation is only called a polymorphism if it occurs in 1% or more of the population. “If the polymorphism is stable, then it is a SNP,” explains Richard Leach, vice president for scientific services at deCODE Genetics in Reykjavik, Iceland. “If it arises de novo in an individual and isn't propagated in the population, then it is a mutation.”
Most SNPs occur outside protein-coding regions and thus are phenotypically silent—the equivalent of mile markers on the side of the highway; others ('nonsynonymous SNPs') affect protein sequence. Both types of SNPs can serve as landmarks in the search for genes associated with disease, drug response and complex phenotypes.
Battle of the chips
The most efficient way to link a SNP with phenotype is the so-called genome-wide association study, in which hundreds of thousands or even millions of polymorphisms are scanned per sample. The tools of the trade are DNA microarrays, and researchers have mostly aligned behind two competing technologies from San Diego–based Illumina and Affymetrix of Santa Clara, California, USA.
Affymetrix's Genome-Wide Human SNP Array 6.0 includes probes for 906,600 SNPs and 946,000 non-polymorphic copy-number probes; Illumina's High Density Human 1M-Duo chip probes more than 1 million polymorphic genomic features on each of two samples, all of which may also be used for copy-number analysis (Box 2).
Despite their similarity in format, size and application, the two products differ substantially. For one thing, the Illumina arrays use 50-mer oligos, one per SNP—compared to Affymetrix's 25-mers, of which there are about 4–6 replicate probes per allele. In addition, although Illumina's Infinitum assay, which runs on its 1M-Duo chip, uses single-base extension with a labeled base to call the SNP, Affymetrix's calls are based exclusively on differential hybridization.
But the most important distinction involves the two platforms' SNP-selection strategies. Illumina's probes are based almost entirely on haplotype-tagging 'tagSNPs' identified by the International HapMap Consortium. Only about half of the SNP probes on the Affymetrix array are tagSNPs, however; the rest are 'unbiased' SNPs chosen to cover the genome while accommodating sequence restraints imposed by the assay itself. Affymetrix's protocol includes a “complexity reduction” step involving selection of relatively small (200–1,100 bp) restriction fragments before hybridization. Effectively, only SNPs located within these regions can be monitored, though the company says the assay still provides 90% genomic coverage, at least in Caucasian and Asian populations.
“There is a certain amount of bias in the selection and amplification,” says Jessica Tonani, Affymetrix's associate director of DNA product marketing. “But the purpose is to cover all the common haplotypes, and with our current design, we are able to sample one, and often more than one, tag for each common haplotype.”
Researchers definitely have their favorite platforms, whether governed by convenience, price or content. But Stacey Gabriel, director of the genetic analysis platform at the Broad Institute in Cambridge, Massachusetts, USA, whose facility uses both platforms and who was involved in the development of the Affymetrix 6.0 array, says the question of content is largely overblown. “You can make a big deal about SNP-selection strategy, but ultimately that is not what predicts success,” she says, “especially now, when we live in a world where we genotype 1 million SNPs.”
Instead, she says, success in genome-wide association studies is governed by statistical power, which comes from increasing sample numbers. Although detecting strong associations requires relatively few samples, it may take thousands of samples to tease out lower-penetrance effects. Typically, for cost reasons, that is accomplished by performing multistage studies. In Chanock's prostate cancer study, for instance, researchers scanned half-a-million SNPs in 1,150 affected individuals and 1,150 normal controls, followed by a subset analysis of 27,000 markers in another 8,000 individuals.
Gabriel says her facility can process “about 2,000 whole-genome samples per week.” Though the Broad Institute has invested in both Affymetrix and Illumina platforms, it has historically been a larger-volume user of Affymetrix chips—she says, a decision that was driven largely by “the desire to maximize the number of samples that could be successfully scanned for a given budget.”
deCODE, which processes as many as 10,000 samples per month, favors Illumina arrays, says Leach, citing their “higher call rate” and “better information content.”
Kimberly Doheny, assistant director of the Center for Inherited Disease Research (CIDR) at Johns Hopkins University School of Medicine in Baltimore, Maryland, USA, also prefers Illumina, though she uses both platforms.
That decision dates back to 2003, she explains, when her lab compared a 10,000 SNP array from Affymetrix to a 6,000 SNP product from Illumina. “When we compared the two, Illumina was a lot cheaper and a lot more flexible. We could do custom and off-the-shelf products with the same equipment, and use the same chemistry.”
Doheny's lab processed some 70,000 samples for CIDR in 2007, she says, including both off-the-shelf Illumina genome-wide association study arrays and custom Illumina products called iSelect arrays, which are physically identical to Infinium arrays and can include anywhere from 6,000 to 60,000 SNPs per sample, with 12 samples per chip. This throughput clearly places Doheny's lab at the higher end of the multiplexing spectrum. But even for the many biologists who are not considering genome-wide association studies, SNP technology, in a variety of low-multiplexing flavors, has also made an impressive difference.
One tube, one SNP
SNP technology development “has been a godsend to those of us with smaller budgets in wildlife genetics,” says Jim Seeb of the School of Aquatic and Fishery Sciences at the University of Washington, Seattle, USA. To look at a handful of SNPs, Seeb uses Applied Biosystems' of Foster City, California, USA, PCR-based TaqMan chemistry for his research into the migration of pacific salmon—genotyping fish by the thousands to help manage the American and US-Canadian treaty fisheries.
TaqMan probes are designed to hybridize to a specific SNP allele, with a different 5′ fluorophore color for each allele. As a specific color or both colors light up during amplification, the genotype at the particular SNP can be easily determined.
TaqMan is currently a singleplex reaction: one tube, one SNP. It can be multiplexed to 3 or 4 SNPs per reaction with additional fluorescent colors, but according to Phoebe White, senior director of genotyping applications for Applied Biosystems, it's not likely to multiplex further, as that would require new chemistries and more sophisticated readers.
Workflow enhancements have emerged, though. In November, Applied Biosystems announced a collaboration with Woburn, Massachusetts, USA–based BioTrove to develop an integrated platform for high-throughput genotyping based on BioTrove's OpenArray architecture, which is capable of 3,072 33-nl PCR reactions on a single microscope slide.
Seeb's lab, which used to handle close to 1,500 384-well PCR plates per month at a cost of nearly $250,000 per year, has reduced its costs thanks to its acquisition of a Biomark system from Fluidigm of South San Francisco, California, USA. Using the Biomark along with Fluidigm's 48.48 Dynamic Array 'integrated fluidic circuit' consumable, Seeb's lab can run 2,304 TaqMan reactions—the equivalent of six 384-well plates—simultaneously, using nanoliter reagent and sample volumes. Spending on TaqMan reagents is down about 98%, he says.
Raymond Miller, assistant research professor in genetics and head of the SNP research facility at Washington University in St. Louis, uses another singleplex SNP assay in his lab, one of several genotyping facilities on campus.
Developed at Washington University in St. Louis by Pui-Yan Kwok, who is now at the University of California, San Francisco, and commercialized by Perkin Elmer of Waltham, Massachusetts, USA as the Acycloprime-FP SNP Detection system, fluorescence polarization-template-directed dye incorporation (FP-TDI) is a single-base extension technology, sometimes called mini-sequencing.
“The selling point of the technology is it is extremely flexible in design, [requiring] three plain vanilla primers,” says Miller. Two primers are used to amplify the SNP-containing sequence; the third hybridizes one nucleotide upstream of the SNP.
After amplification, the third primer is added, along with fluorescent nucleotide terminators corresponding to the two alleles and a polymerase.
Detection is based on the different fluorescent polarization properties of the incorporated and unincorporated nucleotides. “The free dye in solution is a small molecule and tumbles quickly,” Miller explains. “When you shine polarized fluorescent light on it, that causes the light to become unpolarized, which the machine can detect with filters. If the dye gets incorporated, that's a much bigger molecule, so the light comes back as largely polarized.”
“What FP-TDI is very good at is detecting a small number of SNPs with a fair-sized number of samples,” says Miller, whose lab is set up to run sixteen 384-well plates' worth of reactions per day, using a PerkinElmer EnVision fluorescence polarization reader. “Our typical user is doing pilot studies,” he says. “They are looking for an association, but using a limited number of candidate genes.”
But Miller says demand for his facility's services have fallen off lately, as researchers avail themselves of other, more multiplexed genotyping services on campus—especially for genome-wide association studies.
Through the golden gate...
In addition to genome-wide association studies, Johns Hopkins' Doheny also uses a completely different Illumina chemistry for genotypes at the lower end of the multiplexing spectrum—the GoldenGate assay.
The assay requires three oligonucleotides, two of which are specific for the two SNP alleles; the third is a 'locus-specific oligo', which is tagged with a nucleic acid barcode to identify the reaction. Once the allele- and locus-specific oligos have hybridized to the genomic DNA, they are linked using DNA polymerase and ligase, PCR-amplified using fluorescently labeled oligos, and bound to one of 1,536 beads (each complementary to one of the barcodes in the locus-specific oligo) for genotype calling.
“The bead defines the assay, and the color defines the base call,” explains Carsten Rosenow, senior marketing manager for DNA analysis products at Illumina.
GoldenGate's applications dovetail with its unique combination of the level of SNP multiplexing and sample throughput, and include both validating genome-wide association studies analyses and 'candidate-gene studies', in which one or more particular genes is being specifically tested for association with some phenotype.
Another lower-level multiplexing option is the iPLEX Gold assay from San Diego–based Sequenom, which typically runs a 36-plex format, according to chief scientific officer Charles Cantor. Sequenom's MassARRAY mass spectrometer, which processes the reactions, can accommodate two 384-position matrix-assisted laser desorption/ionization (MALDI) target plates at once and handle about 10 plates per day, he says, meaning users can process in excess of 138,000 SNPs daily.
Like FP-TDI, iPLEX is a single-base-extension assay. After PCR across the SNP and annealing of a third primer, which binds one position upstream of the SNP, a pool of 4 terminator bases is added, one of which is enzymatically incorporated depending on the SNP. Genotype calls are based on the mass of the resulting product. “With terminator nucleotides that differ in mass by at least 12 daltons,” Cantor says, “you don't need a high-resolution instrument. This is like shooting fish in a pond as far as mass spec is concerned.”
According to Cantor, Sequenom positions iPLEX as the technology of choice for such second-tier applications as validating hits after (first-tier) genome-wide association studies. That is because arrays typically are too expensive to run on many, many samples, whereas singleplex technologies like TaqMan are too cumbersome to use for many SNPs. “What we increasingly find is that most users use an array and follow up with the Sequenom platform because that's the most cost-effective way,” he says. “Arrays are not flexible, whereas Sequenom is very flexible.”
The Broad Institute has four Sequenom systems to complement its Affymetrix and Illumina instrumentation. “Each [platform] is dedicated to certain things,” says Gabriel. “For instance, Sequenom is very well suited in our hands for very highly targeted genotyping experiments.”
“For us, over 500 is kind of a breakpoint,” she explains. “It's more cost-effective to do Illumina [over 500], and below that, it is more effective to do Sequenom. You balance cost and throughput.”
Other technical details must also be weighed when selecting a genotyping platform, says Panos Deloukas, senior investigator and head of genotyping at the Wellcome Trust Sanger Institute in Hinxton, Cambridge, UK.
TaqMan and iPLEX Gold, for instance, require an initial amplification step. “Because they start with PCR amplification, they do suffer from the intrinsic failure rate of PCR. That can vary from lab to lab, but a 3% failure rate is quite the norm,” he says. That means call rates can suffer somewhat with these approaches.
But he also notes that although chip-based assays can boast call rates above 99%, they are relatively cumbersome and expensive to optimize and retool. “This is the price you pay to operate at one end of the spectrum versus the other,” Deloukas says.
Still, with so many options available, researchers can always find the right tool to meet their needs. And that will surely lead to ever more advances making their way into the genetics literature.
Chanock says it could not be a better time to be in genomics: “I get up every morning and can't wait to get to work and see what's going on.” See Table.
About this article
Molecular diversity of sunflower populations maintained as genetic resources is affected by multiplication processes and breeding for major traits
Theoretical and Applied Genetics (2017)