Original Research Article

Molecular Psychiatry (2005) 10, 384–392. doi:10.1038/sj.mp.4001589 Published online 28 September 2004

Association analysis of mild mental impairment using DNA pooling to screen 432 brain-expressed single-nucleotide polymorphisms

L M Butcher1, E Meaburn1, P S Dale2, P Sham1, L C Schalkwyk1, I W Craig1 and R Plomin1

  1. 1Social, Genetic and Developmental Psychiatry Centre, Institute of Psychiatry, London, UK
  2. 2Communication Science & Disorders, University of Missouri-Columbia, USA

Correspondence: LM Butcher, Social, Genetic and Developmental Psychiatry Centre, Box Number P082, Institute of Psychiatry, De Crespigny Park, London, SE5 8AF, UK. E-mail: l.butcher@iop.kcl.ac.uk

Received 17 June 2004; Revised 2 August 2004; Accepted 11 August 2004; Published online 28 September 2004.



We hypothesize that mild mental impairment (MMI) represents the low extreme of the same quantitative trait loci (QTLs) that operate throughout the distribution of intelligence. To detect QTLs of small effect size, we employed a direct association strategy by genotyping 432 presumably functional nonsynonymous single-nucleotide polymorphisms (nsSNPs) identified from public databases on DNA pools of 288 cases and 1025 controls. In total, 288 MMI cases were identified by in-home administration of McCarthy Scales of Children's Abilities to 836 twin pairs selected from a community sample of more than 14 000 children previously screened for nonverbal cognitive delay using parentally administered tests. Controls were selected from the community sample representing the full range of nonverbal intelligence. SNPs showing at least 7% allele frequency differences between case and control DNA pools were tested for their association with the full range of nonverbal intelligence using five DNA subpools, each representing quintiles of the normal quantitative trait scores from the 1025 controls. SNPs showing linear associations in the expected direction across quintiles using pooled DNA were individually genotyped for the 288 cases and 1025 controls and analyzed using standard statistical methods. One SNP (rs1136141) in HSPA8 met these criteria, yielding a significant (P=0.036) allelic frequency difference between cases and controls for individual genotyping and a significant (P=0.013) correlation within the control group that accounts for 0.5% of the variance. The present SNP strategy combined with DNA pooling and large samples represents a step towards identifying QTLs of small effect size associated with complex traits in the postgenomic era when all functional polymorphisms will be known.


SNPs, DNA pooling, QTLs, mild mental impairment

Mental retardation is included as a symptom for more than 200 mapped and identified gene mutations1 and many chromosomal causes of mental retardation are also known,2 including microdeletions.3 These single-gene mutations and chromosomal anomalies often cause severe mental retardation.4 Although severe mental retardation has drastic consequences for the affected individual, mild mental impairment (MMI) has a larger cumulative effect on society because many more individuals are affected. MMI is defined in terms of low cognitive performance, in contrast to traditional diagnoses of mental retardation, which also add criteria about adaptive behavior.5 Even though most MMI individuals can live independently and hold a job, the prevalence of MMI is an issue of increasing concern as society continues to become more technologically dependent.6

Two large family studies suggest that MMI is familial.7, 8 The first twin study, based on the present sample, showed that MMI is substantially heritable.9 The twin study also indicated genetic links between MMI and the normal range of variation in intelligence, a finding compatible with the quantitative trait locus (QTL) hypothesis that MMI is caused by the same multiple genes that operate throughout the distribution.10

Molecular genetic analyses of MMI have not been previously reported. We used a case–control design based on presumably functional (ie, affecting a gene's protein product) single-nucleotide polymorphisms (SNPs), which greatly increases power to detect QTLs of small effect because the hypothesis of direct association between the SNP and MMI can be tested.11 In contrast, indirect association involves nonfunctional polymorphisms that are in linkage disequilibrium with a functional QTL that is associated with the disorder; the power to detect the indirect association between the polymorphism and the trait diminishes rapidly as a function of the distance between the polymorphism and the QTL.12, 13 Another advantage is that, when a direct association is detected, it is a reasonable starting assumption that the functional QTL is the SNP itself, rather than embarking on the difficult task of identifying the functional QTL as in the case of indirect association or linkage.

Eventually, all functional polymorphisms will be identified and will be used to scan the genome for direct association, an approach called sequence-based association.14 In the meantime, a step in this direction is to use all currently available putative functional polymorphisms in the hope that some affect MMI. One class of polymorphisms likely to be functional is nonsynonymous SNPs (nsSNPs), SNPs located within the coding region of a gene that result in a change in amino-acid sequence. Unlike polymorphisms in promoter regions and other regulatory regions for which functionality is difficult to demonstrate, coding sequence variants that result in an amino-acid substitution are likely to produce at least subtle differences in protein structure and function even though such functionality is difficult to prove definitively.15

In order to use available nsSNPs for genome scans of complex disorders, we searched public databases for nsSNPs that met our criteria for utility for QTL analysis of common complex disorders. PicSNP16 is a listing of SNPs from dbSNP that are nonsynonymous according to a systematic analysis of the assembled human genome sequence. NCBI's dbSNP17 – the primary SNP repository – on the other hand, combines different observations of the same polymorphism, along with available frequency data and physical map information. Many entries have a 'function-class' attribute, of which one possible value is 'coding-nonsynon' denoting the SNP class as nonsynonymous. By cross-referencing PicSNP and dbSNP databases, we were able to create a marker-set of potentially informative nsSNPs for our study on MMI. We identified 432 SNPs that met a series of criteria for use in QTL mapping. These criteria stipulated that the SNP should be found in a sample of at least 40 white Caucasian chromosomes with a minor allele frequency no less than 10%, and in proven genes that are brain expressed.

DNA pooling greatly expedites genotyping large numbers of SNPs for large samples by pooling DNA for individuals in each group such as cases and controls.18, 19 We used DNA pooling to screen for case–control differences for the 432 SNPs for 288 MMI cases and 1025 representative controls. Triplicate DNA pools were constructed for the cases and for the controls in order to ensure the replicability of DNA pools. Allele frequency estimates from the triplicate pools of cases and of controls were averaged. SNPs that yielded at least 7% allele frequency differences (a nominally significant difference at P<0.05) in the case–control comparisons were then tested for their association with the normal range of variation in intelligence using five DNA subpools of approximately 205 individuals representing quintiles of the normal distribution derived from the 1025 controls. SNPs selected on the basis of their case–control difference using pooled DNA that also showed QTL association throughout the normal range of intelligence as indicated by pooled DNA for quintiles were individually genotyped to confirm their association with MMI using standard statistical techniques.


Materials and methods

Case and control selection

Case and control children were selected from a community sample of more than 14 000 children in over 7000 families for whom nonverbal cognitive data (see below) were available in the Twins Early Development Study (TEDS; see Trouton et al.20). Identification and assessment methods used in the TEDS have been previously reported.9, 21 Briefly, the community sample was screened at the age of 4 years using the Parent Report of Children's Abilities (PARCA;22, 23 which assesses nonverbal and verbal abilities by both parental report and parentally administered tests. PARCA scores were used to screen for potential cases (524 pairs) and controls (312 pairs), who were then assessed in their homes on the nine nonverbal scales of the McCarthy Scales of Children's Abilities (MSCA; see, McCarthy24) and a test battery of nine diverse language tasks, administered separately to each twin by a different tester. The composite of the nine nonverbal scales was used to index MMI. The nine nonverbal scales of the MSCA were standardized to zero mean and unit variance and summed to create a nonverbal score. This nonverbal score was used to select cases and controls. Nonverbal MMI cases were selected as scoring in the bottom 15% of the control distribution (for details, see Viding et al25], yielding 644 individuals in 438 families. Only one member of a twin pair was selected, the twin with the lowest nonverbal score. Cases were also excluded on the basis of three additional criteria: DNA not available, severe medical problem, not of white origin. These criteria resulted in 288 MMI cases used in the present study. In all, 1025 controls representing the full range of nonverbal intelligence were selected based on PARCA scores at 4 years of age with the same exclusionary criteria except that controls were also excluded if their co-twin was a case, so that case and control groups were genetically independent.

SNP selection

As indicated earlier, PicSNP (http://plaza.umin.ac.jp/~hchan
) is a web-based catalogue of nsSNPs that have been derived from build 101 (December 2001) of the draft human genome sequence, as made available by the National Centre of Biotechnology Information (NCBI; http://www.ncbi.nlm.nih.gov/). A total of 12 664 html files were downloaded from PicSNP containing a total of 17 833 unique SNP identifiers (rs numbers). The only practical way of joining PicSNP with dbSNP (http://www.ncbi.nlm.nih.gov/SN
) was to download the entire contents of dbSNP. The dbSNP data set we downloaded was based on build 101 (December 2001) of the draft human genome sequence and is available from the file transfer protocol server in several formats (ftp://ftp.ncbi.nlm.nih.gov/snp
). The XML version is explicit, human-readable and straightforward, but bulky. The database occupies 1 Gb of disk space in gzip-compressed form. It can be filtered using simple Perl filter scripts with the XML::Parser module by Cooper and Wall (http://search.cpan.org/author/
). Example scripts as well as the SNP lists (including primer information) can be found on our web site (http://sgdp.iop.kcl.ac.uk/oleo

nsSNPs were selected using the following eight criteria: exclusion of nsSNPs with no heterozygosity value, exclusion of nsSNPs with uncertain function class, exclusion of nsSNPs not identified in a white Caucasian population, exclusion of nsSNPs with minor allele frequency less than 10% (although rarer allele frequencies could be important, our design would not have sufficient power to detect them as QTLs), exclusion of nsSNPs found in samples smaller than 20 individuals, exclusion of nsSNPs mapped to more than one position, exclusion of nsSNPs not found in known genes, and exclusion of nsSNPs in genes not expressed in human brain.

The final list contained 432 putative nsSNPs. The PCR primers were picked using Primer326 and have annealing temperatures predicted by Primer3 (50 mM salt) of 58plusminus1°C. The SNaPshot™ primers were designed using our own experimental software (http://sgdp.iop.kcl.ac.uk/cgi-
) and have annealing temperatures between 50 and 60°C. The distribution of nsSNPs across each chromosome was consistent with gene density relative to that particular chromosome.

DNA pool construction

Genomic DNA for each individual, extracted from buccal swabs,27 was diluted in TE (0.01 M Tris, 0.001 M EDTA) based on prior spectrophotometry readings to approximately 40 ng/mul. These dilutions were diluted further to 25 ng/mul using fluorimetry (employing PicoGreen®dsDNA quantitation reagent Cambridge Bioscience, UK) and then diluted again resulting in standardized concentrations of 10plusminus0.5 ng/mul.

DNA pools for cases and controls were constructed using a novel approach that we have called 'DNA subpooling'. Rather than constructing a single pool of 288 cases and another single pool of 1025 controls, we built up towards the case and control pools by creating smaller subpools that were compared genetically using a multiplex of 12 highly polymorphic microsatellite markers in order to check the quality of pool construction. In addition, the use of subpools makes it possible to conduct pooling experiments that extract more information from the data. For example, the pool of 288 cases was built from four subpools that subdivided the MMI sample by gender and by comorbidity with low language so that it would be possible to use these subpools to examine associations by gender and by comorbidity status. Each subpool was constructed in triplicate to form three technical replicates for use in analysis of the subpools or the main screening pool. The four subpools were combined to create the main screening pool for cases. Similarly, for the control pool of 1025 individuals, 10 subpools were created by quintile (based on nonverbal intelligence scores) and then gender. The subpools were combined to create the main control DNA pool.

Genotyping procedure

PCR amplification was performed in a final volume of 11 mul, containing approx10 ng pooled genomic DNA, 2.5 mM MgCl2, 10 mM dNTPs, 3 pM of each PCR primer and 1.6 U Taq. PCR was performed on an MJ Research thermal cycler with a touchdown protocol (95°C for 5 min, 29 cycles of 95°C for 45 s, 62°C for 45 s (-0.4°C per cycle), 72°C for 45 s, followed by a final incubation step of 72°C for 5 min). Genotyping was performed using SNaPshot™ (Applied Biosystems) and following the standard protocol provided by the manufacturer. Estimation of relative allele frequency in pools was achieved by measuring peak heights generated in GeneScan® Analysis Software and visualized in Genotyper® Software.

Experimental procedure

Case and control DNA pools were genotyped for 432 SNPs in triplicate. SNPs showing a minor allele frequency less than 10% in our samples were rejected from the subsequent analysis. SNPs yielding allele frequency differences between MMI case and control DNA pools of at least 7% (which guarantees a nominal chi2 level of significance of P<0.05 with our sample sizes) were selected for follow-up testing within the control group using DNA pools representing quintiles of the control group. SNPs showing linear associations across the quintiles in the same direction as the case–control analysis were individually genotyped (KBiosciences, UK) to confirm previous stages of replication and to test their putative association with MMI using standard statistical methods.

For individually genotyped SNPs, pooled DNA estimates of relative allele frequency for cases and controls, as well as quintile subpools, were corrected by a factor K28 to allow us to validate DNA pooling estimates of allele frequency with individual genotyping results. K is the ratio of A and B, derived by K=A/B, and is applied using the equation: Â=A/(A+KB), where A and B are the peak heights of the alleles A and B, respectively, and  is the corrected frequency of allele A. The application of K to estimate absolute allele frequencies of DNA pools is important as uncorrected estimates are subject to measurement bias arising from a number of sources (the relative contributions of which are still largely unknown). For instance, heterozygotes should theoretically yield biallelic measurements that are of equal intensity, as they have an equal number of copies of the two alleles. However, departures from this expectation can occur due to unequal amplification,29 differential efficiencies in the incorporation of the ddNTPs30 using the SNaPshot™ method, and unequal emission energies for the different fluorescent dyes.18 Such differences in amplification of the two alleles can bias absolute estimates of allelic frequencies from DNA pools. For SNPs that were individually genotyped, we calculated K by individually genotyping 7–19 known heterozygous individuals. (See Le Hellard et al31 for a more detailed account of the estimation of K and its use in correcting allele frequency estimates based on pooled DNA.) For any given SNP, each pool is adjusted by a constant, K, therefore any change in the difference between two groups is, for the most part, negligible. For this reason we did not correct at the case–control stage or the quintile stage because we were interested in assessing the relative difference between pools.



Verification of quality of subpool construction

In order to determine whether there were any systematic differences between subpool triplicates due to pool construction, we compared the ratio of peak heights for each of the 12 microsatellites genotyped as a multiplex. Examination of the overlaid triplicate allele image patterns for each of the 14 subpools (four case subpools and 10 control subpools; see above) across each of the 12 microsatellites (total of 168 allele image patterns) revealed no systematic difference due to pool construction between each of the subpool triplicates for any of the 14 subpools.

MMI case–control analysis

Of the 432 nsSNPs examined, 21 were either monomorphic or had a minor allele too rare to be detected given our sample sizes. A total of 14 SNP assays failed repeated PCR attempts and were omitted from further analysis. Two SNP assays were omitted from the outset due to primer design complications. In all, 52 SNPs showed minor allele frequency estimates less than 10% in either the cases or controls and were removed from further analysis. Thus, 343 of the original 432 SNPs were used in the following analyses.

Allele frequencies as indexed by the peak heights were highly correlated across the three technical replicate DNA pools for cases and controls, indicating the reliability of DNA pooling. For cases, the correlations among the three replicate DNA pools were 0.992, 0.989, and 0.990; within controls, the correlations were 0.991, 0.991, and 0.992. The mean allelic frequency difference between the replicate pools was 0.017 for cases and 0.017 also for the controls.

Figure 1 shows a scatter plot comparing allelic frequencies for cases and controls for the 343 SNPs (inside dashed line). The correlation (r=0.967) is very high, which serves as an index analogous to genomic control indicating the comparability of cases and controls. The mean allele frequency absolute difference between cases and controls was 0.031, which is larger than the average difference of 0.017 between replicate pools of cases and controls. The dotted lines indicateplusminus7% allelic frequency differences between cases and controls. All 30 SNPs falling outside these boundaries were nominally significant using chi2 analysis (P<0.05, 1 df).

Figure 1.
Figure 1 - Unfortunately we are unable to provide accessible alternative text for this. If you require assistance to access this image, please contact help@nature.com or the author

Scatter plot for SNPs common allelic frequencies for the average of three technical replicates for MMI cases and controls. SNPs falling outside the dashed lines (indicating a minor allele frequency less than 10%) were omitted from subsequent analysis. Dotted lines indicate plusminus7% boundaries for allele frequency estimates between cases and controls. All SNPs outside these boundaries showed significant allelic differences using a chi2 test (P<0.05).

Full figure and legend (19K)

QTL hypothesis testing using quintile DNA subpools

The 30 nsSNPs showing at least 7% relative allele frequency differences between cases and controls were further tested against the QTL hypothesis by genotyping subpools representing quantitative trait score quintiles within the control group. As explained later, a 7% difference between cases and controls is nominally significant (P=0.05) but does not protect against multiple testing; however, the essence of our design is to screen liberally for differences between cases and controls and then to test a greatly reduced number of SNPs nominated in this way for their association within the control sample. Of these 30 SNPs, six showed good fits of data across the quintiles in the same direction as the case–control outcome and whose common allele frequency in the cases was more extreme than the common allele frequency of quintile 1: rs1136141, rs4236, rs4760, rs760482, rs8345, rs917012. For example, for rs1136141, the common allele frequencies were 0.83 for quintile 1 (low scores) and 0.76 for quintile 5 (high scores), which corresponds to the direction of the results found for MMI cases (lowest scores) (0.87) and controls (0.80). Figure 2 shows the pooled DNA results for the quintiles superimposed on the case–control results for these six SNPs. The results for the quintiles and the regression line of best fit across the five data points generally show, as expected, that Quintile 1 is not as severe as cases, because individuals in the lowest 16% of the distribution were excluded from the control group. The average of the allelic frequency estimates for the quintiles is, as expected, generally similar to the allelic frequency of the controls.

Figure 2.
Figure 2 - Unfortunately we are unable to provide accessible alternative text for this. If you require assistance to access this image, please contact help@nature.com or the author

(a–f) Quintile results for pooled DNA for six SNPs that yielded differences in the same direction as the case–control data. The case–control differences are superimposed on the quintile results. The line of best fit for the five quintiles is shown as a solid line.

Full figure and legend (102K)

Individual genotyping

It should be emphasized that DNA pools were used merely to screen SNPs as possible QTLs for MMI in case–control and quintile comparisons that would be tested using data from individual genotyping. The individual genotyping data provides robust statistical tests of association for the case–control and quintile comparisons. Of the six SNPs selected for individual genotyping based on their preliminary results using DNA pools, one SNP (rs1136141) retained a significant case–control difference using individual genotyping data (chi2=4.53, P=0.033). Using DNA pools to estimate allele frequency, rs1136141 yielded a case–control difference of 0.069 (chi2=14.3, P=0.0002). Table 1 indicates the allele and genotype distribution for SNP rs1136141 in cases and controls using individual genotyping data.

The difference between cases and controls for the common allele (G) was 0.035—half the difference found between cases and control pools (0.069)—which probably reflects DNA pooling error as seen in the average absolute case–control difference of 0.031 mentioned earlier in relation to Figure 1. Case–control differences were also consistently smaller for individual genotyping compared to DNA pooling estimates in the other five SNPs as shown in Table 2. None of these other SNPs yielded significant chi2 differences using individual genotyping data although two were in the expected direction but were not significant (rs8345, P=0.150; rs917012, P=0.206). Despite rs8345 and rs917012 showing case–control differences of similar magnitude to rs1136141, the nonsignificant differences observed for these two SNPs are a reflection of the fact that the allele frequencies for these two SNPs are less extreme than those for rs1136141.

We confirmed the status of rs1136141 as a QTL within the control group. Table 3 and Figure 3 show the nonverbal intelligence scores for the three genotypic groups. The results show a linear relationship between number of G alleles and lower scores. Using an additive genetic model (AA=0, AG=1, GG=2), the correlation between the nonverbal intelligence score and rs1136141 genotype was -0.070 (P=0.013, one-tailed), the negative sign indicating lower scores for the G allele. The effect size for this SNP is estimated as 0.5% (r2=0.005). The remaining five SNPs were not significant in these individual genotyping analyses.

Figure 3.
Figure 3 - Unfortunately we are unable to provide accessible alternative text for this. If you require assistance to access this image, please contact help@nature.com or the author

rs1136141: mean standardized nonverbal intelligence scores by genotype for individuals in the control sample. The error bars represent plusminus1 SE of the mean. Using an additive genetic model, the correlation between genotype and quantitative trait score is significant (r=-0.070, P=0.013, one-tailed). The standardized scores are greater than zero on average because the lowest scoring individuals were removed as cases.

Full figure and legend (8K)

Table 2 also supports the validity of the DNA pools as compared to individual genotyping, especially with the K correction. The mean difference between the common allele frequency estimate for DNA pools and individual genotyping was 0.020 for the control group and 0.056 for the much smaller group of cases.



The hypothesis that MMI represents the extreme of normal variation rather than a 'broken brain' has far-ranging implications for neuroscience as well as for diagnosing, treating, and preventing mental impairment. This hypothesis underlies the QTL approach to molecular genetic research on common disorders.10 Rather than assuming that MMI is due to a concatenation of rare single-gene or chromosomal causes, the QTL hypothesis predicts that MMI is caused by the same multiple genes that operate throughout the distribution. Stated more starkly, there is no distinct disorder of MMI, just the low end of the normal bell-shaped distribution of cognitive ability.

Employing a three-stage strategy using (1) DNA pooling for a case–control design and (2) using quintiles within the control group followed by (3) individual genotyping, we identified a possible QTL for MMI in the heat–shock cognate protein 8 gene (HSPA8). HSPA8 is located on the long arm of chromosome 11 and consists of nine exons and spans approximately 5 kb. HSPA8 is primarily a house-keeping gene, involved in maintaining nascent polypeptides in a semifolded state to enable them to pass through mitochondrial and endoplasmic reticulum membranes.32 HSPA8 is believed to play a minor role alongside other heat–shock protein family members (such as HSP60, HSP90, and GRP78). This suggests that polymorphisms in HSPA8 may result in subtle changes in cellular efficiency or gene expression. Although rs1136141 was selected as a putative nsSNP, we now know that rs1136141 occurs in the untranslated region (UTR) of HSPA8. (This issue of misclassified nsSNPs is discussed later.) Nonetheless, rs1136141 may be functional because SNPs in UTRs are often involved in regulating protein synthesis. It is of course possible that rs1136141 is not the functional QTL responsible for the association with MMI but rather that it is in linkage disequilibrium with a functional QTL. This may be the reason why the G allele of rs1136141 contributes such a small relative risk (1.35) to MMI in the case–control comparisons and why it has such a small association (r=0.07) with IQ scores throughout the normal distribution of controls. However, because the distribution of QTL effect sizes is not known for any complex trait, it is also possible that most QTLs will account for less than 1% of the variance.

Few previous studies have had the statistical power to break this '1% QTL barrier'.33 The present study indicates that the '1% QTL barrier'33 can be broken, using DNA pooling and large sample sizes. Regardless of whether rs1136141 is actually the functional QTL, its replicated association with MMI and with normal variation throughout the distribution of nonverbal intelligence scores was significant even though it accounts for only 0.5% of the variance. It should be noted that attempts to replicate effects of this magnitude in an unselected sample require samples of 1000 to reach 80% power ((P<0.01, two-tailed.34)

The obvious advantage of DNA pooling is that it makes it possible to screen for QTL associations of small effect size using large samples of cases and controls for large numbers of markers. In our study, triplicate technical DNA replicate pools indicated that the average allelic frequency difference between the replicate pools was 0.017. We found that we were able to detect reliable allelic differences between pooled DNA for cases and controls when the difference was approximately double this average difference, on the order of 0.07, which is nominally significant (P<0.05) given our sample sizes. The chi2 analysis with one degree of freedom yields nominal significance levels of P<0.05 for differences as low as of 0.03 when the minor allele frequency is 10%. When we verified the case–control results from DNA pooling using individual genotyping, we found that the median allelic frequency difference between cases and controls was roughly halved.

These calculations indicate that we are near the limits of sensitivity of DNA pooling for detecting allele frequency differences between groups such as cases and controls, despite our considerable efforts to obtain accurate quantifications of each individual's DNA prior to pooling. In our current research, we are creating independent subpools to reduce the standard error of the mean rather than using technical replicates of DNA pools of the same individuals.19, 35

It should also be noted that in the case–control screening we used a nominal level of significance (P<0.05) rather than correcting for multiple testing of 343 SNPs as an attempt to balance false positive and false negative results in the search for QTLs of small effect size. Although 17 SNPs would be expected to show case–control differences with P<0.05 on the basis of chance alone (and we found 30 significant), the second stage of the study using quintiles from the control groups was designed to exclude false positive findings.

Concerning the issue of the accuracy of DNA pooling as compared to individual genotyping, we have replicated results from other studies28, 31 showing that K correction improves the accuracy of DNA pooling. K corrects for the preferential allele amplification for DNA pools that occurs for the majority of SNaPshot™ assays. However, because analyses of pooled DNA from cases and controls involve relative comparisons rather than estimates of absolute allelic frequencies, this correction procedure is less important at the screening level of analysis. If we treated the case–control study as a more definitive analysis, then it may have been advantageous to correct every SNP with K. The problem with this is that it incurs increased genotyping costs and labor. Another practical difficulty in calculating K is that numerous heterozygotes need to be identified a priori for each SNP assayed, which would be daunting in our study of 432 SNPs, let alone in future studies involving tens of thousands of SNPs.

Eventually the problem of unequal allelic amplification will be understood to the point that K will be predictable, thus obviating the need to locate and genotype heterozygotes. At present, though, not enough is known about unequal amplification of alleles and further research is warranted. Until then the genomic community may benefit from a public database with K data for all SNPs using a variety of genotyping platforms.

Nonsynonymous SNPs

Not only rs1136141 but most SNPs chosen as nonsynonymous on the basis of information available at the beginning of 2002 are now known not to be nonsynonymous due to improvements in the accuracy of genome annotation. When SNPs were harvested from PicSNP, they were assumed to be nonsynonymous in function based on the latest build of NCBI's draft human genome sequence. Now fewer than a quarter of our SNP set remain nsSNPs. The continued honing of each genome assembly is gradually reducing these problems and providing user-friendly and publicly accessible Internet sites that will make it possible to conduct studies using true nsSNPs. An important priority of sequence-based research14 is the classification of nsSNPs based on their impact on the structure and function of proteins such as folding, interaction sites, solubility, and stability.15 It is likely that these data will soon be available on public databases, but currently the identification of nsSNPs is a reasonable starting point for a SNP-based screen founded on direct association.

As we indicated earlier, nsSNPs are not the only source of potentially functional QTLs. Unlike promoters and other regulatory genes, it is at least now straightforward to identify nsSNPs. However, QTLs associated with complex traits are not limited to genes or promoter regions of genes. For example, noncoding RNA sequences called microRNA act as genes by producing RNA molecules that regulate gene expression directly, rather than being translated into amino-acid sequences.36 We look forward to the day when all functional polymorphisms of any kind are available for use in direct association studies. In the meantime, we have successfully used pooled DNA on the Affymetrix GeneChip® Mapping 10 K array Xba 131.37 Although these microarrays do not focus on functional polymorphisms, over 40% of the SNPs on the microarray are within the 2% of DNA that encompasses genes and thus provide sufficiently dense markers of genes for systematic screens for indirect association.



  1. Inlow JK & Restifo LL. Molecular and comparative genetics of mental retardation. Genetics 2004; 166: 835−881. | Article | PubMed | ChemPort |
  2. Plomin R, DeFries JC, McClearn GE & McGuffin P. Behavioral Genetics 4th edn. Worth Publishers: New York 2001;.
  3. Knight SJL, Regan R, Nicod A, Horsley SW, Kearney L & Homfray T et al.. Subtle chromosomal rearrangements in children with unexplained mental retardation. Lancet 1999; 354: 1676−1681. | Article | PubMed | ISI | ChemPort |
  4. Winnepenninckx B, Rooms L & Kooy RF. Mental retardation: a review of the genetic causes. Br J Dev Disab 2003; 49: 29−44.
  5. Plomin R. Genetic research on general cognitive ability as a model for mild mental retardation. Int Rev Psychiatry 1999; 11: 34−36. | Article |
  6. Gottfredson LS. G, jobs and life. In: Nyborg H (ed)The Scientific Study of General Intelligence Pergamon: Amsterdam 2003; 293−342.
  7. Nichols PL. Familial mental retardation. Behav Genet 1984; 14: 161−170. | Article | PubMed | ChemPort |
  8. Reed EW & Reed SC. Mental Retardation: a Family Study Saunders: Philadelphia 1965;.
  9. Spinath F, Harlaar N, Ronald A & Plomin R. Substantial genetic influence on mild mental impairment in early childhood. Am J Ment Retard 2004; 109: 34−43. | Article | PubMed |
  10. Plomin R, Owen MJ & McGuffin P. The genetic basis of complex human behaviors. Science 1994; 264: 1733−1739. | PubMed | ISI | ChemPort |
  11. Carlson CS, Eberle MA, Kruglyak L & Nickerson DA. Mapping complex disease loci in whole-genome association studies. Nature 2004; 429: 446−452. | Article | PubMed | ISI | ChemPort |
  12. Kruglyak L. Prospects for whole-genome linkage disequilibrium mapping of common disease genes. Nat Genet 1999; 22: 139−144. | Article | PubMed | ISI | ChemPort |
  13. Reich DE, Cargill M, Bolk S, Ireland J, Sabeti PC & Richter DJ et al.. Linkage disequilibrium in the human genome. Nature 2001; 411: 199−204. | Article | PubMed | ISI | ChemPort |
  14. Botstein D & Risch N. Discovering genotypes underlying human phenotypes: past successes for Mendelian disease, future approaches for complex disease. Nat Genet 2003; 33: 228−237. | Article | PubMed | ISI | ChemPort |
  15. Sunyaev S, Ramensky V & Bork P. Towards a structural basis of human non-synonymous single nucleotide polymorphisms. Trends Genet 2000; 16: 198−200. | Article | PubMed | ISI | ChemPort |
  16. Chang H & Fujita T. PicSNP: a browsable catalog of nonsynonymous single nucleotide polymorphisms in the human genome. Biochem Biophys Res Commun 2001; 287: 288−291. | Article | PubMed | ChemPort |
  17. Sherry ST, Ward M & Sirotkin K. dbSNP-database for single nucleotide polymorphisms and other classes of minor genetic variation. Genome Res 1999; 9: 677−679. | PubMed | ISI | ChemPort |
  18. Norton N, Williams NM, Williams HJ, Spurlock G, Kirov G & Morris DW et al.. Universal, robust, highly quantitative SNP allele frequency measurement in DNA pools. Human Genet 2002; 110: 471−478. | Article | ChemPort |
  19. Sham P, Bader JS, Craig I, O'Donovan M & Owen M. DNA pooling: a tool for large-scale association studies. Nat Rev Genet 2002; 3: 862−871. | Article | PubMed | ISI | ChemPort |
  20. Trouton A, Spinath FM & Plomin R. Twins Early Development Study (TEDS): a multivariate, longitudinal genetic investigation of language, cognition and behaviour problems in childhood. Twin Res 2002; 5: 444−448. | Article | PubMed |
  21. Colledge E, Bishop DVM, Dale P, Koeppen-Schomerus G, Price TS & Happé F et al.. The structure of language abilities at 4 Years: a twin study. Dev Psychol 2002; 38: 749−757. | Article | PubMed |
  22. Oliver B, Dale PS, Saudino K, Petrill SA, Pike A & Plomin R. The validity of parent-based assessment of non-verbal cognitive abilities of three-year olds. Early Child Dev Care 2002; 172: 337−348. | Article |
  23. Saudino KJ, Dale PS, Oliver B, Petrill SA, Richardson V & Rutter M et al.. The validity of parent-based assessment of the cognitive abilities of two-year-olds. Br J Dev Psychol 1998; 16: 349−363.
  24. McCarthy D. McCarthy Scales of Children's Abilities The Psychological Corporation: New York 1972;.
  25. Viding E, Price TS, Spinath FM, Bishop DV, Dale PS & Plomin R. Genetic and environmental mediation of the relationship between language and nonverbal impairment in 4-year-old twins. J Speech, Lang Hear Res 2003; 46: 1271−1282.
  26. Rozen S & Skaletsky HJ. Primer3 on the WWW for general users and for biologist programmers. In: Krawetz S, Misener S (eds)Bioinformatics Methods in Molecular Biology Humana Press: New Jersey 2000; 365−386. | ChemPort |
  27. Freeman B, Smith N, Curtis C, Huckett L, Mill J & Craig I. DNA from buccal swabs recruited by mail: evaluation of storage effects on long-term stability and suitability for multiplex polymerase chain reaction genotyping. Behav Genet 2003; 33: 67−72. | Article | PubMed | ChemPort |
  28. Hoogendoorn B, Norton N, Kirov G, Williams N, Hamshere ML & Spurlock G et al.. Cheap, accurate and rapid allele frequency estimation of single nucleotide polymorphisms by primer extension and DHPLC in DNA pools. Hum Genet 2000; 107: 488−493. | Article | PubMed | ISI | ChemPort |
  29. Liu Q, Thorland EC & Sommer SS. Inhibition of PCR amplification by a point mutation downstream of a primer. Biotechniques 1997; 22: 292−300. | PubMed | ChemPort |
  30. Barnard R, Futo V, Pecheniuk N, Slattery M & Walsh T. PCR bias toward the wild-type k-ras and p53 sequences: implications for PCR detection of mutations and cancer diagnosis. Biotechniques 1998; 25: 684−691. | PubMed | ISI | ChemPort |
  31. Le Hellard S, Ballereau SJ, Visscher PM, Torrance HS, Pinson J & Morris SW et al.. SNP genotyping on pooled DNAs: comparison of genotyping technologies and a semi automated method for data storage and analysis. Nucleic Acids Res 2002; 30: e74. | Article | PubMed |
  32. Tavaria M, Gabriele T, Anderson RL, Mirault ME, Baker E & Sutherland G et al.. Localization of the gene encoding the human heat shock cognate protein, HSP73, to chromosome 11. Genomics 1995; 29: 266−268. | Article | PubMed | ChemPort |
  33. Plomin R, DeFries JC, Craig IW & McGuffin P. Behavioral genetics. In: Plomin R, DeFries JC, Craig IW, McGuffin P (eds).Behavioral Genetics in the Postgenomic Era American Psychological Association: Washington, DC 2003; 3−15.
  34. Cohen J. Statistical Power Analysis for the Behavioral Sciences 2nd edn. Hillsdale, Lawrence Erlbaum Associates: New Jersey 1988;.
  35. Zou G & Zhao H. The impacts of errors in individual genotyping and DNA pooling on association studies. Genet Epidemiol 2004; 26: 1−10. | Article | PubMed |
  36. Eddy SR. Non-coding RNA genes and the modern RNA world. Nat Rev Genet 2001; 2: 919−929. | Article | PubMed | ISI | ChemPort |
  37. Butcher LM, Meaburn E, Liu L, Hill L, Al-Chalabi A & Plomin R et al.. Genotyping pooled DNA on microarrays: a systematic genome screen of thousands of SNPs in large samples to detect QTLs for complex traits. Behav Genet 2004; 4: 549−555. | Article |


We thank all the parents and twins who have contributed time and effort to the Twins Early Development Study (TEDS) for making this study possible. We are also grateful to KBiosciences, UK for their proficient individual genotyping efforts. This work was supported in part by UK Medical Research Council grant G9424799.



These links to content published by NPG are automatically generated


Genes, brain and cognition

Nature Neuroscience News and Views (01 Dec 2001)