Original Article

Molecular Psychiatry (2008) 13, 729–740; doi:10.1038/sj.mp.4002063; published online 7 August 2007

Quantitative trait locus association scan of early reading disability and ability using pooled DNA and 100K SNP microarrays in a sample of 5760 children

E L Meaburn1, N Harlaar1, I W Craig1, L C Schalkwyk1 and R Plomin1

1Social, Genetic and Developmental Psychiatry Centre, Institute of Psychiatry, King's College, London, UK

Correspondence: Dr EL Meaburn, Social Genetic and Developmental Psychiatry Centre, Institute of Psychiatry, King's College London, Box Number P082, De Crespigny Park, London SE5 8AF, UK. E-mail: e.meaburn@iop.kcl.ac.uk

Received 30 March 2007; Revised 14 June 2007; Accepted 27 June 2007; Published online 7 August 2007.

Top

Abstract

Quantitative genetic research suggests that reading disability is the quantitative extreme of the same genetic and environmental factors responsible for normal variation in reading ability. This finding warrants a quantitative trait locus (QTL) strategy that compares low versus high extremes of the normal distribution of reading in the search for QTLs associated with variation throughout the distribution. A low reading ability group (N=755) and a high reading group (N=747) were selected from a representative UK sample of 7-year-olds assessed on two measures of reading that we have shown to be highly heritable and highly genetically correlated. The low and high reading ability groups were each divided into 10 independent DNA pools and the 20 pools were assayed on 100K single nucleotide polymorphism (SNP) microarrays to screen for the largest allele frequency differences between the low and high reading ability groups. Seventy five of these nominated SNPs were individually genotyped in an independent sample of low (N=452) and high (N=452) reading ability children selected from a second sample of 4258 7-year-olds. Nine of the seventy-five SNPs were nominally significant (P<0.05) in the predicted direction. These 9 SNPs and 14 other SNPs showing low versus high allele frequency differences in the predicted direction were genotyped in the rest of the second sample to test the QTL hypothesis. Ten SNPs yielded nominally significant linear associations in the expected direction across the distribution of reading ability. However, none of these SNP associations accounted for more than 0.5% of the variance of reading ability, despite 99% power to detect them. We conclude that QTL effect sizes, even for highly heritable common disorders and quantitative traits such as early reading disability and ability, might be much smaller than previously considered.

Keywords:

dyslexia, learning disability, DNA pooling, quantitative trait, allelic association, twins

Top

Introduction

Early reading disability, or developmental dyslexia, is by far the most frequently diagnosed form of childhood learning disability1, 2 and shows strong stability into adolescence.3, 4 A twin study of US twins between ages 7 and 18 has shown that reading disabilities are partly due to genetic factors.5 Reading disability and ability in the early school years are also highly heritable, as shown in a study of 3000 pairs of UK 7-year-old twins (the sample used in the present study), with heritabilities of about 60% both for poor readers and for individual differences in reading ability for the entire sample.6 These quantitative genetic analyses indicated that reading disability represents the low extreme of a quantitative trait of reading ability, which supports the hypothesis of multiple susceptibility loci of small effect size, known as quantitative trait loci (QTLs).7 The QTL hypothesis predicts that genes associated with reading disability are also associated with individual differences in reading ability throughout the normal distribution, including the high end of the distribution.8

QTL linkage research on reading disability indicating consistent linkage at 6p has led to a focus on two candidate genes at 6p22, although other less consistent QTL linkages are also being explored.9, 10, 11, 12 Association analyses indicate that the effect size of the two candidate genes at 6p22 is small, yielding significant results only when severely affected individuals are compared to controls.13, 14 Because QTL linkage is likely only to detect QTLs of relatively large effect size, one interpretation of these results is that the high heritability of reading disability and ability is due to QTLs of small effect size that slip under the radar of QTL linkage studies.

Allelic association is more powerful than QTL linkage for detecting QTLs of small effect size,15, 16 although very large samples are needed to detect such QTLS.17, 18, 19 Genome-wide association scans are now possible using single nucleotide polymorphism (SNP) microarrays20, 21 although many issues remain to be resolved such as gene- versus genome-centred approaches, common versus rare variants, sample size and design.22, 23, 24, 25 However, microarrays are expensive, which make them impractical for genotyping the very large samples needed to detect QTLs of small effect size. One economical strategy for screening large samples is to pool DNA for groups such as cases and controls for a disorder or low and high groups for a quantitative trait.26, 27, 28 We have combined the strengths of microarrays and DNA pooling in a method we call SNP microarrays and pools (SNP-MaP). Pooled DNA can be allelotyped reliably on microarrays.29, 30, 31, 32, 33 We have used the SNP-MaP method with a 10K microarray to identify QTLs associated with mild mental impairment in a multistage design that includes confirmation by individual genotyping of SNPs nominated in the SNP-MaP scan.34

In the present study, we apply the SNP-MaP method using 100K SNP microarrays in a three-stage QTL association scan of early reading disability and ability in a representative UK sample of 5760 7-year-old children assessed on two measures of reading that we have shown to be highly heritable and highly genetically correlated6, 7 In the first stage we used pooled DNA to screen for the largest SNP allele frequency differences for more than 100000 SNPs comparing low (N=755) and high (N=747) reading groups. Although many more SNPs are needed for a complete genome-wide association scan35, 36 we used the Affymetrix GeneChip Human Mapping 100K Array Set because the first stage of this study was conducted before the Human Mapping 500K Array Set was available. In Stage 2, we individually genotyped nominated SNPs in an independent sample of low and high reading ability children to confirm and replicate the allele frequency differences observed in the pooling screen (Stage 1). In Stage 3, to test the QTL hypothesis directly, SNPs that yielded reliable allele frequency differences in the predicted direction in Stage 2 were tested for association across the entire distribution of reading abilities in an unselected sample of 4258 children. The goal of this three-stage design was to balance false-positive and false-negative results in the search for QTL associations of small effect size.

Top

Materials and methods

Sample

The sampling frame for this study was the Twins Early Development Study (TEDS), a large-scale longitudinal study of behavioural development in a representative sample of twins born in 1994, 1995 and 1996, who have been followed from infancy through adolescence.37

Stage 1: SNP microarrays and pooling (SNP-MaP) screen of low versus high groups. Stage 1 of the study was based on a subset of 3043 children (one member of each twin pair) from the 1994 and 1995 TEDS cohort for whom DNA and reading data at 7 years of age were available. This sample represented the entire distribution of reading scores.

High and low reading ability individuals were selected using a 25th percentile cutoff of the distribution of a composite reading measure described below, yielding 755 children low in reading ability and 747 children high in reading ability. DNA pools were constructed from the low and high groups (see Laboratory methods section). The choice of percentage cutoffs for selection is guided by quantitative genetic research in TEDS showing that heritability is high for early reading regardless of severity of cutoffs6 and by statistical genetic simulations that show that quartile cutoffs balance the power obtained in DNA pooling studies from using extreme cutoffs and from using large samples.38

Stage 2: Testing SNPS nominated in SNP-MaP by individual genotyping low and high groups. Stage 2 of the study was based on a larger subset of 4258 children from the 1994, 1995 and 1996 TEDS cohort for whom DNA and reading data at 7 years of age were available (reading data became available for the 1996 cohort subsequent to the selection of individuals for Stage 1). As in Stage 1, this sample represents the entire distribution of reading scores. Low and high reading ability children were selected using a 10th percentile cutoff of the composite reading measure, with 425 individuals in each group. The low and high groups in Stage 2 are independent of the low and high groups selected in Stage 1 (that is, individuals are not included in both Stages 1 and 2, although it should be noted that an individual's co-twin may be present in the other stage). See Figure 1 for an overview of the samples.

Figure 1.
Figure 1 - Unfortunately we are unable to provide accessible alternative text for this. If you require assistance to access this image, please contact help@nature.com or the author

Illustration of the samples used in the study and their selection. Sample 1 consisted of low (N=755) and high (N=747) reading ability individuals selected from a foundation sample of 3043 individuals that represent the entire distribution of reading scores. Sample 2 is independent of sample 1 and consisted of low (N=425) and high (N=425) reading ability individuals selected from a second foundation sample of 4258 individuals that represent the entire distribution of reading scores. Sample 3 consisted of 3408 individuals that remained following selection of sample 2 individuals from the second foundation sample of 4258.

Full figure and legend (119K)

Stage 3: Testing SNPS nominated in SNP-MaP by individual genotyping an unselected sample. Stage 3 of the study individually genotyped the rest of the sample from Stage 2 to represent the entire distribution of reading abilities (N=3408, see Figure 1).

To summarize, a total of 4258 children were included in Stages 2 and 3 and represent the entire distribution of reading abilities (3408 plus 425 ‘lows’ and 425 ‘highs’). In conjunction with the independent sample of 1502 low and high reading ability children in Stage 1, a total of 5760 individuals were used in this study.

Measures

At 7 years of age, reading ability was assessed using the Test of Word Reading Efficiency (TOWRE)39 and a teacher assessment of reading based on UK National Curriculum criteria.40 These measures have been shown to be highly reliable, valid and correlated (r=0.69) across the distribution and for reading disabled children at 7 years.41 Heritabilities are substantial for both measures (0.63 and 0.74, respectively) and the genetic correlation between them is 0.79, indicating that the same genes largely affect both measures.42 Together, these results suggest that a composite based on these two measures combines the precision of a test of the key reading processes of fluency and accuracy (TOWRE) and the breadth of reading ability as assessed by year-long teacher assessments using UK National Curriculum criteria.

The TOWRE is a standardized measure of fluency and accuracy in word reading skills, and was administered by telephone.41 The TOWRE includes two timed subtests: sight word efficiency (SWE), which assesses the ability to read aloud real words; and phonemic decoding efficiency (PDE), which assesses the ability to read aloud pronounceable printed non-words. Each subtest consists of a list of words, carefully graded for difficulty, printed on a single page. The child is given 45s to read as many words as possible. The raw score, the total number of words read correctly on each subtest, is thus a measure of both accuracy and fluency. Because the SWE and PDE raw scores are highly correlated (r=0.81), a total TOWRE score, calculated by standardizing and summing the subtest scores, was used.

In the last half of the first year of primary school when most children are 7 years old in the United Kingdom, teachers of each of the children assessed the children's year-long reading performance using the UK National Curriculum criteria for key Stage 1, which is designed for children age 5 through their first year of primary school at age 7.40 Teacher assessments were obtained by postal questionnaires.

A composite of the TOWRE and NC teacher assessments was used for our QTL analyses for reasons discussed above. The TOWRE and NC teacher assessment scores were standardized and summed so that they were weighted equally in the composite, and were age and sex regressed. In line with current theory about reading disability,43 we did not attempt to adjust reading scores for intelligence. We used standard exclusions in TEDS including major medical or perinatal problems, hearing difficulties, autism spectrum disorder and English not being the first spoken language.

Laboratory methods

Stage 1: SNP-MaP screen of low versus high groups. DNA was extracted from buccal swabs44 and DNA pools constructed as described in our previous publications.29, 30, 31, 34 Briefly, each sample was quantified using a spectrophotometer (260nm) and diluted to a target concentration of 50ng/μl. Each sample was then quantified once by fluorimetry (employing PicoGreen dsDNA quantitation reagent; Invitrogen, Molecular Probes, CA, USA) before being diluted further to 25ng/μl and quantified in triplicate. Samples that were accurately quantified (±5%) were accepted for pooling. For both the low (N=755) and high (N=747) reading ability groups, 80ng of each individual's DNA was randomly assigned to 1 of 10 DNA pools, yielding a total of 20 independent pools with 75 individuals on average in each pool. The reason for including multiple independent pools is to allow the use of parametric statistics that includes variance between pools when comparing low and high groups, which also includes technical variation due to pooling and to microarrays. We have previously shown that DNA pools constructed in this way and allelotyped on 100K microarrays yielded allele frequency estimates that correlated 0.97 on average with each other (reliability) and the average allele frequency estimates from the DNA pools correlated 0.97 for 26 SNPs that were genotyped individually on the samples used to construct the pools (validity).31

Each of the 20 DNA pools was allelotyped using the GeneChip Mapping 100K microarray set in accordance with the standard protocol for individual DNA samples.45 The Affymetrix 100K microarray set includes 58960 SNPs using XbaI restriction endonuclease and 57244 SNPs using HindIII restriction endonuclease. Each microarray was scanned using the GeneChip scanner 3000 and GeneChip operating software (GCOS) version 1.1.1 with patch 5. CEL files were generated and subsequently saved and transported as Cabinet (CAB) files using Data Transfer Tool version 1.1 to a workstation that contained GCOS software version 1.2. Using GCOS 1.2, new .cel files were generated and analysed using GeneChip DNA Analysis Software (GDAS) version 3.0.

The raw probe intensity data were exported and DSSNP and relative allele scores (RAS) were generated using the RAS score algorithm as implemented in a script in R that is freely available to download (http://sgdp.iop.kcl.ac.uk/oleo/affy).31 An average of RAS 1 (sense strand) and RAS 2 (antisense strand) values (RASav) was used as a quantitative index of allele frequency. RASav values with a DSSNP value <0.04 were excluded from analysis.

Stages 2 and 3: Testing SNPS nominated in SNP-MaP by individual genotyping

SNPs that were nominated in Stages 2 and 3 were individually genotyped at two facilities: Kbiosciences, UK, who used both competitive allele-specific PCR (KASPar) and TaqMan genotyping assays, and in our laboratory at the Institute of Psychiatry using SNPlex Genotyping System 48-Plex.46

In the representative sample of 4258 individuals (one member of a twin pair) in Stage 3, power to detect additive genetic effects explaining 1.0, 0.5, 0.1% of the total variance of reading scores is 100, 99 and 66%, respectively, uncorrected for multiple testing (P=0.05, one-tailed).

Top

Results

Stage 1: SNP-MaP screen of low versus high groups

SNP-MaP allele frequencies for the 20 DNA pools were calculated. To increase the reliability of SNP-MaP allele frequency estimates, we required a minimum of eight (of 10) SNP-MaP allele frequency estimates for both high and low groups, a criterion met by 107116 SNPs on the 100K microarray set.

The average SNP-MaP allele frequency estimates for the low and high groups were calculated for each SNP and these allele frequency estimates were correlated between the low and high groups. As shown in Figure 2, allele frequency estimates were highly correlated between the low and high groups (Pearson correlation coefficient, r=0.994), which can be viewed as a genome-wide genomic control. Figure 3 indicates that the differences between the low and high groups were correspondingly small, with a mean absolute difference of 0.025 ranging from 0.000 to 0.308.

Figure 2.
Figure 2 - Unfortunately we are unable to provide accessible alternative text for this. If you require assistance to access this image, please contact help@nature.com or the author

Scatterplot of SNP-MaP allele frequency estimates (RASav scores) averaged across DNA pools for low versus high reading groups for 107116 SNPs. A large correlation exists despite some scatter due to the huge number of entries on the diagonal.

Full figure and legend (55K)

Figure 3.
Figure 3 - Unfortunately we are unable to provide accessible alternative text for this. If you require assistance to access this image, please contact help@nature.com or the author

Distribution of absolute differences between low and high SNP-MaP allele frequencies averaged across DNA pools within each group for 107116 SNPs. The inset magnifies differences greater than or equal to 0.10.

Full figure and legend (15K)

A Student's t-test, which tests for mean allele frequency differences between groups as a function of the variance of the 10 pools within each group, was performed for each of 107116 SNPs. Although the goal of Stage 1 (SNP-MaP) was to screen for SNPs that would then be tested for association in Stage 2 rather than to test for statistical significance, 1083 SNPs reached an α level of 0.01 when 1071 would be expected on the basis of chance alone.

The mean absolute difference between the low and high SNP-MaP allele frequency estimates for these 1083 SNPs was 0.079 (ranging from 0.008 to 0.308). High t results can be found for very small mean absolute differences when variance of the 10 DNA pools within each group is even smaller. This occurs especially for low-frequency alleles. For this reason and because power is substantially reduced for low-frequency alleles, SNPs were removed from further analysis if the average SNP-MaP minor allele frequency estimate in one or both groups was less than 10%. This reduced the number of SNPs from 1083 to 780. For the 780 SNPs, the mean absolute difference between the high and low SNP-MaP allele frequency estimates was 0.089 (ranging from 0.028 to 0.308). Finally, the number of nominated SNPs was winnowed further on the basis of power to detect allele frequency differences. The average s.e.m. across the 10 independent DNA pools within each group can be used to estimate power. For the 107116 SNPs for which we had allele frequency estimates for a minimum of 8 of 10 DNA pools, the average s.e.m. was 0.024 for the low group and 0.025 for the high group. Using an s.e.m. of 0.025 to estimate power, our design with 10 DNA pools with 75 individuals in each group provides 80% power (P<0.01, two-tailed) to detect SNP-MaP allele frequency differences of 0.085 between the low and high groups, and 99% power to detect differences of 0.12.47 For this reason, we focus on SNPs whose low–high SNP-MaP allele frequency difference is greater than 0.09, which left 313 SNPs. The 313 SNPs were further reduced to 302 because 11 SNPs were located on the X chromosome and the mixed gender pools complicate the interpretation of these SNPs. The scatterplot of these 302 SNPs is shown in Figure 4. Details about the 302 SNPs can be found in Supplementary Table S1.

Figure 4.
Figure 4 - Unfortunately we are unable to provide accessible alternative text for this. If you require assistance to access this image, please contact help@nature.com or the author

Scatterplot for low versus high groups for 302 autosomal SNPs with t-test Pless than or equal to0.01, minor allele frequency greater than or equal to0.10 and an allele frequency difference >0.09 (for which power exceeds 80%).

Full figure and legend (16K)

Stage 2: Testing SNPS nominated in SNP-MaP by individual genotyping low and high groups

It should be noted that Stage 2 requires replication of low–high differences seen in Stage 1 in an independent sample, and that these differences are in the same direction as Stage 1. Although all 302 candidate SNPs should be tested in Stage 2 in our low and high reading ability sample of 850 children (425 ‘lows’ and 425 ‘highs’), funds were only available to test 75 SNPs. A variety of criteria was used to select a subset of 75 SNPs during the course of the project, but they were in essence arbitrarily chosen and cover the range of SNP-MaP low–high allele frequency differences. Because of this, it is reasonable to assume that these 75 SNPs indicate how well the 302 SNPs as a group would fare had they all been genotyped. Individual genotyping results for the 75 SNPs can be found in Supplementary Table S2.

Of the 75 SNPs, nine (12%) showed significant differences between the low and high groups using a nominal one-tailed α level of 0.05. These SNPs were included in Stage 3. An additional 14 SNPs, although not reaching nominal significance, showed a moderate allele frequency difference (greater than or equal to0.030) between the low and high groups in the expected direction. To decrease the likelihood of false-negative findings, we also included these SNPs in Stage 3.

Stage 3: Testing SNPS nominated in SNP-MaP by individual genotyping an unselected sample

The 23 SNPs nominated in Stage 2 were individually genotyped across the remaining unselected sample of 3408 children to test the QTL hypothesis directly by assessing the extent to which the SNPs are associated with reading ability throughout the distribution.

Genotypic scores were created for each of the 23 SNPs using an additive genotypic model: AA=0, AB=1, BB=2, where A is the allele whose frequency was reported in the SNP-MaP screening stage (that is, the RASav value for the SNP). These additive genotypic scores were then correlated with individual reading scores for the entire sample of 4258 individuals. This correlation provides a simple and powerful test of effect size and the direction of the effect for an additive genetic model. Because only those SNPs that replicate the previous results in the same direction are considered as significant, one-tailed tests of significance were used.

With 23 tests and an α of 0.05, only one significant result would be expected on the basis of chance alone. As shown in Table 1, 10 of the 23 SNPs were significantly associated with individual differences in reading ability throughout the distribution in the same direction as indicated by the SNP-MaP analyses based on pooled DNA from low versus high groups.


The significant correlations are small with an average correlation across the 10 significant SNPs of 0.038; the largest correlation is 0.049. Squaring these correlations (r2) to estimate effect size indicates that these associations account from only 0.078–0.240% of the variance of reading scores. The average effect size is 0.15% and the sum of the effect sizes of the 10 SNPs is 1.5%.

Figure 5 illustrates the results from Table 1 for the 10 significant SNPs in terms of standardized mean quantitative trait reading scores with ±1 s.e.m. for the three SNP genotypes. For all 10 SNPs, the non-overlapping s.e.m.s indicate that the mean reading scores of the two homozygous genotypes differ significantly. The mean reading scores of the heterozygous genotypes are in between the mean reading scores of the homozygous genotypes, indicating intralocus additivity. However, the heterozygous genotype is significantly different from the homozygous genotypes only for rs2192595, which shows the largest correlation (Table 1). Because the reading score is standardized, it provides another way of examining the effect size. For example, the average Z-score differences between the homozygotes for the 10 SNPs is 0.139 and ranges from 0.078 to 0.228, indicating that the homozygotes differ up to 0.228 of a s.d. in their reading scores. This finding indicates that even with very modest correlations such as these, genotypic selection for homozygotes could yield groups that differ importantly in reading. This point takes on greater significance when the SNPs are combined in a ‘SNP set’.

Figure 5.
Figure 5 - Unfortunately we are unable to provide accessible alternative text for this. If you require assistance to access this image, please contact help@nature.com or the author

Standardized composite reading scores for the three genotypes for the 10 significant SNPs. N of each genotypic group is indicated at the top. Error bars show mean±1 s.e.m.

Full figure and legend (66K)

SNP set

The additive genotypic values for the 10 SNPs are uncorrelated because the SNPs are not in linkage disequilibrium with each other. This permits the creation of a composite ‘SNP set’ that aggregates the small effects of each SNP and can be useful in studies that are not sufficiently large to provide the power needed to analyse each SNP separately. As described above, additive genotypic values were coded 0, 1, or 2 for each SNP, with 0 conferring lowest reading ability and 2 conferring highest reading ability. SNP genotypes for the 10 significant associations were summed to produce SNP-set scores from 0 through 20. Only individuals with complete data for all 10 SNPs were included (N=1759), although analyses were also conducted using a missing data option that substituted the population mean for missing SNPs (N=4258). The SNP-set scores were normally distributed (see Figure 6).

Figure 6.
Figure 6 - Unfortunately we are unable to provide accessible alternative text for this. If you require assistance to access this image, please contact help@nature.com or the author

The frequency distribution of ‘SNP-set’ scores for the 10 SNPs significantly associated with reading scores. Only individuals with complete data for all 10 SNPs were included.

Full figure and legend (8K)

The correlation between the SNP-set scores and reading scores is 0.105 (P=0.000005). Using the missing data option, the correlation was lower (r=0.087, P=0.000000002), presumably because of the loss of information, although the significance is greater because of the much larger sample size. Figure 7 plots the standardized reading score against the SNP-set scores for the sample with complete data. It can be seen that the association is linear, which indicates additivity; that is, the SNPs do not appear to interact epistatically in their effect on reading. The standardized reading score difference between the SNP-set scores of 6 and 15 is 0.54 s.d. This difference suggests that genotypic selection based on SNP-set scores could be effective for selecting groups with a genetic liability for reading disability or for reading ability in early childhood, even when the associations of the individual SNPs show weak effect sizes.

Figure 7.
Figure 7 - Unfortunately we are unable to provide accessible alternative text for this. If you require assistance to access this image, please contact help@nature.com or the author

Standardized composite reading scores for SNP-set genotypic scores. N of each genotypic group is indicated. Samples with a SNP-set score of 3, 4 and 18 were not included due to their low N. Error bars show mean±1 s.e.m.

Full figure and legend (16K)

10 SNPs associated with early reading

We have shown 10 SNPs to be reproducibly associated with reading ability in early childhood. None of the 10 SNPs are located in any of the previously identified reading disability regions. It is noteworthy that 7 of the 10 SNPs reside in genes (intronic or 5′-UTR), with the remaining three SNPs located in intergenic regions. None of the SNPs has obvious functional effects. Interestingly, one of the seven genes, TIAM1, the T-cell lymphoma invasion and metastasis 1 gene on chromosome 21q22.11, also contains one of the 302 SNPs (rs723469), which was selected at Stage 1 of the study but which was not one of the 75 SNPs selected for individual genotyping at Stage 2.

To assess whether SNPs that show a significant association in Stage 3 could be predicted by assessing P-values of nearby SNPs in Stage 1, we retrospectively examined the P-values of SNPs located within 1 LDU in either direction of the 10 significant SNPs based upon a recent LD map of the genome.48 Although the coverage of the 100K microarray limits the usefulness of this approach for some SNPs, in the six instances where coverage is good (>5 SNPs in the 2-LDU region), there were other nominally significant SNPs (P<0.05) within the 2-LDU region for four SNPs (rs10485609, rs1842129, rs1323381, rs4754752).

Top

Discussion

Using pooled DNA for low and high reading groups to screen 107116 SNPs for associations with reading, 302 SNPs were nominated for the second stage of QTL testing, which involved individual genotyping an unselected sample of low and high reading ability individuals. For financial reasons, 75 of these 302 SNPs were individually genotyped. Of these 75 SNPs, 23 were also genotyped across the entire unselected sample of 4258 individuals. Of these 23 SNPs, 10 were associated with individual differences in reading ability at the nominal α level of 0.05 in the direction expected from the screening stage. Until these SNPs are replicated, caution is in order because the 10 SNPs were only nominally significant, that is they would not survive correction for multiple testing such as false discovery rates. In addition, although Stage 1 was designed for screening rather than significance testing, only a few more SNPs were detected at Stage 1 than expected by chance.

Replication of these 10 SNPs will be difficult because the effect sizes of the SNPs are so small: the average effect size is 0.15% of the total variance of the reading scores and the largest effect size is only 0.24%. A sample of more than 4000 individuals is needed to reach 80% power (P<0.05, one-tailed) to detect an effect size of 0.15%. When effect sizes are so small, multiple SNPs can be aggregated in a composite ‘SNP set’, which could be replicated and used in studies with much smaller samples. For example, the ‘SNP set’ of 10 SNPs has an effect size of 1% of the total variance in early reading, or about 2% of the genetic variance if the heritability of early reading is about 50%. An effect size of 1% would require a sample of about 600 to reach 80% power to detect its association with reading (P<0.05, one-tailed). It should be noted that if the 75 SNPs we selected for individual genotyping were a random selection of the 302 SNPs nominated from Stage 2, we might expect to find a SNP set of 40 rather than 10 SNPs, accounting for 4% rather than 1% of the variance.

The finding of more general significance is that no associations greater than 0.5% were detected even though the sample of 4000 provided 99% power to detect them. Genome-wide association scans will identify ‘low-hanging fruit’—QTLs with large effect sizes—as demonstrated in research on macular degeneration49 and inflammatory bowel disease.50 The possibility of harvesting such low-hanging fruit warrants the use of genome-wide association scans. However, we predict that for common disorders and quantitative traits, such scans will largely exclude QTLs of large effect size. If the largest QTL effects are as small as 0.2% of the variance, winnowing the wheat from the chaff will be difficult, requiring extremely large samples, multiple-stage designs and replication in independent samples. Nonetheless, the substantial heritability of most common disorders and quantitative traits means we must do what it takes to find the genetic variation responsible.

An important limitation of this study is that only the 100K SNP microarray set was available commercially when we began the screening stage of the project, whereas many more SNPs are needed to provide a genome-wide association scan. Although 1590 of the 107116 SNPs on the 100K array are located on the 6p21–p22 and 15q21 regions that have shown replicated QTL linkage for reading disability, only seven of the SNPs in these regions were among the 302 SNPs nominated by the SNP-MaP screening (Stage 1), none of which passed the selection criteria for individual genotyping in Stage 2. This lack of replication may be a consequence of the SNPs assayed in this study; as mentioned, 100K SNPs cannot be considered truly genome-wide and we may have missed the signal from these regions. In addition, associations to 6p21–p22 are strongest for severely affected individuals with reading disability and our sample is community based, and so the lack of replication may also be confounded by this.

It is noteworthy that the three SNPs with the lowest minor allele frequencies (rs1323381, rs2192595, rs1320490) showed the largest effect for the genotype with the lowest frequency. This could signal false-positive results for SNPs with lower minor allele frequencies. Because power is reduced to detect SNP associations for SNPs with low minor allele the reduction, we excluded SNPs with minor allele frequencies of less than 10%. The minor allele frequencies for these three SNPs indicate that they are not rare alleles: 15.6, 16.9 and 18.2%, respectively. Nonetheless, the sample sizes for the homozygous genotypes with the minor allele are relatively small (for example, 73 for the TT genotype of rs1323381, as indicated in Figure 5). However, because the least common genotype has the largest standard error, its effect on our test for an additive effect across the three genotypes is weighted accordingly. It can be seen in Figure 5 that the plots of genotypic values for the 10 significant SNPs are reasonably linear, especially when the standard error bars are into account.

The study may potentially be limited in eight other ways: (1) In Stage 3, SNPs were screened using an additive model because quantitative genetic research suggests that most genetic influence is additive.51 As shown in Figure 5, the 10 SNPs show a linear relationship between additive genotypic values and reading scores. (2) The SNP-MaP screen did not use K-corrected allele frequency estimates. K-correction improves the accuracy of absolute estimates of allele frequency for pooled DNA,52 but K values were not available when we conducted the SNP-MaP stage of our study. However, K-correction does not have much effect on relative estimates of allele frequency, which is the relevant issue when comparing pooled DNA for groups.31 In the present study, of the 302 SNPs we selected in the SNP-MaP stage, all but four would have been selected if we had used K-corrected estimates. (3) Only 75 of the 302 SNPs nominated by SNP-MaP were individually genotyped in the sample of 4258 children. The 75 SNPs essentially represented a random sample of 302 SNPs so that their results are likely to be representative of the results for all 302 SNPs. With additional funds it will be possible to investigate the remaining 227 SNPs, including the SNP (rs1001646), which shows the largest allele frequency difference of 0.308 (see Supplementary Table S1) but was not selected in the sample of 75 SNPs. (4) SNPs with minor allele frequencies less than 10% were excluded in the SNP-MaP screen for reasons explained earlier. It seems reasonable to begin by testing the common variant/common disease QTL hypothesis;53 with additional funds it will be possible to investigate less common alleles. (5) SNPs nominated on the basis of low versus high allele frequency differences in Stage 2 were required in Stage 3 to show associations with individual differences throughout the normal distribution because of our interest in identifying QTLs. The 13 SNPs that did not show QTL associations in Stage 3 might nonetheless be followed up in low–high comparisons. (6) Although no individuals from the DNA pools in Stage 1 were also included in Stage 2, genetically related individuals (co-twins) were included in Stages 2 and 3, which means that Stages 2 and 3 are not completely independent genetically of Stage 1. The reason for this design decision was to use the largest possible sample size in the search for QTLs of small effect size. (7) More sophisticated ways of analysing pooled DNA are available (for example, GenePool).54 However, these methods were not available when we completed our SNP-MaP screen. In addition, our approach based on comparing group mean allele frequency differences as a function of variance across DNA pools, is central to these newer analytic strategies. (8) The SNP-MaP screening did not discriminate SNPs in known regions of copy number variants.55, 56 The 100K microarray does not detect copy number variants; the Affymetrix 5.0 microarray contains 420K non-polymorphic probes to detect copy number variants. However, if the copy number variants were important, we suggest that it would add noise to the SNP-MaP screening thus increasing false negatives rather than false positives. However, the SNP associations identified could be due to copy number variants (or other functional DNA within LD of the SNP such as DNA for noncoding RNA). We examined the 10 SNPs in relation to known regions of copy number variants and found that three are in known CNV regions.56 This is more than would be expected by chance—1250 of the SNPs on the 100K array are in known CNV regions (~1%).

Our ongoing research differs in two ways that attenuate some of these limitations. First, we are now using the Affymetrix 5.0 500K SNP microarray, which will move closer to a genome-wide association scan and detects copy number variants. The second difference addresses the frustration at only being able to genotype 75 of the 302 SNPs nominated by SNP-MaP due to financial constraints. Indeed, it would be useful to test hundreds of other SNPs suggested by the SNP-MaP screening using the 100K microarray and even more SNPs when the 500K microarray is used. We will address this issue by adding a second SNP-MaP stage to winnow the 500K SNPs to a manageable number of SNPs before confirmation with individual genotyping. This second SNP-MaP stage will include low–high comparisons for all 500K SNPs on a sample independent from the first SNP-MaP stage.

Top

References

  1. Shaywitz SE. Dyslexia. New Engl J Med 1998; 338: 307–312. | Article | PubMed | ISI | ChemPort |
  2. Snowling M. Dyslexia, 2nd edn. Blackwell: Oxford, 2000.
  3. Jacobson C. How persistent is reading disability? Individual growth curves in reading. Dyslexia 1999; 5: 78–93. | Article |
  4. Shaywitz SE, Fletcher JM, Holahan JM, Shneider AE, Marchione KE, Stuebing KK et al. Persistence of dyslexia: the Connecticut Longitudinal Study at adolescence. Pediatrics 1999; 104: 1351–1359. | Article | PubMed | ISI | ChemPort |
  5. DeFries JC, Knopik VS, Wadsworth SJ. Colorado Twin Study of reading disability. In: Duane DD (ed). Reading and Attention Disorders: Neurobiological Correlates. York Press: Baltimore Maryland, 1999, pp 17–41.
  6. Harlaar N, Spinath FM, Dale PS, Plomin R. Genetic influences on early word recognition abilities and disabilities: a study of 7-year-old twins. J Child Psychol Psychiatry 2005; 46: 373–384. | Article | PubMed | ISI |
  7. Plomin R, Kovas Y. Generalist genes and learning disabilities. Psychol Bull 2005; 131: 592–617. | Article | PubMed | ISI |
  8. Plomin R, Owen MJ, McGuffin P. The genetic basis of complex human behaviors. Science 1994; 264: 1733–1739. | Article | PubMed | ISI | ChemPort |
  9. Paracchini S, Scerri T, Monaco AP. The genetic lexicon of dyslexia. Annu Rev Genomics Hum Genet 2007 [ E-pub ahead of print] .
  10. Fisher SE, Francks C. Genes, cognition and dyslexia: learning to read the genome. Trends Cogn Sci 2006; 10: 250–257. | Article | PubMed | ISI |
  11. Williams J, O'Donovan MC. The genetics of developmental dyslexia. Eur J Hum Genet 2006; 14: 681–689. | Article | PubMed | ISI | ChemPort |
  12. McGrath LM, Smith SD, Pennington BF. Breakthroughs in the search for dyslexia candidate genes. Trends Mol Med 2006; 12: 333–341. | Article | PubMed | ISI | ChemPort |
  13. Schumacher J, Hoffmann P, Schmael C, Schulte-Korne G, Nothen MM. Genetics of dyslexia: the evolving landscape. J Med Genet 2007; 44: 289–297. | Article | PubMed | ISI | ChemPort |
  14. Deffenbacher KE, Kenyon JB, Hoover DM, Olson RK, Pennington BF, DeFries JC et al. Refinement of the 6p21.3 quantitative trait locus influencing dyslexia: linkage and association analyses. Hum Genet 2004; 115: 128–138. | Article | PubMed | ISI | ChemPort |
  15. Risch NJ. Searching for genetic determinants in the new millennium. Nature 2000; 405: 847–856. | Article | PubMed | ISI | ChemPort |
  16. Sham PC, Cherny SS, Purcell S, Hewitt JK. Power of linkage versus association analysis of quantitative traits, by use of variance-components models, for sibship data. Am J Hum Genet 2000; 66: 1616–1630. | Article | PubMed | ISI | ChemPort |
  17. Zondervan KT, Cardon LR. The complex interplay among factors that influence allelic association. Nat Rev Genet 2004; 5: 89–100. | Article | PubMed | ISI | ChemPort |
  18. Cardon LR, Bell JI. Association study designs for complex diseases. Nat Rev Genet 2001; 2: 91–99. | Article | PubMed | ISI | ChemPort |
  19. Ioannidis JP, Trikalinos TA, Khoury MJ. Implications of small effect sizes of individual genetic variants on the design and interpretation of genetic association studies of complex diseases. Am J Epidemiol 2006; 164: 609–614. | Article | PubMed | ISI |
  20. Syvanen AC. Toward genome-wide SNP genotyping. Nat Genet 2005; 37(Suppl): S5–S10. | Article | PubMed | ISI | ChemPort |
  21. Hirschhorn JN, Daly MJ. Genome-wide association studies for common diseases and complex traits. Nat Rev Genet 2005; 6: 95–108. | Article | PubMed | ISI | ChemPort |
  22. Carlson CS, Eberle MA, Kruglyak L, Nickerson DA. Mapping complex disease loci in whole-genome association studies. Nature 2004; 429: 446–452. | Article | PubMed | ISI | ChemPort |
  23. Newton-Cheh C, Hirschhorn JN. Genetic association studies of complex traits: design and analysis issues. Mutat Res 2005; 573: 54–69. | PubMed | ISI | ChemPort |
  24. Thomas DC, Haile RW, Duggan D. Recent developments in genomewide association scans: a workshop summary and review. Am J Hum Genet 2005; 77: 337–345. | Article | PubMed | ISI | ChemPort |
  25. Wang WY, Barratt BJ, Clayton DG, Todd JA. Genome-wide association studies: theoretical and practical concerns. Nat Rev Genet 2005; 6: 109–118. | Article | PubMed | ISI | ChemPort |
  26. Knight J, Sham P. Design and analysis of association studies using pooled DNA from large twin samples. Behav Genet 2006; 36: 665–677. | Article | PubMed | ISI |
  27. Darvasi A, Soller M. Selective DNA pooling for determination of linkage between a molecular marker and a quantitative trait locus. Genetics 1994; 138: 1365–1373. | PubMed | ISI | ChemPort |
  28. Norton N, Williams NM, O'Donovan MC, Owen MJ. DNA pooling as a tool for large-scale association studies in complex traits. Ann Med 2004; 36: 146–152. | Article | PubMed | ISI | ChemPort |
  29. Butcher LM, Meaburn E, Liu L, Fernandes C, Hill L, Al Chalabi A et al. Genotyping pooled DNA on microarrays: a systematic genome screen of thousands of SNPs in large samples to detect QTLs for complex traits. Behav Genet 2004; 34: 549–555. | Article | PubMed | ISI |
  30. Meaburn E, Butcher LM, Liu L, Fernandes C, Hansen V, Al Chalabi A et al. Genotyping DNA pools on microarrays: tackling the QTL problem of large samples and large numbers of SNPs. BMC Genomics 2005; 6: 52. | Article | PubMed | ChemPort |
  31. Meaburn E, Butcher LM, Schalkwyk LC, Plomin R. Genotyping pooled DNA using 100K SNP microarrays: a step towards genomewide association scans. Nucleic Acids Res 2006; 34: e27. | Article | PubMed | ChemPort |
  32. Pearson JV, Huentelman MJ, Halperin RF, Tembe WD, Melquist S, Homer N et al. Identification of the genetic basis for complex disorders by use of pooling-based genomewide single-nucleotide-polymorphism association studies. Am J Hum Genet 2007; 80: 126–139. | Article | PubMed | ISI | ChemPort |
  33. Kirov G, Nikolov I, Georgieva L, Moskvina V, Owen MJ, O'Donovan MC. Pooled DNA genotyping on Affymetrix SNP genotyping arrays. BMC Genomics 2006; 7: 27. | Article | PubMed | ChemPort |
  34. Butcher LM, Meaburn E, Knight J, Sham PC, Schalkwyk LC, Craig IW et al. SNPs, microarrays, and pooled DNA: identification of four loci associated with mild mental impairment in a sample of 6000 children. Hum Mol Genet 2005; 14: 1315–1325. | Article | PubMed | ISI | ChemPort |
  35. Evans DM, Cardon LR, Morris AP. Genotype prediction using a dense map of SNPs. Genet Epidemiol 2004; 27: 375–384. | Article | PubMed | ISI |
  36. Ke X, Hunt S, Tapper W, Lawrence R, Stavrides G, Ghori J et al. The impact of SNP density on fine-scale patterns of linkage disequilibrium. Hum Mol Genet 2004; 13: 577–588. | Article | PubMed | ISI | ChemPort |
  37. Oliver B, Plomin R. Twins' Early Development Study (TEDS): a multivariate, longitudinal genetic investigation of language, cognition and behavior problems from childhood through adolescence. Twin Res Hum Genet 2007; 10: 96–105. | Article | PubMed | ISI |
  38. Sham P, Bader JS, Craig I, O'Donovan M, Owen M. DNA Pooling: a tool for large-scale association studies. Nat Rev Genet 2002; 3: 862–871. | Article | PubMed | ISI | ChemPort |
  39. Torgesen JK, Wagner RK, Rashotte CA. Test of Word Reading Efficiency. Pro-Ed: Austin, TX, 1999.
  40. Qualifications and Curriculum Authority. QCA Key Stage 1: Assessment and reporting arrangements. Great Britain Qualifications and Curriculum Authority 1999.
  41. Dale PS, Harlaar N, Plomin R. Telephone testing and teacher assessment of reading skills in 7-year-olds: I. Substantial correspondence for a sample of 5808 Children And for extremes. Reading and Writing: An Interdisciplinary Journal 2005; 18: 385–400. | Article |
  42. Harlaar N, Dale PS, Plomin R. Correspondence between telephone testing and teacher assessments of reading skills in a sample of 7-year-old twins: II. Strong genetic overlap. Reading and writing: An interdisciplinary journal 2005; 18: 401–423. | Article |
  43. Lyon JR, Fletcher JM, Shaywitz BA, Shaywitz SE, Torgesen JK, Wood FB et al. Rethinking special education for a new century. In: Finn CE, Rotherham RAJ, Hokanson CR (eds). Rethinking Learning Disabilities, Thomas B. Fordham Foundation and Progressive Policy Institute: Washington, DC, 2001, pp 259–287.
  44. Freeman B, Smith N, Curtis C, Huckett L, Mill J, Craig I. DNA from buccal swabs recruited by mail: evaluation of storage effects on long-term stability and suitability for multiplex polymerase chain reaction genotyping. Behav Genet 2003; 33: 67–72. | Article | PubMed | ISI | ChemPort |
  45. Matsuzaki H, Dong S, Loi H, Di X, Liu G, Hubbell E et al. Genotyping over 100 000 SNPs on a pair of oligonucleotide arrays. Nat Methods 2004; 1: 109–111. | Article | PubMed | ISI | ChemPort |
  46. Tobler AR, Short S, Andersen MR, Paner TM, Briggs JC, Lambert SM et al. The SNPlex genotyping system: a flexible and scalable platform for SNP genotyping. J Biomol Tech 2005; 16: 398–406. | PubMed |
  47. Cohen J. Statistical Power Analysis for the Behavioral Sciences. Academic Press: New York, 2004.
  48. Lau W, Kuo TY, Tapper W, Cox S, Collins A. Exploiting large scale computing to construct high resolution linkage disequilibrium maps of the human genome. Bioinformatics 2007; 23: 517–519. | Article | PubMed | ISI | ChemPort |
  49. Klein RJ, Zeiss C, Chew EY, Tsai JY, Sackler RS, Haynes C et al. Complement factor H polymorphism in age-related macular degeneration. Science 2005; 308: 385–389. | Article | PubMed | ISI | ChemPort |
  50. Duerr RH, Taylor KD, Brant SR, Rioux JD, Silverberg MS, Daly MJ et al. A genome-wide association study identifies IL23R as an inflammatory bowel disease gene. Science 2006; 314: 1461–1463. | Article | PubMed | ISI | ChemPort |
  51. Plomin R, DeFries JC, Craig IW, McGuffin P. Behavioral genetics. In: Plomin R, DeFries JC, Craig IW, McGuffin P (eds). Behavioral Genetics in the Postgenomic Era. American Psychological Association: Washington, DC, 2003, pp 3–15.
  52. Simpson CL, Knight J, Butcher LM, Hansen VK, Meaburn E, Schalkwyk LC et al. A central resource for accurate allele frequency estimation from pooled DNA genotyped on DNA microarrays. Nucleic Acids Res 2005; 33: e25. | Article | PubMed | ChemPort |
  53. Cargill M, Daley GQ. Mining for SNPs: putting the common variants--common disease hypothesis to the test. Pharmacogenomics 2000; 1: 27–37. | Article | PubMed | ChemPort |
  54. Craig DW, Huentelman MJ, Hu-Lince D, Zismann VL, Kruer MC, Lee AM et al. Identification of disease causing loci using an array-based genotyping approach on pooled DNA. BMC Genomics 2005; 6: 138. | Article | PubMed | ChemPort |
  55. Redon R, Ishikawa S, Fitch KR, Feuk L, Perry GH, Andrews TD et al. Global variation in copy number in the human genome. Nature 2006; 444: 444–454. | Article | PubMed | ISI | ChemPort |
  56. Wong KK, deLeeuw RJ, Dosanjh NS, Kimm LR, Cheng Z, Horsman DE et al. A comprehensive analysis of common copy-number variations in the human genome. Am J Hum Genet 2007; 80: 91–104. | Article | PubMed | ISI | ChemPort |
Top

Acknowledgements

We are indebted to the parents of the twins in the Twins Early Development Study (TEDS) for making the study possible. We thank Charles Curtis for constructing the DNA pools used in this study. TEDS is supported by a programme grant from the UK Medical Research Council (Grant no. G0500079); the association scan of reading disability and ability is supported by a grant from the US National Institute of Child Health and Human Development (Grant no. HD49861). The study was approved by the Institute of Psychiatry/South London and Maudsley Research Ethics Committee and appropriate informed consent was obtained from the parents and teachers of the children.

Supplementary Information accompanies the paper on the Molecular Psychiatry website (http://www.nature.com/mp)

Top

MORE ARTICLES LIKE THIS