Introduction

Developmental dyslexia (DD) is characterized by deficiencies in reading and writing with phonological difficulties persisting throughout life.1 Previous studies have demonstrated genetic components for DD with at least nine loci (DYX1-9) identified, and perturbed gene expression or human loss of function phenotypes have revealed roles for candidate genes in neuronal migration, axon guidance and more recently in ciliary biogenesis and function.2, 3, 4, 5, 6 Recently, a study with over 900 cases analyzed the effect of common polymorphisms on DD, word reading and spelling using samples from eight European countries.7 Nominal association was found for variants in single populations; however, no variant reached a significant association in meta-analysis of all samples. These results suggest the need for larger sample sizes for association analyses of common variants. We hypothesised that susceptibility genes can harbour unreported single-nucleotide polymorphisms (SNPs) and novel variants in significant association with DD. Consequently, we embarked on an alternative approach using next-generation sequencing (NGS) of CYP19A1, DCDC2, DIP2A, DYX1C1, GCFC2 (also known as C2orf3), KIAA0319, MRPL19, PCNT, PRMT2, ROBO1 and S100B in 100 unrelated DD cases from Finland. The first seven genes were selected because they harbour variants with replicated genetic association; PCNT, DIP2A, S100B and PRMT2 were selected as they are contained within a deletion on chromosome 21q22.3 that co-segregates with DD in a Dutch family.8

Materials and Methods

Ethical permissions were obtained from the regional committees in Helsinki, Finland (412/E9/04), Marburg and Würzburg, Germany (115/00), and Stockholm, Sweden (2008/153-32).

DNA from 100 DD cases (discovery sample) was divided into 10 pools á 10 individuals and the regions of interest (ROI) were captured using selector probes (Halo Genomics, Uppsala, Sweden). The ROI included all RefSeq (release 41) annotated exons as well as 3′-UTRs, 5′-UTRs and 50 base pair intronic sequence flanking exons of the 11 target genes. Captured sequences were sequenced on SOLID4 and 5500xl instruments (Life Technologies, Carlsbad, CA, USA). We chose the CRISP algorithm9 for the subsequent variant prediction and the parameters were set to allow detection of heterozygous variants from single individuals in each pool. For detailed material and method descriptions, see Supplementary Materials and Methods.

Results and Discussion

We identified 282 SNPs and novel variants in the ROI by SOLiD4 sequencing (Table 1). We also sequenced available DNA pools (9 out of 10) on a 5500xl platform to increase the total sequencing coverage and depth, and 291 SNPs and novel variants were detected (Table 1). A total of 236 variants were detected in common by both sequencing methods (Table 1). For validation of individual variants by genotyping, we selected all (i) coding variants; (ii) novel variants neither present in dbSNP130 nor the 1000 Genomes database (Pilots 1,2,3 release March 2010); (iii) 3′ and 5′-UTR SNPs and (iv) intronic SNPs predicted by both CRISP and FreeBayes algorithms, making a total of 111 variants. Out of those, we validated 92 and 89 variants in case-controls and families from Finland (including the discovery sample) as well as families from Germany, respectively (Supplementary Tables 1 and 2). The extended case-control sample set (169 affected and 194 unaffected) included Finnish case-controls combined with unrelated affected and unaffected individuals from families (Table 2). The German family sample (419 affected and 118 unaffected individuals) was stratified based on observed spelling ability (Table 2).

Table 1 Total number of predicted SNVs after a targeted sequencing
Table 2 Sample sets used in the study

As we sequenced selected genes, we corrected transmission disequilibrium test (TDT) and allelic association P-values for the number of independent markers for each gene (Supplementary Materials and Methods). Neither TDT in Finnish families nor allelic association test in the extended case-control set revealed variants in significant association with DD. TDT in the German families yielded significant association (P-value=0.0003; corrected P=0.002, 7 independent markers) with spelling for the A allele of a nonsynonymous variant (rs2274305, p.Ser221Gly) in DCDC2 (Table 3) using the group with most severe spelling deficiency (2.5 s.d. below expected spelling score, 72 affected). This confirms and strengthens previous results.10 Power analyses11 (alpha=0.05) show 98% power to detect association with the rs2274305 SNP in the total German sample and 59% in the 72 trios belonging to the 2.5 s.d. group. Rs2274305 is predicted to be possibly damaging (Polyphen2)12 and is located within the doublecortin domain, a structure important for the neuronal migration function of DCDC2.13 Furthermore, we found a 3′-UTR variant (rs9722) in S100B in significant association with spelling for the groups defined as 2 s.d. (T allele, corrected P=0.016, 1 marker, power=69% at alpha=0.05) and 1.5 s.d. (T allele, corrected P=0.016, 1 marker, power=73% at alpha=0.05) below anticipated spelling score (Table 3). The association disappeared in the 2.5 s.d. group. This is likely an effect of reduced power because of the limited number of cases (n=72). Sequence variants in the 3′-UTR may affect transcript stability, for example, by creating or disrupting target sites for microRNAs (miRNAs).14, 15, 16 We therefore searched for putative miRNA target sites overlapping rs9722 in TargetScan predictions.17 Indeed, rs9722 was located in, or adjacent to, multiple predicted miRNA target sites (Supplementary Table 2), a result that warrant further analyses. None of the five validated novel variants, distributed in PCNT and DIP2A, showed association with DD and their low number makes tests for excess of rare variants in cases versus controls, for example gene burden tests, unsuitable.

Table 3 Top ranking TDT results by using validated SNPs in the German family sample set

Tests combining case-controls and related individuals can increase power. To explore this option, we used genotypes of the 92 validated variants for the Finnish family (156 affected, 258 unaffected) and case-control samples (92 affected, 67 unaffected) in MQLS (a more powerful quasi-likelihood score test) capable of case-control association testing with the related individuals.18 None of the variants showed a significant association with DD. It is plausible that the association in the families, at least in part, is driven by polymorphisms in other genes.19, 20

We conclude that sample sets based on related individuals with severe phenotypes can increase the likelihood to find genetic association. Our results strengthen a previous association signal in DCDC2 as well as suggesting S100B as a new candidate gene for DD.