Reading and writing are crucial life skills but roughly one in ten children are affected by dyslexia, which can persist into adulthood. Family studies of dyslexia suggest heritability up to 70%, yet few convincing genetic markers have been found. Here we performed a genome-wide association study of 51,800 adults self-reporting a dyslexia diagnosis and 1,087,070 controls and identified 42 independent genome-wide significant loci: 15 in genes linked to cognitive ability/educational attainment, and 27 new and potentially more specific to dyslexia. We validated 23 loci (13 new) in independent cohorts of Chinese and European ancestry. Genetic etiology of dyslexia was similar between sexes, and genetic covariance with many traits was found, including ambidexterity, but not neuroanatomical measures of language-related circuitry. Dyslexia polygenic scores explained up to 6% of variance in reading traits, and might in future contribute to earlier identification and remediation of dyslexia.
The ability to read is crucial for success at school and access to employment, information and health and social services, and is related to attained socioeconomic status1. Dyslexia is a neurodevelopmental disorder characterized by severe reading difficulties, present in 5–17.5% of the population, depending on diagnostic criteria2,3. It often involves impaired phonological processing (the decoding of sound units, or phonemes, within words) and frequently co-occurs with psychiatric and other developmental disorders4, especially attention-deficit hyperactivity disorder (ADHD)5,6 and speech and language disorders7,8. Dyslexia may represent the low extreme of a continuum of reading ability, a complex multifactorial trait with heritability estimates ranging from 40% to 80%9,10. Identifying genetic risk factors not only aids increased understanding of the biological mechanisms, but may also expand diagnostic capabilities, facilitating earlier identification of individuals prone to dyslexia and co-occurring disorders for specific support.
Previous genome-wide investigations of dyslexia have been limited to linkage analyses of affected families11 or modest (n < 2,300 cases) association studies of diagnosed children and adolescents12. Candidate genes from linkage studies show inconsistent replication, and genome-wide association studies (GWAS) have not found significant associations, although LOC388780 and VEPH1 were supported in gene-based tests12. Larger cohorts are vital for increasing sensitivity to detect new genetic associations of small effect. Here, we present the largest dyslexia GWAS to date, with 51,800 adults self-reporting a dyslexia diagnosis and 1,087,070 controls, all of whom are research participants with the personal genetics company 23andMe, Inc. We validate our association discoveries in independent cohorts, provide functional annotations of significant variants (mainly single-nucleotide polymorphisms (SNPs)) and potential causal genes, and estimates of SNP-based heritability. Lastly, we investigate genetic correlations with reading and related skills, health, socioeconomic, and psychiatric measures, and evaluate the evidence for previously implicated dyslexia candidate genes in our well-powered results.
The full dataset included 51,800 (21,513 males, 30,287 females) participants responding ‘yes’ to the question ‘Have you been diagnosed with dyslexia?’ (cases) and 1,087,070 (446,054 males, 641,016 females) participants responding ‘no’ (controls). Participants were aged 18 years or over (mean ages of cases and controls were 49.6 years (s.d. 16.2) and 51.7 years (s.d. 16.6), respectively). We identified 42 independent genome-wide significant associated loci (P < 5 × 10−8) and 64 loci with suggestive significance (P < 1 × 10−6) (Fig. 1 and Supplementary Table 1). Genomic inflation was moderate (λGC = 1.18) and consistent with polygenicity (see Q–Q plot, Extended Data Fig. 1). We also performed sex-specific GWAS and age-specific GWAS (younger or older than 55 years) because dyslexia prevalence was higher in our younger (5.34% in 20- to 30-year-olds) than older (3.23% in 80- to 90-year-olds) participants. These subsample analyses showed high consistency with the main GWAS (of the full sample). Genetic correlation estimated by linkage disequilibrium (LD) score regression (LDSC) was 0.91 (95% confidence intervals (CI): 0.86–0.96; P = 8.26 × 10−253) in males and females, and 0.97 (95% CI: 0.91–1.02; P = 2.32 × 10−268) between younger and older adults.
Of the 17 genome-wide significant variants in the female GWAS (Extended Data Fig. 2), all but four (rs61190714, rs4387605, rs12031924 and rs57892111) were significant in the main GWAS and, of these four, three were in LD with an SNP that approached significance (P < 3.3 × 10−7 or smaller) in the main analysis. Intergenic SNP rs57892111 (located between TFAP2B and PKHD1 on chromosome 6p) was not among the significant or suggestive SNPs of the main analysis, and so may represent a female-specific variant. There is no evidence from existing GWAS that this SNP is associated with any other human trait. Of the six genome-wide significant variants in the male GWAS (Extended Data Fig. 3), all were significant in the main GWAS.
In the main GWAS, all significant variants were autosomal, except rs5904158 at Xq27.3 (for regional association plots, see Supplementary Fig. 1). A total of 17 index variants were in high LD with published (genome-wide significant) associated SNPs in the NHGRI GWAS Catalog13 (15 were associated with cognitive/educational traits; Supplementary Tables 1 and 2). Thus, a total of 27 associated loci showed no evidence of published genome-wide associations with traits expected to overlap with dyslexia (for example, educational attainment, cognitive ability) and were considered new (Table 1).
Of 38 associated loci (the 4 remaining were tagged by indels unavailable in validation cohorts), 3 (rs13082684, rs34349354 and rs11393101) were significant at a Bonferroni-corrected level (P < 0.05/38) in the GenLang consortium GWAS meta-analysis of reading (n = 33,959) and spelling (n = 18,514) ability14. At P < 0.05, 18 were associated in GenLang, 3 in the NeuroDys case-control GWAS12 (n = 2,274 cases), and 5 in the Chinese Reading Study (CRS) of reading accuracy and fluency (n = 2,270; Supplementary Note) (Table 1 and Supplementary Tables 3–6).
Gene-based tests identified 173 significantly associated genes (Supplementary Table 7) but no significantly enriched biological pathways (Supplementary Table 8). We estimated the LDSC liability-scale SNP-based heritability of dyslexia to be h2SNP = 0.152 (standard error = 0.006) using the 23andMe sample prevalence of 5%, and h2SNP = 0.189 (standard error = 0.008) using a 10% prevalence of dyslexia, which is more typical of the general population2,3.
Fine-mapping and functional annotations
Within the credible variant set (Supplementary Table 1), missense variants were the most common (55%) of the coding variants; Extended Data Figure 4 summarizes all predicted variant effects. Predicted deleterious variants by SIFT (Sorting Intolerant From Tolerant) score were identified in R3HCC1L, SH2B3, CCDC171, C1orf87, LOXL4, DLAT, ALG9 and SORT1. Within the credible variant set, no genes were especially intolerant to functional variation (smallest LoFtool (Loss-of-Function) percentile was 0.39). For the 42 associated loci, the most probable gene targets of each were estimated by the Overall V2G (Variant-to-Gene) score from OpenTargets (Supplementary Table 9). Two index variants (missense variant rs12737449 (C1orf87) and rs3735260 (AUTS2)) could be causal because they had combined annotation dependent depletion (CADD) scores suggestive of deleteriousness to gene function according to Kircher et al.15 (Supplementary Table 10). The AUTS2 variant RegulomeDB rank of 2b indicated a regulatory role; its chromatin state supported location at an active transcription start site16,17.
Of the 173 significant genes from genome-wide gene-based tests in MAGMA (see Supplementary Table 11 for their functions), 129 could be functionally annotated (Supplementary Table 12). Protein-coding and noncoding sequences are actively conserved in approximately three-quarters of these genes, 63% are more intolerant to variation than average and 33% are intolerant to loss-of-function mutations. Gene property analysis for general tissues and 13 brain tissues confirmed the importance of the brain and specific brain regions (Supplementary Tables 13 and 14). Levels of brain expression for 125 of the 173 significant genes from gene-based tests could be mapped in FUMA and are shown in Supplementary Table 15. A total of 20 genes showed high general brain expression levels and, of these, 3 (PPP1R1B, NPM1 and WASF3) were located near significant SNP associations. Of the 12 brain regions assessed, gene expression was generally highest in the cerebellar hemisphere, cerebellum, and cerebral cortex, consistent with the results of gene property analysis.
SNP-based heritability of dyslexia partitioned by functional annotation showed significant enrichment for conserved regions and H3K4me1 clusters (Supplementary Table 16 and Extended Data Fig. 5). There was enrichment in genes expressed in the frontal cortex, cortex and anterior cingulate cortex (P < 4.17 × 10−3) (Supplementary Table 17 and Extended Data Fig. 6), but not for brain cell type (Supplementary Table 18 and Extended Data Fig. 7). Enrichment was seen in enhancer and promoter regions, identified by the presence of H3K4me1 and H3K4me3 chromatin marks, respectively, in multiple central nervous system (CNS) tissues (Supplementary Tables 19 and 20 and Extended Data Figs. 8 and 9). Reading, an offshoot of spoken language, is a uniquely human trait, but there was no enrichment for a range of annotations related to human evolution spanning the last 30 million to 50,000 years18 (Supplementary Table 21).
Genetic correlations and LDSC
Genetic correlations were estimated for 98 traits (Fig. 2 and Supplementary Table 22), including reading and spelling measures, from GenLang (Fig. 3), and brain subcortical structure volumes, total cortical surface area and thickness from the Enhancing Neuro Imaging Genetics through Meta-Analysis (ENIGMA) consortium. A total of 63 traits showed genetic correlations with dyslexia at the Bonferroni-corrected significance threshold (P < 0.05/98; Fig. 2). Genetic correlations (rg) with quantitative reading and spelling measures ranged from −0.70 to −0.75 (lowest 95% CI of −0.60, highest 95% CI of −0.86), and were −0.62 (95% CI: −0.50, −0.74) and −0.45 (95% CI: −0.26, −0.64) with phoneme awareness and nonword repetition measures, respectively. The childhood/adolescent performance (nonverbal) intelligence quotient (IQ) rg was lower (−0.19; 95% CI: −0.08, −0.30) than that for adult verbal-numerical reasoning19 (−0.50; 95% CI: −0.45, −0.55) but similar to that for childhood IQ20 (−0.32; 95% CIs: −0.21, −0.43) and educational attainment21 (−0.22; 95% CI: −0.15, −0.29). Traits showing positive rg included jobs involving heavy manual work21 (0.40; (95% CI: 0.34, 0.45)), work-related/vocational qualifications21 (0.50; 95% CI: 0.41, 0.59), ADHD22 (0.53; 95% CI: 0.29, 0.77), equal use of right and left hands21 (0.38; 95% CI: 0.19, 0.57) and pain measures21 (average = 0.31; 95% CI: 0.21, 0.41). Of the 11 ENIGMA measures tested, only intracranial volume was significantly correlated with dyslexia (rg = −0.14; 95% CI: −0.06, −0.22). Targeted investigation of 80 structural neuroimaging measures from UK Biobank, including surface-based morphometry and diffusion-weighted imaging for brain circuitry linked to language, were nonsignificant at a Bonferroni-corrected significance level for number of independent traits. Phenotype independence was estimated by spectral decomposition of the phenotypic correlation matrix implied by the bivariate LDSC intercept from GWAS summary statistics of these traits, using the PhenoSpD toolkit23 (Supplementary Table 23).
Polygenic score analyses
Dyslexia polygenic scores (PGS) based on the 23andMe dyslexia GWAS were computed in four independent cohorts and, overall, higher PGS were associated with lower reading and spelling accuracy (Supplementary Table 24). In two Australian population-based samples (1,647 adolescents, 1,163 adults), the dyslexia PGS explained up to 3.6% of variance in the reading and spelling measures, being most predictive of lower performance on tests of nonword reading, an index of phonological decoding. Dyslexia PGS did not correlate with scores on tests of nonword repetition (considered a marker of phonological short-term memory). In developmental cohorts enriched for reading difficulties, the dyslexia PGS explained 3.7% (UKdys; n = 930) and 5.6% (CLDRC; n = 717) of variance in word recognition tests.
Analyses of dyslexia associations from the literature
Of 75 previously reported dyslexia associations, none showed genome-wide significance in our analyses (Supplementary Table 25). Of these targeted variants, 19 (in ATP2C2, CMIP, CNTNAP2, DCDC2, DIP2A, DYX1C1, FOXP2, KIAA0319L and PCNT) showed association surviving Bonferroni correction that accounted for LD (P < 0.05/68.7). In gene-based tests of 14 candidate genes from the literature24,25, association at a Bonferroni level (P < 0.05/14) was seen for KIAA0319L (P = 1.84 × 10−4) and ROBO1 (P = 1.53 × 10−3) (Supplementary Table 26). The CNTNAP2 association approached corrected replication-level significance (P = 0.004). Targeted gene set analysis of three pathways previously implicated in dyslexia (Supplementary Table 27) showed replication-level support (P = 2.00 × 10−3) for the axon guidance pathway (comprising 216 genes).
In the largest GWAS of dyslexia to date (>50,000 self-reported diagnoses), we identified 42 significant independent loci. Of these, 27 represent new associations that have not been uncovered in GWAS of related cognitive traits; 12 of the new associations were validated in the GenLang consortium GWAS meta-analysis of reading/spelling in English and other European languages14, and 1 in a Chinese language cohort. Of the significant SNPs, 36% overlapped with variants from general cognitive ability GWAS, consistent with twin studies that find that genetic variation in reading disability is explained by general and reading-specific cognitive ability10. Similar to other complex traits, and consistent with high polygenicity, each significant locus showed small effects (odds ratios (ORs) ranging from 1.04 to 1.12). Our estimated SNP-based heritability of 19% (assuming a 10% dyslexia population prevalence) was equal to that reported in a smaller GWAS12, but lower than heritability estimates from twin studies (40–80%)26,27. This difference may be due partly to effects of rare and structural variants28, which have been implicated in reading and related traits29,30.
Whereas AUTS2 has been implicated in autism31, intellectual disability32 and dyslexia33, the variant we uncovered (rs3735260) represents the strongest AUTS2 SNP association with a neurodevelopmental trait to date. Amongst our findings were other known neurodevelopmental genes, such as TANC2 (implicated in language delay and intellectual disability34,35) and, especially, GGNBP2 (linked to neurodevelopmental delay36 and autism37) with variant rs34349354 supported in all our validation cohorts. However, rs34349354 is also associated with cognitive performance38, and based on expression quantitative trait loci (eQTL) evidence is more likely linked to ZNHIT3, colocalizing with molecular QTLs (opentargets.org). Notably, none of the more established candidate genes for dyslexia approached genome-wide significance in our results.
Like other human complex traits, partitioning of SNP-based heritability revealed enrichment in conserved regions39. We further observed enrichment in the histone mark H3K4me1 (which has also been reported for ASD40), and at H3K4me1 and H3K4me3 clusters in the CNS (marking enhancers and promoters, respectively). Since reading/writing systems are built on our capacities for spoken language, it is plausible that evolutionary changes on the human lineage helped shape the underlying genetic architecture41. However, we did not find enrichment of significant associations for curated annotations spanning different periods of hominin prehistory.
Our self-reported dyslexia diagnosis binary trait showed strong negative genetic correlations with quantitative reading and spelling measures, supporting the validity of this measure in the 23andMe cohort, and suggesting that reading skills and disorder are not qualitatively distinct. The positive genetic correlation between hearing difficulties and dyslexia is consistent with genetic correlations reported for childhood reading skill42, suggesting that hearing problems at an early age could affect acquisition of phonological processing skills.
Dyslexia showed moderately negative genetic correlations with adult verbal-numerical reasoning, but there was a lack of a strong genetic correlation of dyslexia with (nonverbal) performance IQ. This would be consistent with phenotypic observations that individuals with dyslexia are disadvantaged on verbal IQ tests43. Educational attainment correlations were also not strong, which might reflect school adjustments and other support that counteract disadvantage in academic learning.
There was little evidence of common genetic variation in dyslexia being related to interindividual differences in subcortical volumes, or structural connectivity and morphometry for brain regions implicated in language processing in adults. Thus, the phenotypic correlations previously reported between dyslexia and aspects of neuroanatomy may in large part reflect environmental shaping of the brain, perhaps through the process of reading itself44. Left-handedness and ambidexterity show small genetic overlap with each other45 yet are both phenotypically linked to neurodevelopmental disorders/cognitive abilities46,47. We report a significant genetic correlation between dyslexia and self-reported equal hand use, but not left-handedness, supporting theories linking ambidexterity and dyslexia48.
Dyslexia and ADHD5,6 often co-occur (24% reporting ADHD in our cases versus 9% in controls), and we show a moderate genetic correlation between the two, potentially reflecting shared endophenotypes like deficits in working memory and attention49. Although we did not find significant genetic correlations between dyslexia and ASD, the GWAS for the latter encompassed diverse neurodevelopmental phenotypes, including subgroups with varying educational attainment and IQ40. Genetic correlations with pain-related traits suggest that individuals with dyslexia may have a lower threshold for pain perception. Links between pain and other neurodevelopmental disorders have been reported50,51.
Dyslexia polygenic scores were correlated with lower achievement on reading and spelling tests in population-based and reading-disorder enriched samples, especially for nonword reading, a measure of phonological decoding that is typically impaired in dyslexia. Polygenic scores could become a valuable tool to help identify children with a propensity for dyslexia, enabling learning support before development of reading skills. However, a limitation of our study is the potential for collider bias arising from sample selection (that is, people without dyslexia and from higher socioeconomic positions), which we were unable to quantify; thus, care should be taken in future research when using polygenic scores based on many variants52.
In summary, we report 42 new independent genome-wide significant loci associated with dyslexia, 27 of which have not been associated with cognitive-educational traits and should be prioritized for follow up as dyslexia candidates. Functional annotation of the variants highlights the importance of conserved and enhancer regions of the genome for this trait. Dyslexia shows positive genetic correlations with ADHD, vocational qualifications, physical occupations, ambidexterity and pain perception, and negative correlations with academic qualifications and cognitive ability; family-based methods are needed to dissociate pleiotropic and causal effects.
Participants were drawn from the customer base of 23andMe, Inc., a consumer genetics company. Participants provided informed consent and participated in the research online, under a protocol approved by the external AAHRPP-accredited IRB, Ethical and Independent Review Services (www.eandireview.com). They included 51,800 (21,513 male, 30,287 female) participants who responded ‘yes’ to the question ‘Have you been diagnosed with dyslexia?’ (cases) and 1,087,070 (446,054 male, 641,016 female) participants who responded ‘no’ (controls). Age ranged from 18 to 110 years, with the prevalence of dyslexia higher for younger participants (5.34% in those aged 20–30 years) than older participants (3.23% in those aged 80–90 years). The negative linear relationship between dyslexia prevalence and participant age was expected given that screening for specific learning difficulties has only become commonplace in more recent decades. Moreover, this aligns with findings from the subsample (4.3%) of participants who reported age of diagnosis: younger participants were diagnosed at an earlier age (for example, 9.7 years (±4.7) for 20- to 30-year-olds) than older participants (for example, 22.4 years (±17.8) for 80- to 90-year-olds). The prevalence of dyslexia in our sample was similar for women (4.51%) and men (4.6%), although the slightly higher prevalence in males in this very large sample was statistically significant (P < 8.7 × 10−6). Such a prevalence lies at the lower end of the range typically reported in the US population3 and might represent the more severe cases of dyslexia given that a formal diagnosis was required; additionally, people with dyslexia might opt out of survey research that requires reading, further restricting the sample range.
Genotyping and imputation
DNA was extracted from saliva samples and genotyped on one of five genotyping platforms by the National Genetics Institute (NGI). In the present analysis, only participants with European ancestry were included. Details about the genotyping arrays, quality control of samples and ancestry derivation can be found in Fontanillas et al.53 and the Supplementary Note. Phased genotypes were imputed to a combined reference panel of the 1000 Genomes Phase 3 haplotypes (May 2015) and the UK10K imputation reference panel using Minimac3 (see Das et al.54).
Association analysis was performed on genotyped and imputed SNP dosage data using logistic regression and assuming an additive model of allelic effects. For X-chromosome analysis, male genotypes were treated as homozygous diploid. Covariates included age, age squared, gender, the first five ancestry principal components and genotype platform. SNP significance was evaluated by a likelihood ratio test, and genome-wide significance was determined as P < 5 × 10−8 (suggestive significance level as P < 1 × 10−6). Only reliably imputed SNPs (r2 > 0.80) and those with minor allele frequency (MAF) > 0.01 are presented (n = 7,995,923). We define associated regions by first identifying all variants with P < 5 × 10−8, then grouping these variants into regions separated by gaps of at least 250 kb. Index variants are the variants with smallest P value within each associated region. We use the same approach for regions with suggestive associations, but by first identifying all variants with P < 10−5. Subsidiary genome-wide association analysis of separate male (n = 21,513 cases, 446,054 controls) and female (n = 30,287 cases, 641,016 controls) groups, and younger (below 55 years; n = 30,763 cases, 582,276 controls) and older (55 and above; n = 21,037 cases, 504,794 controls) groups was performed. The latter was to check whether reliability of diagnosis (assumed to be higher in the younger sample whose recall of diagnosis should be better and who would have been exposed to greater levels of dyslexia screening) affected the GWAS signal.
We also looked to independently validate our genome-wide significant variants within (1) a published GWAS meta-analysis of 2,274 dyslexia cases from nine European countries representing six different languages (NeuroDys) by Gialluisi et al.55; (2) a population sample (Chinese Reading Study; CRS) of children measured on quantitative traits of reading accuracy and reading fluency (n = 2,270; described in the Supplementary Note), and; (3) within the GenLang quantitative trait GWAS meta-analysis of word reading (up to n = 33,959) and spelling (up to n = 18,514) skills measured in cohorts of children and adolescents from Europe, the United States and Australia, and representing seven European languages, of which English was the most common14.
The GWAS results were used to calculate gene-based P values for association with dyslexia by performing the gene analysis in MAGMA v.1.08 (ref. 56) through the FUMA interface57 using standard settings. In total, 19,039 genes were tested, and P values were judged based on a Bonferroni-corrected significance threshold of P < 2.63 × 10−6. We also performed gene set analyses for association of biological pathways (all available gene ontology (GO) terms and curated gene sets from the Molecular Signatures Database (MsigDB)58,59) with dyslexia in MAGMA through the FUMA interface. The total number of pathways tested was 15,486, and P values were judged based on a Bonferroni-corrected significance threshold of P < 3.23 × 10−6.
Genome-wide significant variants and nearby gene(s) were annotated using external reference data and evaluated for functional or regulatory impact. A 99% credible set of potentially causal variants for SNPs in significant regions was based on approximate Bayes factor (ABFs)60 assuming a prior variance of 0.1, and using the method of Maller et al.61 to define these sets. Variant effect prediction of these was done in ENSEMBL (release 104)62. For genome-wide significant variants, we considered: gene context (whether a variant is intergenic or located within a specific functional region within a gene locus); deleteriousness (Combined Annotation Dependent Depletion (CADD) score); functionality (RegulomeDB (RDB) category); chromatin state (minimum and common 15-core chromatin state); and SNP-trait associations reported in the NHGRI GWAS Catalog13.
For each variant, the most probable gene target was identified using the Open Target Genetics portal63, which draws on evidence from QTL and chromatin interaction experiments, functional predictions and distance from a gene’s transcription start site. For genome-wide significant genes, we considered: loss-of-function intolerance (probability of loss-of-function Intolerance (pLI) score); variation intolerance (residual variation intolerance score, RVIS); variation intolerance in noncoding regions (noncoding RVIS, ncRVIS); evolutionary constraint of noncoding regions (noncoding genomic evolutionary rate profiling (ncGERP) score); evolutionary constraint of protein-coding regions (protein-coding genomic evolutionary rate profiling (pcGERP) score); deleteriousness across noncoding regions (noncoding CADD (ncCADD) score); combined functionality of variants in noncoding regions (noncoding genome-wide annotation of variants (ncGWAVA) score); and expression in 12 brain tissues (amygdala, anterior cingulate cortex, caudate basal ganglia, cerebellar hemisphere, cerebellum, cortex, frontal cortex, hippocampus, hypothalamus, nucleus accumbens basal ganglia, putamen basal ganglia and substantia nigra). All annotations were obtained through FUMA57 except RVIS, ncGERP, pcGERP, ncCADD and ncGWAVA, which were taken from Petrovski et al.64. Details of each annotation including original sources are in the Supplementary Note.
We partitioned SNP heritability of dyslexia using stratified LDSC, as described by Finucane et al.39, to determine whether SNPs that share the greatest proportion of the heritability are also clustered in specific functional categories in the genome. Overall, we performed 266 different tests, which would give a very conservative Bonferroni-corrected significance level of 1.88 × 10−4, but because there will be overlap among annotation groups, we also report corrections to significance within different classes of annotation, each of which we now describe. Partitioning was performed for the 24 main functional annotations defined by Finucane et al.39. LD scores, regression weights and allele frequencies are from European ancestry samples and were retrieved from https://alkesgroup.broadinstitute.org/LDSCORE. Heritability estimates were considered statistically significant if the P value surpassed an α level of 2.08 × 10−3, derived by Bonferroni correction based on 24 tests.
We also estimated the enrichment for heritability of dyslexia for tissue-specific annotations, while controlling for the annotations in the baseline model, including gene expression in three brain cell types, gene expression in 12 brain regions, and chromatin marks H3K4me1 and H3K4me3 in multiple tissues (108 and 114, respectively) since these marks are enriched at enhancers65 and promoters66, respectively. Enrichment is the proportion of SNP heritability divided by the proportion of SNPs. For the brain cell types, we estimated enrichment for heritability of dyslexia for genes expressed in neurons, astrocytes, and oligodendrocytes using data from Cahoy et al.67. Enrichments were considered statistically significant if the P value surpassed an α level of 0.017, derived by Bonferroni correction based on three tests. The gene expression data used to estimate the enrichment of heritability in genes expressed in certain brain regions was from the GTEx database68, and the Bonferroni-derived α level for enrichment was 4.17 × 10−3 (based on 12 tests). Chromatin annotations include data from the Roadmap Epigenomics consortium17 and EN-TEx69,70. For H3K4me1, the Bonferroni-derived α level for enrichment was 4.63 × 10−4 (based on 108 tests) and, for H3K4me3, the Bonferroni-derived α level for enrichment was 4.39 × 10−4 (based on 114 tests).
Although reading and writing is a human cultural invention, it builds on fundamental pathways involved in language processing. Therefore, we investigated whether annotations related to human evolution were significantly enriched for heritability of dyslexia by applying an evolutionary analysis pipeline adapted from Tilot et al.18. These analyses capture a range of periods in an evolutionary timeframe on the lineage that led to humans, from approximately 30 million years ago to 50,000 years ago.
Enrichment of heritability was estimated in adult brain human gained enhancers (HGEs)71, fetal brain HGEs72, ancient selective sweep regions73, Neanderthal-introgressed SNPs74 and Neanderthal-depleted regions75 (see Supplementary Note for a description of each annotation); and controlled for using the baselineLD v.2 model from Gazal et al.76. Heritability enrichment in human adult and fetal HGEs were additionally controlled for adult and fetal brain active regulatory elements from the Roadmap Epigenomics resource17. Active regulatory elements were defined using chromHMM16. Enrichment P values were judged by an α level of 10−2, derived by Bonferroni correction based on five tests.
Genetic correlations within the 23andMe GWAS of dyslexia
Genetic correlations of dyslexia with other traits
We present the pairwise genetic correlation of dyslexia with 98 traits. Summary statistics for most of these traits are publicly available through LD Hub77,78,79—a centralized database and web interface that automates the LDSC regression analysis pipeline. A selection of brain magnetic resonance imaging measures obtained from the ENIGMA-3 consortium80,81,82,83, and measures of reading and spelling accuracy, and performance IQ from the GenLang Consortium14 were analyzed locally using LDSC. Word reading accuracy in GenLang was measured by the number of correct words read aloud from a list in a time restricted or unrestricted fashion. Examples of tools that include this measure are Test of Word Reading Efficiency (TOWRE), the British Ability Scales (BAS) and the Wide Range Achievement Test (WRAT). Spelling accuracy in GenLang was measured by the number of words correctly spelled orally or in writing. The words were dictated as single words or in a sentence. Examples of tools that include this measure are the BAS, WRAT and Wechsler Objective Reading Dimensions (WORD). Performance IQ in GenLang was based on subtests of IQ tests that did not depend on verbal cues, as included for example in the BAS and Wechsler Intelligence Scale for Children (WISC). Trait descriptions and summary statistic sources are in Supplementary Table 22. Bonferroni correction for multiple testing derived an adjusted critical P value of 5.1 × 10−4 from 98 independent tests.
Genetic correlations were further estimated in a targeted analysis of structural brain magnetic resonance imaging measures from UK Biobank, which were more comprehensive than those currently available from ENIGMA, along with further advantages such as hemisphere-specific data and greater homogeneity in cohort and scanning procedures. GWAS summary statistics from brain imaging-derived phenotypes for 33,000 participants were downloaded from the Oxford Brain Imaging Genetics Server84. Structural brain imaging traits encompassed both diffusion tensor imaging and surface-based morphometric phenotypes85 where selected tracts or regions of interest had a known link to language. For diffusion tensor imaging, fractional anisotropy values derived from both tract-based-spatial statistics and probabilistic tractography were used for available tracts spanning the extended language network86. For surface-based morphometric (cortical volume, surface area and thickness) GWAS, summary statistics for regions of interest derived from the Desikan-Killiany atlas (white surface) were used, again selected for their relevance in language processing, based on previous literature87,88,89,90. To correct for multiple testing, phenotypic correlations between the UK Biobank imaging indices were derived and analyzed by PhenoSpD23 to obtain the number of independent variables (36.08) to use for Bonferroni correction (adjusted critical P value of 1.39 × 10−3).
Polygenic score analyses
Dyslexia polygenic scores were based on increasingly larger numbers of SNPs corresponding to their association P values from the 23andMe GWAS (P < 5 × 10−8, P < 1 × 10−5, P < 0.001, P < 0.01, P < 0.05, P < 0.1, P < 0.5, 1). They were calculated in four independent cohorts. Two were general population cohorts from Australia: n = 1,640 (772 families) adolescents/young adults (Brisbane adolescents)91; n = 1,165 (966 families) older adults (Brisbane adults)25. The other two were family-based samples selected for dyslexia: one from the United Kingdom (UKdys), n = 930 (595 families); the other from the United States (Colorado Learning Disabilities Research Center, CLDRC), n = 717 (336 families)92. In the Australian samples, polygenic scores were calculated on 1000 Genomes Phase 3 (v.20101123) imputed genetic data using PLINK93. Only reliably imputed SNPs (R2 > 0.80) and those with a minor allele frequency >0.01 were included, and the default clumping procedure was used where index SNPs formed a clump with other SNPs in LD (R2 > 0.1) and within a 250 kb distance. In the UKdys and CLDRC samples, polygenic scores were calculated on Haplotype Reference Consortium imputed genetic data using PRSice94, with the same imputation quality and MAF exclusions for the base (23andMe GWAS) sample, and clumping parameters.
Polygenic scores were then used as predictors in linear models of quantitative trait outcomes (Australia: word, nonword (phonetic), irregular word (lexical) reading and spelling tests from an extended version of the Components of Reading Examination95, and two nonword repetition tests which are sensitive to developmental language disorders—Dollaghan and Campbell96, Gathercole and Baddeley97; UKdys and CLDRC: word recognition). All quantitative traits were preadjusted for sex, age and ancestry principal components (10 principal components in UKdys and CLDR; 20 principal components in Australian samples). Further adjustments were made for imputation run (separate runs for different genotyping arrays) in the Australian samples, and for nonverbal IQ in all samples (except for the Australian adults), and for hearing difficulties in the Australian older adults. Because the cohorts included related family members (twins or siblings), linear mixed models (lme) were specified in RStudio98, with family membership modeled as a random effect and the dyslexia polygenic score as a fixed effect. Where monozygotic twins were present, their trait scores were averaged and they were used as a single case.
Evaluation of candidates from previous literature
We used the results of the 23andMe dyslexia GWAS to assess variants, genes and biological pathways previously associated with or implicated in dyslexia and/or variation in reading and spelling ability in past association studies, linkage analyses and other studies.
Previously reported variants
We assessed 75 previously reported variants within our summary statistics, adopting a replication/validation significance threshold of P < 7.28 × 10−4, derived by Bonferroni correction based on 68.7 independent tests derived through matrix spectral decomposition, taking into account LD (see Doust et al.25 for details on how these variants were selected). The sources for each variant are provided in Supplementary Table 26.
Dyslexia candidate genes
We evaluated gene-based results from MAGMA v.1.08 (ref. 56) for overrepresentation of genome-wide significant variants from the 23andMe dyslexia GWAS within the loci of 14 candidate genes from earlier literature: CMIP, CNTNAP2, CYP19A1, DCDC2, DIP2A, DYX1C1, GCFC2, KIAA0319, KIAA0319L, MRPL19, PCNT, PRMT2, S100B and ROBO1. The rationale for this selection is detailed by Luciano et al.24 and Doust et al.5. The critical P value, based on Bonferroni correction for 14 tests, was 3.57 × 10−3.
Candidate dyslexia gene sets
We performed a gene set analysis in MAGMA to test for overrepresentation of genome-wide significant variants within (1) a set of transcriptional targets of FOXP2, a highly conserved transcription factor linked to speech and language impairment99; and (2) two biological pathways previously suggested to play a role in dyslexia susceptibility100,101—axon guidance (GO:0007411: ‘chemotaxis process that directs the migration of an axon growth cone to a specific target site’; 216 genes) and neuron migration (GO:0001764: ‘movement of an immature neuron from germinal zones to specific positions where they will reside as they mature’; 145 genes). An adjusted critical P value of 0.017 was derived using Bonferroni correction based on three independent tests.
Participants provided informed consent and participated in the research online, under a protocol approved by the external AAHRPP-accredited IRB, Ethical and Independent Review Services. Participants were included in the analysis on the basis of consent status as checked at the time data analyses were initiated.
Further information on research design is available in the Nature Research Reporting Summary linked to this article.
The full summary statistics for each dyslexia GWAS presented in this paper will be made available through 23andMe website (https://research.23andme.com/dataset-access/) to qualified researchers under an agreement with 23andMe that protects the privacy of the 23andMe participants. The top 10,000 associated SNPs from the main GWAS can be downloaded from https://doi.org/10.7488/ds/3465.
Ritchie, S. J. & Bates, T. C. Enduring links from childhood mathematics and reading achievement to adult socioeconomic status. Psychol. Sci. 24, 1301–1308 (2013).
Shaywitz, S. E., Shaywitz, B. A., Fletcher, J. M. & Escobar, M. D. Prevalence of reading disability in boys and girls: results of the Connecticut Longitudinal Study. JAMA 264, 998–1002 (1990).
Katusic, S. K., Colligan, R. C., Barbaresi, W. J., Schaid, D. J. & Jacobsen, S. J. Incidence of reading disability in a population-based birth cohort, 1976–1982, Rochester, Minn. Mayo Clin. Proc. 76, 1081–1092 (2001).
Carroll, J. M., Maughan, B., Goodman, R. & Meltzer, H. Literacy difficulties and psychiatric disorders: evidence for comorbidity. J. Child Psychol. Psychiatry 46, 524–532 (2005).
Margari, L. et al. Neuropsychopathological comorbidities in learning disorders. BMC Neurol. 13, 198 (2013).
Willcutt, E. G., Pennington, B. F. & DeFries, J. C. Twin study of the etiology of comorbidity between reading disability and attention-deficit/hyperactivity disorder. Am. J. Med. Genet. 96, 293–301 (2000).
McArthur, G. M., Hogben, J. H., Edwards, V. T., Heath, S. M. & Mengler, E. D. On the ‘specifics’ of specific reading disability and specific language impairment. J. Child Psychol. Psychiatry 41, 869–874 (2000).
Catts, H. W., Fey, M. E., Tomblin, J. B. & Zhang, X. A longitudinal investigation of reading outcomes in children with language impairments. J. Speech Lang. Hear. Res. 45, 1142–1157 (2002).
Bates, T. C. et al. Genetic and environmental bases of reading and spelling: a unified genetic dual route model. Read. Writ. 20, 147–171 (2007).
Haworth, C. M. A. et al. Generalist genes and learning disabilities: a multivariate genetic analysis of low performance in reading, mathematics, language and general cognitive ability in a sample of 8000 12-year-old twins. J. Child Psychol. Psychiatry 50, 1318–1325 (2009).
Fisher, S. E. & DeFries, J. C. Developmental dyslexia: genetic dissection of a complex cognitive trait. Nat. Rev. Neurosci. 3, 767–780 (2002).
Gialluisi, A. et al. Genome-wide association study reveals new insights into the heritability and genetic correlates of developmental dyslexia. Mol. Psychiatry 26, 3004–3017 (2021).
Buniello, A. et al. The NHGRI-EBI GWAS catalog of published genome-wide association studies, targeted arrays and summary statistics 2019. Nucleic Acids Res. 47, D1005–D1012 (2018).
Eising, E. et al. Genome-wide analyses of individual differences in quantitatively assessed reading- and language-related skills in up to 34,000 people. Proc. Natl Acad. Sci. USA 119, e2202764119 (2022).
Kircher, M. et al. A general framework for estimating the relative pathogenicity of human genetic variants. Nat. Genet. 46, 310–315 (2014).
Ernst, J. & Kellis, M. ChromHMM: automating chromatin-state discovery and characterization. Nat. Methods 9, 215–216 (2012).
Kundaje, A. et al. Integrative analysis of 111 reference human epigenomes. Nature 518, 317–330 (2015).
Tilot, A. K. et al. The evolutionary history of common genetic variants influencing human cortical surface area. Cerebral Cortex 31, 1873–1887 (2020).
Sniekers, S. et al. Genome-wide association meta-analysis of 78,308 individuals identifies new loci and genes influencing human intelligence. Nat. Genet. 49, 1107–1112 (2017).
Benyamin, B. et al. Childhood intelligence is heritable, highly polygenic and associated with FNBP1L. Mol. Psychiatry 19, 253–258 (2014).
Bycroft, C. et al. The UK Biobank resource with deep phenotyping and genomic data. Nature 562, 203–209 (2018).
Middeldorp, C. M. et al. A genome-wide association meta-analysis of attention-deficit/hyperactivity disorder symptoms in population-based pediatric cohorts. J. Am. Acad. Child Adolesc. Psychiatry 55, 896–905.e6 (2016).
Zheng, J. et al. PhenoSpD: an integrated toolkit for phenotypic correlation estimation and multiple testing correction using GWAS summary statistics. Gigascience 7, giy090 (2018).
Luciano, M., Gow, A. J., Pattie, A., Bates, T. C. & Deary, I. J. The influence of dyslexia candidate genes on reading skill in old age. Behav. Genet. 48, 351–360 (2018).
Doust, C. et al. The association of dyslexia and developmental speech and language disorder candidate genes with reading and language abilities in adults. Twin Res. Hum. Genet. 23, 23–32 (2020).
Davis, C. J., Knopik, V. S., Olson, R. K., Wadsworth, S. J. & DeFries, J. C. Genetics and environmental influences on rapid naming and reading ability. Ann. Dyslexia 51, 231–247 (2001).
Gayán, J. & Olson, R. K. Genetic and environmental influences on orthographic and phonological skills in children with reading disabilities. Dev. Neuropsychol. 20, 483–507 (2001).
Hannula-Jouppi, K. et al. The axon guidance receptor gene ROBO1 is a candidate gene for developmental dyslexia. PLoS Genet. 1, e50 (2005).
Ganna, A. et al. Ultra-rare disruptive and damaging mutations influence educational attainment in the general population. Nat. Neurosci. 19, 1563–1565 (2016).
Gialluisi, A. et al. Investigating the effects of copy number variants on reading and language performance. J. Neurodev. Disord. 8, 17–17 (2016).
Oksenberg, N., Stevison, L., Wall, J. D. & Ahituv, N. Function and regulation of AUTS2, a gene implicated in autism and human evolution. PLoS Genet. 9, e1003221 (2013).
Beunders, G. et al. Two male adults with pathogenic AUTS2 variants, including a two-base pair deletion, further delineate the AUTS2 syndrome. Eur. J. Human Genet. 23, 803–807 (2015).
Girirajan, S. et al. Relative burden of large CNVs on a range of neurodevelopmental phenotypes. PLoS Genet. 7, e1002334 (2011).
Wessel, K. et al. 17q23.2q23.3 de novo duplication in association with speech and language disorder, learning difficulties, incoordination, motor skill impairment, and behavioral disturbances: a case report. BMC Med. Genet. 18, 119 (2017).
Guo, H. et al. Disruptive mutations in TANC2 define a neurodevelopmental syndrome associated with psychiatric disorders. Nat. Commun. 10, 4679 (2019).
Pasmant, E. et al. Characterization of a 7.6-Mb germline deletion encompassing the NF1 locus and about a hundred genes in an NF1 contiguous gene syndrome patient. Eur. J. Hum. Genet. 16, 1459–1466 (2008).
Takata, A. et al. Integrative analyses of de novo mutations provide deeper biological insights into autism spectrum disorder. Cell Reports 22, 734–747 (2018).
Lee, J. J. et al. Gene discovery and polygenic prediction from a genome-wide association study of educational attainment in 1.1 million individuals. Nat. Genet. 50, 1112–1121 (2018).
Finucane, H. K. et al. Partitioning heritability by functional annotation using genome-wide association summary statistics. Nat. Genet. 47, 1228–1235 (2015).
Grove, J. et al. Identification of common genetic risk variants for autism spectrum disorder. Nat. Genet. 51, 431–444 (2019).
Mozzi, A. et al. The evolutionary history of genes involved in spoken and written language: beyond FOXP2. Sci. Rep. 6, 22157 (2016).
Schmitz, J., Abbondanza, F. & Paracchini, S. Genome-wide association study and polygenic risk score analysis for hearing measures in children. Am. J. Med. Genet. B Neuropsychiatr. Genet. 186, 318–328 (2021).
Vellutino, F. Alternative conceptualizations of dyslexia: evidence in support of a verbal-deficit hypothesis. Harvard Educ. Rev. 47, 334–354 (2012).
Dehaene, S., Cohen, L., Morais, J. & Kolinsky, R. Illiterate to literate: behavioural and cerebral changes induced by reading acquisition. Nat. Rev. Neurosci. 16, 234–244 (2015).
Cuellar-Partida, G. et al. Genome-wide association study identifies 48 common genetic variants associated with handedness. Nat. Hum. Behav. 5, 59–70 (2021).
Papadatou-Pastou, M. et al. Human handedness: a meta-analysis. Psychol. Bull. 146, 481–524 (2020).
Peters, M., Reimers, S. & Manning, J. T. Hand preference for writing and associations with selected demographic and behavioral variables in 255,100 subjects: the BBC internet study. Brain Cogn. 62, 177–189 (2006).
Brandler, W. M. & Paracchini, S. The genetic relationship between handedness and neurodevelopmental disorders. Trends Mol. Med. 20, 83–90 (2014).
Willcutt, E. G., Pennington, B. F., Olson, R. K., Chhabildas, N. & Hulslander, J. Neuropsychological analyses of comorbidity between reading disability and attention deficit hyperactivity disorder: in search of the common deficit. Dev. Neuropsychol. 27, 35–78 (2005).
Gu, X. et al. Heightened brain response to pain anticipation in high-functioning adults with autism spectrum disorder. Eur. J. Neurosci. 47, 592–601 (2018).
Whitney, D. G. & Shapiro, D. N. National prevalence of pain among children and adolescents with autism spectrum disorders. JAMA Pediatr. 173, 1203–1205 (2019).
Munafò, M. R., Tilling, K., Taylor, A. E., Evans, D. M. & Davey Smith, G. Collider scope: when selection bias can substantially influence observed associations. Int. J. Epidemiol. 47, 226–235 (2018).
Fontanillas, P. et al. Disease risk scores for skin cancers. Nat. Commun. 12, 160 (2021).
Das, S. et al. Next-generation genotype imputation service and methods. Nat. Genet. 48, 1284–1287 (2016).
Gialluisi, A. et al. Genome-wide association scan identifies new variants associated with a cognitive predictor of dyslexia. Transl. Psychiatry 9, 77 (2019).
de Leeuw, C. A., Mooij, J. M., Heskes, T. & Posthuma, D. MAGMA: generalized gene-set analysis of GWAS data. PLoS Comput. Biol. 11, e1004219 (2015).
Watanabe, K., Taskesen, E., van Bochoven, A. & Posthuma, D. Functional mapping and annotation of genetic associations with FUMA. Nat. Commun. 8, 1826 (2017).
Subramanian, A. et al. Gene set enrichment analysis: a knowledge-based approach for interpreting genome-wide expression profiles. Proc. Natl Acad. Sci. USA 102, 15545–15550 (2005).
Liberzon, A. et al. Molecular signatures database (MSigDB) 3.0. Bioinformatics 27, 1739–1740 (2011).
Wakefield, J. A Bayesian measure of the probability of false discovery in genetic epidemiology studies. Am. J. Human Genet. 81, 208–227 (2007).
Maller, J. B. et al. Bayesian refinement of association signals for 14 loci in 3 common diseases. Nat. Genet. 44, 1294–1301 (2012).
Howe, K. L. et al. Ensembl 2021. Nucleic Acids Res. 49, D884–D891 (2020).
Carvalho-Silva, D. et al. Open Targets Platform: new developments and updates two years on. Nucleic Acids Res. 47, D1056–D1065 (2018).
Petrovski, S. et al. The intolerance of regulatory sequence to genetic variation predicts gene dosage sensitivity. PLoS Genet. 11, e1005492 (2015).
Rada-Iglesias, A. Is H3K4me1 at enhancers correlative or causative? Nat. Genet. 50, 4–5 (2018).
Heintzman, N. D. et al. Distinct and predictive chromatin signatures of transcriptional promoters and enhancers in the human genome. Nat. Genet. 39, 311–318 (2007).
Cahoy, J. D. et al. A transcriptome database for astrocytes, neurons, and oligodendrocytes: a new resource for understanding brain development and function. J. Neurosci. 28, 264 (2008).
The GTEx Consortium. The Genotype-Tissue Expression (GTEx) pilot analysis: multitissue gene regulation in humans. Science 348, 648 (2015).
Finucane, H. K. et al. Heritability enrichment of specifically expressed genes identifies disease-relevant tissues and cell types. Nat. Genet. 50, 621–629 (2018).
Dunham, I. et al. An integrated encyclopedia of DNA elements in the human genome. Nature 489, 57–74 (2012).
Vermunt, M. W. et al. Epigenomic annotation of gene regulatory alterations during evolution of the primate brain. Nat. Neurosci. 19, 494–503 (2016).
Reilly, S. K. et al. Evolutionary genomics. Evolutionary changes in promoter and enhancer activity during human corticogenesis. Science 347, 1155–1159 (2015).
Peyrégne, S., Boyle, M. J., Dannemann, M. & Prüfer, K. Detecting ancient positive selection in humans using extended lineage sorting. Genome Res. 27, 1563–1572 (2017).
Simonti, C. N. et al. The phenotypic legacy of admixture between modern humans and Neandertals. Science 351, 737–741 (2016).
Vernot, B. et al. Excavating Neandertal and Denisovan DNA from the genomes of Melanesian individuals. Science 352, 235–239 (2016).
Gazal, S. et al. Linkage disequilibrium-dependent architecture of human complex traits shows action of negative selection. Nat. Genet. 49, 1421–1427 (2017).
Bulik-Sullivan, B. K. et al. LD score regression distinguishes confounding from polygenicity in genome-wide association studies. Nat. Genet. 47, 291–295 (2015).
Bulik-Sullivan, B. K. et al. An atlas of genetic correlations across human diseases and traits. Nat. Genet. 47, 1236–1241 (2015).
Zheng, J. et al. LD Hub: a centralized database and web interface to perform LD score regression that maximizes the potential of summary level GWAS data for SNP heritability and genetic correlation analysis. Bioinformatics 33, 272–279 (2016).
Grasby, K. L. et al. The genetic architecture of the human cerebral cortex. Science 367, eaay6690 (2020).
Satizabal, C. L. et al. Genetic architecture of subcortical brain structures in 38,851 individuals. Nat. Genet. 51, 1624–1636 (2019).
Hibar, D. P. et al. Novel genetic loci associated with hippocampal volume. Nat. Commun. 8, 13624 (2017).
Adams, H. H. et al. Novel genetic loci underlying human intracranial volume identified through genome-wide association. Nat. Neurosci. 19, 1569–1582 (2016).
Smith, S. M. et al. An expanded set of genome-wide association studies of brain imaging phenotypes in UK Biobank. Nat. Neurosci. 24, 737–745 (2021).
Alfaro-Almagro, F. et al. Image processing and Quality Control for the first 10,000 brain imaging datasets from UK Biobank. Neuroimage 166, 400–424 (2018).
Forkel, S. J. & Catani, M. The Oxford Handbook of Neurolinguistics: Diffusion Imaging Methods in Language Sciences (Oxford Univ. Press, Oxford, 2019).
Price, C. J. The anatomy of language: a review of 100 fMRI studies published in 2009. Ann. N. Y. Acad. Sci. 1191, 62–88 (2010).
Richardson, F. M. & Price, C. J. Structural MRI studies of language function in the undamaged brain. Brain Struct. Funct. 213, 511–523 (2009).
Perdue, M. V., Mednick, J., Pugh, K. R. & Landi, N. Gray matter structure is associated with reading skill in typically developing young readers. Cereb. Cortex 30, 5449–5459 (2020).
Roehrich-Gascon, D., Small, S. L. & Tremblay, P. Structural correlates of spoken language abilities: a surface-based region-of interest morphometry study. Brain Lang. 149, 46–54 (2015).
Luciano, M. et al. A genome-wide association study for reading and language abilities in two population cohorts. Genes Brain Behav. 12, 645–652 (2013).
Gialluisi, A. et al. Genome-wide screening for DNA variants associated with reading and language traits. Genes Brain Behav. 13, 686–701 (2014).
Purcell, S. et al. PLINK: a tool set for whole-genome association and population-based linkage analyses. Am. J. Human Genet. 81, 559–575 (2007).
Euesden, J., Lewis, C. M. & O’Reilly, P. F. PRSice: polygenic risk score software. Bioinformatics 31, 1466–1468 (2015).
Bates, T. C. et al. Behaviour genetic analyses of reading and spelling: a component processes approach. Aust. J. Psychol. 56, 115–126 (2004).
Dollaghan, C. & Campbell, T. F. Nonword repetition and child language impairment. J. Speech Lang. Hear. Res. 41, 1136–1146 (1998).
Gathercole, S. E., Willis, C. S., Baddeley, A. D. & Emslie, H. The Children’s Test of Nonword Repetition: a test of phonological working memory. Memory 2, 103–127 (1994).
RStudio Team. RStudio: Integrated Development for R. (Boston, MA, 2020).
Ayub, Q. et al. FOXP2 Targets show evidence of positive selection in European populations. Am. J. Human Genet. 92, 696–706 (2013).
Poelmans, G., Buitelaar, J. K., Pauls, D. L. & Franke, B. A theoretical molecular network for dyslexia: integrating available genetic findings. Mol. Psychiatry 16, 365–382 (2011).
Guidi, L. G. et al. The neuronal migration hypothesis of dyslexia: a critical evaluation 30 years on. Eur. J. Neurosci. 48, 3212–3233 (2018).
We thank the research participants and employees of 23andMe Inc, the GenLang Consortium, the Brisbane Adults Reading Study, and the CRS. E.E., G.A., B.M., B.S.P., C.F. and S.E.F. are supported by the Max Planck Society (Germany). The CRS was supported by grants from the National Natural Science Foundation of China (Grant No. 61807023), Funds for Humanities and Social Sciences Research of the Ministry of Education (Grant No. 19YJC190023 and 17XJC190010) and General Project of Shaanxi Natural Science Basic Research Program (2018JQ8015) (Grant No. 2018JQ8015 and 2021JQ-309). S.P. is funded by the Royal Society. Acknowledgements for the GenLang Consortium appear in the Supplementary Note.
P.F., A.A. and the 23andMe Research Team are employed by and hold stock or stock options in 23andMe, Inc. The remaining authors declare no competing interests.
Peer review information
Nature Genetics thanks the anonymous reviewers for their contribution to the peer review of this work. Peer reviewer reports are available.
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
a-c, Quantile-quantile (Q-Q) plots of observed versus expected P values for associations of single nucleotide polymorphisms with self-reported dyslexia diagnosis in a genome-wide association analysis for all participants (n = 51,800 cases, 1,087,070 controls) (a), female participants (n = 30,287 cases, 641,016 controls) (b), and male participants (n = 21,513 cases, 446,054 controls) (c). The solid red line represents the distribution of P values under the null hypothesis, and the dashed red line represent 95% confidence intervals. The black circles represent the observed distribution of P values.
The y-axis represents the -log10 P value for association of single nucleotide polymorphisms with self-reported dyslexia diagnosis from 30,287 female individuals and 641,016 female controls. The threshold for genome-wide significance (P < 5 × 10−8) is represented by a horizontal grey line. Genome-wide significant variants in the 17 genome-wide significant loci are red. Variants located within a distance of 250 kb of each other are considered as one locus.
The y-axis represents the -log10 P value for association of single nucleotide polymorphisms with self-reported dyslexia diagnosis from 21,513 male individuals and 446,054 male controls. The threshold for genome-wide significance (P < 5 × 10−8) is represented by a horizontal grey line. Genome-wide significant variants in the 6 genome-wide significant loci are red. Variants located within a distance of 250 kb of each other are considered as one locus.
Extended Data Fig. 4 Variant effect predictor summary for the credible set of variants significantly associated with dyslexia.
Summary information is output from the online variant effect predictor in ENSEMBL (release 104). All our variants were present in the 1000 Genomes reference panel so are considered existing, and no pre-filtering (for example, on MAF; consequence type) was done.
The 24 major functional annotations were defined by Finucane et al.39. Enrichment is the proportion of h2/proportion of SNPs. The horizontal dotted line indicates no enrichment (where proportion of h2/proportion of SNPs = 1). Error bars represent standard errors of the enrichment estimates. Asterisks indicate enrichment estimates are significant based on a Bonferroni-derived P value of < 2.08 × 10−3 (for 24 tests). Exact values of enrichment statistic, standard error, and P value can be found in Supplementary Table 16.
The -log10 P value of the enrichment estimates for heritability of dyslexia for genes expressed in 12 brain regions. The horizontal dotted line indicates significance after Bonferroni correction for 12 tests (P < 4.17 × 10−3).
The -log10 P value of the enrichment estimates for heritability of dyslexia for brain cell types. The horizontal dotted line indicates significance after Bonferroni correction for three tests (P < 1.67 × 10−2).
The -log10 P value of the enrichment estimates for heritability of dyslexia for variants located within H3K4me1 peaks of different tissues. Central nervous systems tissues are represented in dark green and other tissues are represented in light green. The vertical dotted line indicates significance after Bonferroni correction for 114 tests (P < 4.39 × 10−4).
The -log10 P value of the enrichment estimates for heritability of dyslexia for variants located within H3K4me3 peaks of different tissues. Central nervous systems tissues are represented in dark blue and other tissues are represented in light blue. The vertical dotted line indicates significance after Bonferroni correction for 114 tests (P < 4.39 × 10−4).
About this article
Cite this article
Doust, C., Fontanillas, P., Eising, E. et al. Discovery of 42 genome-wide significant loci associated with dyslexia. Nat Genet 54, 1621–1629 (2022). https://doi.org/10.1038/s41588-022-01192-y