Genome-wide association scan identifies new variants associated with a cognitive predictor of dyslexia

Developmental dyslexia (DD) is one of the most prevalent learning disorders, with high impact on school and psychosocial development and high comorbidity with conditions like attention-deficit hyperactivity disorder (ADHD), depression, and anxiety. DD is characterized by deficits in different cognitive skills, including word reading, spelling, rapid naming, and phonology. To investigate the genetic basis of DD, we conducted a genome-wide association study (GWAS) of these skills within one of the largest studies available, including nine cohorts of reading-impaired and typically developing children of European ancestry (N = 2562–3468). We observed a genome-wide significant effect (p < 1 × 10−8) on rapid automatized naming of letters (RANlet) for variants on 18q12.2, within MIR924HG (micro-RNA 924 host gene; rs17663182 p = 4.73 × 10−9), and a suggestive association on 8q12.3 within NKAIN3 (encoding a cation transporter; rs16928927, p = 2.25 × 10−8). rs17663182 (18q12.2) also showed genome-wide significant multivariate associations with RAN measures (p = 1.15 × 10−8) and with all the cognitive traits tested (p = 3.07 × 10−8), suggesting (relational) pleiotropic effects of this variant. A polygenic risk score (PRS) analysis revealed significant genetic overlaps of some of the DD-related traits with educational attainment (EDUyears) and ADHD. Reading and spelling abilities were positively associated with EDUyears (p ~ [10−5–10−7]) and negatively associated with ADHD PRS (p ~ [10−8−10−17]). This corroborates a long-standing hypothesis on the partly shared genetic etiology of DD and ADHD, at the genome-wide level. Our findings suggest new candidate DD susceptibility genes and provide new insights into the genetics of dyslexia and its comorbities.


Phoneme Awareness (PA)
Phoneme awareness is defined as the ability to manipulate the smallest pronounced unit in a word (i.e., the phoneme). In all countries except for UK and Colorado, this was assessed through a phoneme deletion test. A typical task of this kind consists of pronouncing a sound sequence after deleting a specified sound (e.g. say "/gulst/without/l/"). Language-specific tasks were constructed with comparable difficulty levels in all countries, to account for different orthography consistency 1 . In the UK dataset, PA was assessed through phoneme deletion and substitution in single words and spoonerism tasks (i.e. swapping the first sounds of two words, e.g. from spoon, dog to doon, spog). In all the countries mentioned above, raw scores were grade-normed and standardized as before. In Colorado, PA was assessed through i) a classical phoneme deletion task and ii) a phoneme segmentation and transposition task, i.e. taking the first phoneme of a word, putting it at the end and adding the sound "/ay/" (e.g. rope becomes ope-ray). Then an average score of these tests was computed, which was ageadjusted and standardized against a matching control population 3 .

Digit Span (DigSpan)
Digit Span was tested in all countries except the UK through a classical WISC (Wechsler Intelligence Scale for Children) digit span task 4,5 . Such a task consisted of reciting a sequence of digits visually presented by recalling them in the same (forward) and reverse (backward) order. Raw scores (sum of forward and backward test scores) were then converted into scaled scores (mean = 10, SD = 3) and finally into z-standardized scores based on national norms within each country 1,2 . However, in this case no grade-normalization was carried out 1 .

Rapid automatized naming (RAN)
RAN tests were administered in all countries except the UK. Children were asked to name as quickly and as accurately as possible a matrix of digits (RANdig), letters (RANlet), and objects/pictures (RANpic) that was visually presented. The resulting raw score (i.e., the number of items correctly named per minute) was then grade-normed and standardized as above in all the European countries 1,2 , while it was age-adjusted and then standardized against a control population in Colorado 3 .

Phenotypic outlier detection
To check for the presence of phenotypic outliers in our datasets, we defined them within each dataset as subjects showing phenotypic z-scores at least 4 standard deviations (SDs) below or above the mean for the majority of the traits available within each dataset (i.e., in at least 3 out of 4 traits in the UK dataset and in at least 5 out of 8 traits in all the other datasets). None of the subjects tested met this condition in any dataset.

Genotype quality control (QC) and imputation
Quality control of genotyped data was conducted in PLINK 1.90b3s 6 . Some of the QC filters applied differed between the sibling-based datasets analysed (i.e., UK and Colorado) and the cohorts made up of unrelated subjects (AGS, Finland, France, Hungary, and the Netherlands), as reported in detail in Table S2. Prior to imputation, genotypes were aligned to the 1000 genomes phase I v3 reference panel (June 2014 release) 7 using SHAPEIT v2 (r837) 8 Table S1a. Cross-trait correlations (Pearson's r coefficient) of the eight cognitive skills analysed in the present study. These coefficients are based on the seven datasets made up of unrelated subjects (AGS, Finland, France, Hungary, and the Netherlands; see Table 1 in the main text and Table S2 below), while for the sibling-based datasets (Colorado and UK) cross-trait correlations can be found in Gialluisi et al. 3 (see Table   S4a, c).  Table S1b. Latent variables computed through a Principal Component Analysis (PCA) of the cross-trait correlation matrix (see Table 1a above), performed in MatSpd 10 . Here, PC coefficients of each trait on the latent variables computed for the unrotated matrix are reported. We report in bold those latent variables (VAR1 to VAR5) which were considered in the correction for multiple phenotypic traits tested, as computed by MatSpd (see main text). Overall, these variables explained 90% of the total shared variance in the cognitive traits analysed.  Table S2. Main genotype QC statistics and information for each dataset involved in the study. a A subset of the German sample (N=195) was genotyped on the Illumina 317k chip and shared a low number of SNPs with the Human CoreExome array. These samples were thus QCed separately and merged with the rest of the AGS samples only after imputation. b SNPs in these datasets were preliminarily filtered through Illumina GenomeStudio software before producing hard-call genotype data, as described by Gialluisi et al. 3 . c Since this dataset was genotyped on two different platforms, only variants which were shared between the two arrays were used in the following genotype QC and imputation.

Power and sample size estimation analysis
We performed a sample-size estimation analysis using the Genetic Power Calculator 11 .
Specifically, we computed the sample size (N) required to detect genome-wide significant associations with continuous traits, assuming a 50:50 dominance-to-additive QTL ratio, and a QTL with MAF = 5% and in perfect LD with the causative SNP, which could explain 1% of the total phenotypic variance (i.e. R 2 = 0.01). Since some of our cohorts were made up of unrelated subjects, while others were sibling-based, we performed the analyses in both alternative settings. Assuming a dataset of unrelated subjects, a sample size N=4,280 was required to have a 80% power to detect genome-wide significant associations (p < 5x10 -8 ), and N=4,646 to detect associations surviving our correction for multiple traits tested (p < 1x10 -8 , see Table S1b above and main text). On the other hand, under the assumption of a sibling-based cohort (with sibling correlation of 0.5), the sample sizes required were N=2,136 and N=2,319, respectively. Assuming a QTL with effect size R 2 = 0.005, the samples sizes required to reach a power of 80% were N=8,582 (for association p < 5x10 -8 ) and N=9,316 (for p < 1x10 -8 ) in the unrelated subjects settings, and N=4,283 and N=4,649 in the sib-pairs settings. Due to the mixed nature of our cohorts -with some datasets made up of unrelated subjects and others made up of sibling pairs or, rarely, trios -it is likely that the sample size required to have a 80% power in our GWAS meta-analysis is in the middle between the values reported above for the different settings.

Further analyses of top association signals
The analyses explained in this section were only conducted on datasets with RANlet measures available (see Table S4) and required the preliminary adjustment of phenotypic traits for genetic population structure in each dataset. This was carried out differently in the datasets including only unrelated subjects (AGS, Finland, France, Hungary, and The Netherlands) and in the sibling-based dataset (Colorado). In the former group, we regressed the phenotypic traits against the first ten MDS components (previously used as covariates in the GWAS). In the latter case, we adjusted the traits for a GRM through the polygenic() function of the GenABEL package 12 .

Permutation-based correlation test and effect size estimation
To assess the robustness of the most significant associations detected (with RANlet), we carried out a permutation-based test on the top-associated SNPs at 18q12.2 (rs17663182) and

Further tests of pleiotropy
In addition to the genome-wide multivariate association test across all the DD-related traits analyzed, for the two most significant univariate (and multivariate) association signals we carried out a specific multivariate association test on RAN traits only, through TATES 15 . This software combines the p-values obtained in univariate genetic association analysis on multiple (correlated) phenotypes, to produce one multivariate association p-value per SNP, while correcting for their correlations. For this analysis, we used as input the association pvalues of the top hits with RANlet, RANdig and RANpic, as well as their correlation matrix (i.e. last three columns and last three rows of Table S1 above).
Moreover, we tested the top association signals for horizontal (independent) pleiotropic effects on traits other than RANlet, namely WRead, WSpell, NWRead, PA, DigSpan, RANdig and RANpic. To this end, we first regressed these traits, which had previously been adjusted for genetic population structure, against the RANlet score in R, separately for each dataset. Then we tested the residuals of these traits for association with rs17663182 and rs16928927 dosages in PLINK. Finally, we combined the results of the association tests in different datasets through an inverse-variance fixed-effect pooled analysis in METAL v25-03-2011 16 , which allowed us to directly detect concordance of allelic trends across datasets for all the SNPs tested.

Test for independent genetic effects in 18q12.2 and 8q12.3
We tested for the presence of genetic effects independent from the local top hits in 18q12. combined the association statistics that were produced for each dataset using METAL (as described above).

SNP×SNP interaction analysis
To investigate potential epistatic effects of rs17663182 and rs16928927 on RANlet, we carried out a two-SNP interaction analysis in R. Since rs16928927 was not available in the Finnish dataset, this analysis was conducted only in the AGS, France, Hungary, Netherlands, and Colorado datasets. The analysis consisted of two steps: first, we regressed RANlet scores adjusted for genetic population structure against the allelic dosages of the SNPs rs17663182 and rs16928927. Then we regressed the RANlet residual scores against a single interaction term of the two SNPs and computed the fraction of phenotypic variance (R 2 ) explained by this term. were also included as covariates. The resulting files were combined meta-analytically using a fixed-effect, inverse-variance-weighted model as implemented in METAL 16 , both in the discovery and in the replication datasets, and finally a meta-analysis across the discovery and replication data sets was carried out.

Imaging genetic assessment
Our choice of investigating subcortical brain volumes in relation to our GWAS top hits was determined by two factors, namely i) the increasing evidence implicating subcortical structures in reading and language abilities (as reviewed in 20-22 ), and ii) the large sample size of the imaging genetics GWAS 17 , which maximized the power to detect significant genetic effects. Previous reviews suggested a potential involvement of some these structures in learning-and language-related abilities, like putamen, thalamus, hippocampus and globus pallidus 20,21 , and in reading-related phonological processes 22 . However, since these brain structures are highly interconnected among themselves and with the cortical regions implicated in the above mentioned processes 21 , we decided to include in our assessment all the subcortical structures which were tested so far 17 .
For this analysis, we computed a Bonferroni-corrected significance threshold α = 7.1×10 -4 , taking into account two SNPs, five independent latent traits tested in our study (computed in