Association between genes regulating neural pathways for quantitative traits of speech and language disorders

Speech sound disorders (SSD) manifest as difficulties in phonological memory and awareness, oral motor function, language, vocabulary, reading, and spelling. Families enriched for SSD are rare, and typically display a cluster of deficits. We conducted a genome-wide association study (GWAS) in 435 children from 148 families in the Cleveland Family Speech and Reading study (CFSRS), examining 16 variables representing 6 domains. Replication was conducted using the Avon Longitudinal Study of Parents and Children (ALSPAC). We identified 18 significant loci (combined p < 10−8) that we pursued bioinformatically. We prioritized 5 novel gene regions with likely functional repercussions on neural pathways, including those which colocalized with differentially methylated regions in our sample. Polygenic risk scores for receptive language, expressive vocabulary, phonological awareness, phonological memory, spelling, and reading decoding associated with increasing clinical severity. In summary, neural-genetic influence on SSD is primarily multigenic and acts on genomic regulatory elements, similar to other neurodevelopmental disorders.

Nonsense word repetition (NSW 5 ) requires children to repeat 15 non-words, a task which requires encoding of unfamiliar phonological sequences; deficits in encoding would result in inaccurate word repetition 5 and can discriminate children and adults with resolved SSD from those who never had SSD 6 . While this task is normally given to individuals ages 4 years through adulthood, the children with CAS generally could not perform this task in preschool, so the first available assessment starting at age 7 was used for this analysis. The Multisyllabic Word Repetition task (MSW) requires children to accurately sequence phonemes by repeating real multisyllabic words. Target words include aluminum, thermometer, and sympathize 5 . The test is scored by determining the percentage of words repeated correctly. In addition, we scored the percentage of phonemes repeated correctly; these tests are the MSW-PPC and NSW-PPC.

Phonological Awareness Measure
The Elision subtest of the Comprehensive Test of Phonological Processing 7 requires the individual to repeat a word and then say the word with a sound or syllable deleted. For example, "Say the word tulip. Now say it without the "tu". The Elision subtest is a measure of phonological awareness, that is knowledge of the sound system of oral language. Children with poor phonological awareness skills demonstrate weakness in single word reading. In addition, we gave an earlier version of the Elision task, Comprehensive Test of Phonological Processing-Experimental Version (Wagner, Torgesen, & Rashotte, 1994, personal communication), which we combined with this CTOPP subtest in order to create a single variable for analysis. We transformed the CTOPP Elision subtest into a z-score prior to merging with the original Elision.

Rapid Automatized Naming
The Rapid Color Naming (RAN) requires the individual to name colored squares as quickly as possible. The total number of seconds to name all the colors is the individual's score. This test measures the subject's ability to retrieve phonological information from long-term memory and is predictive of reading decoding skills. We had an older version of this task 8 as well as the CTOPP subtest. We transformed the CTOPP into a z-score and flipped (or negated) the RAN prior to merging CTOPP colors, a scaled score with a mean of 10 and standard deviation of 3, and RAN colors, a z-score with larger values representing worse performance.

Spoken Language Measures
There were two language assessments that were combined, Clinical Evaluation of Language Fundamentals-Revised (CELF-R) 9 and the Test of Language Development-Primary-Second Edition (TOLD-P2) 10 . For pre-school age children, the equivalent subtests on the Clinical Evaluation of Language Fundamentals-Preschool was given. Participants enrolled in the study after the release of a newer version of the tests were given the new version. Both the CELF-R and TOLD-P2 provide standard scores for receptive and expressive language skills. We merged the CELF-R receptive with the TOLD-P2 listening quotient, with preference to the CELF-R if both were given at the same time point. Both the CELF-R and the TOLD-P2 were scaled scores, with a population mean of 100 and standard deviation of 15. We merged the CELF-R expressive with the TOLD-P2 speaking quotient, with preference to the CELF-R if both were given at the same time point. Both the CELF-R and the TOLD-P2 were scaled scores, with a population mean of 100 and standard deviation of 15.

Reading Measures
The Woodcock Reading Mastery Test-Revised, Word Attack subtest (WRMT-AT 11 ) evaluates phonetic decoding skills by requiring examinees to read a list of 45 non-words; the test includes nonwords such as ip, din, ceisminadolt, and gnouthe. The Woodcock Reading Mastery Test-Revised, Word Identification Subtest (WRMT-ID 11 ) assesses single word reading ability by requiring examinees to read a list of 106 real words. This task is given to individuals ages 5 through adulthood. The Wechsler Individual Achievement Test 12 has 2 subtests. The Reading Comprehension subtest (WIAT-RC) consists of printed passages that the individual reads and then answers orally presented question on the passage. It assesses the individual's ability to recognize details and make inferences concerning what has been read. The reading comprehension subtests provides information on the child's ability to use picture cues, recognize detail, sequence events, identify cause-effect relationships, make inferences, and compare and contrast characters, objects or events from the passage. The Listening Comprehension subtest (WIAT-LC) consists of orally presented passages that are sometimes paired with pictures. The child understanding of the passage is examined by answering questions about the passage. The Listening Comprehension subtest, similar to the Reading Comprehension Subtest, assesses the child's ability to use picture cues, recognize detail, sequence events, identify cause-effect relationships, make inferences, and compare and contrast characters, objects or events from the passage.

ALSPAC Study Communication measures
We worked with the ALSPAC team to find the most equivalent measures to match those in the CFSRS. These are summarized in Supplementary Table 3. The multisyllabic word repetition (MWR) task consisted of repeating "buttercup" and dinosaur" 5 times each -the total correct score was used for analysis. This task was given at age 5. A nonsense word repetition task 13 (CNrep) was given at age 5 (CNrep5), and a shortened version of this task was given at age 8 (CNrep8). Two reading tasks were given. At age 7, both the Wechsler Objective Reading Dimensions single word reading task (WORD) and Neale analysis of reading ability (NARA) were given. The NARA has a reading comprehension (NARA-C) and reading accuracy (NARA-A) subtest. A non-word reading task was designed specifically for ALSPAC, so we refer to it as ALSPACread. The spelling tasks were based on work by Nunes et al. 14 , so we refer to this task as ALSPACspell7 and ALSPACspell9 for the tests given at ages 7 and 9, respectively. The Wechsler Objective Language Dimensions (WOLD) is a test of expressive language ability in children, and has several subtests that were used for language and vocabulary assessment:

Supplementary Figure 3. Colocalization figures using LocusFocus. Each figure has 3 subfigures A-C. A
Visualization plot depicts GWAS results for the associated speech communication phenotype as filled circles (corresponding yaxis on left) and eQTLs for the highlighted gene in GTEx tissues (spanning brain and skeletal muscle), as well as eQTLs for specified psychEncode genes as lines (corresponding y-axis on right). B Heat map shows Simple Sum colocalization results for each gene/tissue combination. Cell colors for gene-tissue pairs are based on strength of colocalization, as -log10(SS p-values) for that gene-tissue pair. Strength of colocalization is coloured from green (low -log10P) to red (high -log10P). C Colocalization results for PsychEncode eQTLs in specified genes in the region. Significant colocalization is determined by the suggested Simple Sum colocalization threshold (-log10 (0.05/#of tests)).

Supplementary Figure 3a. Colocalization Plots for IFI16 region
A B C *Colocalization Plots for the IFI16 region. The highlighted gene for which GTEx eQTLs are displayed in subfigure A is IFI16 and the GWAS results are for NWR. Significant colocalization is determined by the suggested Simple Sum colocalization threshold of 1.6 (-log10 (0.05/#of tests)). PsychENCODE eQTL SS colocalization is not significant for the IFI16 and DARC genes. For GTEx tissues there are no significant SS colocalizations seen for any gene/tissue combination in the region.

Supplementary Figure 3c. Colocalization Plots for DACT1 region
A B C *Colocalization Plots for the DACT1 region. The highlighted gene for which GTEx eQTLs are displayed in subfigure A is DACT1 and the GWAS results are for MSW. Significant colocalization is determined by the suggested Simple Sum colocalization threshold of 1.3 (-log10 (0.05/#of tests)). PsychENCODE eQTL SS colocalization is not significant for the DACT1 gene. For GTEx tissues there are no significant SS colocalizations seen for any gene/tissue combination in the region.

Supplementary Figure 3d. Colocalization Plots for SETD3 region
A B C *Colocalization Plots for the SETD3 region. The highlighted gene for which GTEx eQTLs are displayed in subfigure A is SETD3 and the GWAS results are for WRMT-AT. Significant colocalization is determined by the suggested Simple Sum colocalization threshold of 2.0 (-log10 (0.05/#of tests)). PsychENCODE eQTL SS colocalization is significant for the SETD3 gene. For GTEx tissues significant SS colocalization is seen for SETD3 in brain_nucleus_accumbens_basal_ganglia and skeletal muscle, and in CCNK in cerebellar hemisphere and cerebellum.

Supplementary Figure 3e. Colocalization Plots for MON1B region
A B C *Colocalization Plots for the MON1B region. The highlighted gene for which GTEx eQTLs are displayed in subfigure A is SYCE1L and the GWAS results are for MSW. Significant colocalization is determined by the suggested Simple Sum colocalization threshold of 2.78 (-log10 (0.05/#of tests)). PsychENCODE eQTL SS colocalization is significant for the SYCE1L gene and borderline significant for the MON1B gene, at 2.65. For GTEx tissues significant SS colocalization is seen for SYCE1L across all