Genetic architecture of childhood speech disorder: a review

Severe speech disorders lead to poor literacy, reduced academic attainment and negative psychosocial outcomes. As early as the 1950s, the familial nature of speech disorders was recognized, implying a genetic basis; but the molecular genetic basis remained unknown. In 2001, investigation of a large three generational family with severe speech disorder, known as childhood apraxia of speech (CAS), revealed the first causative gene; FOXP2. A long hiatus then followed for CAS candidate genes, but in the past three years, genetic analysis of cohorts ascertained for CAS have revealed over 30 causative genes. A total of 36 pathogenic variants have been identified from 122 cases across 3 cohorts in this nascent field. All genes identified have been in coding regions to date, with no apparent benefit at this stage for WGS over WES in identifying monogenic conditions associated with CAS. Hence current findings suggest a remarkable one in three children have a genetic variant that explains their CAS, with significant genetic heterogeneity emerging. Around half of the candidate genes identified are currently supported by medium (6 genes) to strong (9 genes) evidence supporting the association between the gene and CAS. Despite genetic heterogeneity; many implicated proteins functionally converge on pathways involved in chromatin modification or transcriptional regulation, opening the door to precision diagnosis and therapies. Most of the new candidate genes for CAS are associated with previously described neurodevelopmental conditions that include intellectual disability, autism and epilepsy; broadening the phenotypic spectrum to a distinctly milder presentation defined by primary speech disorder in the setting of normal intellect. Insights into the genetic bases of CAS, a severe, rare speech disorder, are yet to translate to understanding the heritability of more common, typically milder forms of speech or language impairment such as stuttering or phonological disorder. These disorders likely follow complex inheritance with polygenic contributions in many cases, rather than the monogenic patterns that underly one-third of patients with CAS. Clinical genetic testing for should now be implemented for individuals with CAS, given its high diagnostic rate, which parallels many other neurodevelopmental disorders where this testing is already standard of care. The shared mechanisms implicated by gene discovery for CAS highlight potential new targets for future precision therapies.


INTRODUCTION
Speech acquisition is a biologically driven, inexorable developmental process in most infants.Yet up to 5% of children develop common speech disorders including stuttering, articulation and phonological impairments [Table 1].These common conditions are highly tractable and tend to resolve, with or without intervention, by 7 years of age [1,2].By contrast, 1 in 1000 children follow a severely disrupted developmental path to an intractable speech disorder known as childhood apraxia of speech (CAS) [3].In these individuals, early development is often marked by hypotonia, feeding difficulties, limited babbling, delayed onset of first words, and marked difficulty in acquiring speech which is unintelligible in the preschool years, when a diagnosis is usually made [4].The condition was first described by pioneering British speech therapist Muriel Morley in 1957 who identified a childhood speech presentation akin to the speech praxis seen in adults following lesions to Broca's area, with the crux of the diagnosis being difficulty accurately producing sound sequences [5].
Since the original description of CAS, there has been ongoing debate over the defining diagnostic features of the condition [6].
In 2007, the American Speech and Hearing Association supported an expert-based consensus which defined the three diagnostic features of CAS (Table 1) [7].Whilst the condition is largely framed as a 'motor' speech disorder resulting from movement planning or programming deficits, language and literacy impairments also occur in over 90% of individuals [8][9][10].Furthermore, neuroimaging points to perturbation of linguistic as well as motor pathways, in affected individuals [11].
Recently, the CAS phenotype has increasingly been associated with commonly occurring neurodevelopmental comorbidities, including motor and cognitive impairments, attention deficit hyperactivity disorder, seizures and autism spectrum disorders [8][9][10].Similar to the presentation of these neurodevelopmental disorders (NDDs), speech and language disorders rarely occur in isolation, and rather are found in a broader context of perturbed neurodevelopment.
Until recently, understanding of the aetiology of CAS was limited.Parents of children with CAS would embark on a diagnostic odyssey to investigate the chronic and striking nature of the condition.Early studies have implicated copy number variants (CNVs), including chromosomal aneuplodies involving multiple genes, and single nucleotide variants (SNVs) in individual genes, to CAS.
A specific neurogenetic basis for CAS was first identified in 2001, with the seminal discovery that pathogenic missense SNVs in FOXP2 [12], a transcriptional repressor, were associated with CAS, initially inherited in a large multiplex family, but subsequently also found to arise de novo in sporadic cases.Functionally related transcription factors and downstream targets of FOXP2 were subsequently investigated, namely CNTNAP2 (MIM: 604569), FOXP1 (MIM: 605515) and TBR1 (MIM:606053).Although these genes have been associated with intellectual disability syndromes and ASD, they have not explained cases ascertained for primary or isolated speech or language disorder [13][14][15].The next most promising candidate gene for CAS was GRIN2A, which is also associated with the epilepsy-aphasia syndromes, now termed developmental and/or epileptic encephalopathy with spike-wave activation in sleep [16], and including Llandau-Kleffner syndrome [17][18][19].Yet again, as for CNTNAP2, FOXP1 and TBR1, pathogenic variants in GRIN2A have not been identified in cohorts ascertained for a primary diagnosis of speech or language disorder in the absence of epilepsy.
Most recently, advances in massively parallel sequencing technologies and bioinformatic algorithms have allowed rapid identification of genes not previously implicated in speech dysfunction.Here we review the rapidly unfolding Mendelian genetic bases for CAS.Specifically, we have reviewed data on gene discovery cohorts applying exome or genome sequencing to cohorts ascertained for primary speech disorder CAS [Search strategy box below].

SEARCH STRATEGY AND SELECTION CRITERIA
We searched PubMed for articles published between Jan 1, 2001, and March 15, 2023, using the search terms "childhood apraxia of speech", "dyspraxia", "speech", "exome sequencing" and "genome sequencing".There were no language restrictions.We selected articles that had ascertained cohorts with CAS and applied next generation sequencing approaches and analysis to report novel genes associated with CAS.We also searched for articles describing the function and implications of pathogenic variants in the genes identified, for literature on other neurodevelopmental disorders associated with these genes.The final reference list was generated based on the relevance to the topics covered in this review.

DISCUSSION/ANALYSIS OF RECENT LITERATURE
Three CAS gene discovery cohort studies were identified, each relatively small given the rarity of the disorder, but growing in cohort size over time: n = 19 probands, Eising et al., 2019; n = 33 probands, Hildebrand et al., 2020; n = 70 probands, and Kaspi et al., 2023.In the first study, 8/19 (~42%) probands were found to have a pathogenic or likely pathogenic gene variant via genome sequencing [8].In the second study, 11/34 (~32%) probands had highly plausible pathogenic variants identified by a combination of exome and genome sequencing, and chromosomal microarray analysis [9].In the third study, 18/70 (~26%) probands had a high confidence pathogenic variant detected via genome sequencing or chromosomal microarray analysis [10].There was no apparent benefit at this stage for WGS over WES in identifying monogenic conditions associated with CAS.The overall clinical genetic diagnostic yield across the three cohorts was 30% (36/122 probands) (see Fig. 1a).
These studies provided the first neurobiological insights into the mechanisms of speech disorders, including the key finding that pathogenic variants are enriched in genes involved in transcriptional regulation and chromatin remodelling in the developing brain (Table 2).Importantly, these genes also showed significant clustering within a module of genes highly coexpressed in the human embryonic brain, in regions known to subserve speech function [8].Hence the speech disorders field now has the first evidence that CAS is a neurodevelopmental disorder due to dysregulation of genes expressed in white-matter tracts critical for development of speech [8,9,27].
Unlike FOXP2, which had no disease association prior to being linked to CAS, many newer genes associated with CAS were   already implicated in other NDDs such as intellectual disability (ID), ASD and epilepsy (see Table 2 for outline of known associated conditions).These findings [8][9][10] align with the well-established genetic overlap between other neurodevelopmental phenotypes [28] and indicate that CAS can be added to these overlapping profiles [see Fig. 1b].
In some ways, the association of CAS with genes known to cause ID, ASD and epilepsy is not surprising given that these neurodevelopmental phenotypes have long been associated with speech and language pathology; however, a primary speech phenotype of CAS had been considered separate from the larger group of NDDs.Now genetic findings show this distinction may not be valid.Furthermore, recent studies of individuals with FOXP2 variants show that they also experience broader, subtle, neurodevelopmental phenotypes beyond speech dysfunction [29].Thus, whilst FOXP2 remains the most 'speech specific' gene to be identified [29], there may not exist a "pure" speech apraxia gene.As such, speech and neurodevelopmental phenotypes should be considered as existing across a phenotypic spectrum rather than as categorical diagnoses, mirroring findings in genetic understanding of other diseases, such as epilepsy.
Finally, there is currently a strong bias in comparing next generation sequencing findings for ID, ASD and epilepsy, with tens of thousands of probands reported in the literature, compared to just over 120 probands with CAS.Thus, surprisingly, the published CAS cohort studies have shown a comparably high genetic diagnostic yield for individuals with these speech phenotypes, despite them arguably being milder relative to ID, ASD and epilepsy.This suggests clinical genetic testing is also warranted for children with CAS given that genome-wide testing is increasingly routine and often funded for children with other NDDs.Routine clinical genetic testing will be important as although many of the gene variants reported to date are de novo and predicted pathogenic according to ACMG guidelines, most have been found only in individual probands and identification of the same gene in unrelated patients with the same phenotype will confirm that gene's contribution.As outlined below, unrelated patients have been identified for three of the candidate genes for CAS across the small CAS-ascertained cohort studies alone.
Although genetic heterogeneity is a feature of gene discovery findings in CAS cohorts, pathogenic variants in a handful of genes, namely SETBP1, SETD1A and DDX3X, each account for multiple cases across the cohorts studied to date [8][9][10].SETBP1 stands out as being particularly intriguing, with pathogenic loss-of-function (LoF) variants detected in all three cohorts [8][9][10] With emerging evidence for CAS in SETBP1 haploinsufficiency disorder, a speech and language study of 28 individuals with SETBP1 LoF variants then confirmed the diagnosis of CAS, seen in 80% of individuals studied, as a core part of the phenotype [30].When comparing children's performance across developmental domains, it was also clear that communication was most impaired relative to social skills, daily living skills, motor abilities and adaptive functioning, supporting SETBP1 having a central role in speech and language development [30].Further, studies of common genetic variants suggest SETBP1 may also be important for communication abilities in the general population.Associations between single nucleotide polymorphisms (SNPs) in SETBP1 and scores on a test examining syntactic complexity were reported in a genome wide association study of language disorder in a geographically isolated Russian cohort aged 3-18 years [31].SNPs in SETBP1 have also been associated with phonological working memory in a readingimpaired cohort [32].
In addition to evidence for the strength of association between CAS and the candidate genes across the three CAS-ascertained gene discovery cohorts discussed here, Table 2 further outlines the strength of independent evidence currently found to support the candidate genes.At this time, nine of the candidate genes have a high level of independent supporting evidence (FOXP2, KAT6A, MKL2/MRTFB, SETBP1, CDK13, EBF3, MEIS2, RBFOX3, SHANK3), six have medium (CHD3, SETD1A, WDR5, DDX3X, ZNF142, BRPF1) and the remainder have low levels of independent evidence, but we expect expanded clinical genetic testing will reveal additional cases for many of the other candidate genes implicated [8][9][10].
Alternative genetic mechanisms for CAS If high impact de novo sequence variants and CNVs account for about one third of individuals with CAS, the question that follows is what genes or mechanisms account for the remaining unsolved cases.Whole genome sequencing has not been completed for all cases studied, meaning non-coding variants have not been routinely interrogated and may account for some undiagnosed cases.Mosaicism, increasingly implicated in neurodevelopmental diseases such as intellectual disability, epilepsy and autism [33], may be low level and brain-limited and may underpin CAS in some individuals where it may be limited to key networks; however, detection would require sequencing of brain tissue, which is generally inaccessible.
From existing data, there is evidence that the cohort of CAS individuals with an identified pathogenic de novo gene variant is enriched for individuals with cognitive impairment and co-morbid language and motor diagnoses, compared to those without a genetic diagnosis (Fig. 1c).These data suggest that different genetic mechanisms may apply to those cases with CAS currently without a specific single gene diagnosis.
It is likely that inherited variants will account for a sizeable portion of CAS, but elucidation of these variants will require large cohorts coupled with deep phenotyping of family members.Interestingly, many families report a family history of speech difficulties, which might be explained by inherited variants that exhibit variable expressivity and phenotypic heterogeneity due to variability in the genetic background, similar to multi-hit models in other neurodevelopmental disorders [34].
The fact that many children with CAS exhibit comorbidities with ASD and ADHD suggests an additional genetic overlap with these conditions, which are mostly attributed to polygenic aetiology, as is the case for ASD [35].A limitation in this nascent field of speech genetics is the lack of available population-based cohorts with both high quality genetic and phenotyping data.There is a concerted effort by the address this issue via the GenLang consortia (https:// www.genlang.org);yet, to date, the cohorts in GenLang typically include language and literacy data, with recent fruitful GWAS publications identifying loci associated with language and literacy traits [36][37][38], but do not include speech-specific data.
One remaining challenge for the field is the lack of clinical variables which robustly predict who will have a monogenic cause or polygenic contributions (accepting that for monogenic diseases there may be modifier genetic contributions).From existing data, there is some evidence that individuals with any degree of cognitive impairment, or those more likely to have co-morbid language and motor diagnoses, have a greater likelihood of monogenic disease (Fig. 1c).However, it is clear that larger cohorts of individuals with CAS are required to provide adequate power to generate accurate genetic diagnostic prediction models to confirm these findings.

CONCLUSIONS
After almost two decades with only one established gene for CAS, over 30 new genes of relevance have been identified in the past three years.Critically, about one third of children sequenced have received a molecular genetic diagnosis for their CAS, supporting implementation in clinical testing alongside other neurodevelopmental disorders.Around half of the candidate genes identified are currently supported by medium to strong evidence supporting the association between the gene and CAS.
Almost all genes identified were previously described to cause neurodevelopmental conditions including ID, ASD and epilepsy.Hence the phenotypic spectrum of these conditions has been expanded to include individuals with a distinctly milder presentation of primary speech disorder.Whilst there is genetic heterogeneity, the genes coalesce on a small number of biological pathways, largely involved in chromatin remodelling or transcriptional regulation, providing new targets for precision medicines.Although genetic diagnoses in CAS to date have largely been de novo high impact variants in neurodevelopmental genes, the genetic architecture of CAS is likely to also encompass polygenic inheritance of common variants and rare inherited variants with incomplete penetrance and variable expressivity.

Fig. 1
Fig. 1 Genetic causes for childhood apraxia of speech over time.a Genetic causes identified for CAS over time.WGS, whole genome sequencing; WES, whole exome sequencing.More patients have undergone WGS to date.The yield to date has been similar for WGS and WES.No variants have been reported in non-coding regions to date.b CAS candidate genes and co-occurring neurodevelopmental phenotypes.An association between a gene and a phenotype was denoted if the association was higher than the prevalence in the general population.c Overlap in phenotype in children with CAS, comparing those with a pathogenic variant and those with no known cause.Phenotypic features of CAS cohorts with (n = 29) and without (n = 74) pathogenic genetic variants, based on data from Hildebrand et al. (2020) and Kaspi et al. (2023).Authors were approached for any updated diagnoses for study participants.Confirmation of diagnoses for cognitive impairment, ASD, receptive and expressive language impairment, and gross and fine motor impairment was based on psychometric testing/clinical report.Seizure diagnosis is based on parent report.

Table 1
key: focuses on neurodevelopmental forms of speech disorder, not structural (eg.cleft lip or palate, malocclusion of mandible and maxilla; or acquired (eg.brain tumour, stroke, traumatic brain injury).
*Some children have phonological delay as opposed to disorder.This is a delay, in understanding/use of speech sounds of one's language to convey meaning.A child persists in the use of developmental error patterns as seen in the phonology of younger children, eg. a 6 year old using the phonological process of stopping fricatives, substituting a 'b' for 'f' (bish for fish), which should have resolved at age 4 years.Vowels and prosody are unaffected.

Table 2 .
Genes causally linked to CAS and other neurodevelopmental phenotypes between 2001 and 2023.