De novo variants underlying monogenic syndromes with intellectual disability in a neurodevelopmental cohort from India

The contribution of de novo variants as a cause of intellectual disability (ID) is well established in several cohorts reported from the developed world. However, the genetic landscape as well as the appropriate testing strategies for identification of de novo variants of these disorders remain largely unknown in low-and middle-income countries like India. In this study, we delineate the clinical and genotypic spectrum of 54 families (55 individuals) with syndromic ID harboring rare de novo variants. We also emphasize on the effectiveness of singleton exome sequencing as a valuable tool for diagnosing these disorders in resource limited settings. Overall, 46 distinct disorders were identified encompassing 46 genes with 51 single-nucleotide variants and/or indels and two copy-number variants. Pathogenic variants were identified in CREBBP, TSC2, KMT2D, MECP2, IDS, NIPBL, NSD1, RIT1, SOX10, BRWD3, FOXG1, BCL11A, KDM6B, KDM5C, SETD5, QRICH1, DCX, SMARCD1, ASXL1, ASXL3, AKT3, FBN2, TCF12, WASF1, BRAF, SMARCA4, SMARCA2, TUBG1, KMT2A, CTNNB1, DLG4, MEIS2, GATAD2B, FBXW7, ANKRD11, ARID1B, DYNC1H1, HIVEP2, NEXMIF, ZBTB18, SETD1B, DYRK1A, SRCAP, CASK, L1CAM, and KRAS. Twenty-four of these monogenic disorders have not been previously reported in the Indian population. Notably, 39 out of 53 (74%) disease-causing variants are novel. These variants were identified in the genes mainly encoding transcriptional and chromatin regulators, serine threonine kinases, lysosomal enzymes, molecular motors, synaptic proteins, neuronal migration machinery, adhesion molecules, structural proteins and signaling molecules.


Introduction
Intellectual disability (ID) is defined as a defect in cognitive functioning and adaptive behavior that originates before the age of 18 years and has a worldwide prevalence of ~2−3% [1].The etiology of ID includes both acquired as well as genetic causes [2,3].The genetic etiology of disorders of ID is highly heterogeneous, encompassing a wide spectrum of genetic variations, including structural variants (SVs), copy number variants (CNVs), small insertions/deletions, and single-nucleotide variants (SNVs), identified across more than 1000 genes [4][5][6][7].Monogenic disorders contribute to 30−50% of cases of ID, around 20% are due to disease-causing large or small CNVs and the cause of ~50% cases remains unknown till date [4,5,8,9].
Disorders which have a component of ID and are associated with other systemic or behavioral abnormalities are referred to as syndromic ID (sID).These disorders follow all inheritance patterns and show extreme genetic heterogeneity.With the availability of rapidly advancing next generation sequencing (NGS) based platforms, mainly exome and genome sequencing (ES/GS) has led to the rapid and efficient diagnosis as well as discovery of several ID syndromes in the last two decades [10].
De novo variants are now a well-recognized cause of severe early-onset genetic diseases, including sIDs.Though the genetic basis of inherited sIDs could be identified through studies involving large family pedigrees, the sporadic ones remained largely unidentified until recently.NGS approaches, particularly proband-parents trio ES/GS is an effective method of understanding the distribution of variations and determining all types of de novo events throughout the genome, from SNVs, indels, CNVs to large SVs in addition to determining the parental origin and whether they occurred in the germline or post zygotically [4,8,9,11].
The challenges faced by the low-and middle-income countries (LMIC) in terms of rare disease diagnosis using the evolving genomic technologies include the lack of coherent national policies, limited trained professionals, lagging research infrastructure, and lastly economic and cultural challenges [12,13].The extent of the burden posed by de novo variants associated with ID syndromes remains incompletely understood within several LMICs, including India.With the aim to address this challenge, we herein represent the clinical and genetic spectrum of 54 families with de novo variants underlying sID.We also highlight the utility of proband-only ES followed by segregation analysis as a first-tier testing in identification of de novo variants in resource-limited settings.

Material And Methods
We evaluated and recruited 530 families with heterogeneous neurodevelopmental disorders (NDDs) in an ongoing mono centric study from October 2019 to December 2022.The clinical characteristics of the affected individuals were recorded through detailed clinical examination using human phenotype ontology (HPO terms).Informed consents for genetic testing, publication of data and clinical photographs were obtained from the families.The Europe PMC Funders Author Manuscripts Europe PMC Funders Author Manuscripts informed consents were approved by the Institutional ethics committee, Kasturba Medical College and Kasturba Hospital, India as per the declaration of Helsinki.
Genomic DNA was extracted from the peripheral blood sample of the proband, parents and siblings (as required) using the QIAamp DNA Blood Mini Kit (QIAGEN, Valencia, CA; cat # 51106).The testing strategy included either an exome first or a sequential testing approach in which a targeted test or chromosomal microarray (CMA) was followed by ES for the affected individuals based on the clinical phenotype.The NGS data processing, quality assessment, variant calling, annotation and analysis was performed as described earlier [14].The Sanger validation and segregation analysis was carried out in all families with a singleton ES or Mendeliome while Sanger validation was carried out in families who achieved a diagnosis using a trio ES.CNV analysis from exome data was performed for individuals with no clinically relevant SNVs/indels detected on ES.The detailed description of Mendeliome, CNV and ES analysis is provided in the supplementary material.

Results
Of the 530 families recruited, 211 affected individuals from 196 families presented with a syndromic presentation characterized by major or minor morphologic anomalies and neurologic, cognitive, behavioral or sensory impairments.A molecular diagnosis could be achieved in 104 of the total 196 families (53%).Of these, 59 families (57%) carried de novo variants, six families (4%) had inherited variants underlying an autosomal dominant or an X-linked disorder, 16 families (16%) had biallelic variants underlying an autosomal recessive disorder, and 23 families (23%) were diagnosed with a chromosomal aberration.Consanguinity was noted in 25 families (24%).Within the 25 consanguineous families, 14 families (56%) had biallelic variants underlying an autosomal recessive disorder, two families (8%) had inherited variants underlying X-linked recessive disorders, six (24%) carried de novo variants causing autosomal dominant and X-linked dominant disorders, and three families (12%) had CNVs.
Of the 59 families with de novo variants, five families (five affected individuals) with novel disease-gene association, phenotypic expansion, and multiple genetic diagnoses have been published earlier [15][16][17].The present cohort consists of 55 individuals from 54 families diagnosed using targeted Sanger sequencing, Mendeliome, singleton ES, and trio ES (Fig. 1A and Table 1).Thirty-one affected individuals were males (57%) and 24 were females (43%).Consanguinity was noted in six families (11%).The age ranged from newborn to 14 years.The clinical findings noted in 55 diagnosed individuals, in addition to ID, included global developmental delay, dysmorphism, malformations, seizures, autism spectrum disorder, hypotonia, and sensory dysfunction (Fig. 1C).A total of 53 diseasecausing de novo variants underlying 46 distinct ID syndromes were identified in the current cohort (Table 1).Of which, 51 were SNVs and/or indels and two were CNVs (Fig. 1B).Notably, 39 (74%) of them were found to be novel.These SNVs and indels were classified according to the American College of Medical Genetics and Genomics (ACMG) and the Association for Molecular Pathology (AMP) standards and guidelines for interpreting sequence variants [18].Thirty variants were classified as pathogenic (58%) and 21 as likely pathogenic (40%).The two CNVs identified were classified as pathogenic according to the Europe PMC Funders Author Manuscripts Europe PMC Funders Author Manuscripts American College of Medical Genetics and Genomics and ClinGen standards and guidelines for CNVs (Table 1) [19].The additional details pertaining to the genetic testing performed, disease-causing variants, and ClinVar submission IDs are provided in Supplementary Table S1.

Discussion
Several cohorts of individuals harboring disease-causing de novo variants and underlying sIDs have been reported in the last two decades.However, most of these cohorts have originated from Caucasian outbred populations of the developed world [20][21][22].The spectrum as well as the burden of the variants contributing to sID in LMICs remains largely uninvestigated.The current study elucidates the clinical and genotypic spectrum of 54 families with de novo variants underlying sID in an Indian cohort of neurodevelopmental disorders.Though several sIDs are clinically recognizable and less challenging to diagnose than isolated ID, they are often associated with large variability in the phenotypes especially among different populations.This variability could be because of a difference in genetic background, environmental factors or a combination of both.In addition to ID, global developmental delay, dysmorphism, malformations, seizures, autism spectrum disorder, hypotonia and sensory dysfunction defects were the other more commonly observed comorbidities.Of the 46 disorders observed in this cohort, 24 disorders are being reported through our cohort for the first time in the Indian population, to the best of our knowledge.
Genetics of monogenic sID is extremely heterogeneous and follows all inheritance patterns.Rare de novo variants are known to contribute to causative variants in 40-70% individuals with ID [9,21,22].Despite the recently published large cohort studies, the precise burden of de novo variants remains largely unknown.In the present study, we noted that 57% (54 families) of the 104 molecularly diagnosed families with sID carried de novo variants for autosomal dominant or X-linked disorders.Thirty-nine of the 53 (74%) disease-causing variants identified were novel, thus expanding the list of the disease-causing variants in sID causative genes.All these disease-causing variants were submitted to ClinVar to make them available to the medical genetics' community worldwide.
Until recently, CMA was recommended as a first-tier test for investigating undiagnosed disorders of ID and congenital anomalies with diagnostic yields ranging from 16 to 28% [23,24].However, currently it is recommended that ES/GS which can identify the genetic etiology in 28-68% individuals be strongly considered as a first-or second-tier test [25,26].Previous studies have mainly utilized trio ES as a first-tier test to identify de novo variants in sIDs, resulting in a high diagnostic yield of 50-70% [9,21,22,27].In the current study, it was observed that 72% of the de novo variants could be identified using a singleton ES/Mendeliome.This can be attributed to additional phenotypic clues in sID aiding the interpretation of the singleton ES data and reaffirms the findings previously observed in Indian studies highlighting the effectiveness of deep phenotyping and singleton ES in diagnosis of clinically heterogeneous NDDs [13,28,29].However, in families with isolated ID, variant prioritization often warrants familial testing through trio exome sequencing and/or testing of additional affected and unaffected family members.Moreover, the overall diagnostic yield of identifying monogenic de novo variants has increased from 55% (SNVs/ Europe PMC Funders Author Manuscripts Europe PMC Funders Author Manuscripts indels) to 57% using a combinatorial approach of detecting SNVs/indels and CNVs from the exome data.These results are in line with previous studies highlighting the significance of incorporating exome based CNV analysis algorithms to increase the diagnostic yield of NDDs [30,31].
Previous studies on syndromic/nonsyndromic ID cohort of de novo variations showed that genes encoding transcriptional and chromatin regulators were the most commonly mutated genes as compared to other neuronal regulators (synaptic maintenance and signaling) or fundamental cellular processes (translation, cell cycle control and energy metabolism) regulating genes [27,32,33].In this study, we classified the genes carrying the diseasecausing de novo variants in the current cohort based on its function, and our results were similar to those observed by Taskiran et al (2021).We observed that the transcriptional and chromatin regulators represented the largest class of ID-associated genes, followed by signaling molecules, serine threonine kinases, neuronal migration machinery, structural proteins, molecular motors, synaptic proteins, adhesion molecules, enzymes, microtubule formation, and cell proliferation (Supplementary Fig. 1).
It is known that approximately 80% of all de novo germline single nucleotide variants arise on the paternal allele [11,34].Thus, advanced paternal age at conception has been established as the major factor linked to the increase in the number of de novo variants, a subset of which might underlie developmental disorders [8,11,35].However, we noted that in our cohort, the median paternal age at the time of conception was 39 years in families with autosomal recessive disorders, 29 years for inherited autosomal dominant/X-linked disorders, and 35 years in families with de novo variants.
Consanguinity and inbreeding are widely prevalent in specific communities and geographic regions of India [14,36].Though, the high rate of parental consanguinity and inbreeding is expected to precipitate rare autosomal recessive disorders including those causative of sIDs, we observed de novo variants in 24% of our consanguineous families with sID [37].Previously Kahrizi et al. and Mercan et al. have reported de novo variants in approximately 17% and 28% of the individuals with ID born to consanguineous parents, respectively, thus highlighting the significant occurrence of de novo events within highly inbred populations [38,39].
There are few limitations of our study.Though we report a high rate of identifying de novo disease-causing variants underlying sIDs using a singleton ES in the known disease-causing genes, only one proband with new disease gene association could be ascertained [15].This could be explained by an inability to perform trio ES or GS in the undiagnosed families due to resource limitations.Also, further investigation may be needed for individuals with undiagnosed phenotypes, such as exploring variants beyond the exonic regions, somatic alterations, digenic as well as oligogenic etiologies.
The knowledge of rare genetic disorders, their diagnoses using the rapidly emerging genomic testing techniques, genetic counseling, prenatal testing, early intervention, and management in LMICs is improving owing to the availability of better infrastructure, cost effective genomic testing, manpower and trained professionals [40,41].We herein Europe PMC Funders Author Manuscripts Europe PMC Funders Author Manuscripts consolidate the phenotypic and genotypic spectrum of de novo variants underlying monogenic sIDs highlighting the utility of singleton ES as an excellent diagnostic tool for the diagnosis of heterogeneous sIDs in LMIC like India.Our study reiterates de novo variants are likely to contribute significantly as the most frequent cause of sIDs even in populations with consanguinity and endogamy.

Fig. 1 .
Fig. 1.Cohort details depicting.A Types of genetic testing employed B Disease-causing de novo variants identified C Spectrum of the clinical findings observed in individuals with sID.GDD Global developmental delay, ASD Autism spectrum disorder.