First comprehensive TSC1/TSC2 mutational analysis in Mexican patients with Tuberous Sclerosis Complex reveals numerous novel pathogenic variants

The aim of this study was to improve knowledge of the mutational spectrum causing tuberous sclerosis complex (TSC) in a sample of Mexican patients, given the limited information available regarding this disease in Mexico and Latin America. Four different molecular techniques were implemented to identify from single nucleotide variants to large rearrangements in the TSC1 and TSC2 genes of 66 unrelated Mexican-descent patients that clinically fulfilled the criteria for a definitive TSC diagnosis. The mutation detection rate was 94%, TSC2 pathogenic variants (PV) prevailed over TSC1 PV (77% vs. 23%) and a recurrent mutation site (hotspot) was observed in TSC1 exon 15. Interestingly, 40% of the identified mutations had not been previously reported. The wide range of novels PV made it difficult to establish any genotype-phenotype correlation, but most of the PV conditioned neurological involvement (intellectual disability and epilepsy). Our 3D protein modeling of two variants classified as likely pathogenic demonstrated that they could alter the structure and function of the hamartin (TSC1) or tuberin (TSC2) proteins. Molecular analyses of parents and first-degree affected family members of the index cases enabled us to distinguish familial (18%) from sporadic (82%) cases and to identify one case of apparent gonadal mosaicism.

Mutational analyses of TSC1 and TSC2. SSCP and SS. Genomic DNA samples derived from all 66 cases were initially subjected to mutational analyses of TSC1 and TSC2 by a SSCP assay followed by SS confirmation in 61 cases and direct SS in the remaining five cases (Fig. 1). Both assays included all coding and non-coding (20 bp at the exon-intron boundaries) regions of the TSC1 (NM_000368.4) and TSC2 (NM_000548.3) genes. These analyses identified a clear disease-causing PV in 40/61 cases studied by SSCP/SS and in two of five cases studied by direct SS (Table 1). Three other variants were classified as likely pathogenic variants (LPV) according to guidelines of the American College of Medical Genetics and Genomics and the Association for Molecular Pathology (ACMG/AMP) 20 . The two LPV in TSC1 were c.737+3A>G and p.(Leu112_Leu113delinsLysGluVal) from cases ET75 and ET201, respectively, and the single in-frame LPV in TSC2: p.(His1746_Arg1751dup) from case ET171. From these three cases, solely in case ET75, analysis of the proband's paternity and maternity (criterion PS2) 20 using 15 short tandem repeat markers (13 of them belong to the CODIS system) could be performed and confirmed parentage, but this ACMG/AMP criterion was not enough to re-classify the LPV as pathogenic (Table 2). Therefore, our analysis identified a PV or LPV in 42/61 cases studied by SSCP/SS and in three of five cases studied by direct SS (Fig. 1).
Next-generation sequencing. Finally, an NGS study examining TSC1 and TSC2 coding exons and intron-exon boundaries (150 bp) was carried out in the remaining 15 cases (Fig. 1). The median depth of coverage was 639×(range 86×-1940×) with a 99.9% width of coverage. A customized bioinformatic analysis enabled us to identify a PV in 10 cases; we also found one case (ET243) with a missense variant p.(Trp1060Ser) in TSC2 that was classified as an LPV 20 and one case (ET81) with an intronic variant c.3815-21G>A in TSC1 that was classified [PP3, PP4] 20 as a variant of uncertain significance or VUS (Tables 1,2). All NGS-identified variants were confirmed by SS in the index cases and their available parents. As the missense TSC2 p.(Trp1060Ser) LPV from case ET243 was not reported in the main genotype databases and we did not find it in 212 alleles of healthy and ethnically matched individuals assessed by a specific-allele PCR assay (data not shown), we were able to re-classify it as a pathogenic variant (IIIa) [PM2, PS2, PS4, PP3, PP4] 20 . In the remaining three cases (ET44, ET61 and ET223), no mutation was identified (NMI; lacking any LPV, VUS or pathogenic genotype) by the implemented molecular technologies (Fig. 1).
To summarize, we were able to identify a PV or LPV in 62 cases and we could not identify a PV or LPV in four cases, although one of them (case ET81) was found to harbor a VUS in TSC2 (c.3815-21G>A) (Figs. 2a and 3). Of the identified changes, 56 (90%) corresponded to small variants (SV) such as point mutations, deletions, small insertions/deletions (InDels) and duplications, while six were large deletions (10%). Far more of the identified changes were found in TSC2 (N = 48) than TSC1 (N = 14, Fig. 2b). The mutational proportions for TSC1 and TSC2 are shown in Fig. 2d,e. Eight intronic variants were identified at both genes; five affected canonical splice sites (in TSC2) and three affected intronic splicing enhancers sequences (TSC1: c.737+3A>G; TSC2: c.481+5G>T and c.5160+5G>T).
Based on our review of the literature and public databases, including the Leiden Open Variation Database (LOVD, www.lovd.nl/), dbSNP (https://www.ncbi.nlm.nih.gov/projects/SNP), Exome Aggregation Consortium (http://exac.broadinstitute.org), Genome Aggregation Database (gnomad.broadinstitute.org/), ClinVar (https:// www.ncbi.nlm.nih.gov/clinvar/), and Human Genome Mutation Database (http://www.hgmd.cf.ac.uk/), we determined that 25 of the 62 (40%) PV or LPV identified herein (six in TSC1 and 19 in TSC2) had not been previously reported. All of them have been submitted to LOVD (Table 1). Of these 25 novel variants, 24 were considered pathogenic and one was an LPV 20 Direct molecular screening in parents (when available) of the 62 cases with a PV/LPV showed that the pathogenic allele was absent from both parents for 33 patients (de novo cases). However, in 12 cases with one or more clinically affected family members, we confirmed the same PV in the available affected cases (familial cases, see Table 1). We suspect gonadal mosaicism in familial case ET28 as we identified a novel heterozygous PV in TSC2: c.3624G>A or p.(Trp1208*) in two affected siblings but failed to find this allele in peripheral blood leukocyte DNA of both clinical healthy parents. This argument was further strengthened when we confirmed the proband's paternity and maternity by DNA profiling (data not shown). In 15 cases, we could not analyze the father's DNA but there was no reported family history of TSC, so we designated these as suspected de novo cases. In the remaining two cases, the mother's and father's DNA samples were not available for testing (Table 1   www.nature.com/scientificreports www.nature.com/scientificreports/ The modeled hamartin WT and MUT 3D structures showed that the amino acid residues surrounding the insertion/deletion region have a hydrophobic character in the WT protein, and the insertion of Lys112Glu113Val114 (two of which are ionizable) could alter the stability of this hydrophobic region. Previous work showed that the incorporation of negatively charged residues in proteins with hydrophobic clusters can provoke a significant structural alteration, and that such residues are therefore usually excluded from hydrophobic pockets 21 . Our modeling of tuberin revealed that the six duplicated amino acid residues (HisIleLysArgLeuArg at positions 1752-1757) drastically altered the secondary structure of the C-terminal end region of the MUT protein compared to the WT protein (Fig. 4f). The mutated region was found to lie in close contact with the GAP domain, suggesting that the inserted amino acids could significantly alter the GAP domain contacts. Notably, the inserted amino acids are located close to Arg1743 in the primary sequence, and a previous report showed that the Pro1743 mutation can abolish the GAP activity of tuberin 22 . Hence, this region seems to be critical for the correct function of tuberin.
clinical manifestations in patients with novel genetic variants. We identified novel PV in 24 TSC cases and had detailed TSC clinical information for 22 of them (see Supplementary Table S1). We were not able to identify a clear phenotype-genotype correlation since each variant was unique. However, if we exclude the single neonatal patient ET200, most of the cases showed neurological involvement (N = 21/21), including intellectual disability/developmental delay (N = 20/21), epilepsy (N = 21/21) and/or behavioral abnormalities (N = 8/21); meanwhile, only one case (familial, ET173; having a PV in TSC1) presented epilepsy without intellectual disability (Supplementary Table S1).
The presence of cardiac rhabdomyoma was observed in eight of the 22 above-described patients (36%), one with a PV in TSC1 and the remaining seven with alterations in TSC2. In four of those cases, the rhabdomyoma presented complete regression (ET107, ET238, ET159, ET87), while the remaining four cases did not require medical or surgical management. In a single case (ET200), the rhabdomyoma was detected prenatally. Renal angiomyolipomas were identified by ultrasound in five cases (5/22; 23%), only one of which harbored a PV in TSC1. Case ET171 (harboring an LPV) was the only patient in our series that died during the study period; this occurred due to bronchopneumonia at 1 year 9 months of age. Variable expressivity could be corroborated in five out of six familial cases that had detailed TSC clinical information available and harbored a previously unreported PV (Supplementary Table S1). In the putative gonadal mosaicism case (ET28), the index case displayed a mild intellectual disability and epilepsy, while the brother reportedly exhibited psychotic episodes with moderate intellectual disability. In three cases with a novel PV, the parents showed multiple dental pits (mothers of ET117 and ET122, and father of ET243) or hypopigmented macules and learning disability (mother of ET201, who had an LPV). However, SS analysis led us to exclude minimal expression of the TSC phenotype.

Discussion
The clinical characterization of early-stage TSC has proven challenging due to the variable expressivity of the disease and the absence of any clear genotype-phenotype correlation. Most of the cases examined herein were diagnosed before 10 years of age (N = 51/66; 77%); this was similar to a previous study with a larger sample size (N = 197/243; 81%) 23 performed at two different Hospitals in Boston, and there was no statistically significant difference in the age of diagnosis between the two studies (P = 0.48, Fisher's exact test, 2-tailed). However, as only four of our TSC cases were diagnosed in the first 6 months of life, it could be useful for clinicians in Mexico to monitor specific clinical signs that have recently been reported to be useful for an earlier TSC diagnosis (before 6 months) 1,24 .
The emergence and routine implementation of new molecular techniques, such as MLPA and NGS, have revolutionized TSC diagnosis and increased the mutation detection rate to ~80-96% 12,14,15,[25][26][27][28][29] . In this study, a PV or LPV was identified in 62 (94%) of the 66 included cases that fulfilled definitive TSC diagnosis criteria. Most of the 62 PV/LPV were present in TSC2 (77% compared to 23% in TSC1) and there was a greater proportion of SV (90%) compared to CNV (10%). These data agree with the findings of multiple previous studies in other populations, which showed that the causative mutation rate was 77-85% in TSC2 vs. 15-23% in TSC1, and that the mutations were 87-94% SV compared to only 6-13% large deletions 12,14,25,26 .
In terms of the genetic distributions of SV and CNV found in this study, four of the 13 SV found in TSC1 (N = 4/13; 31%) were located in exon 15, which agrees with that reported in LOVD and various other publications (9.5-34%) 5,8,10,14,25 . This apparent accumulation of variants could be because exon 15 is the largest coding exon (559 bp) in TSC1. Four other SV were identified in exons 17 and 18, which form part of the coiled-coiled domain; together, these three exons (15, 17 and 18) presented the highest mutation frequency in TSC1 (62%). Similarly, Hung et al. 31 found that up to 89% of the identified PV localized to this region in Taiwanese TSC families. In the tuberin-binding domain (exons 10-13), in contrast, no PV was identified in our patients. There is debate as to whether this domain is a mutation region: some studies showed it to be a low-frequency mutation site 25,31-35 , while others found the opposite 5,14,15,36 . In TSC2, the GTPase-activating protein (GAP) binding domain (exons 35-39) contained 10 out of the 43 total SV (23%) identified herein, and the remaining were distributed throughout the gene.
In this study, recurrent PV were observed in both TSC1 and TSC2. In TSC1, the frameshift p.(Lys630Glnfs*22) PV, which was previously reported as one of the most common mutations in that gene 10 , was seen in two of the 13 SV (15%) identified at this locus. In TSC2, the missense variants, p.(Ala614Asp) and p.(Pro1675Leu), were identified in two cases each (N = 2/43; 4.7%), while an in-frame microdeletion p.(His1746_Arg1751del) was seen in three cases (N = 3/43; 7%). The latter is the most frequently reported TSC2 variant in the literature (N = 5/182, 2.8%; N = 4/98, 4.1%; N = 9/158, 5.7%) 25,36,37 and could therefore be considered a potential hotspot. In this context, it is notable that we observed a novel microduplication affecting the same nucleotides and amino acids p.(His1746_Arg1751dup) in patient ET171. Our detailed analysis revealed that the microdeletion involved the CCG motif located three nucleotides upstream of the 5' breakpoint and the microduplication involved the GTA motif located four nucleotides downstream of the 3' breakpoint. These motifs are thought to favor replication slippage and are overrepresented in the close vicinity of microdeletions and/or microduplications 38 . Also, the ACTTAC motif located downstream of the 3' breakpoint near the donor splice site, may promote secondary structure formation at the DNA level, increasing the potential for microdeletions and microinsertions 38 . Therefore, this region is prone to microdeletions (113 reported patients in LOVD: TSC2_00149) and microduplication (one case reported herein) due to its particular DNA architecture and could be considered a TSC2 hotspot.
Regarding TSC inheritance, it is more often found sporadic cases (~85%) than familial ones 25 . We describe a similar proportion herein: 54 cases (82%) lacked any family history of TSC and 12 cases were familial (18%). When we examine only the de novo cases, which were defined as patients for whom the molecular study discarded the presence of a PV in either parent (N = 34), there were approximately four times more PV in TSC2 than in TSC1 (28 vs. 6 cases, respectively). In contrast, the familial cases showed similar proportions of PV in TSC1 versus TSC2 (5 vs. 7 cases; no significant difference, P = 0.1235 by Fisher's exact test). These findings are comparable to previous reports that 67-85% of TSC cases were found to be caused by de novo germline mutations, mostly located in TSC2 14,15,25,39,40 (two to ten times more often than in TSC1 14 ). The familial cases showed no difference in the mutation frequency between TSC1 and TSC2 14,25,36 , but Dabora et al. 25 pointed out that the reported frequencies of TSC1 and TSC2 mutations in familial cases could be biased by the small number of families studied. Germline mosaicism was suggested in one of the familial cases (ET28) and even though germline mosaicism is rarely seen in TSC (6%) and we found a somewhat lower rate (1/66 cases; 1.5%), a conservative 2-3% recurrence risk should be advised for apparently sporadic TSC families 41 Our search of the literature and public databases for previous reports of the 62 mutations found in TSC1/TSC2 allowed us to determine that 25 (40%) of the PV/LPV found in the present work were novel, which was a higher proportion than those found in previous studies (38%, 29%, 22%) using Greek and Malaysian populations 15,26,32 . This is expected since this disease presents high allelic and locus heterogeneity, and emphasizes the importance www.nature.com/scientificreports www.nature.com/scientificreports/ of implementing multiple and diverse molecular techniques to evaluate coding and non-coding regions in both genes, and to discriminate SV from CNV. Our results are similar to those of Yu et al. 42 , who found a high percentage (54%) of new TSC variants but included a very limited number of cases (N = 11).
The molecular algorithms for detecting mutations in TSC1 and TSC2 by combining direct SS, NGS and MLPA techniques have been shown to achieve a very high mutation detection power 15,26,32 . Here, although we used a combined molecular methodology, there were three cases (ET44, ET61 and ET223) that fulfilled the criteria for a definitive TSC diagnosis but in whom no mutation was identified (NMI; 4.5%). Our NMI cases could have mutations in regions not covered by the SS and NGS techniques (promoters, regulatory regions and deep intronic mutations affecting splicing and branch point sites), mosaicism at a very low allelic frequency that could not be detected by the implemented bioinformatic algorithm and/or epigenetic modifications leading to transcriptional silencing 11 .
The protein modeling of the two missense LPV (cases ET201 and ET171) showed that these changes could induce potential structural alterations in important functional regions of the hamartin and tuberin proteins. In hamartin, the insertion of Lys112Glu113Val114 occurred at a potentially hydrophobic region. The residues were predicted to be buried in relatively rare hydrophobic cavities and would not be compatible with the hydrophobic interior of proteins 43,44 , and consequently would be likely to alter the structure and function of the protein. In tuberin, the introduction of HisIleLysArgLeuArg at C-terminal positions 1752-1757 appears very likely to alter the GAP domain. This region is important, since it regulates the GTP-binding domain and hydrolyzes Ras superfamily proteins that contribute to regulating cell growth regulation, proliferation and differentiation 5 . Moreover, it has been demonstrated that the C-terminal region of tuberin contains various important zones, including amino acids that are relevant for calmodulin binding (amino acids 1740-1757), a region that overlaps with estrogen receptor-α (amino acids 734-1807) and a nuclear localization signal (amino acids 1743-1755). All these regions are close to the amino acids that are inserted in our case, and their functions could potentially be affected.
Here, the intronic c.3815-21G>A variant was classified as a VUS. It was previously reported in human subject databases [e.g., dbSNP (rs778201014) and ExAC] at very low allelic frequencies (total AF = 0.0001279, Latino AF = 0.0009600) and with no homozygotes. At present, the actual effect of this variant is unknown. Caminsky et al. 45 pointed out that the acceptor site (3′) of human consensus splice site sequences extends 26 nucleotides upstream from the exon boundary. The VUS identified herein is at the −21 position, prompting us to hypothesize that this genetic variant could have a deleterious impact on spliceosome recognition. Further functional studies are needed to corroborate the role of this VUS and the two missense LPV described above.
To date, it has proven difficult to establish any genotype-phenotype correlation in TSC syndrome. Some authors have proposed that TSC2 mutations are associated with a more severe phenotype (early age of seizure onset, lower cognition index and the presence of subependymal nodules, SEGA, cardiac rhabdomyomas and/or  Table 2. Molecular information and in silico evaluation of the three LPV and one VUS. Symbols: 1 value < −2.5 is deleterious; 2 value close to 1 indicates a high 'security' of the prediction; a protein modeling was performed in these variants; † classified according to ACMG/AMP criteria 20 . Scientific RepoRtS | (2020) 10:6589 | https://doi.org/10.1038/s41598-020-62759-5 www.nature.com/scientificreports www.nature.com/scientificreports/ renal angiomyolipomas) [14][15][16] . However, in other studies, the occurrence of tubers, seizures (P = 0.595) and (sub) cortical tubers (P = 0.299) did not differ between cases with a TSC1 or TSC2 mutation 14,16 . We were unable to determine a genotype-phenotype correlation from our cases that harbored novel PV, as all these genetic variants occurred only in one family. Most of these patients showed seizures and intellectual disability (N = 20/21; 95%) regardless of whether they harbored a PV in TSC1 or TSC2; however, this feature could be biased because the study population was drawn from a tertiary referral hospital, where most of the cases show a severe condition. We found that cardiac rhabdomyomas and renal angiomyolipomas were more common in patients with a PV in TSC2 than in TSC1 (7:1 and 4:1, respectively); in this, our results are similar to those of other published studies 26,34,42 . Even though cardiac rhabdomyomas are the most common prenatal cardiac tumor related to TSC (50-86% of cases), the absence of other manifestations at this age makes it difficult to establish a definitive diagnosis 46 . Of the cases studied herein, only one case [ET200, with a novel PV in TSC2 (c.1258-2A>G)] had prenatal detection of rhabdomyoma; however, the presence of hypomelanotic macules at neonatal age allowed for a definitive diagnosis of TSC. None of our patients presented any cardiac complication, which is consistent with the report that most of the rhabdomyomas in TSC ( > 60%) are asymptomatic 46 .
The NMI cases generally showed milder phenotypes (low severity and prevalence of seizures, less serious brain findings on imaging studies and better intellectual capacity) compared to those cases with a PV in TSC2 47 . In our NMI cases ET61 and ET223, epilepsy was reported at 2 and 9 years of age, respectively, but absent at 17 and 10 years of age, respectively. Two of the three NMI cases (ET61 and ET44) did not exhibited intellectual disability, whereas the third (ET223) had a clinically severe cognitive affliction. Finally, the three NMI cases had cardiac rhabdomyomas at 17, 6 and 10 years of age, respectively. This is relevant given that the majority of TSC patient were found to have partial (50%) or complete (18%) rhabdomyoma involution upon follow-up echocardiography 46 . conclusion Our combined molecular screening using SSCP/SS/MLPA/NGS reached a mutation detection rate of 94% and revealed a clear predominance of TSC2 mutations and a majority of sporadic cases. Due to the great allelic and locus heterogeneity that exists in TSC and the large number of novel variants, it remains difficult to identify any genotype-phenotype correlation. This genetic study, however, enabled us to provide accurate genetic counseling, www.nature.com/scientificreports www.nature.com/scientificreports/ such as discarding minimal expression in first-degree relatives and defining familial versus sporadic cases. Our 3-D protein modeling results showed that the two missense LPV could alter the protein structure and function, but in vitro assays are needed to determine the real effects of these variants on the activities of hamartin and tuberin. Regarding the three cases with NMI, additional analyses are needed to rule out the presence of mosaicism or epigenetic TSC1/TSC2 modifications. The fact that 40% were novel variants supports the importance of studying the genetics of different TSC populations in order to expand our knowledge of the genetic spectrum of this disease, both worldwide and in countries such as Mexico, where molecular studies are limited and little work has been done on this disease. Therefore, this work represents the first TSC molecular screening performed in our country.

Methods
Genomic DNA extraction and PCR. All patients have a statement attesting to the informed consent of a parent and/or legal guardian for participation in the study and their parents signed their written informed consent. The study was conducted in accordance with the Declaration of Helsinki and Institutional Review Board (Comité de Ética en Investigación, Instituto Nacional de Pediatría, México) approval was obtained (protocol reference number 060/2014). Total peripheral blood leukocytes or buccal swab cells were obtained from the 66 cases, their available parents and first-degree affected family members. Genomic DNA was obtained with a commercially available kit using a silica-based approach (QIAamp; Qiagen, Victoria, Australia) according to the manufacturer's protocol. Specific primers were designed to enable PCR amplification of coding regions and intron-exon boundaries (±20 base pairs) of the TSC1 (NG_012386.1, NM_000368.4) and TSC2 (NG_005895.1, NM_000548.3) genes. Primer sequences and amplification conditions are available upon request.
Single-strand conformation polymorphism. All TSC1 and TSC2 PCR fragments were subjected to SSCP analysis. Briefly, 9 µL of denaturing solution (0.05% w/v bromophenol blue, 0.25% xylene-cyanol, 1.17 M sucrose and 5 M urea) was mixed with 5 µL of PCR product, heated for 10 min at 94 °C and chilled on ice. Samples (2.5-25 ng) were loaded on a 1X polyacrylamide gel prepared according to the manufacturer's protocol (MDE, Lonza, Rockland, USA). Electrophoresis was performed at 25 W for 5 h; the temperature was kept constant (4 o C) through cold-water circulation. The gel was stained with silver nitrate solution according to the manufacturer's protocol (Silver Stain Plus kit; Bio-Rad). www.nature.com/scientificreports www.nature.com/scientificreports/ were subjected to automated bidirectional Sanger sequencing (performed by Macrogen, USA). The obtained electropherograms were aligned to reference TSC1 and TSC2 gene sequences (NG_012386.1 and NG_005895.1, respectively) and posteriorly analyzed with the Codoncode Aligner software (CodonCode Corporation, Dedham, MA, USA) to detect small variants (point mutations and small insertions, deletions or duplications). In addition, the clinically relevant variants identify by NGS of coding and exon-intron boundaries (±50 base pairs) sequences were confirmed by SS for index cases and first-degree relatives.
The Mutalyzer nomenclature module tool (http://www.mutalyzer.nl) was used to validate the sequence variant nomenclature of all the TSC1 and TSC2 variants reported herein according to the guidelines of the Human Genome Variation Society. The novel variants have been submitted to LOVD v.3.0. (for accession numbers, see Table 1).

Multiplex ligation-dependent probe amplification (MLpA). Copy number variants (CNV) in
TSC1 and TSC2 were assessed with the MLPA technique using SALSA MLPA P124-C1 probemix for TSC1 and P337-A2 for TSC2 (MRC-Holland Amsterdam, The Netherlands). Amplified products were posterior analyzed by electrophoresis on an Applied Biosystems 3500 Genetic Analyzer (Thermo Fisher Scientific, USA). Comparative data analysis was performed with the Coffalyser.Net (v.140701.0000) software (MRC-Holland Amsterdam, The Netherlands).
next-generation sequencing and data analysis. DNA libraries were prepared with KAPA Hyper Prep (Kapa Biosystems, Inc. Wilmington, MA, USA), following the manufacturer's protocol. TSC1 and TSC2 exons and intron boundaries (±150 bp) were captured by hybridization with 125-mer probes for 30 nucleotides with 50x tiling (designed by Twist Bioscience, San Francisco, CA, USA) for the hg38 reference genome. Captured DNA was sequenced on the Illumina HiSeq. 2 × 150 Platform for an 800x mean coverage, as performed by Admera Health Company (South Plainfield, NJ, USA). The raw sequencing data were evaluated for quality with the FastQC program (Version 0.11.8) 48 . Adapters and low-quality reads (Phred value <20) were excluded with the Trimmomatic v 0.35 software 49 . Filtered reads were aligned with Bowtie2 50 against human genome version GRCh38, and optical and PCR duplicates were removed by SAMtools 51 . Single nucleotide variants were detected with the GATK 52 and FreeBayes 53 programs and posteriorly annotated with GATK 52 .
Once the responsible TSC genotypes were determined as described above, the Fisher's exact test was used to compare the proportions of different TSC1 and TSC2 gene variants (Fig. 2e).

Data availability
Twenty-five pathogenic or likely pathogenic variants, not previously reported, were submitted and are available at the Leiden Open Variation Database; LOVD (www.lovd.nl/TSC1 and www.lovd.nl/TSC2).