Rare genetic susceptibility variants assessment in autism spectrum disorder: detection rate and practical use

Autism spectrum disorder (ASD) is a neurodevelopmental disorder with a strong genetic component whose knowledge evolves quickly. Next-generation sequencing is the only effective technology to deal with the high genetic heterogeneity of ASD in a clinical setting. However, rigorous criteria to classify rare genetic variants conferring ASD susceptibility are currently lacking. We have performed whole-exome sequencing to identify both nucleotide variants and copy number variants (CNVs) in 253 ASD patients, including 68 patients with intellectual disability (ID) and 90 diagnosed as Asperger syndrome. Using explicit criteria to classify both susceptibility genes and susceptibility variants we prioritized 217 genes belonging to the following categories: syndromic genes, genes with an excess of de novo protein truncating variants and genes targeted by rare CNVs. We obtained a susceptibility variant detection rate of 19.7% (95% CI: [15–25.2%]). The rate for CNVs was 7.1% (95% CI: [4.3–11%]) and 12.6% (95% CI: [8.8–17.4%]) for nucleotide variants. The highest rate (30.1%, 95% CI: [20.2–43.2%]) was obtained in the ASD + ID subgroup. A strong contributor for at risk nucleotide variants was the recently identified set of genes (n = 81) harboring an excess of de novo protein truncating variants. Since there is currently no evidence that the genes targeted here are necessary and sufficient to cause ASD, we recommend to avoid the term “causative of ASD” when delivering the information about a variant to a family and to use instead the term “genetic susceptibility factor contributing to ASD”.


Introduction
Autism spectrum disorder (ASD), whose prevalence is estimated to be around 1 per 100 children, encompass a wide range of phenotypic manifestations ranging from severe behavioral impairment, often associated with intellectual disability (ID), to mild difficulties in social interaction. The fact that, clinically, autism is not a single disease entity is mirrored by the extreme genetic heterogeneity underlying this condition. Chromosomal microarray analysis is usually used as the first-tier genetic test for individuals with autism with a diagnostic yield of 7-9% 1,2 . However, such analysis cannot detect single nucleotide variations and small Indels but only loss or gain of genomic DNA material. In light of recent advances in ASD genetics, especially regarding the importance of de novo truncating variants, it is estimated that rare variants within several hundreds of genes may contribute to ASD susceptibility, each of them being present only in a very small proportion of cases 3 . In this context, nextgeneration sequencing is the only reasonable and costeffective approach to search for variants in these genes in a diagnostic perspective.
The aims of the present study were threefold: (i) to establish an accurate list of susceptibility genes from literature data and to provide guidelines to categorize rare susceptibility variants (ii) based on these criteria, to estimate the whole exome sequencing (WES) detection rate of rare susceptibility variants in a sample of ASD patients typical of those attending genetic consultations and (iii) to propose some recommendations for susceptibility factors assessment and interpretation in clinical practice.

Patient recruitment
During the 2009-2017 period, 679 unrelated subjects received a diagnosis of ASD or Asperger syndrome by clinicians of our local expert center. This study was carried out on a subset of 253 cases, including all patients from whom parental DNA was available (n = 159) and 94 randomly selected patients.
Clinical evaluation was mainly based on ADOS-2. In addition, ADI-R and CARS, were also used for 76 and 145 patients, respectively. All diagnoses were made according to DSM-IV-TR criteria. Accordingly, the diagnosis of Asperger syndrome included the absence of language impairment, as documented during clinical examination. IQ was assessed in subjects with age >5 years with the Wechsler Intelligence Scale for Children (WISC) or the Raven Progressive matrix. PsychoEducational Profil third edition (PEP-3) and Vineland Adaptive Behavior Scales (VABS-2) assessment, when available, were used to refine the classification of cases with heterogeneous cognitive functions or mild cognitive impairment. Subjects with age <5 years were classified in the ASD subgroup, without further specification.
Following national recommendations, diagnostic assessment included a genetic consultation. Every patient was received by a senior clinical geneticist for thorough questioning about development, previous medical records and family history, followed by a complete physical examination.
This genetic study was approved by our legal ethics committee. Parents or legal representative of the children signed an informed consent for genetic analyses.

Molecular genetics
All included patients had previously been negatively screened for Fragile X expansion and 54 had been tested by array Comparative Genomic Hybridization (aCGH) Agilent 180K (Santa Clara, United-States), yielding to the prior identification (subsequently confirmed by exome sequencing) of four CNVs conferring susceptibility for ASD. For each proband, DNA was extracted from fresh blood samples. DNA was also obtained from affected siblings and from parents when available.
Exome sequencing was performed on every proband. Exomes were captured using the Agilent SureSelect All Exons V5-UTR or V6 Kit (Santa Clara, United-States). Sequencing was performed on an Illumina HiSeq4000 (Illumina, San Diego, CA, USA) at the CNRGH (Centre National de Recherche en Génomique Humaine, Evry, France) with paired end mode, 150 base pairs (bp) reads. Nucleotide variants and copy number variants (CNVs) were analyzed through a bioinformatics pipeline to perform variant calling, quality check, annotation and CNVs detection (see Sup Info). When possible, the segregation of retained variants was examined by Sanger sequencing for nucleotide variants and by ddPCR or QMPSF for CNVs. Parenthood was checked by polymorphic microsatellites for patients carrying de novo mutations.

List of genes and interpretation of variants
To establish a panel of genes firmly contributing to ASD susceptibility, we retained three sets of genes: (i) genes showing a statistically significant excess of de novo protein truncating variants (PTVs), (i.e. nonsense, canonical splice site variants and frameshift indels) in large published cohorts. Since these genes have been identified in several, sometimes redundant recent trio studies 1,4-10 , we reanalyzed (see Sup Info) all de novo events reported in each of these studies (Sup Table 1). Depending on the level of statistical evidence for the enrichment in de novo PTVs, we ranked these genes into two categories: class A (false discovery rate (FDR) <5%) meaning definitely involved and class B (5% ≤ FDR ≤ 10%), meaning probably involved in ASD susceptibility. This led to a list of 81 genes (Sup Table 2). For these genes, we retained as susceptibility factor all rare PTVs found in patients, whatever the inheritance (i.e. de novo occurrence or not).
(ii) syndromic ASD-related genes, i.e. genes causing monogenic diseases for which ASD can be part of the phenotype (n = 107). This set of genes was carefully curated from the Simon Foundation Autism Research Initiative (Sfari) database and from literature data. For each syndromic gene, the mode of inheritance was specified and rare non-synonymous variants were prioritized. For these genes, pathogenicity of rare variants (MAF < 1% in GnomAD) was evaluated using the guidelines from the American College of Medical Genetics-Association for Molecular Pathology (ACMG-AMP). Only variants considered as likely pathogenic or pathogenic (class 4 or 5) were retained as advised by the guidelines. For our analysis, we considered these variants as probable (class 4) or definite (class 5) ASD-susceptibility variants and then evaluated whether patients carrying them harbored or not the full syndromic presentation typical of each gene.
(iii) genes intersected by a burden of rare CNVs, listed in the Sfari database (n = 85).
Combining these three sets yielded a high confidence list of 217 ASD susceptibility genes (Sup Table 3). Note that some of these genes were simultaneously present in several sets. (Fig. 1).
Besides this main panel and despite the fact that the interpretation was more problematic, we also examined rare missense variants in genes carrying a burden of de novo and transmitted mutations identified by TADA analyses 4,11 as well as in genes recently reported as carrying an excess of de novo missense mutations in ASD and developmental disorders (DD) 1,12 . This analysis was conducted on 89 genes (61 already included in the main list and 28 additional genes). For this analysis, only de novo missense variants predicted as damaging by three in silico prediction tools were considered.
Finally, for patients with ID or additional sensory impairment, we examined all OMIM referenced genes.
While CNV detection was performed on all genes of the main list, variant interpretation depended on the gene subsets. We retained all previously associated CNVs (i.e. duplications or deletions) within the set of large recurrent CNVs associated with various DDs including ASD 13 . In contrast, we used a conservative approach to interpret the remaining CNVs, and only retained rare deletions targeting Loss of Function (LoF) intolerant genes as probable ASD susceptibility variants.

Results
Among the 253 ASD probands (81.4% males), 68 were categorized in the ASD+ID subgroup while 90 were classified as having Asperger syndrome. The average age at inclusion was 8.5 ± 5.6 years (range 3-41). Positive family history (among first, second or third-degree relatives) for ASD, DD or ID was reported for 27.7% of probands. Demographic and phenotypic characteristics of the sample are displayed in Table 1.
As shown in Table 2, ASD-contributing CNVs were present in 18/253 cases (7.1%, 95% CI: [4.3-11%]) while ASD-contributing nucleotide variants (Table 3) (Table 4). In the whole sample, two CNVs (NRXN1 E1-3 del and 15q11 BP 1-2 del) were recurrent and four genes were found to carry nucleotide variants more than once: CHD8, SHANK3, NRXN1, and KMT2A. Regarding nucleotide susceptibility variants, the contribution of the set of genes previously shown as enriched in de novo PTVs in large ASD cohorts should be highlighted. Ten genes carrying ASD-susceptibility variants belonged to this category vs seven syndromic genes and nine genes belonging simultaneously to the two categories. Overall, the high frequency of de novo events is noteworthy: we identified 21/28 de novo nucleotide variants and 6/14 de novo CNVs among 42 patients from whom parental DNA was available. Regarding the mode of inheritance of ASD contributing variants, no autosomal recessive inheritance was encountered in this mainly outbred population.
Finally, variants in two genes not included in our list, but which cause nevertheless part of the phenotype, were found in two patients presenting ASD, ID, and sensory deficits. A patient with hearing impairment carried compound heterozygous c.2485C > T, p.Gln829*/c.5665T > C,  Genotype/phenotype correlations Consistency with the previously described phenotypes was checked for syndromic genes and was fair to good for most genes (DNMT3A, ASXL3, EBF3, NR2F1, DYRK1A, DEAF1, ANKRD11, SHANK3). (See Sup Info) but in some cases unusual phenotypes were encountered: Patients carrying MECP2 or KMT2A mutations did not harbor the classical features of Rett or Wiedemann-Steiner syndrome, respectively. A MECP2 mutation (not present in the International Rett database) was identified in a girl who presented with mild ASD. She had no developmental delay (walk at 15 months, no language delay) and was enrolled in an ordinary school at the age of 14. She demonstrated no ID and no dysmorphic features. KMT2A mutations were identified in two patients. One of them carrying the p.Gln3192Pro de novo mutation was the first child of a couple with epilepsy and mild cognitive impairment. He had a global developmental delay with a sitting age at 1 year but a normal walking age at 17 months, a language delay and a motor instability. Clinical evaluation showed a normal growth, an hyperlaxity and a diffuse hypertrichosis but no hairy elbow or morphological features typical of Wiedemann-Steiner Syndrome. The second one was the first child of a couple without medical history. He carried a c.10835 + 1G > A splice mutation already described in 3 related ID patients, not diagnosed as Wiedemann-Steiner. This variant leads to the in frame deletion of exon 28 and likely disrupts the stabilizing interaction between the C terminal KMT2A  Likewise, although DYNC1H1 missense or truncating variations are usually associated with a severe phenotype including ID, neuronal migration defects and epileptic encephalopathy 15 , our patient had an unremarkable ASD phenotype without neurological features. CHD2 truncating mutations are mostly associated with pediatric refractory epilepsy 16 , but our patient had only experienced one tonic-clonic seizure. In the same vein and in line with recent reports underlining the phenotypic heterogeneity of SCN2A 17 and GRIN2B 18 mutation carriers, none of the patient carrying de novo truncating mutations on these genes had epilepsy, which is a feature generally associated with these mutations.
It should also be noted that besides these wellcharacterized syndromic genes, another patient carrying the RIMS1 splice variant received a diagnosis of Dravet syndrome but bore no mutation on genes causing this phenotype. Interestingly, the RIM1a protein encoded by the mice ortholog of RIMS1 has been characterized as a presynaptic active zone protein mainly expressed in neurons from the cerebellum 19 and controlling epileptogenesis following pharmacologically induced status epilepticus 20 . This suggests that the RIMS1 truncating variant identified here, which affects a minor transcript expressed in the cerebellum (https://gtexportal.org), could be associated with a peculiar neurological phenotype.

Discussion
In this report, the rate of identified CNVs, largely based on well-known recurrent CNVs, was similar to that obtained in several recent studies 1,2 . It is notable that four of our patients carry the 15q11.2 recurrent deletion which is known to have a mild effect on developmental disorder. Hence, it is very probable that additional factors are involved in this subset. Regarding nucleotide variants, we used rigorous criteria to classify both ASD susceptibility genes and variants. As a result, we retained only 217 genes as compared for example with the 880 genes present in the Sfari database (october 2018 release), often with minimal evidence of involvement in ASD susceptibility. We nevertheless obtained a detection rate of 12.6% for nucleotide variants. Combining CNVs and nucleotide variants, we reached a higher detection rate (19.7%) than the one obtained in 105 ASD patients previously analyzed by WES and aCGH (15.8%) 2 . This is partly due to the contribution of the set of genes with an excess of de novo PTVs in ASD patients, which were identified in the recent years thanks to the study of more than 6000 ASD trios 1,4-10 . Clearly, this huge effort has obvious consequences for clinical practice. The finding of a particularly high rate of de novo PTVs in this set of genes (14/20 PTVs = 70% of the total number of PTVs  in patients for whom parental DNA was available) reinforces the conclusion that this set is truly enriched in de novo PTVs in ASD subjects. As seen in Sup Table  2, the gene which is the most frequently hit by de novo PTVs in ASD patients from the literature is CHD8 21 .
Interestingly, among the three CHD8 PTVs identified in this study, two were de novo events while the third was inherited from a parent whose sole phenotypic feature was macrocephaly. More generally, the presence of 30% of PTVs inherited from parents with no ASD diagnosis underlines the incomplete penetrance and/or variable expressivity of these variants. Due to the stringent criteria used to classify variants, it should be stressed that our detection rate is likely to be underestimated. First, concerning CNVs, partial duplications intersecting genes were conservatively considered of uncertain significance (Sup Table 4). However, it is well known that tandem intragenic duplications often disrupt reading frames and can lead to haploinsufficiency. In a recent analysis of all structural variation detected in the gnomAD dataset, the authors showed a significant correlation between constraint metrics reflecting intolerance to Lof mutations and the depletion of partial genic duplications at the gene level 22 . However, despite these findings, the individual interpretation of partial duplications remains challenging and will require additional investigations. Second, since all missense variants impacting genes exhibiting a statistically significant excess of de novo missense mutations in ASD 1,12 were inherited, we decided to consider them as variants of unknown significance (VUS) (Sup Table 5) even when they were absent from gnomAD.
The detection rate was particularly high in the ASD+ID subgroup. It was in the same range as those recently reported in ID or DD patients (25-39%) [23][24][25] , or in a cohort of 163 ID/DD patients "with reported ASD or autistic features" (25.8%) 26 . Of note, all these studies used larger panels of genes than ours. In our study, although ASD+ID patients were more likely hit, susceptibility variants were also found with a high frequency in the Asperger group. Several genes carried susceptibility variants in both ASD+ID and Asperger subjects. For example, in accordance with a previous report outlining that cognitive ability of CHD8 patients ranges from profoundly disabled to age-appropriate 21 , of the three patients carrying CHD8 PTVs, one had ID associated to ASD while two others received a diagnosis of Asperger syndrome. However, macrocephaly, a shared feature among CHD8 patients, was present in all three patients. Likewise, NRXN1 exon 1-3 deletions were present in patients with each clinical presentation (ASD, ASD+ID, Asperger).
The genotype/phenotype correlation for syndromic genes was generally good, but in some cases was not typical. As already outlined 23,25 , it is noteworthy that the phenotypic spectrum of mutations in several "syndromic genes" for which a specific phenotype has previously been described is broader than initially thought. Clearly, for a large set of genes, prioritizing by double entries (genes with both an excess of de novo PTVs and syndromic genes) allows the identification of ASD susceptibility factors in more patients than prioritizing upon a single criteria based on strict phenotypic concordance with described syndromic features.
In summary, the strategy we advocate here is an efficient and cost-effective manner to deal with the detection of both rare nucleotide variants and CNVs in a clinical setting. Of course, exome data should be periodically reanalyzed and the list of genes updated. Due to the extreme rarity of variants studied here, it is currently difficult to establish for each of them if they are fully causative of an ASD phenotype. Most likely, only a subset of them are necessary and sufficient to cause ASD. For this reason, we recommend to avoid the term "causative of ASD" when delivering the information about these variants to a family and to use instead the term "genetic susceptibility factor contributing to ASD phenotype". At the present stage, with the exception of pathogenic variants in syndromic genes causing well-defined monogenic diseases, using these results for prenatal diagnosis is premature.