Introduction

Schizophrenia is a major mental disorder characterized by a wide spectrum of symptoms, including delusions, hallucinations, disturbance of thinking processes and deterioration of social behaviors; the prevalence of schizophrenia is 1.1% in the US adult population.1, 2 Schizophrenia is a neurodevelopmental disorder characterized by social and cognitive developmental abnormalities, often with mild motor signs.3, 4 The average age of onset of schizophrenia is 18 years in men and 25 years in women.5 However, there is a rare, very early onset form of the disease referred to as Child onset schizophrenia (COS), defined by onset at 13 years or younger. This rare type of schizophrenia is clinically and neurobiologically continuous with the adult-onset disorder.6 COS patients have a high rate of comorbidity of developmental disorders such as autism spectrum disorder (ASD), motor developmental disorders and learning disorders.6 A longitudinal study conducted on COS patients at the child psychiatry branch of National Institute of Mental Health (NIMH) reported the prevalence of COS to be ~1/40 000, with a gradual onset and outcome resembling adult cases.7 As the prevalence of the disease is very low, less is known about the genetic architecture in COS.

Studies looking at family-based association and copy-number variations (CNVs) have identified some genetic variations underlying COS.6, 8, 9 One study of structural variations (ie, microduplications and microdeletions) found that the overall frequency of CNVs is higher in COS patients compared with adult-onset patients and to population controls.6, 10, 11

In schizophrenia, our group found potentially deleterious de novo single base pair variants and showed that the rate of de novo variants in schizophrenia patients is higher than expected.12, 13 The aim of the current study was to measure the rate of de novo variants and to uncover possibly exclusive COS candidate genes, by analyzing de novo rare variants in sporadic COS trios.

Materials and methods

Samples and clinical characteristics

Seventeen sporadic COS cases (11 males and 6 females) and their unaffected parents were recruited for this study. Clinical information is included in the Supplementary Information. The mean age of onset was 9.8 years (range, 6–12 years old). Six of the patients were also diagnosed with ASD. We had previously examined the cases for CNV and the results were published in Ahn et al11 (Supplementary Table 1). Informed consent was obtained from all participants.

Whole-exome sequencing

The exome capture of all 51 individuals in the COS trios was performed using SureSelectXT Human All Exon V4 kit (Agilent Technologies Inc. Mississauga, ON, Canada). We prepared samples in two different batches. The first batch consisting of 13 COS trios (39 samples) was captured and sequenced using Illumina HiSeq 2000 at the McGill University and Genome Quebec Innovation Centre (Montreal, QC, Canada). The last batch of 4 COS trios (12 samples) was sequenced using the Illumina HiSeq 2000 platform at the Université de Montréal’s Beaulieu-Saucier Pharmacogenomics Centre at the Montreal Heart Institute (Montreal, Canada).

Exome data analysis

The sequenced reads of all the samples from Illumina HiSeq2000 were aligned to the reference genome (GRCh37/hg19) using Burrow Wheeler’s Algorithm.14 The aligned reads were converted to binary format for the convenience of further analysis using SAM tools.15 The efficiency of capture was calculated based on the percentage of exome covered by at least 20 reads (20 ×) of each sample. Samples obtained an average coverage of over 90% target covered at a depth of 20 × (Supplementary Figure 1). The quality of coverage was assessed by the total number of reads mapped to corresponding regions in the reference genome, over the total number of uniquely mapped reads. Next, the variant calling was performed using Genome Analysis Tool Kit (GATK).16 The variants were called for the sequenced reads available within the coverage region for each of the samples. This process identified single-nucleotide variants and small insertions or deletions at different levels of stringency based on their quality scores. The variants identified were annotated with ANNOVAR tool17 to state the position of genes and their chromosomes, including their minor allele frequencies from publicly available databases.

De novo variant identification

We used an in-house program to segregate all the variants in COS trios. Our selection included: (1) all truncating variants, (2) splice site disrupting variants, (3) missense variants, and (4) synonymous variants. De novo variants are those that occur in the gametes before fertilization or immediately after fertilization.18 We prioritized the de novo variants by excluding all inherited variants from each proband’s parent’s exome data, as a part of the filter. We determined the allele frequency of variants based on the variants reported in the Exome Variant Server (EVS; Exome Variant Server, NHLBI GO Exome Sequencing Project (ESP), Seattle, WA, USA (URL: http://evs.gs.washington.edu/EVS/)).

Validation of candidate variants

To validate the candidate variants, we used polymerase chain reactions (PCR) and Sanger sequencing. Primers were designed for each of the potential de novo variants identified from COS cases and controls using Primer3.19 Sanger sequencing was performed for every member of each family. The Sanger sequencing results were analyzed using Mutation Surveyor v.4.0 (SoftGenetics, Pennsylvania, USA).

Results

In our 17 COS trios, we identified 20 exonic de novo variants (14 novel missenses, 2 novel deletions and 4 synonymous), as shown in Table 1. Among the 14 missense variants, (RYR2 p. (Glu746Tyr)) had two nucleotide substitutions in the same codon which was termed a ‘delins’ (Supplementary Figure 3). By opposition to what we observed in one of our previous adult schizophrenia report13 there were no nonsense variants identified in this study. Variants are available in ClinVar database at http://www.ncbi.nlm.nih.gov/clinvar/ (Accession: SCV000223956—SCV000223976). The total quality read bases for each individual was calculated based on the sequenced reads obtained from SureSelect Human All Exon V4 kit probes that were overlapping with regions from the Consensus CDS (CCDS). The de novo mutation rates were calculated based on the total number of de novo variants divided by the total number of callable bases. The observed de novo mutation rate in COS probands was 1.93 × 10−8 per base pair (Figure 1), based on the total number of callable bases for the 17 COS probands (518 Mb) and the 20 observed de novo variants (including small insertions and deletions). The de novo mutation rate in COS was slightly increased from the de novo mutation rate observed in recent studies; the binomial P-value was 0.25 (95% CI=1.17 × 10−8–2.98 × 10−8; Table 2a). However, we found that the ratio of non-synonymous (n=15) to synonymous (n=4) de novo variants in the COS cases was increased, although not significantly when compared with the neutral expectation (ratio of 2.23:1) reported in previous findings20 (Binomial P-value= 0.46; Table 2b).

Table 1 List of de novo variants in our study
Figure 1
figure 1

Dot plot shows the distribution of de novo variants in 17 COS probands. The x axis represents the de novo mutation rate per individual. The line in red indicates the fairly accurate coding de novo mutation rate reported so far.

Table 2a De novo mutation rate comparison between our COS cohort and recent studies
Table 2b Non-synonymous to synonymous ratio comparisons between de novo and inherited variants in our study

We also investigated the possible effect of missense variants in all genes, based on the predictions of PolyPhen-2 (Polymorphism Phenotyping-2).21 We compared the PolyPhen-2 scores of de novo variants found in COS cases with three different groups: de novo variants found in adult-onset schizophrenia cases previously reported by our group,13 de novo variants found in published controls22, 23 and private inherited variants found in COS cases. Private inherited variants are the rare variants found exclusively in a given family. The comparison showed that the de novo variants were on average more severe than private inherited variants (Two tailed T-test P-value=0.03; Figure 2a). We also found that published controls had a lower variant severity profile compared to diseased cohorts, although not significantly (Two-tailed T-test P-value=0.29); severity being defined by impact on protein function (Figure 2a). The severity profile was similar in the distribution of schizophrenia and COS. Overall, our analysis showed increase in highly disruptive amino acid changes of de novo events, in comparison with inherited variants, and thus suggested a negative selection pressure on missense variants.

Figure 2
figure 2

Violin plot showing the distribution of functional severity, predicted by bioinformatics scores, for the missense de novo variants and private inherited variants in COS probands. Also shown is the comparison of de novo variants in schizophrenia13 reported by our group and other controls in recent studies.22, 23 The median is indicated by the white dot and the colored area shows the kernel distribution of the data. (a) PolyPhen-2 score—variant based and the higher score shows more severity. (b) RVIS, Residual Variant Intolerant Score is gene specific and lower score shows more intolerant for variation.

Furthermore, to explore the pathogenic potential of our de novo variants, we used Residual Variant Intolerant Score (RVIS).24 RVIS is a gene based score and this genome-wide scoring system assesses the functional variation of human genes based on the single-nucleotide variants in EVS. The RVIS percentile gives an indication as to whether a gene is ‘tolerant’ or not to variants (ie, if it can be mutated without leading to a disease). This score is significantly correlated with genes known to cause Mendelian diseases.24 We compared the RVIS percentile for genes harboring de novo variants in COS cases with the genes mutated in the three different groups mentioned in the comparison of PolyPhen-2 scores. Interestingly, we found that de novo variants in COS were more present in intolerant genes than the private inherited variants (Two tailed T-test P-value=0.0375), as well as the de novo variants in published controls (Two tailed T-test P-value=0.1366; Figure 2b). There was a very similar trend in the RVIS percentile distribution of schizophrenia and COS. Overall, we observed that de novo variants were more frequently in genes functionally intolerant to variations regardless of case–control status.

A likelihood analysis of PolyPhen-2 scores and RVIS percentile was also performed using EVS variants to assess the pathogenicity of random variants in a simulation (Supplementary Methods). We calculated the mean value for both PolyPhen-2 scores and RVIS percentile using permutations of a subset of randomly selected genes. We found that the mean PolyPhen-2 score of our identified de novo variants tend to be significantly higher than the random selection (P-value=0.040). Furthermore, when applying the same method to RVIS percentile, we found that the de novo variant carrying genes had a higher global intolerance to variation (P-value=0.010). Thus, this analysis further supported the notion that de novo variants in COS were pathogenic and in genes intolerant to variations not merely by chance (for more details see Supplementary Information and Supplementary Figure 2).

Discussion

In this study, the coding de novo mutation rate in COS is 1.17 per exome, which is consistent with the results published in de novo mutation studies of other psychiatric diseases13, 25, 26, 27, 28, 29 (Supplementary Table 2). The missense variants identified in our COS subjects tend to be more severe than private inherited variants. This result supports the fact that de novo variants are more penetrant when compared with inherited variants, as they are not subjected to natural selection.30 De novo variants are now thought to explain part of the heritability of complex neurodevelopmental disorders such as ASD, schizophrenia and intellectual disability.13, 22, 28, 31 Here we show that it might be the same for another related neurodevelopmental disorder, COS. As COS is an early-onset disease, we could hypothesize that de novo variants, which are enriched for more severe and deleterious effects (according to the results of PolyPhen-2 and RVIS), may have a greater role in disease than in the adult form of schizophrenia. However, the sample size of the current study does not allow us to draw such a conclusion and thus, a new study in a larger cohort is needed.

Among the 20 de novo variants identified, 6 were previously implicated in schizophrenia or other neuropsychiatric disorders. One such gene, GPR153, with a de novo missense variant in one of the subjects, has also been reported to have a de novo missense variant in a schizophrenia patient.22 Interestingly, the same proband (COS885) has a possibly disease-related CNV in 1q21.3 (Supplementary Information). If we exclude this patent from our study, then the de novo mutation rate is 1.84 × 10−8 per base pair, still similar to that seen in autism and schizophrenia. Similarly, a de novo missense variant in STAC2, which codes for cysteine-rich domain-containing protein 2, has previously been reported in a patient with schizophrenia.25 In our study, we identified a synonymous variant in the same exon of this gene, suggesting a possible association with the disease. On the basis of the literature and severity score scale, some of the genes with de novo variants seem to be good candidates as COS predisposing genes. One of these is SEZ6, a closely related homolog of SEZ6L2, a candidate gene in autism.32 Polymorphisms in SEZ6L have been associated with autism and bipolar disorder.33 In addition, a de novo variant in SEZ6 has been identified in a patient with intellectual disability.31 As COS patients have comorbidity of autism and language disorders, SEZ6 is a potential candidate gene for COS. Another gene harboring de novo variants, RYR2, codes for a calcium channel receptor and has been associated with autism.34 Interestingly, one of the COS patients has two de novo nucleotide substitutions in the same codon of this gene (Supplementary Figure 4). These type of rare events may be termed as ‘delins’, and were explained in an autism study as de novo nucleotide substitutions that occur during allelic gene conversion events.35 In addition, recurrent de novo RYR2 and RYR3 variants have been found in patients with infantile spasms and Lennox–Gastaut syndrome—classical epileptic encephalopathies.23 Furthermore, RYR2 plays a major role in maintaining the calcium homeostasis and presynaptic function in hippocampal neurons.36 All these evidences suggest that the variant identified in RYR2 may affect synaptic organization and connectivity during the early development of the brain. Hence, RYR2 is a good candidate gene for COS. We also identified a potentially damaging missense variant in GTF2IRD1, a gene previously associated with Williams–Beuren Syndrome, a neurodevelopmental disorder.37 In addition, a deletion of the GTF2 transcription factor family was observed in patients with cognitive behavioral abnormalities, including language delay and non-social anxiety.37 Similar clinical symptoms are observed in patients with COS (Supplementary Table 1). Since schizophrenia is believed to be a neurodevelopmental disorder, GTF21RD1 is a potential candidate gene. TTBK1, one of the genes harboring a missense de novo variant in a COS patient, is brain specific and involved in neuronal and cognitive dysfunction.38 Moreover, changes in expression of TTBK1 induce pathological effects in the hippocampal neurons of patients with Alzheimer’s disease.39 Therefore, TTBK1 appears to be an attractive candidate gene for COS because of its expression in the hippocampus, a region whose dysfunction is thought to be involved in psychiatric diseases.40 Several genes of the neuronal migration pathway have been previously implicated in schizophrenia. Among those genes, we identified a small deletion in ITGA6 in one of the COS patients. Genes such as ITGA3 and LAMA2, with de novo variants reported in schizophrenia, are known to have major roles in fetal development of brain.29 ITGA6, of the same integrin gene family as ITGA3, could also be potentially involved in a pathway relating to neuronal migration.41

In conclusion, this study provides a list of interesting genes such as SEZ6, RYR2, GPR153, GTF2ID1, TTBK1 and ITGA6 that are candidates to be involved in the etiology of childhood onset schizophrenia. Although larger COS cohorts will be required to replicate these findings, it will be worthwhile to further investigate if these candidate genes are also involved in biological pathways associated with the disease, especially the early development of central nervous system.