Exome sequencing of case-unaffected-parents trios reveals recessive and de novo genetic variants in sporadic ALS

The contribution of genetic variants to sporadic amyotrophic lateral sclerosis (ALS) remains largely unknown. Either recessive or de novo variants could result in an apparently sporadic occurrence of ALS. In an attempt to find such variants we sequenced the exomes of 44 ALS-unaffected-parents trios. Rare and potentially damaging compound heterozygous variants were found in 27% of ALS patients, homozygous recessive variants in 14% and coding de novo variants in 27%. In 20% of patients more than one of the above variants was present. Genes with recessive variants were enriched in nucleotide binding capacity, ATPase activity, and the dynein heavy chain. Genes with de novo variants were enriched in transcription regulation and cell cycle processes. This trio study indicates that rare private recessive variants could be a mechanism underlying some case of sporadic ALS, and that de novo mutations are also likely to play a part in the disease.

The contribution of genetic variants to sporadic amyotrophic lateral sclerosis (ALS) remains largely unknown. Either recessive or de novo variants could result in an apparently sporadic occurrence of ALS.
In an attempt to find such variants we sequenced the exomes of 44 ALS-unaffected-parents trios. Rare and potentially damaging compound heterozygous variants were found in 27% of ALS patients, homozygous recessive variants in 14% and coding de novo variants in 27%. In 20% of patients more than one of the above variants was present. Genes with recessive variants were enriched in nucleotide binding capacity, ATPase activity, and the dynein heavy chain. Genes with de novo variants were enriched in transcription regulation and cell cycle processes. This trio study indicates that rare private recessive variants could be a mechanism underlying some case of sporadic ALS, and that de novo mutations are also likely to play a part in the disease.
A n intensive search for genetic disorders that could underlie amyotrophic lateral sclerosis (ALS) has uncovered pathogenetic variants in about 10% of sporadic ALS (SALS) and 60% of familial ALS (FALS) patients 1 . While this represents remarkable progress in only a few years, a major question is whether most SALS arises from environmental factors, genetic predisposition, or some combination of the two. Attempts have been made to look for environmental factors or gene-environment interactions underlying ALS in, for example, pesticide exposure 2 , but despite work from many research groups no convincing environmental factor for ALS has been found. Furthermore, numerous genome-wide association studies (GWAS) have revealed no reproducible findings of common variants that would lead to ALS susceptibility in a substantial proportion of patients 3 . There could be a number of reasons for such negative results in GWAS, one being a mismatch of exposure to environmental factors and the presence of susceptibility genes. While there may be an environmental contribution to SALS, the genetic contribution could come mostly from rare variants, which still allows for strong gene-environment interactions.
If SALS has a strong genetic component, consideration needs to be given to the genetic mechanisms that could be responsible for the sporadic occurrence of most cases of ALS. One form of inheritance that can give rise to an apparently sporadic condition, especially in small families, is that of recessive variants 4 . Homozygous variants in ALS have already been described in SOD1, OPTN, and FUS 1 , and other rare variants could be responsible to further SALS cases 4 . Recessive inheritance due to rare compound heterozygous variants is another genetic mechanism that can give rise to a sporadic disorder, since both rare variants are unlikely to be reproduced in the next generation. It has often been pointed out that with the demographic shift to smaller families, a disease with a recessive inheritance or with a low penetrance will seem sporadic in a large number of cases 5 .
De novo mutation, in which the pathogenetic variant arises for the first time in the offspring of normal parents, is a further mechanism that can give rise to an apparently sporadic disorder. De novo mutations in FUS 6 , ERBB4 7 and ATXN2 8 have previously been suggested to be associated with ALS.
A powerful method of looking for recessive and de novo variants underlying a sporadic disorder is the use of case-unaffected-parents trios. Large numbers of these trios are difficult to collect in ALS, since it is unusual to have access to living parents of ALS patients, with the average age of disease onset being in the early 60 s. In 2011, a genome-wide copy number analysis of 12 SALS trios found a number of de novo copy number variants (CNVs) in the SALS offspring; 11 of these CNVs involved genes, some of which were in pathways suspected in the pathogenesis of ALS 9 . More recently, exome sequencing of 47 SALS trios brought to light de novo single nucleotide variants in genes that may be involved in the pathogenesis of ALS 10 .
In an attempt to uncover rare recessive and de novo variants that could underlie SALS, we therefore sequenced the exomes of 44 Australian case-unaffected-parents trios.

Results
ALS offspring patients and unaffected parents. White blood cell DNA samples were available from 44 trios (see Table 1 for the number of recessive and de novo variants found in each ALS patient, and Supplementary Table S1 online for further clinical details of the patients and all ages). Thirty-seven of the offspring had classical sporadic ALS (SALS) with upper and lower motor neuron signs, three had sporadic progressive muscular atrophy (SPMA), two had sporadic progressive bulbar palsy (SPBP), one had sporadic primary lateral sclerosis (SPLS), and one had sporadic frontotemporal degeneration with motor neuron disease (SFTD-MND).
The average age of disease onset of our ALS trio (ALS TRIO ) offspring was 46.1 y (SD 9.1 y, range 26-63 y). In comparison, the average age of disease onset for the 828 SALS patients in the Australian MND DNA Bank was 61.9 y (SD 11.5 y, range 26-99 y), a significant difference on unpaired two-tailed t-testing (p , 0.0001).
The average age of fathers at the birth of ALS TRIO offspring was 29.4 y (SD 5.1 y, range 22-42 y) and that of the 689 Australian MND Bank fathers at ALS offspring birth was 31.4 y (SD 6.9 y, range 15-67 y), a non-significant difference on t-testing (p 5 0.06). The average age of mothers at the birth of ALS TRIO offspring was 26.3 y (SD 4.0 y, range [20][21][22][23][24][25][26][27][28][29][30][31][32][33][34][35][36][37][38] and that of the 689 Australian MND Bank mothers at ALS offspring birth was 28.2 y (SD 6.1 y, range 13-50 y), also a nonsignificant difference on t-testing (p 5 0.05). pathways see Figure 1), 49 recessive and compound heterozygous variants remained for validation and further analysis. Full lists of the coding recessive and compound heterozygous variants detected are shown in Supplementary Dataset 1 and Supplementary Dataset 2 respectively online. We validated 28 compound heterozygous variants in 19 different genes (Table 2), which involved 12 (27%) ALS TRIO patients (Table 1). In 6 (14%) ALS TRIO patients 9 homozygous recessive variants in 9 different genes were found (Tables 1 and 2). The deleterious nature of these recessive variants can be judged from their average SIFT score of 0.0058 and average PolyPhen2 score of 0.0008. A quarter of the genes with these recessive variants have significantly increased expression in the spinal cord compared to non-central nervous system tissues 11 (Table 2).
De novo variants. Eighty-one de novo variants passed manual review in IGV and 54 were validated with Sanger sequencing (Figure 1). See Supplementary Table S4 online for the complete list of de novo variants. Seventeen of the de novo variants were coding, involving 12 (27%) ALS TRIO patients (Tables 1 and 3). Of these 17 variants, 15 were missense (10 identified as deleterious or damaging using SIFT, PolyPhen and Condel), one splice site, and one nonsense. Twentyfour percent of the genes in which we found de novo variants have significantly increased expression in the spinal cord compared to non-central nervous system tissues 11 (Table 3). Although two coding de novo variants were found in five ALS TRIO patients, the distribution of de novo variants followed a Poisson distribution (see Supplementary Fig. S1 online), indicating that multiple de novo alleles in any one individual are unlikely to contribute to ALS risk.
Relation of variants found to known ALS variants. The frequency of variants in ALS susceptibility genes and the frequency of known ALS susceptibility variants were assessed in our cohort using ALSoD 12 . No increased burden of coding variants in known ALS genes was found in this cohort. All of the coding variants have been previously identified, and the alternate allele frequencies of these variants are similar to those in the NHLBI ESP and the 1000 Genomes Project, with the exception of a few non-synonymous variants that had elevated frequencies (see Supplementary Table S5 for a list of these). Given our limited sample size, however, we were unable to determine whether this enrichment was statistically significant.
Homozygous segments. No statistically significant enrichment of homozygous segments was found by size, burden, or genomic location in ALS TRIO patients versus controls.  Our DAVID analyses showed no involvement in any functional pathways of genes containing either recessive or de novo variants. The previous ALS trio exome study of Chesi et al., on the other hand, which used the same DAVID analysis, reported that chromatin regulator genes were significantly enriched 10 . When we combined our and the Chesi et al. de novo variants, and submitted them to functional annotation analysis, genes related to transcription regulation became more significantly enriched than our previous analysis (p 5 0.000032, FDR 5 0.0018). These 15 enriched genes comprised six from our ALS TRIO list (LIMD1, FOXN3, GTF2H4, MLL3, SND1 and TRRAP) and nine from the Chesi et al. list (CNOT1, ELL, FOXA1, FOXK1, HDAC10, SRCAP, SS18L1, ZNF410, ZNF778). This combined analysis therefore gives further weight to the suggestion that disturbances by de novo variants of transcription regulation genes may be a pathogenetic mechanism in ALS. On the other hand, genes related to chromatin modification were not significant in the combined de novo analysis, with a high false discovery rate (FDR 5 0.27).

Discussion
Due to the late age of onset of ALS, and the possibility of incomplete penetrance, it is difficult to assess whether SALS is truly sporadic. For example, multiple system atrophy was once thought to be sporadic, but recently-identified compound heterozygous and recessive mutations in COQ2 segregate with the disease, and heterozygous mutations in the same gene predispose individuals to this disease 13 . Additionally, it has been reported that ALS patients harbour a greater number of rare homozygous segments than controls, and that these segments are longer and contain more genes 4 . This suggested further evidence for a recessive cause for apparently sporadic ALS, though our finding of no excess homozygosity in ALS patients does not support this hypothesis of long runs of homozygosity containing rare ALS susceptibility variants. This does not necessarily mean that recessive inheritance can be ruled out, just that long runs of homozygosity were not found in our cohort.
With our dataset of case-parent trios we tested the hypothesis that rare, recessive-acting variants could contribute to disease susceptibility. Indeed, a number of promising candidate genes with recessive or compound heterozygous variants were identified. For example, in one family we identified two extremely rare variants in ABCA2 that are highly conserved and are predicted to be damaging. ABCA2 encodes an ATP-binding cassette transporter and plays a role in intracellular sterol trafficking. It is highly expressed in the brain and regulates low-density lipoprotein metabolism in neuronal cells 14 . Dysregulation of ABCA2 is associated with amyloid beta deposition in Alzheimer's disease 15 , and ABCA2 null mice accumulate more gangliosides and sphingomyelin in neuronal tissue compared to wild-type mice 16 .
We identified a recessive variant in RAB25 in one ALS TRIO patient. This gene encodes a protein involved in membrane trafficking and has nucleotide binding capacity. A meta-analysis of genome-wide association studies showed that a common variant at the SYT11/ RAB25 locus is associated with Parkinson's disease in Caucasians 17 , suggesting a role for this gene in neurodegenerative diseases.
CACNA1H encodes a protein in the voltage-dependent calcium channel complex, and we identified two extremely rare damaging variants inherited as a compound heterozygote in one ALS TRIO patient. Dysregulation of calcium homeostasis in spinal and motor neurons has been previously demonstrated in mouse models of ALS 18 . This leads to altered excitability of motor neurons with modified synaptic activity and neuronal excitotoxicity 19 . Of interest, in presymptomatic ALS patients cortical hyperexcitability appears to be an early feature 20 . Our results give more weight to the idea that variants in voltage-dependent calcium channel genes play a role in ALS susceptibility. Functional annotation analysis of the recessive variants showed enrichment for genes that are involved in the dynein heavy chain (DNAH10, DNAH2 and DNAH9). This is of interest since defects in axonal transport have long been suspected to play a part in ALS 21 . A group of five genes, comprising the three dynein-related genes above, as well as ABCA2 and ATP8B3, were enriched for ATPase activity. Na,K-ATPase has been suggested to be involved in mutant-SOD1 ALS 22 , but data on the activity of other forms of APTase in ALS are sparse, despite the fact that altered energy metabolism is a possible mechanism in ALS 23 . Finally, the above five genes, as well as a further three (CNGA4, MYO3B and RAB25) are enriched for nucleotide binding activity. Of note, caution needs to be exercised in attributing importance to the variants in DNAH10 and MYO3B since exome sequencing frequently finds variants in these genes 24 .
Recent studies of individual SALS patients and their parents have identified de novo variants in ALS-associated genes such as FUS 25 and CREST 26 . Other sporadic disorders such as autism spectrum disorder demonstrate a similar pattern of recurrent de novo variants 27 . Interestingly, we identified a novel de novo initiator codon variant in CHRM1, a gene that also harbored a de novo missense variant in a previous ALS exome trio study 10 . This gene encodes a cholinergic receptor and is predicted to be involved in diseases of motor neurons and frontotemporal dementia, which is related to ALS. CHRM1 is predominantly expressed in the parasympathetic nervous system and influences the effects of acetylcholine in the central and peripheral nervous systems. In patients with Alzheimer's disease, loss of CHRM1 exacerbates cognitive decline 28 and increases amyloid pathology 29 . In spinal cord injuries significantly reduced gene expression of muscarinic cholinergic receptors intensifies motor dysfunction 30 . Our results further support the hypothesis that damaging variants in CHRM1 contribute to neurodegenerative disorders such as ALS.
Of note, CHRM1 was the only gene in which de novo variants (in different regions of the gene) were found in both our and the previous ALS trio exome study of Chesi et al. 10 , with the two studies containing a total of 91 ALS patients. This implies that, if de novo mutations do play a major part in ALS, large numbers of private mutations in different genes are likely to be responsible for the disease.
We identified a novel coding de novo variant in ITPR2. Common variations in this gene have been associated with ALS, with the expression of ITPR2 being increased in the peripheral blood of ALS patients 31 . ITPR2 is highly expressed in motor neurons where it encodes a calcium channel on the endoplasmic reticulum, the latter a site a great interest in ALS 32 . Dysfunction of ITPR2 with increased intracellular calcium may lead to motor neuron cell death 31 and overexpression of murine ITPR2 in the SOD1 G93A ALS mouse model damages cells by increasing the release of neuronal calcium 33 . In neuronal cell lines, oxidative stress leads to calcium dysregulation by upregulating ITPR2 expression, which increases calcium release into the nucleus 34 . The association of ITPR2 with ALS has not been replicated in other genome-wide association 35 or single nucleotide variant studies, though these only assayed common, and not rare, variants. Our results, on the other hand, suggest that rare variants in this gene may contribute to the pathogenesis of ALS.
Functional annotation analysis of our de novo variants showed seven that are related to transcription regulation (LIMD1, FOXN3, GTF2H4, MLL3, STK36, SND1 and TRRAP), which is in keeping with the findings of abnormal RNA transcription and processing in ALS 36 . Caution though needs to be exercised attributing significance to the de novo variant found in STK36 since this gene frequently contains variants in exome sequencing 24 . The finding of de novo variants in three genes related to the cell cycle (ANAPC7, FOXN3 and PSMB7) is in accord with suggestions that cell cycle abnormalities underlie some instances of ALS 37 .
It has been suggested that ALS may be caused by variants in a number of genes within one individual, the so-called oligogenic hypothesis 38 . Our findings support this hypothesis, since nine ALS TRIO patients had more than one gene with either a recessive or de novo rare variant. The Poisson distribution for novel de novo coding variants in our study suggests that these variants alone are unlikely to be involved in an oligogenic process. However, 75% of our ALS patients who had de novo variants had concurrent recessive or other de novo variants; only 3 patients had a single de novo variant, and in these it is quite possible that other recessive or de novo variants outside of the exome sequencing targets could play contributory roles.
Although evidence for an oligogenic mechanism for ALS was present in our present study, we looked only at single nucleotide variants. Other genetic abnormalities, such as copy number variants, DNA methylation 39 , or somatic mutations 40 could interact with the variants we found in our ALS TRIO patients to confer further susceptibility to disease. For example, when 12 of the present ALS TRIO patients had genome-wide CNVs analysed with microarrays in a previous study 40 , de novo CNVs were found in 11 of them ( Table 1). CNVs that overlapped with genes or promoters were found in eight of these patients, including three with multiple CNVs.
ALS trio studies are uncommon, and without access to parent DNA we do not know how many mutation-carrying parents of ALS patients never develop the disease, or develop it at a much older age than their offspring. ALS-associated variants were found in our study in four ALS TRIO patients as well as in an unaffected parent; one of these was in SOD1, two in C9orf72, and one in TDP-43 41 . Either environmental or modifying genetic variations could be responsible for this difference in phenotype between parent and offspring. Unaffected mutation-bearing parents could also carry a protective genetic variant elsewhere in their genomes. Our finding that all four of the above ALS TRIO patients had additional single nucleotide or copy number variants suggests that other genetic variants may be needed for the ALS phenotype to appear in some patients who have apparently single gene mutations.
Limitations of the present study are: (1) Our ALS patients had a younger average age of onset that is usual for this disease, so they could represent a different subgroup where genetic variants are more common than in most sporadic ALS. (2) A parent could present with the onset of ALS much later in life (a not uncommon clinical scenario), so we cannot be sure that the ALS in our trios was truly of an isolated/sporadic nature. (3) We did not analyse the whole genome, so potentially significant recessive or de novo variants in intronic or intergenic regions could not be detected. (4) Further assessment of compound heterozygote and de novo variant frequency in ALS will only be able to be undertaken once larger numbers of ALS trios become available, which will require an international collaborative effort. A number of groups are presently undertaking exome sequencing on large numbers of individual ALS patients, so whether the variants we found are truly private mutations or are more common will soon be known. Not having parental DNA, however, means these studies will not be able to determine whether the variants are actually recessive or de novo in nature. (5) Exome capture is inherently biased towards the creation of false positives. For this reason we imposed several quality control steps in an attempt to filter out false positives. For the de novo variants we used Polymutt software that takes into account the parental genotypes when calling a de novo variant in the offspring. Of the variants that did not validate, 14 were due to poor Sanger data quality in one or both of the parents and four were due to poor Sanger data quality in the offspring. Seven variants were actually homozygous reference in the child and two variants were present (but undercalled from exome data) in the parents, representing true false positives. Although there are slightly more www.nature.com/scientificreports de novo variants than would be expected from a true exome (,1 per family), most of these were non-coding or silent mutations. There were only 17 coding (15 missense) de novo variants out of the 44 families; this number is in line with de novo coding events in other exome studies 42 . (6) Because of our relatively small number of trios, we did not have sufficient numbers to undertake a rare variant transmission disequilibrium test (TDT) that would yield adequate statistical power 43,44 .
As is common in other genomics-based studies, we expect that the variants we found will be a springboard for other researchers to develop model systems to further explore their functionality. We consider all our 28 recessive and 17 de novo variants to be strong candidates for a role in ALS, since our vigorous in silico analyses ensured that we reported only validated variants that are rare and involved in processes or metabolic pathways implicated in ALS. Complex model systems will be needed to test the functionality of these variants, since testing has to take into account the probability that multiple variants are acting together, and that exposure to environmental toxins, such as heavy metals 45 and neurotoxic amino acids 46 , are also playing a part in the disease. Future studies using a combination of whole genome nucleotide sequences, structural variations, and epigenetic differences, using multiple tissues to look for somatic mutations, and obtaining DNA from multiple generations, are likely to be needed to uncover all the variants comprising the genetic contribution to sporadic ALS.
In conclusion, our exome sequencing of ALS-unaffected-parents trios has uncovered rare homozygous, compound heterozygous, and de novo variants that are likely to play a role in the pathogenesis of this disease. Most of these appear to be private variations, which implies that we will be unlikely to find any more mutations (such as those in C9orf72) that are common to large numbers of sporadic ALS patients. The implications of this study are four-fold: firstly, there are no previously published ALS trio exome studies showing the widespread occurrence of potentially deleterious compound heterozygous variants. Secondly, only one previous ALS trio study has demonstrated de novo variants, and our study confirmed these do occur, though in different genes (apart from one shared between the two studies), indicating that most are likely to be rare or private variants. Thirdly, we validated extremely rare, highly conserved, deleterious recessive mutations in our sporadic ALS patients. Hidden recessive inheritance in ALS has been hypothesised for many years, and we have now been able to show the importance of this mode of inheritance that could explain the sporadic nature of some ALS, possibly in combination with other recessive or de novo variants. Finally, our findings give the best evidence so far that oligogenic variants underlie much of sporadic ALS.

Methods
Ethics statement. The study protocol was approved by the Sydney South West Area Health Service Human Research Ethics Committee. Informed written consent was obtained from each individual for their DNA to be used for research purposes. All methods were carried out in accordance with the approved guidelines and regulations.
SALS patients and unaffected parents. Individuals selected for study were patients with ALS who had donated blood samples to the Australian Motor Neuron Disease DNA Bank, and whose ALS-unaffected parents had also given blood samples to the Bank. The diagnosis of ALS was made by a neurologist using standard criteria. For the purpose of this study, patients were considered to have ''sporadic'' ALS if they had no history of ALS in any family member at the time of blood sampling, even if an ALSassociated mutation was found in that patient and their family member. All ALS offspring in this study are referred to as ''ALS Trio'' (ALS TRIO ) patients.  ALSoD analysis. Data from the ALS online database (http://alsod.iop.kcl.ac.uk/) were downloaded, and variant calls from the exomes were intersected with the previously identified ALS susceptibility variants. The affected and unaffected carrier frequencies were calculated using GEMINI, a framework for exploring genome variation.
Homozygous segments. Thirty of 44 ALS probands were genotyped on the Illumina Human Omni Express Bead Chip. Control data were drawn from the 379 European descent 1000 Genomes individuals that were genotyped on the Illumina Omni 2.5 Bead Chip. Data for both sets were imported into the whole genome analysis tool PLINK (v.1.07) 47 and standard quality control procedures were applied. Samples were excluded if they had call rates ,95%, single nucleotide polymorphisms (SNPs) with minor allele frequencies , 0.01, Hardy-Weinberg equilibrium p-values , 0.0001, or non-random missingness in cases versus controls. The two datasets were combined and the intersection of the two marker lists was used for a total of 667,708 SNPs genome-wide. SNPs were pruned based on linkage disequilibrium using a ''light'' pruning scheme 48 , where SNPs with r 2 . 0.9 in a 50 SNP window were removed, leaving 307,288 SNPs. In addition, none of the cases were outliers from the 1000 Genomes European ancestry populations based on PLINK multidimensional scaling analysis. Runs of homozygosity (segments .2 Mb) were identified from the autosomal chromosomes in PLINK 4 and burden and association analyses were performed as previously described 4 , with the exception that the gene list was taken from the UCSC Table Browser hg19 RefSeq genes. Briefly, homozygous segments were coded as copy number variants and analyzed using the PLINK rare copy number variant burden and association analysis. p-values were generated from 100,000 case/ control permutations and statistical significance was set at a genome-wide corrected p-value , 0.05.

Functional implications.
To predict the functional implications of the identified variants, lists of de novo and recessive variants were generated from the exome sequencing data and submitted to the Database for Annotation, Visualization and Integrated Discovery (DAVID 6.7) 49 . The NimbleGen SeqCap EZ Human Exome gene list was used as a background. Potential functional enrichments and pathway analysis were explored, with p-values , 0.05 and false discovery rates , 0.1 selected as significant. Pathway analysis was also undertaken on the combined de novo variant findings in our and in the Chesi et al. ALS trio exome study 10 .