Introduction

Niemann-Pick type C (NPC) is a lethal autosomal recessive, neurodegenerative disorder with a clinical incidence of 1:104,000.1,2,3 This was considered to be a minimal estimate because of incomplete ascertainment of atypical phenotypes or limitations of current diagnostic testing. NPC is caused by disruption of either NPC1 or NPC2; mutations of NPC1 account for 95% of patients.1,2 Loss of function of either NPC1 or NPC2 results in the accumulation of unesterified cholesterol and glycosphingolipids within the late endosome/lysosome of all cells. Although the clinical presentation and progression of NPC is a continuous spectrum, patients can be classified into four general categories based on age at neurological onset. These categories are early infantile, late infantile, juvenile onset, and adolescent/adult onset.1 In the early infantile, late infantile, and juvenile forms of the disease patients may initially present with neonatal cholestasis or hepatosplenomegaly. A small subset of patients with NPC die of systemic liver disease, usually during the neonatal period.1 However, in the majority of NPC patients the liver disease frequently resolves, but neurological signs and symptoms follow.1,2 Neurological symptoms are insidious and heterogeneous in nature, often initially manifesting in a nonspecific manner (e.g., clumsiness or difficulty with school work) but commonly progress to include variable degrees of cerebellar ataxia, vertical supranuclear gaze palsy, gelastic cataplexy, seizures, and dementia. These neurological manifestations are invariably progressive4,5 and ultimately result in death.

The current diagnosis of NPC is based on filipin staining of unesterified cholesterol in cultured fibroblasts or molecular testing. Filipin staining requires a skin biopsy, is performed in only a few specialized diagnostic laboratories worldwide, and is not always conclusive. Molecular testing of NPC1 and NPC2 is also available; however, molecular testing in practice also has weaknesses. It is currently still inconclusive in 12–15% of the cases because of unknown pathogenicity of the changes, the lack of study of allele segregation, and the existence of one (possibly two) unidentified mutant allele. Combined with the frequently nonspecific and insidious nature of the neurological disease onset, the difficulty of diagnosis contributes to a diagnostic delay on the order of 4–5 years2 for the late infantile and juvenile forms of the disease. The diagnostic delay in the adolescent/adult-onset form is likely greater, but the full extent of that delay cannot be determined because of a limited number of reported cases. A sensitive blood-based diagnostic test, which detects elevated oxysterols, was recently developed; this test could be used to screen potential patients economically and rapidly.6

A number of therapies for NPC are actively being developed. Miglustat, a glycosphingolipid synthesis inhibitor, although not approved in the United States for treatment of NPC1, has been approved in the European Union and other countries for the treatment of NPC. 2-Hydroxypropyl-β-cyclodextrin has shown significant promise in both mouse and feline models of NPC1 (C. Vite, personal communication) and is currently in a phase I/II trial (NCT01747135) at the National Institutes of Health (NIH). The development of 2-hydroxypropyl-β-cyclodextrin for NPC1 has been reviewed by Ottinger et al.7 Other potential therapies under development include histone deacetylase inhibitors,8,9,10 heat shock protein 70 (unpublished data F.M.P.), and δ-tocopherol.11 Given the rapid development of potential therapeutic interventions, it is critical that the incidence of NPC and its full clinical spectrum be fully defined.

An increasing number of patients with adult-onset NPC are being reported.1,12,13,14 Psychiatric symptoms can be prominent,12,13,14,15 although affected adults without neurological manifestations have also been reported.16,17,18 The full phenotypic spectrum of adult-onset NPC disease has yet to be delineated. This led us to question whether the incidence of NPC might be greater than previous clinical estimates because of incomplete ascertainment. To estimate the incidence of NPC in a manner that is independent of clinical recognition of cases, we sought to determine a pathogenic carrier frequency of NPC1 and NPC2 variants using data from four independent, massively parallel exome sequencing projects, or next-generation sequencing projects. Our data indicate that the classical incidence of NPC likely occurs at the clinically predicted rate of ~1:90,000 and suggest that there may be a late-onset phenotype or variant form with an incidence potentially as high as 1:19,000–36,000.

Materials and Methods

We recently reported the determination of the pathogenic allele frequency of the 7-dehydrocholesterol reductase gene.19 We used a similar approach for the determination of the variant frequency in NPC.

Data sets

Four large, independent, massively parallel exome sequencing projects, or next-generation sequencing projects, were used. These data sets are the National Heart, Lung, and Blood Institute GO Exome Sequencing Project (ESP),20 version 3 release of the 1000 Genomes Project,21 ClinSeq,22 and a database from an NIH interinstitute collaboration on autism (primary investigators: F.D.P., J.E.B.-W., E Tierney, and A Thurm). ESP contributed a maximum number of 13,006 chromosomes, the 1000 Genomes Project contributed 2,184 chromosomes, ClinSeq contributed 1,902 chromosomes, and the NIH interinstitute collaboration on autism project contributed 662 chromosomes. Thus, a maximum total of 17,754 chromosomes were analyzed, and this number was used as the denominator in total frequency calculations. None of these data sets included patients evaluated for NPC, nor did we identify any individuals with two pathogenic mutations, so we considered them to be unbiased with respect to variation in NPC1 and NPC2.

Determination of variant calls and annotation

Variant calls were downloaded for regions overlapping NPC1 and NPC2, by Perl script, for every base of the coding exons plus or minus five base pairs of exon sequence, when available. Mutations were annotated using SNPnexus23 with Refseq annotations24; pathogenicity predictions were performed using PolyPhen-2,25 SIFT,26 and Mutation assessor.27 Intronic variations detected within five bases of intron–exon boundaries were analyzed by MaxEntScan.28 Untranslated region variations were excluded from the analysis of these data sets.

Determination of pathogenicity of the variant call

Determination of the pathogenicity of a variant allele was a multistep process that used both bioinformatic tools and manual curation. We began by comparing the variants found in the data sets against the professional version of the Human Gene Mutation Database (HGMD)29 and the existing database of 78 patients with NPC1 in the NIH Cohort (primary investigator: F.D.P.) to determine which variants had been previously identified in patients known to have NPC. Because inclusion in HGMD does not require identification in a patient, the primary literature was reviewed to determine the nature and manner in which the variations were detected. Variants were mapped onto known protein tertiary structures as part of the bioinformatic approach, identifying variable to conserved residues and possible interactions ( Figures 1 and 2 ). The variant NPC2 protein was modeled using I-TASSER.30

Figure 1
figure 1

Mapping of the coding variants onto the known structure of NPC2. Probably damaging mutations are labeled with red circles. The human NPC2 structural model (from positions 20 to 149) was created using Modeller based on the bovine NPC2 structure (PDB:2HKA).37 The human NPC2 ribbon is colored according to evolutionary conservation using ConSurf server.38,39 Cholesterol sulfate (from PDB:2HKA)37 is shown in sticks. β strands are labeled (A–G).

Figure 2
figure 2

Mapping human N-terminal domain (NTD) NPC1 mutants. Probably and possibly damaging mutations are labeled with red circles. The human NTD NPC1 (PDB:3GKI)40 ribbon was colored according to evolutionary conservation using the ConSurf server.39,40 Cholesterol is shown in sticks. None of the NTD NPC1 mutants is located at residues that interact with cholesterol.

Single coding nucleotide variants were interrogated in silico by three different predictive software packages: PolyPhen-2,25 SIFT,26 and Mutation assessor.27 PolyPhen-2 provides a predicted assignment of “benign,” “possibly damaging,” or “probably damaging,” as well as a false discovery rate for each single coding nucleotide variant call. For the determination of the pathogenesis of a single coding nucleotide variant, PolyPhen-2 calls of possibly damaging or probably damaging were considered pathogenic. SIFT uses the same terminology as PolyPhen-2, and the same approach was used. Mutation assessor has four predictive determinants: predictive nonfunctional low, nonfunctional neutral, functional (medium), and functional (high). Mutation assessor predictions of functional (medium) and functional (high) where considered pathogenic. When the three predictive algorithms were discrepant and no published data supporting pathogenicity were available, we accepted the prediction of two of the three programs. Potential splice variants were processed in MaxEntScan28 to provide a predictive determination of the variant’s affect on splicing; these were reported as “strongly negative,” “negative,” or “neutral.” Potential splice variants that were classified as negative or strongly negative were considered to be pathogenic. All pathogenic variants were assumed to be fully penetrant.

Determination of predicted disease incidence

Once potential pathogenic variants were identified and a carrier frequency determined, the predicted disease incidence was calculated assuming a Hardy-Weinberg equilibrium. For this estimate we assumed that all pathogenic variants were fully penetrant. The Hardy-Weinberg equilibrium model also assumes that allelic variation is at equilibrium and thus is not undergoing active selective pressure. Given that NPC1 is a receptor for filoviruses and its association with body mass, an assumption of neutral selection may not be correct. However, Al-Daghri et al.31 concluded selective pressure on NPC1 in humans is weak to neutral. We made the assumption that allelic frequencies were consistent across different ethnic groups represented in our data set. The potential error making this assumption is greatest for the ESP cohort, given that it includes a large number of individuals of either European or African descent. We evaluated our data for reduction of heterozygosity resulting from ethnic difference (the Wahlund effect) by determining a weighted frequency; however, only negligible changes were observed for any of the NPC1 or NPC2 pathogenic alleles (data not shown). Given the negligible effect the weighted frequencies were not applied to carrier frequency calculations.

Cloning and sequence analysis of the c.441+1G>A variant discovered in NPC2

Two heterozygous Epstein-Barr virus–transformed lymphoblast cell lines for the c.441+1G>A variant, NIMH 42 and NIMH 77, were identified in the NIH interinstitute collaboration on autism. These two lines and one control line were grown under standard growth conditions in 15% fetal bovine serum in RPMI (Life Technologies, Foster City, CA) for 3 days. Cell pellets were isolated and messenger RNA isolated per the manufacturer’s protocol (Qiagen, Gaithersburg, MD). Forward primer NPC2-F3 5′-GGTGGAGTGGCAACTTCAGG-3′ and reverse primer NPC2-R2 5′-CACTGGATACCATTGGAGAGC-3′ were used to reverse transcribe the messenger RNA using the Superscript III One-step RT-PCR System (Life Technologies). The complementary DNA was visualized on a 1.5% agarose gel. One band was observed for Control and two for NIMH 42 and NIMH 77. All bands were gel purified and cloned into the TOPO TA Cloning Kit for Sequencing (Life Technologies). Isolated colonies were grown overnight in Luria broth-ampicillin, and plasmid DNA was isolated (Qiagen). Sequencing was performed on a 3500xL Genetic Analyzer (Life Technologies) using a BigDye sequencing kit, per the manufacturer’s protocol.

Results

Analysis of exomic sequence data from 17,754 chromosomes compared with the human reference sequence for NPC1 and NPC2 led to the identification of 16,455 and 271 nonsynonymous sequence variants in NPC1 and NPC2, respectively. The 16,455 variants identified in NPC1 were comprised of 147 distinct variants that included 129 coding single-nucleotide base variants, 9 splice-site changes, and 9 insertions/deletions (indels; Supplementary Table S1 online). The 271 nonsynonymous variants identified in NPC2 included 14 distinct changes consisting of 12 coding single-nucleotide base variants and 2 splice-site changes ( Table 1 ).

Table 1 The 271 distinct variants detected in NPC2

The HGMD29 was queried to establish what observed variants in this data set might be pathogenic. For NPC1 (Supplementary Table S1 online) and NPC2 ( Table 1 ), 33 (32 pathogenic variants and 1 benign variant) of 147 variants (22.4%) and 5 of 14 variants (35.7%), respectively, had previously been reported in HGMD. One additional novel NPC1 variant, c.2524T>C (p.F842L), was present in the NIH cohort (Supplementary Table S1 online). The combination of PolyPhen-2, SIFT, and Mutation assessor classified 53 NPC1 and 8 NPC2 coding nucleotide variants as pathogenic based on our criteria. Of the predicted pathogenic variants, 27 (51%) and 6 (75%) have not been reported in HGMD for NPC1 and NPC2, respectively. PolyPhen-2 also calculates a false discovery rate. For predicted NPC1 and NPC2 variants, the average false discovery rate for a prediction of probably or possibly damaging were 0.04% and 0.03%, respectively. These low mean false discovery rates had a negligible effect on the carrier incidence estimate and thus were not applied to either NPC1 or NPC2 carrier frequency calculations.

For NPC1 and NPC2, two of nine and one of two potential splice mutations, respectively, were predicted to be pathogenic. Of the nine indels identified in NPC1, a two–base pair deletion, c.2020_2021del, was observed 319 times only in the ESP data set and thus was removed as a technical artifact unique to the ESP data set. The eight other NPC1 indels result in a frameshift and thus were considered pathogenic. No indels were identified in NPC2.

Based on the above analysis, for NPC1 we initially considered the 68 distinct variants meeting the criteria of pathogenic (54 identified by predictive software to be pathogenic, 4 indicated by the predictive software as benign but known to be pathogenic, the 2 splice variants, and the 8 indels). This accounted for 371 pathogenic alleles, with an estimated carrier rate of 2.09% (371/17,754) and a predicted NPC incidence of 1/9,160. Given the order of magnitude difference between this number and clinical estimates, this prediction is likely a significant overestimation. Thus we applied manual curation to the NPC1 data set. Four variants—c.665A>G (p.N222S), c.1532C>T (p.T511M), c.2882A>G (p.N961S), and c.3598A>G (p.S1200G)—accounted for 254 of the 371 predicted pathogenic alleles (68%). Allelic frequencies for these four alleles were 0.400, 0.287, 0.389, and 0.355%, respectively. Given that their individual allelic frequencies exceed by more than a factor of 10 the allelic frequency of p.I1061T (0.028%), the most commonly reported mutant allele in patients with mutations in NPC1 (Supplementary Table S1 online), it is not plausible that these alleles are associated with classical NPC disease. Excluding these four high-frequency variants based on this assertion left 117 pathogenic alleles, or a 0.659% (117/17,754) carrier rate. This carrier rate predicts an incidence of NPC attributable to NPC1 of 1/92,104.

We further evaluated the decision to exclude the four high-frequency alleles based on lack of an association with classical NPC disease. Although all three predictive packages indicate both p.N222S and p.N961S to be nonpathogenic, these two variants have been reported in “visceral-only” or adult-onset NPC1 cases. The p.N222S variant was reported in combination with a p.I1061T mutation in a single patient (35 years old) with adult-onset NPC with variant filipin staining.32 This patient initially presented with visceral disease (hepatosplenomegaly) and later manifested ataxia at 44 years of age. We identified a p.N222S variant in combination with c.1402T>G, (p.C468G) in teenage sisters diagnosed based on splenomegaly. The second allele in this sibling pair, p.C468G, is predicted by PolyPhen-2 to be probably damaging. Pathological analysis of the spleen in the older sibling was suggestive of Niemann-Pick disease, but filipin staining was inconclusive. Neurological symptoms were absent and signs were very minor, with deep tendon hyperreflexia and minor auditory brainstem response abnormalities noted upon evaluation at 15 and 13 years of age, respectively. NIH severity score for both was 1.4 Plasma oxysterol concentrations were consistent with a diagnosis of NPC in these two subjects. Mapping of p.N222S to the known tertiary structure provided no additional evidence for the pathogenicity of this residue ( Figure 2 ). The p.N961S (c.2882A>C) variant has been reported in a compound heterozygous state with p.S666N (c.1997G>A, with a PolyPhen-2 prediction of probably damaging) in an adult case with subclinical hepatosplenomegaly and lymphadenopathy noted on autopsy following death caused by acute pulmonary embolism and myocardial infarction.16 Although no neurological symptoms were reported, brain pathology was notable for distended neurons with increased lipofuscin granules. Assuming one or both of these variants are pathogenic, fully penetrant, and associated with late-onset NPC disease, the total disease incidence of NPC1 would range from 1/19,077 to 1/36,420.

Although predicted to be probably damaging by PolyPhen-2, neither p.T511M nor p.S1200G have been reported in NPC1 patients. Millat et al.33 reported p.T511M as a novel nonpathological coding single-nucleotide variant. The p.S1200G variant was reported in an “NPC uncertain” case in the recent ZOOM study.14 This subject, patient 5 in the study, was a compound heterozygote for p.V664M, a known NPC1 mutation, but plasma cholestane-3β,5α,6β-triol testing6 was negative. Current data do not support classification of either p.T511M or p.S1200G variants as pathogenic alleles.

Sequence analysis of NPC2 ( Table 1 ) identified 151 potential pathogenic alleles and calculated a pathogenic carrier frequency of 0.85% (151/17,754). Again, the predicted disease incidence (1/55,297) did not seem to be realistic unless one proposed an extreme degree of underascertainment. Thus we similarly applied manual curation to the NPC2 data set. Review of the NPC2 data identified two high-frequency variants that dominated the frequency calculation: c.441+1G>A and c.88G>A (p.V30M); both variants are reported in the HMGD. The splice variant c.441+1G>C was predicted to be “strongly negative” by MaxEntScan. Molecular analysis of independent cell lines revealed multiple splicing events. The most prominent errant splicing event results in the insertion of 16 bases, which leads to the alteration of the terminal 4 amino acids and the addition of 86 additional amino acids to the protein (Supplementary Figure S1 online). Multiple lines of evidence strongly indicate that this errant splicing results in a functional protein. First, the variant has not been reported in association with a patient with NPC. Second, modeling of the variant protein using I-TASSER30 found no alterations to the cholesterol binding pocket or stability of the protein (data not shown). Finally, Huang et al.34 demonstrated that generation of an NPC2 fusion protein with mCherry fused to the carboxy-terminal end of the protein is fully functional and is able to correct the NPC cellular phenotype in Npc2−/− mouse embryonic fibroblasts. As such, we excluded c.441+1G>C as a pathogenic allele. The p.V30M variant, with a allelic frequency of 0.197%, is predicted to be possibly damaging by PolyPhen-2 and SIFT but is considered nonpathogenic by Mutation assessor. The one reported subject with NPC with the p.V30M variant was classified as having a phenotypic NPC variant, a second mutation was not identified, and near normal levels of cholesterol esterification were reported in skin fibroblasts.35 Inclusion of the p.V30M allele predicts a disease incidence of NPC attributable to NPC2 of 1/402,400 and that NPC2 should account for 18.6% of patients with NPC. This latter prediction conflicts with clinical data indicating that NPC2 accounts for only 2–5% of all patients with NPC.1,2 Sequence alignment and structural analyses demonstrate that the p.V30 residue is not evolutionarily conserved and is present in a structurally variable region of the NPC2 protein well away from its binding pocket ( Figure 1 ). Furthermore, p.V30M occurs a higher frequency than any known pathogenic NPC2 allele; this, coupled with the lack of evidence supporting functional importance and ultimately the lack of any clinical correlation, has lead us to exclude p.V30M as a pathogenic allele. We are, therefore, left with 21 pathogenic alleles (0.118% carrier frequency) and a predicted disease incidence for NPC2 of 1/2,858,998 conceptions.

Based on the above analysis of both NPC1 and NPC2, the combined incidence is predicted to be 1/89,229, or 1.12 cases per 100,000 conceptions, and the fraction of NPC2 cases is predicted to be 3.1%. The predicted number of cases is slightly more than the 0.96 cases per 100,000 conceptions reported by Vanier2 when she accounted for prenatal cases, and the fraction of NPC2 cases is consistent with prior clinical observation of 2–5%.1,2,3

Discussion

The impact of NPC1 variation on human health may be significant. Work by multiple groups has demonstrated that c.644A>G (p.H215R) is associated with obesity.36 In this analysis of NPC1 variants, we identified the p.H215R variant in almost a third of the NPC1 alleles. Our work now demonstrates that two relatively common NPC1 variants, with a combined carrier frequency approaching 0.8%, may contribute, in a compound heterozygous state, to a late-onset NPC1 phenotype for which the phenotypic spectrum and clinical significance remains to be defined. This late-onset NPC1 phenotype may represent a milder manifestation of NPC1 deficiency with predominately visceral manifestations. The degree to which this late-onset NPC1 phenotype is associated with high-frequency NPC1 alleles and the adult-onset NPC1 phenotype that includes significant neurological and psychological symptoms also remains to be defined.

Failure to ascertain certain alleles in patients, such as the p.V30M in NPC2 or the p.T511M and p.S1200G in NPC1, could be due to prenatal lethality; however, because NPC is an autosomal recessive disorder, it is difficult to hypothesize a plausible mechanism, such as a dominant inhibitory function, by which these alleles would uniquely result in prenatal lethality.

Based on clinical case reports, one needs to consider the possibility that p.N222S and p.N961S maybe pathogenic, with allelic frequencies of 0.400 and 0.389%, respectively. The evidence for clinical relevance is strongest for p.N222S, which has been observed in two independent cases with similar visceral and delayed neurological manifestations, variant filipin staining in fibroblasts, and positive plasma oxysterol testing in siblings. Assuming pathogenicity is related to a compound heterozygous state and full penetrance, the combined frequency of p.N222S with another pathogenic NPC1 mutation would be 1/35,667. Although only limited data are available, if one includes p.N961S based on a single report with no supporting diagnostic testing, the incidence of a late-onset variant of NPC1 disease would increase to 1/19,077. Another possible explanation for these high predicted incidences is that some individuals harboring these variants, either in combination with another pathogenic allele or in the homozygous state, may be asymptomatic or manifest only subclinical signs.

Leveraging existing “whole-exome sequence” data, we estimated the disease incidence of NPC using both bioinformatic tools and manual curation. With respect to classical NPC disease, we estimate that the incidence of NPC1 and NPC2 are on the order of 1/92,000 and 1/2,900,000, respectively, with a combined incidence of ~1/89,000. These estimates are in agreement with previous clinical estimates. Thus, our data do not support significant underascertainment of classical NPC cases. Concurrence with clinical data also suggests that we are not missing a significant number of alleles, such as large indels or intronic mutations that are not detected by whole-exome sequencing. However, our data suggest that there may be significant underascertainment of a late-onset NPC1 phenotype. This late-onset phenotype may present as visceral-only or neurological mild NPC1 and have a potential incidence of 1/19,000–1/36,000. Further work is necessary to fully delineate this late-onset NPC1 phenotype, but this study suggests that NPC should be considered in individuals with visceral lipidosis or unexplained neurological and psychiatric symptoms.

Disclosure

L. G. B. is an uncompensated adviser to the Illumina Corporation and receives royalties from the Genentech Corporation. The other authors declare no conflict of interest.