Introduction

Narcolepsy is a sleep disorder characterized by excessive daytime sleepiness, cataplexy (sudden loss of muscle tone in response to strong emotions), and pathological manifestations of rapid eye movement (REM) sleep, such as hypnagogic hallucinations, sleep paralysis and sleep-onset REM sleep. Patients usually develop narcolepsy around adolescence. It affects both men and women equally, and its prevalence is 0.16–0.18% in Japan and 0.02–0.06% in the United States and Europe.1, 2 The relative risk of narcolepsy in first-degree family members of patients with narcolepsy is 10- to 40-fold higher than that in the general population.1

Multiple factors are associated with the development of narcolepsy, including genetic variations at several loci. Initially, it was discovered that narcolepsy was closely associated with a human leukocyte antigen (HLA) class II allele, HLA-DQB1*06:02.3, 4, 5, 6, 7 Almost all narcoleptic patients carry HLA-DQB1*06:02. This HLA allele is thought to be a requisite for the development of narcolepsy, but it cannot fully explain the onset of narcolepsy because 10–40% of individuals in the general population carry it as well. Genome-wide association studies have identified several narcolepsy susceptibility loci: a locus near CPT1B (carnitine palmitoyltransferase 1B), which may be involved in a new pathogenic mechanism related to fatty acid oxidation,8 as well as TRA@ (T-cell receptor alpha) and P2RY11 (purinergic receptor P2Y, G-protein coupled, 11).9, 10, 11, 12 These discoveries provided evidence that narcolepsy is caused by the immune system attacking brain cells.

The destruction of hypothalamic neurons that regulate wake-promoting neuropeptide hypocretin (orexin) causes narcolepsy.13, 14 The amount of hypocretin-1 in the cerebrospinal fluid of narcoleptic patients is low or undetectable.15, 16 However, mutations and polymorphisms in prepro-hypocretin and hypocretin-receptor genes do not contribute considerably to the onset of narcolepsy, except for rare cases.17, 18, 19, 20

Essential hypersomnia (EHS) is another type of sleep disorder that is characterized by excessive daytime sleepiness. However, EHS patients do not exhibit cataplexy. Both genetic and environmental factors are considered to contribute to the development of EHS.15, 21 Diagnosis of EHS is made based on the following criteria in central nervous system (CNS) hypersomnias: (i) recurrent daytime sleep episodes that occur basically everyday over a period of at least 6 months; (ii) absence of cataplexy; and (iii) the hypersomnia is not better explained by another sleep disorder, medical or neurological disorder, mental disorder, medication use or substance use disorder.22, 23, 24, 25 If we applied criteria of International Classification of Sleep Disorders 2nd edition (AASM2005) to EHS patients, they would be classified either two CNS hypersomnia such as narcolepsy without cataplexy or a part of idiopathic hypersomnia without long sleep time. Previous studies have reported that EHS and narcolepsy share several susceptibility genes of note,22, 23, 24, 25, 26 approximately 30–50% of EHS patients carry HLA-DQB1*06:02.22, 23 In addition, EHS is associated with narcolepsy susceptibility single-nucleotide polymorphisms (SNPs) in CPT1B and TRA@,22, 26 and it has been reported that EHS and narcolepsy share partially similar pathogenic mechanisms.25

In this study, we evaluated the contribution of common variants associated with narcolepsy onset. The estimations resulting from this study will be useful for designing future strategies to identify the indeterminate genetic factors involved in narcolepsy onset.27, 28 Also, we assessed the extent of genetic share between narcolepsy and EHS. This assessment of the genetic share between narcolepsy and EHS can enable phenotypic classification based on the genetic architecture. In addition, the reliability of our results was assessed by including individuals with panic disorder and autism.

Materials and methods

Subjects

The participants in this study were all Japanese and included 426 narcoleptic patients,29 46 EHS patients with HLA-DQB1*06:02, 125 EHS patients without HLA-DQB1*06:02,30 432 individuals with panic disorder,31 246 individuals with autism32 and 2072 healthy individuals. All narcoleptic patients carried HLA-DQB1*06:02. Among the 171 EHS patients, 46 individuals possessed HLA-DQB1*06:02 (EHS with HLA-DQB1*06:02), while the remaining 125 EHS patients did not (EHS without HLA-DQB1*06:02).22 Each patient with narcolepsy or EHS had received a diagnosis at the Neuropsychiatric Research Institute by specialists for sleep disorders. Data from individuals with neuropsychiatric disorders, including panic disorder and autism, were obtained in previous studies31, 32, 33, 34 and were included for analysis in this study.

Data from healthy individuals genotyped in previous studies31, 32, 33, 34, 35 were also included for analysis in this study as healthy controls. The control subjects did not have a history of narcolepsy or CNS hypersomnia. Age and gender were not matched between the cases and the controls. Ethical approval was obtained from the local institutional review boards of all participating organizations. All individuals provided written informed consent for participation in the study.

Genotyping and quality control

All samples were genotyped for 906 622 SNPs by the Affymetrix Genome-Wide SNP Array 6.0 (Affymetrix, Santa Clara, CA, USA) (http://www.affymetrix.com). Genotype calling was performed with the Birdseed algorithm in Birdsuite (http://www.broadinstitute.org/).36 Quality control procedures were performed using PLINK (http://pngu.mgh.harvard.edu/~purcell/plink/) (Supplementary Figure 1).37 Samples with overall call rates lower than 97% were excluded. Samples from individuals who had reported a family relationship with other participants or with a mean probability of identity-by-descent (PIHAT) value greater than 0.125, as calculated by PLINK, were excluded, until only one sample remained out of the reported or estimated family members. Outliers in principal component analysis were also excluded to eliminate population stratification using EIGENSOFT (http://www.hsph.harvard.edu/alkes-price/software/).38 In the principal component analysis, data from 91 JPT (Japanese in Tokyo, Japan), 90 CHB (Han Chinese in Beijing, China), 180 CEU (Utah residents with Northern and Western European ancestry) and 180 YRI (Yoruba in Ibadan, Nigeria) obtained from the HapMap Project (http://hapmap.ncbi.nlm.nih.gov/)39 were also included. Data from the HapMap populations and the present sample sets were combined after the quality control steps described above were performed. SNPs were excluded if (1) the minor allele frequencies (MAFs) were less than 0.05; (2) P-values from the Hardy–Weinberg Equilibrium (HWE) test either for patient groups or healthy control groups were less than 0.001; (3) SNP call rates were less than 99%; or (4) SNPs were located within the HLA region (chr6: 27 539 703–35 377 701), hg18 or sex chromosomes. After the quality control steps, 476 446 SNPs in 393 narcoleptic patients, 38 EHS patients with HLA-DQB1*06:02, 119 EHS patients without HLA-DQB1*06:02, 376 individuals with panic disorder, 213 individuals with autism and 1582 healthy individuals were analyzed in this study (Supplementary Figure 1).

Statistical analysis

Polygenic risks were calculated using SNPs with r2<0.25 within a 200-kb window in order to exclude secondary associations due to linkage disequilibrium. The model of logistic regression analysis was

where x1 is the collective polygenic risk score (0–1) for each individual. Simply, collective polygenic score for an individual is sum of the number of risk alleles of SNPs which the individual is possessing. Actually, when we summed the number of risk alleles, risk alleles were weighted based on the chi-square values of them. To calculate this collective polygenic score, ‘—score’ function in PLINK was utilized. Nagelkerke’s pseudo R2 was utilized to estimate the ability of polygenic risks to explain the disease onset. As narcolepsy is closely associated with HLA-DQB1*06:02, the region (chr6: 27 539 703–35 377 701), hg18 linked to HLA risk3, 4, 5, 6, 7 was excluded to eliminate the HLA region linkage complexity. Instead, SNP rs7744293, which is in high linkage disequilibrium with HLA-DQB1*06:02 (r2=~0.8), was included in the logistic regression model as a surrogate marker so that estimation of polygenic risks of narcolepsy reflects HLA-DQB1*06:02 effects,

where x1 is the collective polygenic risk score (0–1) for each individual and x2 is 0 or 1 for positivity of the risk allele of SNP rs7744293, which surrogates positivity of the HLA-DQB1*06:02 allele. In addition, to assess risks explained by reported susceptibility SNPs, that is, CPT1B, TRA@ and P2RY11, in the Japanese population, SNPs that were 500 kb upstream and downstream of the reported SNPs were included.8, 9, 10 The logistic regression analysis was conducted using PredictABEL packages (http://www.genabel.org/) provided in R (http://www.r-project.org/).40, 41 In this assessment, we did not control the difference of case–control ratios between the sample set in this study and the Japanese population. In addition, GCTA (http://cnsgenomics.com/software/gcta/)42 was utilized to enforce reliability of our analysis. For GCTA software, a total of 1975 samples, consisting of 393 narcoleptic patients and 1582 healthy individuals, were analyzed. We also estimated the required sample size using the power.prop.test3 function provided in R (http://aoki2.si.gunma-u.ac.jp/R/power_prop_test2.html). All statistical procedures were conducted using PLINK (http://pngu.mgh.harvard.edu/~purcell/plink/) and R (http://www.r-project.org/).37, 41

The study design is shown in Supplementary Figure 2. A permutation analysis was conducted in order to randomize samples used in each stage, the discovery stage and the test stage. The case–control status of samples was kept in this permutation analysis. The permutation was repeated 1000 times. In addition, in the test stage, the same healthy controls across different diseases within one permutation were included.

Results

A total of 3347 samples were genotyped using Affymetrix Genome-Wide SNP Array 6.0 and the Birdseed algorithm. Samples were excluded based on overall call rate and family relationships. SNPs were excluded based on MAF, HWE, SNP call rates, their locations and linkage disequilibria, the HLA region and sex chromosomes. In the present study, 476 446 SNPs in narcoleptic patients (n=393), EHS patients with HLA-DQB1*06:02 (n=38), EHS patients without HLA-DQB1*06:02 (n=119), individuals with panic disorder (n=376), individuals with autism (n=213) and healthy controls (n=1582) were analyzed after the quality control procedures (Supplementary Figure 1).

One thousand permutations were performed in the discovery stage for genome-wide association studies of narcolepsy. The average lambda value of the 1000 permutations was 1.004, suggesting no population stratification. For each permutation, the narcolepsy risk effects of the SNPs were defined.

Polygenic risks were computed for each set of narcoleptic patients and healthy controls in the test stage at seven different P-value thresholds (P<0.001, 0.01, 0.1, 0.2, 0.3, 0.4 and 0.5). As a result of the 1000-permutation analysis, narcolepsy polygenic risks were estimated to be 58.1% (PHLA-DQB1*06:02=2.30 × 1048, Pwhole genome without HLA-DQB1*06:02=6.73 × 10−2) (Table 1 and Figure 1). In addition, estimation by GCTA was 59.7% (standard error=7.6%, P<0.05). Polygenic risks of narcolepsy other than the HLA region were estimated to be 1.3% (Pwhole genome without HLA-DQB1*06:02=2.43 × 10−2) (Table 2 and Figure 2). As SNPs with smaller P-values were included into the analysis, total risk continued to increase until SNPs with a P-value of <0.5 were included. This suggested that small-effect SNPs contributed to the onset of narcolepsy. In addition, the proportion of risks, which has been explained by reported susceptibility SNPs in the Japanese population, was calculated. Using only SNPs located 500 kb upstream and downstream of the reported narcolepsy susceptibility genes CPT1B, TRA@ and P2RY11, the result was 0.8% (Pwhole genome without HLA-DQB1*06:02=9.74 × 10−2) (Table 2 and Figure 2).

Table 1 Phenotypic variance explained by polygenic risks for narcolepsy, including HLA-DQB1*06:02 effects
Figure 1
figure 1

Phenotypic variance explained by polygenic risks for narcolepsy, including HLA-DQB1*06:02 effects. Nagelkerke’s pseudo R2 was utilized to estimate the ability of polygenic risks to explain the disease onset. A full color version of this figure is available at the Journal of Human Genetics journal online.

Table 2 Phenotypic variance explained by polygenic risks for narcolepsy in each disease
Figure 2
figure 2

Phenotypic variance explained by polygenic risks for narcolepsy, excluding HLA-DQB1*06:02 effects. Nagelkerke’s pseudo R2 was utilized to estimate the ability of polygenic risks to explain the disease onset. A full color version of this figure is available at the Journal of Human Genetics journal online.

Next, we aimed to assess the ability of polygenic risks for narcolepsy to explain the onset of EHS with and without HLA-DQB1*06:02. Overall, no statistical significant estimates of narcolepsy polygenic effects on EHS either with or without HLA-DQB1*06:02 were observed when the HLA effects were excluded (Table 2 and Figure 2). However, EHS patients with HLA-DQB1*06:02 tended to have higher polygenic risks for narcolepsy than EHS patients without HLA-DQB1*06:02 (EHS with HLA-DQB1*06:02: 1.4%, Pwhole genome without HLA-DQB1*06:02=1.56 × 101, EHS without HLA-DQB1*06:02: 0.4%, Pwhole genome without HLA-DQB1*06:02=3.06 × 10−1) (Table 2 and Figure 2). We found a significant similarity between narcolepsy and EHS with HLA-DQB1*06:02 including HLA-DQB1*06:02 effects (narcolepsy: 58.1%, PHLA-DQB1*06:02=2.30 × 10−48, Pwhole genome without HLA-DQB1*06:02=6.73 × 10−2, EHS with HLA-DQB1*06:02: 40.4%, PHLA-DQB1*06:02=7.02 × 1014, Pwhole genome without HLA-DQB1*06:02=1.34 × 101) (Table 1 and Figure 1).

To test disease specificity, polygenic analyses were extended to other neuropsychiatric diseases, including panic disorder and autism. Polygenic risks for narcolepsy did not explain the phenotypic variance of these neuropsychiatric diseases (panic disorder: 0.1%, Pwhole genome without HLA-DQB1*06:02=5.20 × 10−1, autism: 0.2%, Pwhole genome without HLA-DQB1*06:02=4.75 × 10−1) (Table 2 and Figure 2).

We also assessed the possibility that the healthy controls in the present study led to spurious estimations. All healthy controls were divided into pseudo-case and pseudo-control groups; and the polygenic risks of these groups were estimated using the same methodology in addition to shuffling 'pseudo' case–control status. Results showed that there was no polygenic contribution among healthy controls (0.2%, P=5.10 × 101).

We found that small-effect SNPs contributed to the onset of narcolepsy; thus, the sample sizes were estimated to detect remaining small-effect susceptibility SNPs (Supplementary Figure 3). A total of 42 000 narcoleptic patients would be required to obtain an odds ratios of 1.1 for the targeted SNPs, a risk allele frequency of 0.2 in the controls, a case/control ratio of 0.25 and an α-level of 5.00 × 108.

Discussion

In this study, a polygenic analysis of narcolepsy and EHS was performed. We evaluated the extent of genetic factors for narcolepsy, both including and excluding HLA-DQB1*06:02 effects. We utilized 426 narcoleptic patients,29 46 EHS patients with HLA-DQB1*06:02 and 125 EHS patients without HLA-DQB1*06:02,30 which derived from previous reports of our team. Polygenic risks of narcolepsy were calculated to be approximately 58.1% (PHLA-DQB1*06:02=2.30 × 1048, Pwhole genome without HLA-DQB1*06:02=6.73 × 102) (Table 1 and Figure 1) and 1.3% excluding HLA effects (Pwhole genome without HLA-DQB1*06:02=2.43 × 102) (Table 2 and Figure 2). In addition, estimation by GCTA was 59.7% (standard error=7.6%, P<0.05), being concordant between the different methodologies. HLA-DQB1*06:02 is thought to be essential in narcolepsy because almost all narcoleptic patients possess this allele. However, the allele itself cannot fully explain the onset of narcolepsy, because 10–40% of individuals in the general populations carry this allele whereas the prevalence of the disease is around 0.1%.1 More than 50% of the first-degree relatives of individuals with narcolepsy possess this allele, but the risk of narcolepsy in these first-degree relatives is only 1–2%.43 In addition, two-thirds of monozygotic twins are discordant for the development of narcolepsy.1 These reports show that the onset of narcolepsy is associated with other genetic and environmental factors. In this study, the effects of genetic factors were evaluated, including the effects of genes located outside the HLA region.

Several genetic factors other than the HLA region have been reported to be related to narcolepsy in the Japanese population; these include the reported SNPs CPT1B, TRA@ and P2RY11. In this study, the extent of these factors to explain polygenic risks of narcolepsy was calculated (Table 2 and Figure 2). SNPs located within 500 kb upstream and downstream of the reported SNPs were included in the analysis. Polygenic risks were estimated to be 0.8% (Pwhole genome without HLA-DQB1*06:02=9.74 × 102). The difference between polygenic risks explained by regions other than HLA (1.3%) and risks explained by CPT1B, TRA@ and P2RY11 (0.8%) was approximately 0.5%, suggesting the presence of some other genetic factors for narcolepsy susceptibility that have not yet been discovered.

EHS patients with and without HLA-DQB1*06:02 were evaluated to determine whether polygenic risks for narcolepsy contribute to the onset of EHS. No statistically-significant polygenic risk factors for narcolepsy were observed in EHS patients either with or without HLA-DQB1*06:02. However, EHS patients with HLA-DQB1*06:02 tended to have higher polygenic risks for narcolepsy than EHS patients without HLA-DQB1*06:02 (Table 2 and Figure 2): in total, approximately 40.4% of polygenic risks in EHS patients with HLA-DQB1*06:02 could be explained by a genetic background of narcolepsy (Table 1 and Figure 1). These results suggest that EHS with HLA-DQB1*06:02 is more similar to narcolepsy than EHS without HLA-DQB1*06:02, which is concordant with the previous studies.22, 26 In the reported association between EHS and TRA@, only EHS with HLA-DQB1*06:02 showed a significant association, while EHS without HLA-DQB1*06:02 did not.26 The definition of EHS with and without HLA-DQB1*06:02 did not follow that of ICSD-2. However, the classification of EHS with and without HLA-DQB1*06:02 seems to truly reflect the genetic features of the disease. Thus, genetic variants discovered to be associated with narcolepsy should be tested for an association with EHS with HLA-DQB1*06:02. Combined analyses for narcolepsy and EHS with HLA-DQB1*06:02 may be performed as appropriate.

Other neuropsychiatric disorders, that is, panic disorder and autism, were also evaluated in this study. Polygenic risks for narcolepsy could not explain the onset of these neuropsychiatric disorders, suggesting that narcolepsy does not share many genetic factors with these neuropsychiatric disorders. In future studies, comparisons between narcolepsy and several autoimmune diseases are needed as narcolepsy is strongly associated with HLA.

In studies of polygenic risk, it is essential to assess whether the data of healthy control samples affected the estimations of polygenic risk. Thus, the genetic features among the control samples were validated. Control samples were divided into pseudo-case and pseudo-control groups; and the same polygenic analysis was performed. There was no significant risk contribution among the healthy controls, indicating that the healthy controls were unbiased and usable for this analysis.

Next, the sample sizes needed to identify the remaining narcoleptic genetic factors were estimated (Supplementary Figure 3). In the future, an effort should be made to collect data from approximately 42 000 narcoleptic patients. At the same time, the sample size for the healthy controls must be four times larger than that of the patients. Therefore, it will be necessary to build a consortium of both patients and healthy controls.

Limitations of this study include assumptions inherent to the methods used. This polygenic analysis assumed an additive effect of each SNP across the genome. Dominant and epistasis effects were not taken into consideration for the estimations even though susceptibility SNPs with dominant effects have been reported in mammals44, 45, 46 and epistasis effects have been reported in human.47 To account for this limitation, the genetic backgrounds that contributed to the development of these diseases were not fully described in this study.

In this study, we demonstrated the existence of uncovered risks in narcolepsy. Also, nominal polygenic similarity between narcolepsy and EHS with HLA-DQB1*06:02 was found. However, much effort, such as drastically increasing the sample sizes, is needed to identify the remaining genetic factors of narcolepsy. Therefore, collaboration is necessary to collect sufficient samples from both patients and healthy controls for further studies.