Introduction

Circadian rhythms of ~24 h are observed in behavioral and physiological processes such as the sleep–wake cycle. The suprachiasmatic nucleus in the hypothalamus is the principal circadian pacemaker in mammals. The molecular mechanism of circadian rhythms is controlled by a transcriptional feedback loop of circadian clock genes [1, 2]. These genes include PER1/2/3, CRY1/2, CKIε/δ, BMAL1, and CLOCK. Chronotype (morningness–eveningness) is a behavioral manifestation of our internal biological clock. Family and twin studies have reported that variations in chronotype are heritable [3,4,5]. Indeed, loci that are associated with chronotype have been found by recent genome-wide association studies (GWAS) [6,7,8,9]. These loci include well-established circadian rhythm-related genes such as PER1, PER2, PER3, CRY1, BMAL1, FBXL3, and RGS16.

When individuals exhibit an extreme advance or delay in sleep onset, they have difficulty adapting to the conventions of social life. Circadian rhythm sleep–wake disorder (CRSWD) is defined as alterations in the circadian time-keeping system, its entrainment mechanisms, or a misalignment of the endogenous circadian rhythm and the external environment. CRSWD is a group of sleep disorders that includes advanced sleep–wake phase disorder (ASWPD) and delayed sleep–wake phase disorder (DSWPD), in which the pattern of the sleep–wake rhythm is persistently or recurrently disrupted. ASWPD is characterized by a stable advance in major sleep episodes, such that habitual sleep onset and offset occur typically two or more hours prior to required or desired times. In several families, ASWPD is a highly penetrant, autosomal dominant trait. Missense mutations in CKIδ and PER2, which cause familial ASWPD, have been identified [10, 11]. Patients with DSWPD fall asleep two or more hours later than the average person and have marked difficulty waking up in the morning. Approximately 40% of individuals with DSWPD have family members with a similar phenotype [12]. Several studies have reported significant associations between polymorphisms in PER3 and DSWPD and/or diurnal preference [13,14,15,16]. In addition, a rare variant in CRY1, which causes exon skipping in CRY1, is present in a hereditary form of DSWPD [17]. These variants that are associated with variations in chronotype and/or the development of CRSWD are located in or near genes with a known role in circadian rhythms. Therefore, this study was conducted to identify DSWPD-associated variants in known genes related to the circadian clock system. Circadian rhythm-related genes were selected as candidates using the KEGG pathway database and previous studies. We utilized data obtained from a database of genetic variations (whole-exome) called the Exome Aggregation Consortium (ExAC) [18, 19]. The dataset includes disease-associated variants, although the data from individuals affected by severe pediatric diseases have been removed, suggesting that we can identify variants associated with common diseases. Therefore, we attempted to extract candidate nonsense and missense variants by integrating the genetic variation data and in silico assessment. We then assessed associations between the selected candidate variants and DSWPD, and identified a low-frequency missense variant that was located in PER2 and associated with DSWPD. We considered that the DSWPD-associated variant may be associated with other sleep disorders. Non-24-h sleep–wake rhythm disorder (N24SWD) is also a CRSWD and is characterized by a gradual delay in the sleep–wake schedule, resulting in cyclic shifting of the sleep–wake phase. N24SWD may be caused by an extremely prolonged endogenous circadian period, and DSWPD may predispose individuals to N24SWPD. We also studied the association of this candidate variant in samples from idiopathic hypersomnia patients who were reportedly more frequently evening-type and thus more alert in the evening than in the morning [20]. A previous study reported aberrant dynamics in the circadian oscillation of clock genes in dermal fibroblasts from idiopathic hypersomnia patients [21]. We studied samples from narcolepsy type 1 and narcolepsy type 2 as patients with hypersomnia without circadian rhythm abnormality. However, some patients with narcolepsy type 2 could be close to idiopathic hypersomnia because its clinical manifestation overlaps with idiopathic hypersomnia [22]. We performed an association study to examine whether the DSWPD-associated variant affected the development of these sleep disorders.

Methods

Subjects

The initial sample set was comprised of 236 patients with DSWPD and 1436 controls [16]. The subsequent sample set contained 77 patients with N24SWD, 780 patients with central disorders of hypersomnolence (narcolepsy type 1: n = 343, narcolepsy type 2: n = 215, idiopathic hypersomnia: n = 222), and 3539 controls from data provided by the Integrative Japanese Genome Variation Database (iJGVD) (https://ijgvd.megabank.tohoku.ac.jp/). All subjects gave written informed consent and were unrelated Japanese living in Japan. The patients with these sleep disorders were diagnosed according to the International Classification of Sleep Disorders third edition (ICSD-3). This study was approved by the Human Genome, Gene Analysis Research Ethics Committee of the University of Tokyo, the National Center of Neurology and Psychiatry Ethics Committee and the Research Ethics Committee of Tokyo Metropolitan Institute of Medical Science. All methods in the present study were performed in accordance with the relevant guidelines and regulations.

Study flow

Supplementary Fig. 1 shows a summary flow of how we narrowed down candidate variants. Fifty-four genes related to circadian rhythms were selected using KEGG and previous studies [6, 7, 15, 16, 23, 24]. We used “circadian rhythm” as the term in the KEGG search (http://www.genome.jp/kegg/). In the circadian rhythm-related genes, 68 nonsense and missense variants with MAF > 0.25% in the East Asian populations of the ExAC data were extracted (http://exac.broadinstitute.org/). If rarer variants with MAF < 0.25% would have been included in the association analysis, the statistical power of our sample size was estimated to have been low [25, 26]. Then, 35 variants with MAF < 1% in European (Non-Finnish) samples of the ExAC data were chosen because previous large-scale European GWASs should have detected most true positive variants with MAF > 1% and odds ratio (OR) > 1.5. Variants assessed in previous association studies of chronotype and CRSWD were also excluded [6, 7, 15, 16, 23, 24]. We selected six variants that were predicted to be damaging and deleterious by both PolyPhen-2 and SIFT [27, 28], respectively. CADD scores of these variants were more than 20, confirming that these were estimated to also be deleterious [29]. We confirmed that all six variants were registered in other genetic variation databases of the Japanese population, such as iJGVD and Human Genetic Variation Database (HGVD) (http://www.hgvd.genome.med.kyoto-u.ac.jp/) [30, 31]. The six candidate variants were screened in 55 patients with DSWPD using the TaqMan PCR assay or direct sequencing. When we found at least two patients with a specific variant in the screening, we performed genotyping and an association study in 236 patients with DSWPD and 1436 controls to test whether the variant was associated with DSWPD. We excluded variants which only one out of 55 patients carried. The allele frequency in the patient group was estimated to be 0.9% (=1/110). We calculated statistical power of 236 patients and 1436 controls in the allele frequency [32]. As a result, the sample size made it difficult to achieve statistical power of 80%. Therefore, we set the criterion that at least two patients were detected to be mutation carriers in the screening. If we found a significant association with DSWPD, the variant was also analyzed in N24SWD, narcolepsy type 1, narcolepsy type 2, and idiopathic hypersomnia to test the possibility that the variant may affect the development of other sleep disorders. Comparisons of frequencies between two groups were performed using the Fisher exact test. When any of the four cells were zero, the Woolf–Haldane correction was used (adding 0.5 to all cells) to calculate an OR.

Results

We selected 54 genes related to circadian rhythms. In these genes, six missense variants were chosen as candidates for DSWPD-associated variants through the filtering process (see Materials and Methods, Supplementary Fig. 1). These were rare or low-frequency variants (Table 1). The six missense variants were screened in 55 patients with DSWPD (Table 1). Four patients with DSWPD had a missense variant (g. 239157708C > T; p.Val1205Met; rs76355956) located in PER2. The minor allele frequency (MAF) of p.Val1205Met was 1.3% in the East Asian populations of the ExAC data. The amino acid substitution was predicted to be damaging (Polyphen-2 probably damaging, SIFT deleterious, CADD scaled 25.4). We looked for an association between p.Val1205Met and DSWPD in a larger sample size (236 patients and 1436 controls) (Table 2). We found that p.Val1205Met was significantly associated with DSWPD (P = 0.026, OR = 2.32, MAF: 2.5% versus 1.1%). In addition, we tested whether p.Val1205Met was associated with N24SWD, idiopathic hypersomnia, narcolepsy type 2, and narcolepsy type 1 (Table 3). The MAF of the variant was 1.1% in the controls, whereas no patients with N24SWD had the minor allele (P = 0.42, OR = 0.29). The missense variant was significantly associated with idiopathic hypersomnia (P = 0.038, OR = 2.07, MAF: 2.3%). In narcolepsy type 2, the frequency of the risk allele was higher than that in controls, although the difference did not reach the significance level of 0.05 (P = 0.16, OR = 1.70, MAF: 1.9%). No association was observed between narcolepsy type 1 and the missense variant (P = 0.44, OR = 0.66, MAF: 0.7%), and the OR was in the opposite direction from that seen in DSWPD. An association study was also conducted in patients with DSWPD and control data from the iJGVD. A significant association between them was also detected (P = 0.013, OR = 2.34) (Supplementary Table 1).

Table 1 Screening for six candidate variants in 55 patients with DSWPD
Table 2 Association between rs76355956 (p.Val1205Met) in PER2 and DSWPD
Table 3 Association between rs76355956 (p.Val1205Met) in PER2 and other sleep disorders

Discussion

In the present study, we report that the low-frequency missense variant (p.Val1205Met) in PER2 is associated with DSWPD. The position (Val1205) is located within the CRY-binding domain of PER2 [33]. The crystal structure of the mouse CRY1-PER2 or CRY2-PER2 complex revealed that Lys485 of mouse CRY1 (mCRY1) or Lys503 of mouse CRY2 (mCRY2) forms a backbone hydrogen bond with Val1207 of mouse PER2 (mPER2), which corresponds to Val1205 of human PER2 [34]. In addition, mutations (K485D/E) in mCRY1 and a mutation (K503R) in mCRY2 reduce mCRY-mPER2 interactions [35, 36]. These studies also suggest that Lys485 of mCRY1 and Lys503 of mCRY2 are critical residues for binding with mPER2. Mutation analysis for Val1207 of mPER2 has not yet been performed. Therefore, a further study is needed to assess the effect of the PER2 substitution.

The PER2 variant p.Val1205Met was significantly associated with idiopathic hypersomnia, and the minor allele increased the disease risk (Table 3). A similar tendency was observed in the narcolepsy type 2 group, although the significance value did not exceed the significance threshold. The MAF in the narcolepsy type 1 group was less than that of the controls. Narcolepsy type 1 is caused by a loss of the neuropeptide, orexin (hypocretin), which is produced by neurons in the lateral hypothalamus. GWASs of narcolepsy type 1 have not found significant associations with genes that influence circadian rhythms [37,38,39]. Taken together, abnormalities in circadian rhythms would not affect the development of narcolepsy type 1. The cause and pathogenesis of idiopathic hypersomnia and narcolepsy type 2 remain largely unknown. Most patients with idiopathic hypersomnia and narcolepsy type 2 have normal orexin levels in the cerebrospinal fluid. A meta-analysis of idiopathic hypersomnia and narcolepsy type 2 groups showed P= 0.021 and an OR of 1.89 (95% confidence interval = 1.12–3.17). In addition, a previous study reported a significant association between central disorders of hypersomnolence and a polymorphism located in an intron of CRY1 [40]. These results suggested that a disrupted circadian clock system may be involved in the pathogenesis of the two diseases. In contrast, p.Val1205Met was not associated with N24SWD. The pathogenesis of DSWPD may be different from that of N24SWD. Another possibility is that the sample size of our study may not be large enough to detect the true result, because only 77 patients with N24SWD were analyzed in the current study. Therefore, a further analysis with a larger sample size is required to clarify this point.

The GWAS performed using samples from the UK Biobank reported a significant association between chronotype and a missense variant located in PER2 (g. 239161957C > T; p.Val903Ile; rs35333999) [7]. This missense variant was also associated with circadian period [41]. The minor allele of the variant tends to be associated with the eveningness chronotype and longer intrinsic circadian period. The frequency of the minor allele is 4.4% in European populations, whereas it is non-polymorphic in East Asian populations. Thus, controlling for the effect of p.Val903Ile is unnecessary in our analysis. Regarding p.Val1205Met that was identified in the present study, the frequency of the minor allele was 1.3% in East Asian populations, 1.1% in our healthy Japanese controls, and less than 0.1% in European populations. The result suggests that p.Val1205Met does not contribute to a large fraction of the cases in European populations.

A missense variant, p.Gly656Val in FBXL13, was detected in the first screening (Table 1). Only one patient with DSWPD was a mutation carrier, and thus the variant in FBXL13 was excluded from a further association analysis according to the work flow of the present study. However, there is a possibility that the variant is associated with DSWPD. A further study using larger sample size will be needed to test an association between the variant and DSWPD.

In an association study of DSWPD, we successfully identified p.Val1205Met in the CRY-binding domain of PER2 that plays an essential role in circadian rhythms, without performing sequencing. The identified variant was a low-frequency variant with an OR of more than 2. Rare and low-frequency variants are expected to have larger effects on the phenotype [42,43,44]. Whole-exome or whole-genome sequencing in large sample sizes is generally necessary to detect rare and low-frequency variants involved in common diseases. We acknowledge that whole-genome-/exome- sequencing is a reliable way to identify rare and unknown variants. Genome-wide analyses also provide novel opportunities to identify susceptibility loci without any prior information. As a limitation of the present study, the strategy does not allow to identify variants located in genes which have not been reported to be related to circadian rhythms. However, candidate variants can be significantly narrowed down by the method of the present study, allowing to reduce the problem of multiple testing. Statistical power to identify rare or low-frequency variants associated with diseases tends to be low compared with common variants. Therefore, we consider that appropriately narrowing down candidate variants is an important approach, when large-scale association studies cannot be performed. In addition, now that an enormous quantity of sequencing data has been accumulated, taking advantage of public databases is sometimes more efficient than performing sequencing to select candidate variants. Here we used the ExAC database and stringent criteria to narrow down candidate variants and proposed a new strategy that effectively reduces the cost and time of DNA sequencing. Although nonsense and missense variants were analyzed in this study, structural variations, indels and splice site mutations have functional effects and often play an important role in the etiology of diseases. In future studies, such genetic variations will be required to be included in the first screening. At the present time, the Genome Aggregation Database (gnomAD) which has larger sample size compared with the ExAC and includes data from whole-genome sequences is released to the public, although we could not have the gnomAD data when this study was started to select candidate variants. It is thought that analysis with higher detection power becomes possible when the same criteria as this study were carried out for the gnomAD data. We hope that our new strategy using public data will be utilized and lead to advancements in future studies of common diseases.