Introduction

The phenomenon of circadian rhythm is manifested as the approximately 24-h cycle in many biochemical, physiological, and behavioral processes. It is a cardinal feature of all eukaryotic, and has also been observed in some prokaryotic, organisms (Dvornyk et al. 2003). The self-sustained, internal timing system is controlled endogenously but entrained by external light level. Disruption in the circadian system has been implicated in a wide range of clinical conditions including sleep disorders (Toh et al. 2001), psychiatric disorders such as seasonal affective disorder and depression (Wirz-Justice 1998; Bunney and Bunney 2000), response to cocaine (Andretic et al. 1999; Abarca et al. 2002), dementia in old age (Sloan et al. 1996), immune dysfunctions and cancer (Bovbjerg 2003; Sephton and Spiegel 2003), cardiovascular disorders (Hastings et al. 2003), and neurological diseases such as multiple sclerosis and headaches (Turek et al. 2001).

In mammals, the master clock for regulating circadian rhythm is contained within two small, bilaterally paired nuclei called the suprachiasmatic nuclei (SCN) located in the anterior hypothalamus. In addition, there are “slave” oscillators in brain areas outside the SCN and throughout the body. The genes that play a direct role in generating circadian rhythms are known generically as “clock genes.” They include Bmal1, Clock, Timeless, Cryptochrome, and Period.

The mammalian Clock gene was first identified in mice when animals lacking the gene were found to lack circadian rhythm when kept in the constant darkness without light entrainment (Vitaterna et al. 1994). Microarray analyses of liver RNA from CLOCK-mutant mice revealed aberrant expression pattern for more than 100 genes (Oishi et al. 2003), indicating the involvement of CLOCK in regulating circadian transcription of these genes, the functions of which include roles in cell cycle, lipid and protein metabolism, and immunity.

Following the identification of the mouse gene, the human Clock gene has also been cloned (Steeves et al. 1999). Although the circadian system was implicated in so many different diseases, to-date there has been only one report on the extensive screening of the hClock gene for sequence alterations (Iwase et al. 2002). In this study, we present the results from denaturing high-performance liquid chromatography (dHPLC) screening of all the exons identified for the gene, including all the coding regions and the 5′ and 3′ untranslated regions (UTRs) and adjacent intronic regions.

Materials and methods

DNA extraction

Peripheral blood samples of 70 unrelated, healthy, male Chinese Singaporeans (21.0±0.9 years) were obtained with informed consent and genomic DNA isolated with Blood and Cell culture DNA Midi Kit (QIAGEN GmbH, Hilden, Germany).

Primer design and PCR condition

Based on the 5,801-bp-long mRNA sequence of the hClock gene (reference sequence: NM004898), three segments of 5′ UTR, 20 segments of protein-coding regions, and one segment of 3′ UTR were mapped out in the hClock gene that spans 114 kb of genomic sequence (reference sequence: NT022853). To amplify these exons and their flanking regions, a total of 35 pairs of PCR primers were designed using the Primer 3 Program (http://www-genome.wi.mit.edu/cgi-bin/primer/primer3_www.cgi). Primers were selected with respect to the length and optimal dHPLC performance of the PCR products. For amplicons with more than one melting domain, GC-clamps were added to some primers to keep the difference between the melting temperatures of different melting domains to less than 5°C. PCR reactions were carried out in a volume of 50 μl containing 1.5–2.0 mM MgCl2, 0.2 mM of each dNTP, 10 pmol of each primer, 50 ng of genomic DNA, and 1 U of AmpliTaq Gold DNA polymerase (Applied Biosystems, Foster City, Calif., USA). Amplification was carried out at 95°C for 5 min, followed by 35 cycles of 95°C for 30 s, 53–63°C for 30 s, and 72°C for 45 s, and a final extension step at 72°C for 7 min. Sequences of primers used and the specific amplification and dHPLC screening temperatures were summarized in Table 1.

Table 1 Conditions for PCR and dHPLC analysis. SNP single nucleotide polymorphism

dHPLC Analysis

DNA samples from the 70 subjects (140 chromosomes) were amplified and subjected to dHPLC analysis on a WAVE Nucleic Acid Fragment Analysis System equipped with 3500HT DNASep Cartridge (Transgenomic, Omaha, Neb., USA). Amplified products were first checked on the system at the non-denaturing condition of 50°C to ensure specific and sufficient amplification of the fragments. For heteroduplex formation, equal volumes of PCR products from two different individuals were mixed, heated at 95°C for 10 min, and cooled to 25°C at a ramping rate of 1°C/min. Melting profiles of the fragments were analyzed using WAVE-MAKER software (Transgenomic). The temperature at which 70–80% of wild-type DNA was double helical was selected for screening. Eight to fifteen microliters of the mixed PCR products was injected into a preheated column and eluted at a flow rate of 0.9 ml/min with a linear acetonitrile gradient consisting of buffer A (0.1 M triethylammonium acetate or TEAA) and buffer B (0.1 M TEAA with 25% acetonitrile). The gradient slope was an increase of 2% of buffer B per minute. Each melting domain was screened at temperatures that spanned 1–2°C above and below the predicted melting temperature to ensure detection of all possible sequence alterations.

DNA sequencing

Samples with dHPLC elution profiles indicating presence of heteroduplexes were amplified separately and direct sequencing was performed with both forward and reverse primers on a MegaBACE 1000 DNA sequencer (Amersham Biosciences, Sunnyvale, Calif., USA) using DYEnamic ET dye terminator kit (Amersham Biosciences).

SNP genotyping

Genotypes of the identified sequence variants were confirmed in the 70 subjects by polymerase chain reaction, followed by restriction enzyme cleavage (PCR-RFLP) or direct sequencing for substitution polymorphisms. All insertion/deletion polymorphisms were genotyped by direct DNA sequencing. For single nucleotide substitutions that did not fall within any restriction enzyme recognition site, mismatched PCR primers were designed to introduce restriction sites to distinguish between the different alleles.

Statistical analysis

For the identified SNPs, conformation to Hardy-Weinberg equilibrium was assessed using the chi-square test and a P-value less than 0.05 was considered as significant deviation from the equilibrium. Pair-wise linkage disequilibrium analysis of all possible combinations was performed using the EM algorithm (Excoffier and Slatkin 1995). All statistical analysis was performed on the statistical software S-Plus.

Results and discussion

dHPLC screening using genomic DNA from 70 unrelated Chinese individuals revealed aberrant chromatographic profiles in 12 different amplicons. Subsequent DNA sequencing of these amplicons identified a total of 15 sequence changes, comprising 12 single-base substitutions and three single-base insertions/deletions. Among them, one was in the 5′ genomic region, one in the 5′UTR, one in a protein-coding region, seven in introns, and five in the 3′ UTR. A graphic representation of the all the identified SNPs in relation to the exons/introns of the hClock gene is shown in Fig. 1, and the exact positions with the corresponding base changes are shown in Table 2.

Fig. 1
figure 1

Schematic diagram of the location of single nucleotide polymorphisms (SNPs) identified in the hClock gene. (Figure not drawn to scale.) For SNPs with minor allele frequency of >10%, those that are in complete or nearly complete linkage disequilibrium are joined by dashed lines. SNPs 1–15 Exact location as indicated in Table 1, 5′UTR-a the first 5′UTR cDNA segment in the genomic sequence, 5′UTR-b the second 5′UTR cDNA segment in the genomic sequence, 5′UTR-c the third 5′UTR cDNA segment in the genomic sequence, E2–21 exons 2–21

Table 2 Frequencies of the 15 SNPs (allele corresponding to the reference sequence is listed first)

Most of the SNPs were in intronic or 3′ non-coding regions. Five variants were in the 3′ UTR, well-conserved region where the first 1,300 nucleotides downstream of the stop codon was reported to be 80% identical to the mouse ortholog (Steeves et al. 1999). Three variants were found in this region, with one of them (3776G/A) in a region found to be 100% conserved between the murine and human sequences (Steeves et al. 1999).

The only coding variant identified in this study, 2102T/C, was a silent polymorphism previously identified from mutation screening in individuals suffering from sleep disorders (Iwase et al. 2002). Another polymorphism reported in the literature, 3092T/C (also known as 3111T/C) was reportedly associated with the diurnal preference of evening type in Caucasians (Katzberg et al. 1998), but the association was not replicated (Robilliard et al. 2002). There was also no association in a study of the SNP with major depression (Desan et al. 2000).

Five of the identified variations are novel. They are −15A/G, IVSa+80C/T, IVS10+140C/−, IVS11+2021A/G, and 3701−/T. The remaining ten could be found in the dbSNP database (http://www.ncbi.nlm.nih.gov/SNP/). However, three SNPs listed in dbSNP database were not detected in this study. They were IVS7+79A/G (rs1522112) in intron 7, 1476G/A (rs1056478) in exon 13, and 1963A/G (rs3762836) in exon 17. Two missense mutations in exon 17 reported in a study on sleep disorder patients also did not seem to be present in our population (Iwase et al. 2002). This could be the effect of population stratification whereby the SNPs were population-specific or geographically restricted. Therefore, it would be necessary to validate SNP data in each population when selecting markers from SNP database for genetic screening. It is also possible that SNPs not detected in our study are actually present but extremely rare in this population, with frequencies that are too low to be detected in our sample size of 70 individuals. As it has also been postulated that common rather than rare allelic variants might have a role in contributing to genetic susceptibility to complex diseases, such rare alleles might not be useful in association studies of common disorders.

To detect homozygotes that might have escaped dHPLC detection, genotypes of these 15 SNPs in the 70 subjects were confirmed by PCR-RFLP or direct DNA sequencing. Except for −15A/G and IVS11+2021A/G, of which the minor allele frequency were lower than 1%, genotype frequencies of all the SNPs were found to be in Hardy-Weinberg equilibrium (P>0.05).

As shown in Table 2, eight SNPs were found to be of high frequency in the local Chinese population (heterozygosity >0.40), with another five having frequencies in the middle range (0.1–0.4). Pair-wise linkage disequilibrium analysis was performed for these SNPs. Based on the strength of D′, r, and their P-values, all but one of the eight high-frequency SNPs were in almost complete linkage disequilibrium (D′ 0.874–1.0, P<0.001). There was also significant linkage disequilibrium between the remaining polymorphism (SNP8) with each of the other seven (D′ 0.444–0.844, P=0.003–0.047). The five SNPs with minor allele frequencies of between 1% and 10% also appeared to be in another LD block (D′ 0.891–1.0, P<0.001), although the EM algorithm and derivation might not be applicable for allele frequencies within this range. The two rarest SNPs were excluded from the analysis.

In summary, with highly sensitive and reproducible dHPLC screening, we have identified numerous sequence variations in the hClock gene. All but one occurred in introns and UTRs. It appears that although the coding sequence of the hClock gene is relatively stable, the variations in the non-coding region are quite common in the local Chinese population. The spectrum of SNPs in local Chinese appears to be slightly different from other populations, suggesting the necessity of validating SNPs in the target population when selecting markers from the NCBI SNP database for genetic screening. Besides the well-documented functional effects of the coding SNPs, the non-coding SNPs could affect gene splicing, and also transcription and translation efficiency. Although the functional significance of these SNPs is still unknown, our results will provide a basis for future genetic analyses of the hClock gene and disorders associated with dysfunction of the circadian system. The identification of the SNP LD groups in the hClock gene will be useful in disease gene mapping and association study in the identification of susceptibility variants.