Introduction

Chronotype is a behavioural manifestation of our internal timing system, the circadian clock. Individual variation within our biological clock drives our morning or evening preferences, thereby making us into ‘morning larks’ or ‘night owls’. Chronotype is influenced by many factors, including age, sex, social constraints and environmental factors, among others1. Chronotype has been associated with sleep disorders, cognitive and physical performance, chronic metabolic and neurologic disease, cancer and premature aging2, in particular when there is desynchrony between internal chronotype and external environment increasing disease risk3. Despite the importance of circadian rhythms to human health and their fundamental role demonstrated in model organisms4,5, little is known about the biological mechanisms underlying inter-individual variation in human chronotype or how it impacts our health and physiology.

Genes that encode molecular components of the core circadian clock (PER2, PER3) or regulate the pace of the clock (CSNK1D) are disrupted in Advanced Sleep Phase Syndrome (ASPS) and Delayed Sleep Phase Syndrome both of which are monogenic circadian rhythm disorders causing extreme advance or delay in sleep onset6. ASPS mutations shorten circadian period in humans and mice7,8, linking the change in pace of the clock with sleep timing preference. More detailed biochemical and functional characterization of these mutations have greatly enhanced our understanding of the mechanisms regulating the circadian clock. Emerging evidence suggests that subjects with ASPS may be at increased risk for chronic disease, such as cardio-metabolic disease9 or migraine10.

In addition to monogenic sleep phase disorders, pronounced inter-individual variation in chronotype exists within the general population5, and epidemiologic associations with adverse health outcomes have been reported2,11. Chronotype is heritable as estimated by twin and family studies (12–42%)12,13,14 but its genetic basis has not yet been well defined. Candidate gene association studies have reported variation associated with morningness or eveningness preference in the CLOCK, PER1, PER2 and PER3 genes15; however, these studies have often had limited reproducibility, suffering from small sample sizes, heterogeneity in chronotype assessment and inadequate correction for population structure. Recently, a genome-wide association study (GWAS) for self-reported habitual bedtime identified variation in NPSR112, but again robust replication of this finding has not been reported. Nonetheless, these studies suggest that novel genetic loci for chronotype, like for other complex traits, may be identified by GWAS provided that sufficiently large cohorts are used.

To define the spectrum of genetic variation contributing to variation in human circadian phenotype, and identify associative or causal links between chronotype and other health indices, we perform the largest GWAS of self-reported chronotype to date, within the UK Biobank cohort (n=100,420), a unique resource with an extensive set of individual life history parameters. Self-reported chronotype has been validated in previous studies, and correlates significantly with objectively measured physiological rhythms16. Our work identifies several novel genetic loci that associate significantly with chronotype, and importantly reveals a significant genetic correlation between chronotype and schizophrenia risk, BMI and educational attainment.

Results

Twelve genome-wide significant association signals

Variation in chronotype associated significantly with age, sex, sleep duration, depression and psychiatric medication use, with ‘eveningness’ being associated with younger age, being male, having a longer sleep duration, being more likely to be depressed or using psychiatric medication (Supplementary Table 1). These characteristics together explained 1.4% of variation in chronotype.

Two parallel primary GWAS analyses of genotyped and imputed single nucleotide polymorphisms (SNPs) were performed using regression models adjusting for age, sex, 10 principal components of ancestry and genotyping array: an ordinal score of chronotype based on four categories from ‘definite morning’ to ‘definite evening’ treated as a continuous trait, using the whole population (n=100,420) and a binary variable of chronotype extremes (8,724 definite evening-type cases versus 26,948 definite morning-type controls), to enrich for rarer variants expected to have stronger effects. In total, 12 genome-wide significant loci were identified (Figs 1 and 2; Table 1; and Supplementary Fig. 1; P<5 × 10−8) of which three surpassed genome-wide significance in both analyses (Table 1). Association was observed near PER2, an ASPS gene, and three other association signals were found in or near genes with a well-known role in circadian rhythms (APH1A, RGS16 and FBXL13), consistent with the hypothesis that circadian clock biology contributes to variation in chronotype. Conditional analyses at the 12 loci implicated one suggestive secondary association signal, a missense variant (V903I) in the core circadian clock gene PER2 (P=8.43 × 10−8) predicted to be damaging (Polyphen 0.984, CADD scaled 16.21; Supplementary Table 2); thus, confirming that core circadian clock genes disrupted in ASPS harbour common variants that contribute to variation in chronotype. Together, in the discovery sample, the 12 loci explain 4.3% of variance in chronotype. Credible set analyses17 highlight a limited number of potential causal variants at each locus (Table 1).

Figure 1: Manhattan and Q–Q plots for genome-wide association analysis of both continuous and extreme chronotype.
figure 1

Manhattan plots (a,c). Red line is genome-wide significant (5 × 10−8) and blue line is suggestive (1 × 10−6). Q–Q plots (b,d). Nearest gene name is annotated. Heritability estimates were calculated using BOLT-REML and lambda inflation values using GenABEL in R.

Figure 2: Regional association plots for genome-wide significant chronotype loci.
figure 2

(ai) show loci associated with continuous chronotype, (jl) show loci associated with extreme chronotype. Genes within the region are shown in the lower panel. The blue line indicates the recombination rate. Filled circles show the −log10 P value for each SNP, with the lead SNP shown in purple. Additional SNPs in the locus are coloured according to correlation (r2) with the lead SNP (estimated by LocusZoom based on the CEU HapMap haplotypes). *chr7 rs372229746 is not in the reference panel, therefore LD data is unavailable for this SNP.

Table 1 Genome-wide significant loci associated with chronotype in subjects of European ancestry in the UK Biobank.

Robustness of the self-reported chronotype trait and genetic loci identified here was further validated by an independent GWAS of extreme chronotype from Hu et al.18 In total, 8 of 15 reported loci replicated in our study, and all 15 showed a consistent direction of effect in our study. Three additional loci attain genome-wide significance in meta-analysis of both studies using publicly available results for the 15 SNPs from Hu et al. (near genes PER3, VIP and TOX3: Supplementary Table 3).

No evidence of association was observed for previously reported SNPs from other candidate gene or GWA studies (Supplementary Table 4). The PER3 VNTR (rs57875989) previously associated with chronotype19 was not directly genotyped or imputed in this study; nonetheless, a suggestive association signal was observed encompassing this region of PER3 (lead SNP: rs7545893 P=6.5 × 10−8; 33 kb from PER3 VNTR) and largely independent from the lead 23andMe PER3 region SNP (r2=0.186 in 1KG CEU18; Supplementary Fig. 2).

Secondary analyses were performed on the 12 lead SNPs within the chronotype loci, including (1) separate comparison of effects on morningness and eveningness, (2) sex-specific analysis, (3) pair-wise genetic interaction analysis and (4) regression models including additional covariates. Comparison of case extremes (8,724 evening or 26,948 morning) to the collapsed middle group (n=64,748) revealed three loci (LINC01128, APH1A, FAT1) with stronger effects in the eveningness case–control analysis as opposed to morningness analysis. A rare variant (0.2%) at the LINC01128 locus (rs141175086C) exhibited the most striking protective effect for eveningness (odds ratio (OR)=0.22 (0.10–0.50), P=1.7 × 10−5) but only a small risk effect for morningness (OR=1.30 (0.97–1.75), P=0.08; Supplementary Table 5). No significant sex-specific effects (Supplementary Table 6) or epistasis between loci (Supplementary Table 7) were detected. Similarly, sensitivity analyses adjusting for factors known to be associated with chronotype, including sleep duration and disorders, depression and psychiatric medication use did not significantly alter the effect estimates or strength of the associations (Supplementary Table 8).

Candidate causal genes at these loci are highlighted in Supplementary Note 1. The 12 loci encompass 72 candidate genes enriched in pathways for circadian rhythms (Padj=0.014), mental disorders (Padj=0.001), sleep disorders (Padj=0.005), the spliceosome (Padj=0.020) and Alzheimer’s disease (Padj=0.030) among others (Supplementary Table 9). In addition, four loci are located in or near genes with a well-known role in circadian rhythms (PER2, APH1A, RGS16 and FBXL13), however whether these genes are responsible for the association signals observed remains to be established. The remaining eight loci offer the potential of novel biological insights into circadian rhythms (Supplementary Note 1). Several candidate causal genes have been implicated in circadian rhythms. TNRC6B controls circadian behaviour in flies20 and is bound by known circadian transcription factors in mouse liver21. MCL1 has rhythmically expressed mRNA in liver22, disrupts circadian rhythms in an RNAi screen using a human osteosarcoma cell line23, and is bound by known circadian transcription factors in mouse liver21. HTR6 is a G-protein-coupled receptor known to regulate the sleep–wake cycle24,25,26.

Fine-mapping, sequencing and experimental studies are necessary to identify the causal gene(s) and variant(s) at each locus to understand mechanisms by which DNA variation influences variation in chronotype. However, clues may emerge from exploration of bioinformatic annotations of candidate regulatory variants and ENCODe analyses of chromatin states and bound proteins27. For example, rare variant rs141175086 is predicted to disrupt a binding site for the known circadian transcription factor DEC1 in an enhancer element within or upstream of previously uncharacterized lincRNAs (LOC643837, LINC01128).

Pathway analyses

Heritability of chronotype, captured by genome-wide genotypes in this study, was estimated to be 19.4% (continuous) and 37.7% (extreme) using GCTA28. Heritability partitioning of continuous chronotype GWAS by tissue and functional category using LD-score regression29 identified enrichment in the central nervous system (enrichment 2.63, P=1.91 × 10−6) and adrenal/pancreatic tissues (enrichment 3.63, P=1.34 × 10−8; Fig. 3a and Supplementary Table 10). Regions of the genome annotated as highly conserved across mammals30 (enrichment 14.33, P=1.75 × 10−9), and in regions of histone 3 lysine 4 monomethylation that mark active/poised enhancer elements (enrichment 1.30, P=0.0017; Fig. 3a and Supplementary Table 10) were significantly enriched, supporting a key role of circadian rhythms throughout mammalian evolution.

Figure 3: Overall genetic architecture of chronotype across tissues, functional categories and cross-trait genetic correlation.
figure 3

(a) Enrichment estimates for the main annotations and tissues of LDSC. Error bars represent 95% confidence intervals around the estimate. Categories are sorted by P value, with boxes indicating annotations or tissues that pass the multiple testing significance threshold. (b) Chronotype regression estimates of genetic correlation with the summary statistics from 19 publicly available genome-wide association studies of psychiatric and metabolic disorders, immune diseases and other traits of natural variation. The horizontal axis indicates the phenotype compared with categorical chronotype and the vertical axis indicates genetic correlation. Error bars are s.e’s. ADHD, attention deficit hyperactivity disorder; BMI, body mass index; CNS, central nervous system; CTCF, CCCTC-binding transcription factor; DHS, DNase hypersenstivity; GI, gastrointestinal; T2D, type 2 diabetes; TFBS, transcription factor binding site; Tss, transcription start site; UTR, untranslated region.

Gene-based analysis31 identified 23 genes significantly associated with chronotype (P<2.8 × 10−6, Supplementary Table 11; Supplementary Fig. 3). Pathway analysis32 shows a significant enrichment in this gene set for genes previously implicated in Alzheimer’s disease (Padj=0.0176) and dementia (Padj=0.0192), eye abnormalities (Padj=0.0176) and eye diseases (Padj=0.0253), chromosomal deletions (Padj=0.0253), brain diseases (Padj=0.0253), central nervous system diseases (Padj=0.0253) and mental disorders (Padj=0.0365). In support, integrative analysis of signals with P<1 × 10−5 using DEPICT33, a tool that uses predicted gene functions to prioritize genes, gene sets and tissues, showed suggestive enrichment in gene sets associated with ‘fear response’ and ‘behavioural defence response’ (False Discovery Rate<0.20), and central nervous and hemic/immune system tissues (Supplementary Table 12). In total, pathway analyses link the genetics of chronotype to central nervous system function and neurological disorders including dementia and affective disorders.

Genetic links with schizophrenia and educational attainment

Given that circadian rhythms play a fundamental role in human physiology, a key question is the extent to which the genetics of chronotype is shared with other behavioural or disease states, and importantly whether genetic relationships between chronotype and other traits are causal. To address this, we tested for genetic correlation of chronotype with GWAS variants for 19 phenotypes spanning a range of cognitive, neuro-psychiatric, anthropometric, cardio-metabolic and auto-immune traits using LD score regression on chronotype GWAS and publicly available GWAS for each trait34. Genetic correlations suggested that tendency towards an evening chronotype is related to greater years of education (rg (s.e.) 0.161 (0.041), P=8.96 × 10−5) and increased schizophrenia risk (rg (s.e.) 0.112 (0.034), P=0.0011 (Fig. 3b; Supplementary Table 10). Genetic correlations also suggested that a morning chronotype may share underlying biology with increased BMI (rg (s.e.) −0.0851 (0.0281), P=0.0025; Fig. 3b; Supplementary Table 10).

Mendelian randomization analyses

To explore whether the relationship between chronotype and traits with significant genetic correlations might be causal, we tested for association of a risk score of genome-wide significant chronotype SNPs from 23andMe18 with years of education, schizophrenia and BMI. SNPs can be used as instrument variables to test for a causal relationship between two traits, and because genotypes are assigned randomly at meiosis, genetic association is not biased by confounding or reverse causation possible in observational epidemiology35,36. Since individuals do not know their genotype any phenotypic misclassification will be random with respect to genotype. In the UK Biobank, a significant association was observed between a chronotype genetic risk score of SNPs related to eveningness and increased educational attainment (P=0.0167), but not schizophrenia (P=0.101) or BMI (P=0.285; Supplementary Table 13). Further instrumental variable analyses suggested that for each increase in ‘eveningness’ category, educational attainment increased by 7.5 months (P=0.021) (Fig. 4; Supplementary Fig. 4; Supplementary Table 13). We then tested for reverse causation by assessing whether variation in education, schizophrenia or BMI might cause variation in chronotype by testing for association of risk scores for each of these traits obtained from prior large-scale GWAS studies with chronotype. No significant associations were observed (Supplementary Table 13).

Figure 4: Mendelian Randomization.
figure 4

under the assumptions of instrumental variable Mendelian randomization analyses70, our results show that having an evening chronotype results in higher educational attainment. In this analysis, for the chronotype risk score (comprised of 15 SNPs from the 23andMe GWAS of chronotype, weighted by effect size), the β coefficient for the association with chronotype was regressed on the β-coefficient for the association with the main educational attainment trait in the UK Biobank (n=68,718) using TSLS. TSLS, two-stage-least-squares method.

Discussion

In this largest GWAS of chronotype to date, we report the discovery of 12 genetic loci associated with chronotype, and pathway analysis suggests key roles of genes in the nervous and ocular systems. Further, we demonstrate shared biology of chronotype with schizophrenia, and possibly BMI, with a putative causal link to educational attainment.

Several lines of evidence support these association signals as true positives that may help to uncover new aspects of circadian biology in humans. First, we detect signals in or near known circadian genes at 4 of the 12 loci, including near and in PER2, a clock gene previously associated with ASPS6. Second, three of these signals have been observed in an independent GWAS18 suggesting independent validation of our findings. Third, novel associated loci include candidate central circadian clock genes with rhythmic expression in the SCN or circadian behavioural phenotypes in model organisms. Fourth, genes under association peaks are enriched for central nervous system and ocular processes, both important for generation of circadian rhythms. Additional replication to confirm chronotype genetic associations and functional follow-up will be necessary to identify causal genes and circuits disrupted by causal variants at these loci.

Our study also defines the genetic architecture of self-reported chronotype, revealing heritability estimates consistent with previous literature12,13,14, despite using a different questionnaire instrument than previous studies16. The 12 genome-wide significant loci appear to explain a large fraction of chronotype variance (4.3%) but this may be overestimated due to winners curse, or may reflect lower polygenicity of chronotype than seen for other complex traits, since variation in a limited number of biological processes (light-sensing, core circadian clock and limited downstream effectors) may be causal. Significant enrichment of heritability in highly conserved regions is consistent with the strong conservation of circadian rhythms throughout evolution37 and may aid in fine mapping of causal variants and creation of faithful animal models for future experimental studies. Similarly, enrichment of heritability in activating enhancer sites and borderline enrichment in transcriptional start sites is consistent with the role of the circadian molecular clock in fine tuning of transcriptional regulation23.

The association signals at loci identified by our study when combined with signals from 23andMe cover genes identified in GWAS for restless legs syndrome and Mendelian and model organism studies of narcolepsy, suggesting overlap with other sleep traits. Genetic variants in the region of TOX3 have been previously associated with restless legs syndrome in a GWAS38. Although the chronotype-associated variant (rs12927162) is not in linkage disequilibrium with the lead restless legs syndrome variant (rs3104767; r2<0.001 in 1KG CEU population), it does suggest that TOX3 may have a broader role in basic sleep/circadian physiology. In addition, rare forms of severe early onset narcolepsy in humans39 and familial narcolepsy in canines40 are caused by mutations in HCRTR2, again suggesting shared underlying biology.

Chronotype has previously been associated with many behaviours and diseases, such as cardiovascular disease, type 2 diabetes, metabolic disorders, risk-taking behaviour, cancer, psychiatric disorders and even creativity1,3. Comparing the genetic architecture of chronotype captured in this study with an initial series of select phenotypes with publicly available GWAS data, we identified significant genetic overlap between chronotype and schizophrenia, educational attainment and possibly BMI. Previous literature links evening chronotype with schizophrenia41,42,43, consistent with our findings. These studies also demonstrate severe circadian sleep/wake disruptions in people with schizophrenia, indicating that this relationship may be bidirectional. However, our Mendelian randomization analyses did not support causal relationships between these two. It is possible that even with our large sample size, we are underpowered to rule out an effect of schizophrenia on chronotype.

We detect a surprising putative genetic link between morning chronotype and higher BMI. Previous observational studies have shown association of evening chronotype with higher BMI, poorer dietary habits and decreased inhibitions44,45,46,47. Consistently, we noted an association between evening chronotype and BMI (beta=1.003 BMI units/chronotype; P=1 × 10−4; r=0.011). Our genetic correlation analyses suggest the intriguing possibility that some underlying pathways contributing to morning chronotype might also contribute to increased BMI. However, independent replication and further large studies are required to fully understand the genetic and causal relationships between chronotype and BMI.

Until now, it has been difficult to discern causal relationships between chronotype and other traits because of potential biases in observational studies, for example, due to confounding or reverse causality, which are unlikely to affect genetic studies48. Our work suggests that tendency to eveningness chronotype is potentially causally related to increased educational attainment, but replication of these findings, and more comprehensive assessment of potential sources of bias will require future investigation. Previous studies have reported that night owls earn a larger mean income than their earlier rising counterparts49. Another study, performed at a top-ranked business school, demonstrated higher GMAT scores in evening types even within a high achieving group50. However, it is possible these results are impacted by misclassification in our self-reported measurement of chronotype. Although the question clearly asks for preference, participants might have been influenced by the reality of their working lives. Those from more deprived socioeconomic positions might have occupations that are more restrictive in terms of working hours and hence less able to ‘adhere’ to their preference. If this results in a relationship between socioeconomic position and misclassification then socioeconomic position would confound any observational associations. However, since participants are extremely unlikely to know their genotype for the variants we have identified, any misclassification of chronotype by genotype will be random with the expectation that the genetic correlation and Mendelian randomization studies would be biased towards the null.

Our study is well-powered to detect genetic variants associated with chronotype, with previous studies demonstrating the power of a sample size >100,000 for detecting genetic effects51. The study uses a single harmonized question across a large cohort, which is in contrast with previous studies that needed to harmonize data across several cohorts with varying measures of chronotype. Our measure of chronotype is based on self-identification, and may reflect timing preference more so than objective measures of chronotype and since it does not take weekday and weekend behaviour into account, any misclassification may be related to occupation and/or socioeconomic position. However, as noted above, for our genetic correlation and Mendelian randomization analyses this would be expected to bias findings towards the null. Our cohort is aged 40 to 69 years and of European ancestry, which reduces the likelihood of bias due to population structure, but means we cannot necessarily assume our results generalize to other groups. That said the distribution of chronotype is consistent with that found in previous studies52,53,54.

In summary, in a large-scale GWAS of chronotype, we identified 12 new genetic loci that implicate known components of the circadian clock machinery and point to previously unstudied genetic variants and candidate genes that might modulate core circadian rhythms or light-sensing pathways. Furthermore, genome-wide analysis suggests that chronotype shares underlying genetic pathways with educational attainment, schizophrenia and possibly BMI, and that evening chronotype might be causally related to higher educational attainment. This work should advance biological understanding of the molecular processes underlying circadian rhythms, and open avenues for future research in the potential of modulating circadian biology to aide prevention and treatment of associated diseases.

Methods

Population and study design

Study participants were from the UK Biobank study, described in detail elsewhere55. In brief, the UK Biobank is a prospective study of >500,000 people living in the United Kingdom. All people in the National Health Service registry who were aged 40–69 years and living <25 miles from a study centre were invited to participate between 2006 and 2010. In total, 503,325 participants were recruited from over 9.2 million mailed invitations. Self-reported baseline data was collected by questionnaire and anthropometric assessments were performed. For the current analysis, individuals of non-white ethnicity were excluded to avoid confounding effects.

Chronotype and covariate measures

Study subjects self-reported chronotype, sleep duration, depression, medication use, age, and sex on a touch-screen questionnaire. Chronotype was derived from responses to a chronotype question that participants answered, along with other study questions, on a touch-screen computer at each assessment centre. The question was taken from the Morningness–Eveningness questionnaire; it is the question from that questionnaire that explains the highest fraction of variance in preferences in sleep–wake timing and is an accepted measure of chronotype54. The question asks: ‘Do you consider yourself to be…’ with response options ‘Definitely a ‘morning’ person’, ‘More a ‘morning’ than ‘evening person’, ‘More an ‘evening’ than a ‘morning’ person’, ‘Definitely an ‘evening’ person’, ‘Do not know’, ‘Prefer not to answer’. This question specifically does not ask about actual sleeping pattern, nor does it distinguish between weekday and weekend behaviour and was accessed at the time of exam, which crosses days of the week and seasons across participants. In all, 498,450 subjects answered this question, but only the 153,000 with genetic data were considered for this analysis. Subjects who responded ‘Do not know’ or ‘Prefer not to answer’ were set to missing. Chronotype was treated both as a continuous trait, with chronotype coded 1–4, where 1 represents definite morning chronotype, and a dichotomous trait, with definite morning responders set to control (n=26,948) and definite evening responders set to case (n=8,724). Depression was reported in answer to the question ‘How often did you feel down, depressed or hopeless mood in last 2 weeks?’ (cases, n=4,279). Subjects with self-reported shift work (n=22,165) or sleep medication use (n=4,575) were excluded.

Genotyping and quality control

Of the 500,000 subjects with phenotype data in the UK Biobank, 153,000 are currently genotyped. Genotyping was performed by the UK Biobank, and genotyping, quality control and imputation procedures are described in detail here56. In brief, blood, saliva and urine were collected from participants, and DNA was extracted from the buffy coat samples. Participant DNA was genotyped on two arrays, UK BiLEVE and UKB Axiom with >95% common content and genotypes for 800,000 SNPs were imputed to the UK10K reference panel. Genotypes were called using Affymetrix Power Tools software. Sample and SNP quality control were performed. Samples were removed for high missingness or heterozygosity (480 samples), short runs of homozygosity (8 samples), related individuals (1,856 samples) and sex mismatches (191 samples). Genotypes for 152,736 samples passed sample quality control (99.9% of total samples). SNPs were excluded if they did not pass quality control filters across all 33 genotyping batches, with a missingness threshold of 0.90. Batch effects were identified through frequency and Hardy–Weinberg equilibrium tests (P value <10−12). Before imputation, 806,466 SNPs pass quality control in at least one batch (>99% of the array content). Population structure was captured by principal component analysis on the samples using a subset of high quality (missingness <1.5%), high-frequency SNPs (>2.5%) (100,000 SNPs) and identified the sub-sample of European descent. Imputation of autosomal SNPs was performed to a merged reference panel of the Phase 3 1000 Genome Project and the UK10K using IMPUTE3 (ref. 57). Data was prephased using SHAPEIT3 (ref. 58). In total, 73,355,677 SNPs, short indels and large structural variants were imputed. Post-imputation quality control was performed as previously outlined and an info score cutoff of 0.1 was applied. For GWAS, we further excluded SNPs with minor allele frequency (MAF) <0.00016, a threshold that represents a minimum 50 counts of each genotype, a conservative threshold. In total, 100,400 samples of European descent with high-quality genotyping and complete phenotype/covariate data were used for these analyses. Genotyping quality of two significant rare SNPs (rs1144566 and rs35333999) was verified by examination of genotyping intensity cluster plots (Supplementary Fig. 5). In addition, for two significant imputed rare SNPs we checked Information Quality Scores (info) and found these to be above the standard threshold of 0.40 used to indicate good imputation quality59 (rs141175086 info=0.48 and rs148750727 info=0.88). Considering the size of the genotyped UK Biobank cohort (N150,000), an information measure of 0.4 on a sample of 150,000 individuals indicates that the amount of data at the imputed SNP is roughly equivalent to perfectly observed genotype data in a sample of N60,000.

Statistical analysis

Genetic association analysis was performed in SNPTEST60 with the ‘expected’ method using an additive genetic model adjusted for age, sex, 10 principal components of ancestry and genotyping array. Genome-wide association analysis was performed separately for continuous chronotype and ‘extreme’ chronotype with a genome-wide significance threshold of 5 × 10−8. Follow-up analyses on genome-wide significant loci included sex interaction testing using a linear regression model including a sex*SNP interaction term, performed in R61, and conditional analysis using SNPTEST conditioning on the lead signal in each locus ±500 kb, covariate sensitivity analysis individually adjusting for sleep duration, sleep disorders, insomnia and depression/psychiatric medication use. Heritability was calculated using BOLT-Reml62. Post-GWAS analysis of LD Score Regression (LDSC)29,34,63,64 was conducted using all UK Biobank SNPs also found in HapMap3 and included publicly available data from 19 published genome-wide association studies, with a P value threshold of 0.0026 after Bonferroni correction for all 19 tests performed. Gene-based testing was performed using VEGAS31 on GWAS summary statistics from SNPs and samples passing rigorous quality control, and gene-set enrichment of genes significant after Bonferroni correction was performed using Web-Gestalt32. Given that gene-based tests like VEGAS are sensitive to missing data and may show inflation and low power if data for rare variants is missing65, we note low missingness rates (>80% of SNPs had over 99.5% genotyping call rate), and for <5% of SNPs that may have failed in a subset of 33 batches, imputation was used to infer missing genotypes. Furthermore, our gene-based testing included only single SNP association results for variants with over 50 minor allele counts. A Q–Q plot of inflation-adjusted gene-based results for 17,791 genes is shown in Supplementary Fig. 3. Pathway-based analysis to identify enrichment in biological processes, gene sets and tissues suggested by the top loci was performed in DEPICT33 for all SNPs present in 1KG phase 3 (ref. 66).

For Mendelian randomization analyses, the weighted genetic risk score was calculated by summing the products of the chronotype risk allele count for 15 SNPs multiplied by the scaled chronotype effect reported by 23andMe18 (that is, using weights from study independent from our own). The instrumental variable analyses were performed in R40 using the two-stage-least-squares method (TSLS function in the SEM package). The risk scores for education, schizophrenia and BMI were constructed using the GWS SNPs and weights from previously published GWAS67,68,69 and tested on chronotype using the summary statistics from our reported GWAS using the GTX package in R.

Additional information

How to cite this article: Lane, J. M. et al. Genome-wide association analysis identifies novel loci for chronotype in 100,420 individuals from the UK Biobank. Nat. Commun. 7:10889 doi: 10.1038/ncomms10889 (2016).