GWAS of 89,283 individuals identifies genetic variants associated with self-reporting of being a morning person

Circadian rhythms are a nearly universal feature of living organisms and affect almost every biological process. Our innate preference for mornings or evenings is determined by the phase of our circadian rhythms. We conduct a genome-wide association analysis of self-reported morningness, followed by analyses of biological pathways and related phenotypes. We identify 15 significantly associated loci, including seven near established circadian genes (rs12736689 near RGS16, P=7.0 × 10−18; rs9479402 near VIP, P=3.9 × 10−11; rs55694368 near PER2, P=2.6 × 10−9; rs35833281 near HCRTR2, P=3.7 × 10−9; rs11545787 near RASD1, P=1.4 × 10−8; rs11121022 near PER3, P=2.0 × 10−8; rs9565309 near FBXL3, P=3.5 × 10−8. Circadian and phototransduction pathways are enriched in our results. Morningness is associated with insomnia and other sleep phenotypes; and is associated with body mass index and depression but we did not find evidence for a causal relationship in our Mendelian randomization analysis. Our findings reinforce current understanding of circadian biology and will guide future studies.

A morning person prefers to rise and rest early, whereas a night person would choose a cycle later in the day. Chronobiology, the study of such differences (or chronotypes), began with Kleitman 1 suggesting their existence and Horne and Ostberg 2 designing a questionnaire for their definition. Morningness is governed by a circadian rhythm mediated by the suprachiasmatic nucleus (SCN) in the hypothalamus. The SCN is a network of cellular oscillators that are synchronized in response to light input received from the human retina 3 . Differences in circadian rhythm have been associated with medically relevant traits such as sleep 4 , obesity 5 and depression 6 .
Most genetic studies of circadian rhythm have been conducted on model organisms, beginning with the discovery of a first circadian clock gene per in Drosophila and CLOCK in mice (Supplementary Table 1). Human linkage studies have implicated PER2 in familial advanced sleep phase syndrome 7 and candidate gene studies 8,9 have found others. However, study sizes have been small and findings are not robust 10 . Furthermore, few genomewide association studies (GWAS) have been successful in identifying significant associations [11][12][13] . We analysed genetic associations of self-reported morningness using the 23andMe cohort (n ¼ 89,283) and identified a total of 15 genome-wide significant loci with seven of them close to wellestablished circadian genes such as PER2. We performed pathway analyses and found both circadian and phototransduction pathways enriched in our results. In addition, we observed significant associations between morningness and body mass index (BMI) and depression in our cohort but found no evidence to support a causal relationship in a Mendelian randomization (MR) analysis.

Results
Descriptions of GWAS study and cohort. We conducted a GWAS of self-reported morningness in the 23andMe participant cohort 14 , across a total of B8 million genotyped or imputed polymorphic sites. Morningness was defined by combining the highly concordant responses (Cohen's Kappa ¼ 0.95, Po1.0 Â 10 À 200 ) to two web based survey questions that ask if the individual is naturally a morning or night person (Supplementary Table 2). Among 135,447 who answered at least one survey, 75.5% were scored as morning or night persons. Individuals who provided neutral (n ¼ 32,842) or discordant responses (n ¼ 309) were removed (Supplementary Table 9). We did not find differences in age, gender or principal components (PCs; all P40.01) when comparing individuals who provided discordant responses versus individuals who gave concordant responses (n ¼ 12,442). We included individuals of European ancestry who had consented for research, and related individuals were removed from analysis (Methods section). Morningness is significantly associated with gender (P ¼ 4.4 Â 10 À 77 ), with a prevalence of 39.7% in males and 48.4% in females. Its prevalence increases with age (Po1.0 Â 10 À 200 ): 24.2% of those under 30-years-old prefer mornings compared with 63.1% of those over 60. This age trend is consistent with previous reported observations 15 . Table 1 (together with Supplementary Table 2) shows the marginal association between morningness and other sleep phenotypes, BMI and depression (defined in Supplementary  Table 3). Morning persons are significantly less likely to have insomnia (12.9 versus 18.3%, odds ratio (OR) ¼ 0.66, P ¼ 2.4 Â 10 À 74 ). They are also less likely to require 48 h of sleep per day (OR ¼ 0.67, P ¼ 1.1 Â 10 À 72 ), to sleep soundly (OR ¼ 0.74, P ¼ 8.5 Â 10 À 50 ), to sweat while sleeping (OR ¼ 0.8, P ¼ 1.0 Â 10 À 23 ) and to sleep walk (OR ¼ 0.77, P ¼ 4.7 Â 10 À 10 ). Morningness is also associated with lower prevalence of depression (OR ¼ 0.64, P ¼ 1.1 Â 10 À 128 , Supplementary Table 11). Morning persons are less prevalent in extreme BMI groups, namely the underweight (r18.5) and the obese (Z30) group (Table 1, Supplementary Fig. 2). However, we found that after for adjusting for age and sex, the prevalence of morning persons decreases monotonically across increasing BMI categories (Supplementary  Table 11). We included age, sex and the first 5 PCs in a logistic regression model and computed likelihood ratio tests for association of each genotyped or imputed marker with morningness. Association test results were adjusted for a genomic inflation factor of 1.21 (Supplementary Data 1). For an equivalent study of 1,000 cases and 1,000 controls, the genomic inflation factor (known as l 1,000 (ref. 16)) would be 1.005. The Manhattan plot (Fig. 1) shows 15 morningness-associated regions with genome-wide significance (Po5 Â 10 -8 ). Table 2 categorizes their index single nucleotide polymorphisms (SNPs) by nearby genes. We used Haploreg 17 , a web based computational tool to explore chromatin states, conservations and regulatory motif alterations using public databases, to understand the possible functional roles of these index SNPs (Supplementary Table 16 and Supplementary Data 2).
Genetic association analyses. Seven loci are near well-established circadian genes. rs12736689 (P ¼ 7.0 Â 10 À 18 ) is in strong linkage disequilibrium (LD) (r 2 ¼ 0.89) with the nonsynonymous variant rs1144566 (H137R) of nearby gene RGS16 ( Supplementary Fig. 3), a G protein signalling regulator that inactivates G protein alpha subunits. RGS16 knock-out mice were shown to have a longer circadian period 18 . rs9479402 (P ¼ 3.9 Â 10 À 11 ) is 54 kb upstream of VIP ( Supplementary  Fig. 4), a key neuropeptide in the SCN (ref. 19). Its intracerebroventricular administration was found to prolong rapid eye movement sleep in rabbits 20 . rs55694368 (P ¼ 3.9 Â 10 À 11 ) is 120 kb upstream of PER2 ( Supplementary  Fig. 5), which has been associated with human familial advanced sleep phase syndrome 7 . This SNP is located in a DNAse hypersensitive site (DHS) for five cell types, including pancreas adenocarcinoma, B-lymphocyte (GM12891 and GM12892), medulloblastoma and CD4 þ cells (Supplementary Table 16B), and alters five regulatory motifs. (See details in Supplementary Tables 16 and 17). rs35833281 (P ¼ 3.7 Â 10 À 9 ) is 18 kb downstream of HCRTR2, or orexin receptor type 2 ( Supplementary  Fig. 6) and alters eight regulatory motifs (Supplementary Table 16). Mutations in HCRTR2 have been linked to narcolepsy in dogs and humans 21,22 . This SNP rs35833281 is in partial LD with two SNPs (r 2 ¼ 0.25 for rs2653349 and r 2 ¼ 0.31 for rs3122169) on HCRTR2 that were suggested to associate with cluster headache and narcolepsy 23 . These SNPs were also but less significantly associated with morningness (P ¼ 3.6 Â 10 À 7 for rs2653349 and P ¼ 1.8 Â 10 À 6 for rs3122169). rs11545787 Supplementary  Fig. 7), a G protein signaling activator 24 and is a promoter histone mark for six cell types (H1, umbilical vein endothelial, B-lymphocyte, lung fibroblasts, skeletal muscle myoblasts and epidermal keratinocyte), in a DHS for seven cell types (skeletal muscle myoblasts, fibroblast, hepatocytes, medulloblastoma, epidermal melanocytes, pancreatic islets and fibroblasts) (Supplementary Table 16). In fact, deletion of RASD1 has been shown to result in a reduction of photic entrainment in mouse 25 . rs11121022 (P ¼ 2.0 Â 10 À 8 ), known to alter three regulatory motifs, is 8 kb downstream of PER3 ( Supplementary Fig. 8), which affects the sensitivity of the circadian system to light 26 and is involved in sleep/wake activity 27 . Variation in PER3 has also been associated with delayed sleep syndrome and extreme diurnal preference 28 . A recent smaller study 13 identified another SNP (rs228697) as a significant association with diurnal preference; however, this SNP is much less significant in our GWAS (P ¼ 5.3 Â 10 À 5 ) and is in low LD with our index SNP rs11121022 (r 2 ¼ 0.08). rs9565309 (P ¼ 3.5 Â 10 À 8 ), locating in a DHS for 16 cell types (Supplementary Table 16, Supplementary Data 2), is an intronic variant of CLN5 and is B2 kb downstream of FBXL3 ( Supplementary Fig. 9), part of the F-box protein family, which ubiquitinates light-sensitive cryptochrome proteins CRY1 and CRY2, and mediates their degradation 29 . Mutant FBXL3 mice were shown to have an extended circadian period 30 .
We found four additional SNPs are linked to genes that are plausibly circadian by literature review for reported potential connections between the genes and circadian rhythms. rs1595824 (P ¼ 1.2 Â 10 À 10 ) is an intronic variant of PLCL1 (Supplementary Fig. 10), which is expressed predominantly in the central nervous system and binds to the g-aminobutyric acid (GABA) type A receptor. rs12965577 (P ¼ 2.1 Â 10 À 8 ) is an intronic variant of NOL4 ( Supplementary Fig. 13), one of 20 genes with the most significant changes in expression in mice with a knockin mutation in the a1 subunit of the GABA(A) receptor 31 . As most SCN neuropeptides are colocalized with GABA (ref. 32) and most SCN neurons have GABAergic synapses 33 , it is possible that PLCL1 and NOL4 have circadian roles. rs34714364 (P ¼ 2.0 Â 10 À 10 ), an enhancer histone mark, known to alter 11 regulatory motifs, a synonymous variant of gene CA14, is 3 kb away from APH1A ( Supplementary Fig. 11). APH1A encodes a component of the g-secretase complex which cleaves the b-amyloid precursor protein 34 , and is regulated by orexin and the sleep-wake cycle 35 . This relationship of g-secretase and sleep-wake cycle suggests a circadian role for APH1A, but this region has many genes and further work is needed to verify this hypothesis. rs3972456 (P ¼ 6.0 Â 10 À 9 ), locating in a DHS for 8 cell types and known to alter three regulatory motifs, is an intronic variant of FAM185A and is 16 kb away from FBXL13 ( Supplementary Fig. 12). FBXL13 also encodes a protein-ubiquitin ligase and may have a circadian role similar to FBXL3. The relationship of the remaining loci to circadian rhythm is less clear. rs12927162 (P ¼ 1.6 Â 10 À 12 ) is 104 kb upstream of TOX3 ( Supplementary Fig. 14), a gene associated with restless leg syndrome 36 . The regional plot around rs12927162 shows that the next best SNP only has a P value of 10 À 6 . This SNP alters a POU2F2 motif, but we found no other functional annotation, and additional work is needed to verify this association. Notably, this SNP is not in LD (r 2 ¼ 1.2 Â 10 À 4 ) with the reported SNP rs3104767 for restless leg syndrome 36 and SNPs rs3803662 and rs4784227 for breast cancer 37,38 (Supplementary Table 12). And none of these SNPs have strong association with morningness (P40.01). rs10493596 (P ¼ 8.0 Â 10 À 12 ) is 21 kb upstream of AK5 ( Supplementary Fig. 15), a gene that regulates adenine nucleotide metabolism expressed only in the brain 39 Supplementary Fig. 16), known to locate in a DHS for three cell types and alter four motifs, is 192 kb downstream of DLX5 and 118 kb upstream of SHFM1, a region linked to split hand/foot malformation. rs6582618 (P ¼ 1.5 Â 10 À 8 ) is 2 kb upstream of ALG10B ( Supplementary Fig. 17), a gene with a role in regulation of cardiac rhythms 40 .
For the above significant loci, we performed stepwise conditional analyses to identify potential additional associated variants that are within 200 kb of the index SNPs. We iteratively added new SNPs into the model until no SNP had Po1.0 Â 10 À 5 . We identified one new SNP (Supplementary Table 6) respectively for the locus close to VIP (rs62436127, P ¼ 1.6 Â 10 À 6 ), APH1A (rs10888576, P ¼ 5.0 Â 10 À 6 ) and PER2 (rs114769095, P ¼ 9.7 Â 10 À 6 ). Accounting for the B15,000 total SNPs that we included in our conditional analysis, the secondary hit around VIP is significant (Po3.3 Â 10 À 6 ) but the other two are not.
We tested for interaction between these SNPs and age, gender, BMI, alcohol abuse, nicotine abuse and current caffeine use (see Supplementary Table 1 for definitions). First, we added each covariate into the null model of morning person versus age, sex and five PCs. Effects of BMI (OR ¼ 0.97 kg À 1 m À 2 , P ¼ 1.0 Â 10 À 125 ) and nicotine abuse (OR ¼ 0.71, P ¼ 3.9 Â 10 À 41 ) were significant (Supplementary Table 7A). We then added each SNP into each new null model. Effect sizes were not substantially altered, though P values generally became less significant, consistent with the degree of reduction in sample size for these covariates (Supplementary Table 7B). We also added interaction terms (Supplementary Table 7C) for the significant SNPs and covariates to each model and found none that would be significant after accounting for multiple testing. In addition, we estimated SNP effects in three age groups (o45, 45-60 and 460) and found them consistent across these groups (P40.01, Supplementary Table 7D). We also estimated 21% (95% confidence interval (CI; 13%, 29%)) of the variance of the liability of morningness can be explained by genotyped SNPs, using Genome-wide Complex Trait Analysis (GCTA) (ref. 41) on a random subset of 10,000 samples due to computational constraints. Finally, we included the 'neutral' responders and defined a chronotype phenotype to describe morning, neutral and night person and then performed GWAS on it using a linear model with adjustment of age, sex and top five PCs. We found the results are largely similar to our morning-person GWAS. Detailed comparison ( Supplementary  Fig. 18) shows that in the chronotype GWAS the loci near FBXL3, RASD1 and NOL4 were no longer genome-wide significant. Two additional loci reached genome-wide significance at rs2975734 in MSRA (Supplementary Fig. 19) and rs9357620 in PHACTR1 ( Supplementary Fig. 20, Supplementary Table 10). MSRA has been related to circadian rhythms in Drosophila 42 . PHACTR1 has not been reported to relate to circadian rhythms but has known associations with myocardial infarction 43 .
Pathway analyses. We used MAGENTA (ref. 44) to evaluate whether any biological pathways were enriched in our GWAS results ( Table 3). The top three pathways are circadian related and share four genes: PER2 (gene based P value ¼ 1.6 Â 10 À 8 ), , in the KEGG circadian rhythm pathway, and FBXL3 (P ¼ 9.4 Â 10 À 8 ), in the REACTOME circadian clock pathway, have strong effects and were implicated in our GWAS. Other circadian genes also contribute to the enrichment of circadian pathways, but less significantly ( Table 3). The BH4 related pathway (gene set P ¼ 3.1 Â 10 À 3 ) has a major role in the biosynthesis of melatonin, serotonin and dopamine, which are important hormones involved in circadian rhythm regulation and brain function. The phospholipase C (PLC) b-mediated events pathway (P ¼ 4.3 Â 10 À 3 ) includes GNAO1 (P ¼ 6.2 Â 10 À 4 ), GNAI3 (P ¼ 5.5 Â 10 À 3 ), GNAT1 (P ¼ 1.0 Â 10 À 2 ) and many other G protein related genes involved in visual phototransduction. GNAT1 is related to night blindness 45 and GNAI3 is known to interact with RGS16 (ref. 46). Interestingly, RGS16 is close to our GWAS top hit. This pathway also includes PRKAR2A (P ¼ 1.4 Â 10 À 3 ) and PRKACG (P ¼ 0.047), which relate to cAMP dependent protein kinase A, known to regulate critical processes in the circadian negative feedback loops 47 . Notably, except for the KEGG circadian rhythm pathway, which has a false discovery rate 0.06, all other associated pathways have false discovery rate 40.2, meaning the statistical evidence of the association is not strong.

MR analyses.
We used a MR approach to find evidence in support of a causal relationship of morningness with BMI. We first calculated a morningness genetic risk score by summing the risk alleles of the seven circadian related SNPs weighted by their effects, then regressed morningness or BMI against this instrument variable while adjusting for covariates (age, sex and top five PCs), and consequently estimated the ratio of the covariate-adjusted genetic effect for morningness to that for BMI (Methods section). Morningness is highly correlated (F statistic ¼ 19.0, P ¼ 2.1 Â 10 À 80 in the linear regression model and P ¼ 1.5 Â 10 À 79 in the logistic regression) with the genetic risk, but BMI (P ¼ 0.43) is not (Table 5). We further estimated the transferred genetic effect, that is, the effect from genetically elevated chance of being a morning person on BMI as À 0.34 kg m À 2 (95% CI: ( À 0.99, 0.96), P ¼ 0.91) per unit increase of probability of being a morning person. Similarly, we found that depression is not significantly correlated with morningness genetic risk (P ¼ 0.10). We estimated a non-significant transferred genetic effect of morningness on depression: the probability of depression decreases by 0.07 (95% CI: ( À 0.10, 0.11), P ¼ 0.18) per unit increase of probability of being a morning person. Thus, we did not find evidence for morningness to be protective of depression or high BMI. Notably, the power of the MR analysis is governed by the strength of the correlation between morningness and its genetic risk as well as the magnitude of the transferred genetic effect of morningness on BMI or depression. We ran simulations (Methods section) to assess the power for our MR and found that  BMI, body mass index; CI, confidence interval; MR, mendelian randomization; OR, odds ratio. *Effect size is OR for binary phenotypes and slope (unit increase) for continuous phenotypes in regression analysis. In MR analysis, it is the transferrable genetic effect, which is the ratio of two genetic effects estimated by regressions. The genetic effect is the average difference of prevalence for binary phenotypes and is the average slope for continuous phenotypes. w The morningness genetic risk is calculated by the sum of the risk alleles of the seven genome-wide significant loci that are close to well-known circadian genes, weighted by their effect size estimated in our morning person GWAS (Table 1). z The BMI genetic risk is calculated by the sum of a set of 28 reported BMI associated alleles (Supplementary Table 3) weighted by the unit change of BMI per additional copy of the associated allele 53 . our current sample sizes, though large by conventional standards, only lead to moderate power in our MR analysis of morningness and BMI and depression (Supplementary Table 8). If the observed correlation is entirely causal, our analysis has only B40% power. Our reported lack of statistical evidence in our MR analysis could be due to constrained study power.
We also conducted an MR analysis of BMI on morningness. We retrieved the morningness GWAS results for a set of 28 previously reported BMI associated SNPs (ref. 53) and found rs1558902, an intronic variant of FTO, had some evidence for association with morningness (P ¼ 6.0 Â 10 À 6 , Supplementary Table 5). We then calculated a BMI genetic risk with this set of SNPs using the previously reported effect sizes. It is highly correlated with BMI (F statistic ¼ 47.4, Po1.0 Â 10 À 200 ) but we found it to be uncorrelated with morningness (P ¼ 0.26) and found no support for a causal relationship (transferred genetic effect ¼ 0.0029, 95% CI: ( À 0.0059, 0.006), P ¼ 0.35). Our power calculation (Supplementary Table 8) shows that this MR analysis is well powered (B80%) to show evidence of a causal relationship between BMI and morningness, assuming the observed correlation is entirely causal.

Discussion
We identified many loci significantly associated with morningness but were unable to find clear genetic associations in our GWAS analysis of related sleep phenotypes, such as insomnia, sleep apnoea, sleep needed, sleeping soundly and sweating while sleeping. These sleep phenotypes may be more genetically heterogeneous and our current sample sizes, while large by most standards, maybe still be too small for discoveries. It is also possible that environmental factors mediate the association between morningness and these sleep phenotypes. These other phenotypes may also be more subject to possible self-reporting bias. We assessed morningness with simple questions and did not consider light exposure, season, geography and other factors, it is possible that better results would be obtained from using moredetailed surveys (such as the standard Horne-Ostberg questionnaire 2 ). We have also considered the effect of smoking, drinking and caffeine consumption in our analysis but with limited thoroughness for these phenotypes. More-detailed phenotyping would be desirable for future studies, though GWAS typically do not adjust for such factors. An analysis including more refined estimates of these covariates would yield more accurate estimates of effect sizes and could reveal information about mechanism, if some associations with sleep phenotypes are mediated by these other behaviours.
For known circadian genes such as DEC1, DEC2, BMAL1, CRY1 and CRY2, we did not find signals that were genome-wide significant. Specifically, within 100 kb windows of each gene, we had 1,712 SNPs for DEC1 with a minimum P value 0.01; we had 689 SNPs for DEC2 with a minimum P value 7.8 Â 10 À 4 ; we had 835 SNPs for BMAL1 and a minimum P value of 1.0 Â 10 À 6 ; we had 804 SNPs for CRY1 and a minimum P value of 4.4 Â 10 À 6 ; we have 504 SNPs for CRY2 and a minimum P value of 8.9 Â 10 À 6 . Some of these genes may have a less important role in morningness, or may not have genetic variation that could be identified by GWAS. However, the associations in BMAL1, CRY1 and CRY2 are suggestive and additional data may confirm signals in these genes.
Another large-scale genetic study of chronotype 54 using the UK Biobank has recently been completed, with results largely consistent with our own. Specifically, that study reports genomewide-significant loci at RGS16, AK5, PER2 and HCRTR2, as well as near FBXL13 and APH1A. Further work will be needed to assess replication of other loci not genome-wide significant in both studies.
Our MR analysis did not provide evidence for a causal relationship of morningness on BMI or depression. We have checked MR assumptions (Supplementary Tables 13-15). The F statistics is 19.0 in the linear regression of morning person. Since morningness is binary, we calculated a generalized coefficient of determination for logistic regression: Nagelkerke's R 2 ¼ 0.0056. Without direct translation between the R 2 and the F statistics, we assumed that this small scale of R 2 could indicate our instrument is weak and our MR analysis could be underpowered. We verified that PCs are associated with both risk and outcomes (Supplementary Table 13A), so we have adjusted for them in our MR analysis (Supplementary Table 13A). In addition, we found the preference of sweet foods (effect ¼ 0.147, P ¼ 7.1 Â 10 À 3 , Supplementary Table 14) is moderately associated with morningperson genetic risk (OR ¼ 0.15, P ¼ 7.1 Â 10 À 3 ) and BMI (OR ¼ 1.009 kg À 1 m À 2 , P ¼ 1.5 Â 10 À 10 , Supplementary Table 14A) and depression (OR ¼ 1.24, P ¼ 6.9 Â 10 À 27 ). We hence included the preference of sweet foods in our MR analysis but found no changes in our conclusion (P40.01).
In addition, we also checked for assumptions in our MR analysis of BMI on morningness (Supplementary Tables 13-15). We found PCs are associated with BMI risk and morning person (Supplementary Table 13B). We also identified current caffeine use (Supplementary Table 14B) is associated with BMI genetic risk (effect ¼ 10.5 kg À 1 m À 2 , P ¼ 1.3 Â 10 À 6 ) and morning person (effect ¼ 18.2, P ¼ 4.5 Â 10 À 11 ). Adjusting for PCs and current caffeine use did not lead to change to our result (Supplementary Table 15B). Our MR analysis could not rule out canalization or developmental compensation, by which individuals adapt in response to genetic change in a way that the expected effect of the change is reduced 55 . Our analysis also did not test for non-linear relationships between the phenotypes.
Among BMI risk SNPs, we found an FTO variant strongly correlated with morningness (Supplementary Table 5). Our MR analysis using a BMI genetic risk score as an instrument variable did not find evidence to support a more general effect of BMI genetic risk on morningness. There may be pleiotropy specifically at the FTO locus, instead of a more general casual effect of BMI. Moreover, their strong association may reflect effects of other factors, such as environment, socioeconomics, personality or other genetic variables through independent mechanisms. Methods 23andMe cohort. Participants in the 23andMe cohort were customers of 23andMe, Inc., a personal genetics company, who had been genotyped as part of the 23andMe Personal Genome Services. DNA extraction and genotyping were performed on saliva samples by the National Genetics Institute (NGI), a Clinical Laboratory Improvement Amendments (CLIA)-certified clinical laboratory and subsidiary of the Laboratory Corporation of America. Samples were genotyped on one of two platforms. About 35% of the participants were genotyped on the Illumina HumanHap550 þ BeadChip platform, which included SNPs from the standard HumanHap550 panel augmented with a custom set of B25,000 SNPs selected by 23andMe. Two slightly different versions of this platform were used, as previously described 14 . The remaining 65% of participants were genotyped on the Illumina HumanOmniExpress þ BeadChip. This platform has a base set of 730,000 SNPs augmented with B250,000 SNPs to obtain a superset of the HumanHap550 þ content, as well as a custom set of B30,000 SNPs. Every sample that did not reach a 98.5% call rate for SNPs on the standard platforms was reanalyzed. Individuals whose analyses repeatedly failed were contacted by 23andMe customer service to provide additional samples.
We collected phenotypes by inviting participants to login in www.23andme.com to answer surveys that are either comprehensive ones with multiple questions on a subject matter or quick questions. We defined phenotypes by combining the answers to questions on the same subject. For example, as shown in Supplementary Table 2, our morning person phenotype definition is from combining the answers to two questions that asking if the participant is naturally a night person or morning person. For each question, we classify the answer as night person, morning person or missing. Then if one answer is missing, we use the other answer as the phenotype value, and if one answer is morning person but the other is night person, we treated the phenotype value as missing. Similarly, we used appropriate combination rules to derive other phenotypes from multiple survey questions (see more in Supplementary Table 2).
The study protocol and consent form were approved by the external Association for the Accreditation of Human Research Protection Programsaccredited Institutional Review Board, Ethical & Independent Review Services. For a small number of participants (n ¼ 167) under the age of 18 years, consent was provided by a parent, guardian or legally authorized adult.
GWAS analysis for 23andMe European samples. For our standard GWAS, we restricted participants to a set of individuals who have 497% European ancestry, as determined through an analysis of local ancestry via comparison to the three HapMap 2 populations 56 . A maximal set of unrelated individuals was chosen for the analysis using a segmental identity-by-descent estimation algorithm 57 . Individuals were defined as related if they shared 4700 cM identity-by-descent, including regions where the two individuals share either one or both genomic segments identical-by-descent. This level of relatedness (roughly 20% of the genome) corresponds approximately to the minimal expected sharing between first cousins in an outbred population.
Participant genotype data were imputed against the August 2010 release of 1,000 Genomes reference haplotypes 58 . First, we used Beagle 59 (version 3.3.1) to phase batches of 8,000-9,000 individuals across chromosomal segments of no 410,000 genotyped SNPs, with overlaps of 200 SNPs. We excluded SNPs with minor allele frequencyo0.001, Hardy-Weinberg equilibrium Po10 À 20 , call rateo95%, or with large allele frequency discrepancies compared with the 1,000 Genomes reference data. We identified the discrepancies by computing a 2 Â 2 table of allele counts for the European 1,000 Genomes samples and 2,000 randomly sampled 23andMe customers with European ancestry and excluded SNPs with w 2 test P value o10 À 15 . We then assembled full-phased chromosomes by matching the phase of haplotypes across the overlapping segments. We imputed each batch against the European subset of 1,000 Genomes haplotypes using Minimac (2011-10-27) 60 , using five rounds and 200 states for parameter estimation.
For the non-pseudoautosomal region of the X chromosome, males and females were phased together in segments, treating the males as already phased; the pseudoautosomal regions were phased separately. We assembled fully phased X chromosomes, representing males as homozygous pseudo-diploids for the nonpseudoautosomal region. We then imputed males and females together using Minimac as with the autosomes.
For morning and night person comparisons, we computed association test results by logistic regression assuming additive allelic effects. For tests using imputed data, we used the imputed dosages rather than best-guess genotypes. We used covariates age, gender, and the top five PC to account for residual population structure. The GWAS association test P values were computed using a likelihood ratio test. Results for the X chromosome are computed similarly, with men coded as if they were homozygous diploid for the observed allele.
Imputed results were computed for 7,381,496 SNPs having an average imputation r 2 40.5 and a minimum within-batch r 2 40.3, and removing SNPs with evidence of a strong batch effect (Po10 À 50 ), measured by ANOVA of dosages versus batches. For genotyped SNPs, we identified 854,959 SNPs with a minor allele frequency 40.1%, call rate 490%, Hardy-Weinberg P410 À 20 in European 23andMe participants and P410 À 50 for an effect of genotyping date on allele frequency. To create a single merged result set, for 806,041 SNPs with both imputed and genotyped results passing these quality filters, we selected the imputed result. After applying these filters and removing a small number of results that did not converge, we were left with association test results for 7,427,422 SNPs.
Pathway analysis of morningness. We first downloaded a database of canonical pathways of 1,320 biologically defined gene sets 61 , then used gene set enrichment analysis 61 , implemented in MAGENTA (ref. 44), on our morningness GWAS results. MAGENTA first assigns SNPs to a gene within 110 kb upstream and 40 kb downstream of transcript boundaries. The most significant SNP in this gene is then adjusted for confounders (gene size, SNP density and LD) in a regression framework to obtain a score for each gene. Then genes are then ranked according to their scores and then a gene set enrichment analysis-based approach was used to test whether predefined sets of functionally related genes are enriched for genes associated with morning person, more than would be expected by chance. MAGENTA counts the number of genes (enrichment score) with scores ranking above the 95th percentile. To evaluate the significance of each pathway, MAGENTA randomly sample 10,000 gene sets from the genome that are of identical size to the pathway and compare the observed enrichment score to the resampled enrichment score. To adjust for multiple testing, it estimates the false discovery rate by comparing the observed normalized enrichment score to all resampled normalized enrichment score.
Relationship of morningness and other phenotypes. For binary phenotypes such as insomnia, sleep apnoea, we used a logistic regression to estimate the effect of morningness after adjusting for age, sex and top five PCs. For the continuous phenotype BMI, we used a linear regression model instead. Morningness is part of sleep and its aetiology intertwines with other sleep phenotypes, so it is difficult to dissect the causal relationship. But BMI and depression are not directly involved in sleep, we can treat the genetic risk of being a morning person as an instrument variable for causal inference. We calculated a morningness genetic risk by averaging the genotypes of the seven SNPs that are close to known circadian genes weighted by their log odds ratio. Then we carried out a MR analysis to evaluate the causal role of morningness. The transferred genetic effect on morningness is estimated by dividing the genetic risk effect of the phenotype to that of the morning-person phenotype. For a continuous phenotype (for example, BMI), this genetic risk effect is the change of mean per unit increase of risk estimated by a linear regression. For a binary phenotype (for example, depression and morningness), this genetic risk effect is the change of probability per unit increase of risk estimated by a logistic regression model 62,63 . The 95% CI of the transferred genetic effect is estimated using the Bootstrap, where we resampled the cohort for 1,000 times to obtain resampled transferred genetic effects. We then calculated two-sided P value by comparing the observed transferred genetic effect and the resampled transferred genetic effects.
For the analysis of the transferred genetic effect of BMI on morningness, we calculated a BMI risk by averaging the genotypes of a total of 28 previously reported BMI loci weighted by their reported effect sizes 53 . We then estimated the transferred genetic effect as the ratio of the genetic risk effect on morningness to that on BMI. The confidence interval and statistical inference is done using Bootstrapping.
In addition, we performed simulations to evaluate the power of our MR analysis 64 . For the analysis of morningness to depression, we kept the genetic risk, age, sex, top five PCs and morning person as observed, and then simulated depression from a Bernoulli distribution with expectation calculated from a logistic regression model. In that model, we included morningness, age, sex and top five PCs as predictors. Their effects were estimated by regressing our observed depression phenotypes against these predictors, except that for morningness, we specified its transferred genetic effect using hypothetical values. Similarly, we simulated BMI using a linear regression with predictors as morningness and other covariates, with the effect of covariates estimated from our data and the transferred genetic effect of morningness hypothetically chosen. The simulation to evaluate the causal role of BMI on morningness was conducted in a similar fashion.