Genome-wide association study identifies 143 loci associated with 25 hydroxyvitamin D concentration

Vitamin D deficiency is a candidate risk factor for a range of adverse health outcomes. In a genome-wide association study of 25 hydroxyvitamin D (25OHD) concentration in 417,580 Europeans we identify 143 independent loci in 112 1-Mb regions, providing insights into the physiology of vitamin D and implicating genes involved in lipid and lipoprotein metabolism, dermal tissue properties, and the sulphonation and glucuronidation of 25OHD. Mendelian randomization models find no robust evidence that 25OHD concentration has causal effects on candidate phenotypes (e.g. BMI, psychiatric disorders), but many phenotypes have (direct or indirect) causal effects on 25OHD concentration, clarifying the epidemiological relationship between 25OHD status and the health outcomes examined in this study.


Supplement intake groups
Supplement intake information was obtained from two UKB data fields -104670 (indicating whether individuals replied to the survey on supplement intake) and 20084 (summarising which supplements were taken). Fields 104670 and 20084 were assessed at 5 instances. Using information from all timepoints, four groups were defined, namely: (1) none (individuals who always reported not taking supplements; N = 90,992), (2) other (individuals who ever reported taking other supplements, but never vitamin D; N = 76,863), (3) vitamin D (individuals who ever reported taking vitamin D, and may also have taken other supplements; N = 11,635), and (4) missing (individuals with no information on supplement intake in any of the five timepoints; N = 238,090).

Principal components
Principal components (PCs) were calculated with UKB genotype calls, which were LD pruned (LD r 2 < 0.01) and had long-range LD regions removed by the UKB team. A total of 137,102 variants with MAF > 0.01, HWE P-value < 10 -6 and genotype missingness < 0.05 in unrelated Europeans were used to calculate the PCs with flashPCA 1 . PCs were calculated for unrelated Europeans and projected onto the complete set Europeans.

SUNLIGHT GWAS summary statistics
The SUNLIGHT data were downloaded from https://drive.google.com/drive/folders/0BzYDtCo_doHJRFRKR0ltZHZWZjQ (file 25HydroxyVitaminD_QC.METAANALYSIS1.txt). In total, 79,366 individuals from 31 GWAS cohorts of European descent in Europe, Canada and USA were included in that study. The SUNLIGHT consortium performed genome-wide analyses within each cohort using a uniform analysis plan. Specifically, additive genetic models were fitted using linear regression on natural-log transformed 25OHD. Month of sample collection, sex, age, body mass index, and principal components (PCs) were included as covariates. Then, a fixed-effect inverse variance weighted meta-analysis was conducted using the METAL 2 software package, with control for the population structure. SNPs with a MAF ≤ 0.05, imputation info score ≤ 0.8, HWE ≤ 1× 10 −6 , and less than two studies or 10,000 individuals contributing to each reported SNP association were removed. After quality control, the SUNLIGHT consortium made summary statistics on 2,579,297 SNPs available for download.

Imputation of the SUNLIGHT summary statistics to 1KGP using ImpG
Since individual-level genotypes are not available from the SUNLIGHT consortium, we imputed the SUNLIGHT summary statistics to 1000 Genome Project (1KGP) using ImpG 3 . The haplotype reference panel files (EUR) and SNP mapping files were obtained from 1KGP phase 1 (release v3). Since information on allele frequency was not available in the SUNLIGHT summary statistics, we imputed the allele frequency info using the 1000G imputed ARIC data. We removed SNPs that had mismatched alleles in the two cohorts. After imputation, we further removed SNPs with imputation accuracy metric (r2pred score) < 0.6, and were left with 7,667,712 SNPs. We extracted 6,912,294 SNPs that were available both in the imputed SUNLIGHT and the UKB summary statistics. We further checked the consistency of allele frequencies and the heterogeneity of effect sizes of the SNPs in both cohorts.

Sample size based meta-analysis
Since the effect size estimates (betas) and their standard errors (se) are not in entirely consistent units across the two cohorts, we adopted a sample size-based approach 2 to perform the metaanalysis. This approach converts the direction of effect and P-value reported in each cohort into signed z-scores, which are combined across studies into a weighted sum (based on sample size) for each allele. However, since we have SNPs with very small P-values, which are recognized as zero by the METAL 2 software and cannot be converted to z-score, we computed the z-score using the betas and se, and computed the meta-analysis using the same equations and tests in R as implemented in the METAL package.

UKBR replication sample
From the UKB sample we identified 5,402 samples (Supplementary Figure 7, green dots) not included in our analysis but who had values of PC1 (PC1 < 0.003) just outside the boundary used for White European (Supplementary Figure 7, blue dots). From these we identified 2,384 individuals not related to each other or to anyone in the main analysis (i.e. all GRM off-diagonals <|0.05|. From these we randomly selected 1,632 individuals to match the size of the QIMR sample.

Supplementary Note 1. Heritability and SNP-based heritability
Our UKB sample included a set of 58,738 individuals related with coefficient of relationship (r) > 0.2 to at least one other person in the set ("all relatives"), from whom we estimate the heritability of 25OHD to be 0.32 (s.e. = 0.01) (Supplementary Figure 2, Supplementary Data 1). Among these, a set of 1 st degree relatives (pairs with 0.4 < r < 0.6; N = 44,783) and set of 2 nd degree relatives (0.2 < r < 0.3; N = 15,771) were identified. Heritability estimated from all relatives was 0.32 (s.e. = 0.01). Consistent with some confounding of genetic and shared environment factors between closer family relatives, the heritability estimated from 1 st degree relatives (0.33, s.e. = 0.01) was higher than that estimated from 2 nd degree relatives alone (0.29, s.e. = 0.04), but the difference was not significant (P = 0.33; Z-test difference between two estimates, H0:difference = 0) The SNP-based heritability estimate (0.13, 95% CI 0.12-0.14) is higher than estimate from the previous published meta-analysis (SUNLIGHT consortium 0.075%, 95% 0.039-0.11), which may reflect reduced noise in a single cohort study (Supplementary Figure 8) as previously been demonstrated for height and BMI 4 .
We also estimated SNP-based heritability using three summary-data-based methods: SBayesR 5 , SBayesS 6 , and LDSC 7 . As expected 8 , estimates from GWAS summary statistics were slightly lower than GREML estimates (Supplementary Figure 2, Supplementary Data 1). SBayesR, a Bayesian method that models SNP effects based on a mixture of normal distributions and a point mass at zero, gave a SNP-based heritability estimate of 0.13 (s.e. = 1.4 x 10 -3 ). SBayesS is another Bayesian method to estimate SNP-based heritability, which assumes a zero-normal mixture distribution for the SNP effect, with effect size-MAF relationship incorporated as a free parameter (S). The genetic architecture described by the SBayesS p estimate is 0.8% which implies ~9,000 of common SNPs affect variation in 25OHD, and S parameter of -0.78 (which represents the relationship between MAF and effect size, and which is zero under a neutral model) which shows evidence of negative selection. These results can be compared to the average of 44 complex traits which had mean SNPbased heritability of 0.18. Of these, the most polygenic traits (body fat percentage, educational attainment and schizophrenia) had about 5% (~55,000) SNPs with nonzero effects, while the least polygenic traits (prostate cancer, age at menopause and male pattern baldness) were affected by about 0.1-0.3% (1,000-3,000) common SNPs. Across these traits the median S parameter was = -0.578. The genetic architecture of vitamin D has previously been described as oligogenic, this remains an appropriate descriptor relative to these other complex traits and acknowledges that the architecture includes some variants of comparatively large effect (maximum ~1.6% for rs1352846, Supplementary Data 3).
Heritability estimates obtained with LDSC (0.10, s.e. = 0.01) were comparable, but less accurate (larger standard errors), than those obtained with SBayesS. This is likely due to the more conservative method (Jackknife) that LDSC uses to estimate the variance of the estimate In summer, mean 25OHD levels were always higher than in winter (58.0 vs 41.  Table 1). On the other hand, the standard deviation of 25OHD levels measured in summer was only higher than winter when assessment was made on the observed scale (19.8 vs 19.3 in nmol•L -1 ; 0.9 vs. 1 in units of RINT(25OHD) applied to the total sample; Supplementary Figure 1c). Regardless of the method used for the analysis, the SNP-based heritability for 25OHD levels assessed in summer was always higher than in winter. The genetic correlation between the seasons is (0.80, s.e.= 0.11). While this is not significantly different from one, there may be genetic heterogeneity in 250HD levels between summer and winter, as investigated in GxE analysis.

Supplementary Note 2. 25OHD as a binary trait
While there is some debate about the clinical threshold for vitamin D deficiency (25 nmol•L -1 or 30 nmol•L -1 serum concentration) 9,10 , we chose the more conservative threshold of 25 nmol•L -1 . In the UKB sample 12% of participants are vitamin D deficient by this definition. To investigate a hypothesis of non-linear SNP effects on 25OHD we compared our reported results with analyses treating 25OHD as a binary trait. Here, we used 25OHD residuals after regression on known covariates and labelled those with the lowest 12% of values as deficient. We conducted fastGWA analyses on deficient cases (labelled 0 so that the sign of effect sizes aligned with those from the quantitative trait; N = 50,110) with these combinations of non-deficient controls (labelled 1).
1. Screened controls: Top 88% in the distribution (i.e. all non-deficient samples, using all observations; N = 367,470; two groups with unequal sample sizes). 2. Super-screened controls: Top 12% (i.e. a subsample of the non-deficient sample with the highest 12% 25OHD concentration, N = 50,110; two groups with equal sample sizes). 3. Down-sampled controls: Random 12% that are not in the bottom 12% (N = 50,110; two groups with equal sample sizes).
From theory (see Rmarkdown file available cnsgenomics.com) the comparative power of these analyses is shown in Supplementary Figure 9. For a locus that explains 0.015% of the variance we have 99% power to detect with 25OHD as a quantitative trait (N = 417,580), 27% power to detect with the screened control data, 83% to detect with the super-screened controls (despite reduced N), and for comparison 0.03% power for the down-sampled controls. Using the estimated variance explained by each of the 142 autosomal loci, calculated as where $ % is the allele frequency estimated in the UK Biobank, and * + , is the SNP effect estimate then the relative power assuming linearity of effects is 1 : 0.64 : 0.36 : 0.21 for quantitative trait: super-screened controls; screened controls: down-sampled screened controls. Although the ratios of relative power are not expected to translate directly to ratios of genome-wide significant results, we note that before clumping (which doesn't account for different loci having different LD etc.) we identify 18,337 GWS loci for the quantitative trait and the ratios were 1 : 0.65 : 0.23 : 0.13. After application of COJO to identify independent loci the ratios are: 1 : 0.63 : 0.26 : 0.11 (reflecting 142 : 90 : 37 : 15 COJO SNPs).
Comparing the best powered analysis of the quantitative trait vs. the super-screened binary trait we find a linear relationship in effect size estimates (Supplementary Figure 10). These results follow expectation assuming that the binary trait is truncated from a normal distribution.
We note that if only 25OHD deficiency is associated with adverse outcomes, then this finding has important public health implications -interventions designed to increase the 25OHD concentration in the non-deficient population are unlikely to prevent adverse outcomes associated with vitamin D deficiency 11 . Finally, while common variants can influence 25OHD concentrations, it is important to acknowledge that rare variants that abolish key vitamin D-related functions (i.e. homozygous mutations that severely disrupt the function of key enzymes or of the vitamin D receptor) are associated with clinically prominent adverse phenotypes 12 (e.g. bone defects related to impaired calcification). Although a threshold effect is likely to be present, where profoundly lowered vitamin D levels do increase risk of fracture, our mendelian randomization results strongly suggest that increasing levels of vitamin D in the (non-deficient) general population is unlikely to decrease risk of fracture. As the genetic architecture of the broad range of vitamin D-related phenotypes is improved, issues related to potential threshold effects (e.g. disease-specific thresholds for clinical deficiency) should be should be re-examined.

Supplementary Note 3. SUNLIGHT and UKB comparisons
Summary statistics from the SUNLIGHT consortium 13 were available for 2,579,297 SNPs and the genetic correlation estimate with UKB results was not significantly different from 1 (1̂3 = 0.92, s.e. = 0.06). Meta-analysis with our UKB GWAS results after imputation 3 Methods, Supplementary Data 2). Of these, 91 were within 1-Mb regions also identified in the UKB alone as GWS. Given that the meta-analysis only increased the total number of significant loci by seven, and given our preference not to include BMI as a covariate, we continued with the UKB-only results for our downstream analyses. Moreover, the only SNP that was significant in the meta-analysis but showed little evidence of association in the UKB (P = 0.17) alone was the known BMI-associated SNP rs1558902, which may be indicative of the ascertainment criteria of some of the cohorts contributing to the SUNLIGHT consortium. It is notable, that random draws of ~80K people from the UKB, the same size approximately as from the SUNLIGHT consortium meta-analysis, identified ~20 independent COJO GWS loci (Supplementary Figure 8a). This compares to 14 COJO SUNLIGHT consortium loci, but this increased to 25 when the SUNLIGHT consortium summary statistics were imputed from 2.6M to 7.7M SNPs. Here, we find an approximately linear relationship between sample size and GWS discovery of 3.7 loci/10K people (Supplementary Figure 8b).

Supplementary Note 4. Summary-data-based Mendelian randomization (SMR) analyses
To identify 25OHD SNP associations with statistical evidence consistent with a causal/pleiotropic association via gene expression we used summary-data-based Mendelian randomization (SMR) 15 using the 15,504 gene probes with significant cis-eQTLs identified from whole blood eQTLGen data 16 . After Bonferroni correction, we found 112 significantly-associated gene expression probes (PSMR < 3.2 x10 -6 , i.e., 0.05 / 5, with 5 = 15,504, being the total number of probes tested in SMR analysis; (Supplementary Data 9; full details of the SMR analyses can be found on https://cnsgenomics.com). These genes were in 30 broad regions on 16 chromosomes (4 regions on chromosomes 1 and 4; 3 regions on chromosome 11; 2 regions on chromosomes 2, 3, 14, and 19; and 1 region on chromosomes 6, 7, 8, 9, 12, 15, 17 and 20). Genes identified in these analyses included those that encode proteins of interest to vitamin D-related pathways, such as CYP2R1, DHCR7, NADSYN1, DGAT2 and HSD17B11. SMR also confirmed genes of interest discussed for the COJO analyses including SDR42E1, DGAT2 and POU2F3. Of the 112 loci identified by SMR, 24 passed the HEIDI test (PHEIDI > 0.05) and have the strongest statistical evidence for a causal role for vitamin D levels (the HEIDI test conservatively excludes results where the gene expression colocalization may be more complex than via the same set of causal SNPs). Of interest, SMR identified a CYP24A1 intron eQTL variant (rs912505) -this enzyme catalyses additional hydroxylation of 1,25OHD, which leads to the degradation of the active form on vitamin D. SMR also identified eQTL variants related to the expression of (a) AKR1A1; a member of the aldo-keto reductase family 1, (b) two hydroxysteroidrelated enzymes; HSD17B13 and HSD3B7, and (c) HAL; an enzyme expressed in the skin that generates UVB-absorbing molecules.
We then used SMR to explore eQTLs in other relevant tissues (sun-exposed and non-sun-exposed skin, liver, and sixteen brain regions) using data from the GTExV7 16 , PsychENCODE 17 and foetal brain tissue 18 (Supplementary Data 9). Of interest, we found significant SMR associations in many tissues for RP11-660L16.2I, which is an antisense non-coding RNA located in the bi-directional promoter between DHCR7 and NADSYN1 19 . For both sun-exposed and non-sun-exposed skin, one of the top eQTL variants selected by SMR was rs3819817, within HAL. Overall, the SMR-related findings add weight to the hypothesis that these particular eQTL variants may be causally related to 25OHD concentrations. Plots for all significant SMR results can be found on https://cnsgenomics.com), with an example in Supplementary Figure 4.

Supplementary Note 5. Simulation to investigate consequences of sample overlap in GSMR analyses
We had previously shown that Summary-data-based Mendelian randomization (SMR) test statistics are not inflated in the presence of complete sample overlap even when the genetic effects are large 20 . Here, we investigate the consequences of sample overlap for GSMR analysis.
We performed simulations to investigate whether the GSMR estimate of * 67 would be biased due to overlapping samples, under three scenarios: 1. Null model: no causal relationship between exposure x and outcome y (* 67 = 0) without horizontal pleiotropy; 2. Null model with pleiotropy: No causal relationship, between exposure x and outcome y (* 67 = 0), but in the context of horizontal pleiotropy model (i.e. shared causal variants between x and outcome y). 3. Causal model without pleiotropy: As 1) but with a true causal relationship between exposure x and outcome y. . We then randomly sampled 300 SNPs (; 7 ) from even chromosomes as causal variants of outcome (N), using the same method to simulate N with proportion of variance in N explained by ; (F =7 -) being 0.2. We performed GWAS analyses to estimate * =6 and * =7 (effect of ; on N). Simulating phenotypes and GWASes were conducted in the same set of individuals. We randomly sampled another 20,000 unrelated individuals from UKB as LD reference sample for GSMR analysis. The simulation was replicated 1,000 times. 2. Null model plus pleiotropy: We simulated both < and N using the same method as above, except to simulate horizontal pleiotropy, we randomly sampled one-third of causal variants (i.e. 100 SNPs) of < and N to be shared between the traits.
3) Causal model: We simulated and standardized < as above. N was simulated using the regression model, N = <* 67 + ? 67 , where * 67 is the causal effect of < on N with F 67 -(proportion of variance in N explained by <) being 0.1 , and ? 67~A (0, var(<* 67 )( / J LO M − 1)). N was then standardised. Given the standardisation the value of * 67 used in the simulation is arbitrary, but because both < and N were standardised, EQ* R 67 S = TF 67 -= 0.3162.
Results over 1,000 replicates are presented in Supplementary Table 7. Estimates of * 67 do not suffer bias from overlapping samples.

NB:
The GSMR P-value distribution deviates from uniform under pleiotropy models. As demonstrated above, * R 67 was not biased in either model. If the top-associated SNPs include pleiotropic SNP then the theoretical var(* R 67 ) (mean of var(* R 67 ) estimated by GSMR) would be slightly smaller than empirical var(* R 67 ). Therefore, there is an inflation of small P-values is expected under the pleiotropy model.

Supplementary Figure 2. Heritability and SNP-based heritability estimates of 25OHD levels estimated with different methods. (A)
Heritability and SNP-based heritability estimates of 25 hydroxyvitamin D (25OHD) obtained with different methods. Estimates including the words "winter" or "summer" in the label were estimated based on data from the moths Dec -Apr and Jun -Oct, respectively. Most methods were applied with (blue bars) and without (grey bars) BMI fitted as a covariate. Error bars represent the 95% confidence interval of the estimates. The sample size used in each analysis is listed on the right. All estimates are provided in Supplementary Data 1. (B) Genetic correlation estimates between selected pairs of traits.
Bivariate GREML was conducted between 25OHD and BMI, but only the 25OHD heritability estimates are plotted. Genetic correlation (rg) estimates and respective standard errors (s.e.) were also obtained from the bivariate GREML analyses. Estimates of genetic correlation between 25OHD and BMI obtained with LDSC regression were -0.17 (s.e. = 0.03) using the UKB data and -0.12 (s.e. = 0.03) using the published BMI meta-analysed results, which excluded the UKB 21 . Bivariate GREML SNP-based (co)heritability analysis estimated a genetic correlation of -0.19 (s.e. = 0.03) and correlation of residual effects of -0.21 (s.e. = 0.05) with BMI.
Abbreviations: GRM, genomic relationship matrix; single GRM analysis contrasts to the linkage disequilibrium minor allele frequency stratified (LDMS) in which the linear mixed model has 10 genetic random effects with 10 GRM constructed from SNPs allocated to 10 bins based on their minor allele frequency and LD score annotation; Biv, bivariate; rg, genetic correlation.

Supplementary Figure 4. Example SMR plot
The above figure shows the results of the SMR+HEIDI analysis that combines the data from GWAS and eQTL studies. Full details of the SMR analyses can be found on https://cnsgenomics.com. Amongst the eQTLs tested, rs2465654 is an eQTL for POU2F3 in blood and liver (of tissues tested) and generates significant SMR associations to 25OHD. Notably, rs2465654 is not an eQTL in GTEX skin tissue, despite POU2F3 having massively higher expression in skin compared to other tissues, and where it plays a critical role in keratinocyte proliferation and differentiation. The first track shows -log10(P-value) of SNPs (grey dots) from the fastGWA (ref) GWAS of 25OHD. Each red rhombus indicates the -log10(P-value) from the SMR tests for associations of gene expression with 25OHD concentration. A solid rhombus represents a probe not rejected by the HEIDI filtering. The yellow rhombus denotes the SNP-25OHD association of the SNP that is the top cis-eQTL (rs2465654). The second track shows -log10(P-value) of the SNP association with gene expression probe ENSG00000137709 (tagging POU2F3). The third track displays information for 25 chromatin states (indicated by the colours on the right bar) of 127 samples from Roadmap Epigenomics Mapping Consortium (REMC) 22 for different tissues and cell types (indicated by the colours on the left bar), which communicates potential tissue-specific roles for the SNP. The bottom track shows the genes underlying the genomic region.

Supplementary Figure 5. Genetic correlations between 25OHD and selected phenotypes
Dots and lines represent the genetic correlation estimates (1̂3) and respective 95% confidence intervals between variants associated with 25 hydroxyvitamin D (25OHD) and selected phenotypes. Genetic correlations were estimated from bivariate LDSC regression 23 using GWAS summary statistics from 25OHD and selected phenotypes. Correlations are shown for 25OHD GWAS analyses that (1) make no adjustment for body mass index (BMI), (2) include BMI as a covariate, or (3) condition on genetic correlates of BMI (via mtCOJO) 24 . Negative genetic correlations indicate the variants associated with increased 25OHD concentration were associated with a smaller value/reduced risk for the phenotypes of interest. Non-significant (correlation P > 0.05) 1̂3 are presented in grey. Nominally significant (correlation P < 0.05) 1̂3 are presented in yellow. 1̂3 that remained significant after Bonferonni correction for multiple testing (correlation P < 0.003) are presented in red. All 1̂3 and respective standard errors and P-values can be found in tabular form in Supplementary Data 10. For additional information about the summary statistic related to these phenotypes, see the original publications as provided in Supplementary Data 10.

Supplementary Figure 6. Distribution of -log10 P-values from 20 25OHD vQTLs without GWS GxE association with season of blood draw.
The median linear regression P-value in the GxE analysis is 0.03, which suggests that at least half of these 20 vQTL are unlikely to have SNP*season interaction. GWS: genome-wide significant.

Supplementary Figure 7. UKBR replication sample
UK Biobank (UKB) samples projected into principal components (PC) 1 and 2. Blue dots represent European samples. Red dots represent non-European samples. Green dots represent 5,402 samples just outside the boundary used for White European, from which 1,632 individuals unrelated to each other, or to anyone in the main analysis, were selected as the UKB replication (UKBR) sample.

Supplementary Figure 8. Number of COJO GWS loci from random draws of samples from the UKB sample (a)
Five random draws of 80K samples (the same size as the SUNLIGHT consortium) identify ~20 GWS loci. SUNLIGHT identifies 14 COJO GWS loci (which increased to 25 when the SUNLIGHT consortium summary statistics were imputed from 2.6M to 7.7M SNPs). (b) There is an approximately linear trend of 3.7 number loci per 10,000 individuals. GWS: genome-wide significant. COJO: conditional and joint analysis (see Methods)

Supplementary Figure 9. Statistical power of different study designs to estimate genetic variance
Statistical power to detect the effect of a locus depending on the percentage of variance it explains (x-axis) and the study design used. Specifically, four study designs are compared, namely 25OHD assessed as a (1) quantitative trait (QT; black line); (2) binary trait with super screened controls (blue line); (3) binary trait with screened controls (red line); and (4) binary trait with down-sampled controls (green line). The super-screened control design uses the same proportion of cases and controls (i.e. top 12% observations are controls and bottom 12% are cases). The screened control design uses the all controls (i.e. top 88% observations as controls and bottom 12% are cases). The down-sampled design uses the same proportion of cases and controls, but controls are randomly sampled from the bottom 88%.

Supplementary Figure 10. Effect size estimates of genome-wide significant loci in a quantitative trait and a super-screened control design
Effect sizes of genome-wide significant (GWS) loci identified with two study designs: 25OHD as a (1) quantitative trait, and (2) super-screened control design (i.e. same proportion of cases and controls, with cases and controls drawn from the bottom and top of the distribution, respectively).
NB These graphs show more positive associations than negative but this reflects, in part, the chance choice of alleles (Supplementary Figure 11). Also, we note that the beta values are on different scales for the quantitative and super-screened controls. As expected from theory the correlation in effect size between super-screened controls and the quantitative trait is higher than between screened controls and the quantitative trait.

Supplementary Figure 11. Frequency and effect size of the trait-increasing allele for 142 independent associations
For the 142 autosomal loci (identified with COJO) we compared the allele frequency of the trait increasing allele (x-axis) with its effect size (y-axis).

Supplementary Table 1. 25OHD distribution and association with potential confounders
Associations with potential confounders were tested with linear regression (25OHD dependent variable for continuous and categorical independent variables). RINT and log transformations presented below were applied to the whole sample (i.e. before stratifying in summer and winter cohorts). For association analyses, RINT was re-applied after stratifying the whole cohort by season. Abbreviations: beta, effect size estimate; IQR, interquartile range; N/A, not presented for categorical covariates; r 2 , variance explained; RINT, Rank Inverse Normal Transformation; SD, standard deviation.

Supplementary Table 2. Number of effective SNPs estimated from SbayesS and SBayesR
In SBayesR, SNPs with small, medium and large effect sizes are the SNPs whose effect size follows a normal distribution with mean zero and variance of 0.01, 0.

Supplementary Table 6. Conditioned GWAS results
Genome-wide association study (GWAS) results before and after conditioning on the most significant common variant in CYP2R1 (identified by GCTA-COJO) 25