Investigation of common, low-frequency and rare genome-wide variation in anorexia nervosa

Anorexia nervosa (AN) is a complex neuropsychiatric disorder presenting with dangerously low body weight, and a deep and persistent fear of gaining weight. To date, only one genome-wide significant locus associated with AN has been identified. We performed an exome-chip based genome-wide association studies (GWAS) in 2158 cases from nine populations of European origin and 15 485 ancestrally matched controls. Unlike previous studies, this GWAS also probed association in low-frequency and rare variants. Sixteen independent variants were taken forward for in silico and de novo replication (11 common and 5 rare). No findings reached genome-wide significance. Two notable common variants were identified: rs10791286, an intronic variant in OPCML (P=9.89 × 10−6), and rs7700147, an intergenic variant (P=2.93 × 10−5). No low-frequency variant associations were identified at genome-wide significance, although the study was well-powered to detect low-frequency variants with large effect sizes, suggesting that there may be no AN loci in this genomic search space with large effect sizes.


INTRODUCTION
Family studies of anorexia nervosa (AN) have consistently shown that first-degree relatives of AN sufferers have an increased risk of AN, compared with relatives of unaffected individuals. [1][2][3][4] Twin studies have estimated the heritability of AN at 56%, 5 with the majority of remaining variance in liability attributed to non-shared environmental factors (38%). 5 Three genome-wide association studies (GWAS) of AN have been conducted to date. The first comprised 1033 AN cases collected as part of the Price Foundation Genetic Study of Anorexia Nervosa and 3733 pediatric controls from the Children's Hospital of Philadelphia. 6 This study focused on common variation and identified 11 suggestive variants (P o 1 × 10 − 5 ). None reached genome-wide significance in the primary analysis, although one variant (rs4479806) approached genome-wide significance in an associated secondary analysis. The second study (comprising 2907 cases and 14 860 controls) was carried out by the Genetic Consortium for AN, as part of the Wellcome Trust Case Control Consortium 3 (WTCCC3) effort. 7 This study identified two suggestively associated variants (P o 1 × 10 − 5 ). Notably, signals at P o 1 × 10 − 5 were significantly more likely to have the same direction of effect in the replication as in the discovery cohorts (P = 4 × 10 − 6 ), which implies that true signals exist within this data set, but that the study was underpowered for detection. Recently, a third study-meta-analyzed samples from both of these studies, as well as some novel cases, comprising a total of 3495 cases and 10 982 controls. To our knowledge, this study identified the first genome-wide significant locus for AN (index variant rs4622308, P = 4.3 × 10 − 9 ). 8 Both previous studies focused on common variation. Here, we conducted, to our knowledge, the first association study that also considered low frequency (minor allele frequency (MAF) o 5%) and rare exonic variants in addition to common variation.

Sample collections
We conducted a GWAS across nine discovery data sets (the majority overlapping with Genetic Consortium for AN, as part of the Wellcome Trust Case Control Consortium 3 (WTCCC3/WTCCC3 samples), resulting in a total of 2158 cases and 15 485 ancestrally matched controls (Table 1 and Figure 1). All AN cases were female. AN diagnosis was made via semistructured or structured interview, or population assessment strategy using Diagnostic and Statistical Manual of Mental Disorders (DSM)-IV criteria for AN. The amenorrhea criterion was not applied, as this has been shown not to be diagnostically relevant 9 and has since been dropped from DSM-5. 10,11 All cases met criteria for lifetime AN.
Exclusion criteria included confounding medical diagnoses, for example, psychotic conditions, developmental delay or medical or neurological conditions causing weight loss.
Ancestry-matched controls were selected for each AN case set. Both male and female controls were used (Table 1). These were obtained either from existing collaborations, or through genotyping repository (dbGaP) access. Each site obtained ethical approval from the local ethics committee, and all participants provided written informed consent in accordance with the Declaration of Helsinki.  (Illumina), 20 at the Wellcome Trust Sanger Institute. Where possible, controls were selected from existing studies with matching genotyping platforms to cases. Three control cohorts had been genotyped on the 'Infinium HumanExome-12 BeadChip Kit' (Table 1). To ameliorate potential confounding due to chip effects, 21 chip-type quality control (QC) was carried out, and~14 000 singlenucleotide polymorphisms (SNPs) removed.

Quality control
Genotypes were called using the GenCall 22 and Zcall 23 algorithms. At each of these genotype-calling stages, QC was performed for each population and for cases and controls separately (Supplementary Table 1). The final number of SNPs included in the analyses is given in Table 2.

Controlling for population stratification
In order to account for population stratification, a principal components analysis was carried out for each cohort separately using the smartpca software. 24 Population outliers were identified by merging each population with central European 1000 Genomes data. 25 Variance explained by each PC was plotted for each population. In order to be both conservative and consistent across populations, the first 10 principal components were included as covariates in the association testing.

Association testing
Unbalanced case-control ratios can lead to anticonservative P-value estimates. 26 This study includes a number of unbalanced strata ( Table 1). The likelihood ratio test has been shown to have low type-I error rate across both balanced and unbalanced cohorts, 26 and was chosen as the association test for this study.
A lower cutoff of minor allele count of 5 and MAF of 0.1% was used. Association testing was performed for each cohort separately using SNPtest. 27 In the cohorts with mixed sex controls (all except Italy and Norway), sex was also included as a covariate.
The standard genome-wide significance threshold of P ⩽ 5 × 10 − 8 was applied.

Meta-analysis
Summary statistics across cohort were meta-analyzed using an inverse variance-based test in METAL. 28 In order to test the heterogeneity of the results, Cochran's Q and the I 2 statistic were computed.

Assigning variants to genes
Variants identified associated at P ⩽ 1 × 10 − 4 were assigned to genes using Ensembl (release 83; Ensembl Genome Browser). 29,30 For each variant, all predicted consequences (for example, missense, non-synonymous, and so on) and associated gene transcripts were downloaded and compared. Each variant was associated with only one predicted consequence and one Ensembl gene ID (Ensembl Genome Browser). 29 Cluster plot checking Cluster plots were created for all SNPs reaching P ⩽ 1 × 10 − 4 in any analysis (cohort-specific or meta-analysis) using ScatterShot. 31 SNPs were visually inspected for each cohort, and for cases and controls separately. In instances where multiple cohorts were merged (for example, UK cases), cluster plots were checked separately for each original cohort.

Burden testing
The potential aggregation of rare variants in cases compared with controls was investigated using a gene-based approach. Burden tests were carried out using the Zeggini-Morris burden test 32 as implemented in rvtests (Rvtests -Genome Analysis Wiki).
All SNPs with MAF between 0.1 and 5% were included; similar to the single-point analysis, a lower bound of minor allele count = 5 was used. A list of genes and locations was obtained from the UCSC genome browser (Table Browser: www.genome.ucsc.edu). All genes with at least two qualifying variants in at least two populations were used, resulting in a total of 9083 genes.
Burden tests were carried out for each population individually, and the results meta-analyzed using Stouffer's method, weighted according to effective sample size. 33 The genome-wide significance threshold for burden testing is computed in a similar manner to that for single-point analysis, using Bonferroni correction for the number of genes tested. This results in a genome-wide significance threshold of 5.5 × 10 − 6 .

Pathway analysis
One of the key motivations of studying complex psychiatric disorders such as AN is the desire to unearth biological pathways underlying disease development. Pathway analysis was performed using summary statistics from the meta-analysis for the full data set.
Four pathway databases were used: the Kyoto Encyclopedia of Genes and Genomes (KEGG), 34,35 the Reactome pathway database (REACTOME), 36 PANTHER pathway (PANTHER) 37,38 and the Gene Ontology database (GO). 39,40 These were curated to remove redundancy, resulting in a total set of 1836 pathways.
The analysis was run once on a merged set of 235 KEGG, 34,35 REACTOME 36 and PANTHER 37,38 pathways, and once for the 1601 GO pathways. 39,40 Pathway analysis was carried out using MAGMA. 41 MAGMA was selected for its ability to deal robustly with linkage disequilibrium (LD) between markers, correct for gene length and deal accurately with rare variants. To our knowledge, MAGMA was first used to annotate SNPs to genes. This analysis was repeated twice. In the first analysis, variants were assigned only to the gene they were in, resulting in 68.73% of the variants being assigned to 13 400 genes. In the second analysis, variants were assigned allowing a 20 kb window in both directions from the gene. This procedure included 75.44% of variants across 18 118 genes.
SNP P-values were used to create gene scores. The European panel of the 1000 Genomes project was used as a reference set to estimate LD between SNPs. The analysis also requires the sample size of the study to be specified; because of the unbalanced nature of the study, the effective sample sizes were given here.
Gene P-values were calculated using MAGMA. 41 The top 10% of SNPs per gene were used. Significance was defined using a false discovery rate of 5%. 42 There is a risk when assigning SNPs to genes using MAGMA that some highly associated SNP might be assigned to multiple overlapping genes, and thus distort pathway results. SNP-gene assignments were checked for all pathways that reached false discovery rate-corrected significance. No instances of SNPs being assigned to multiple genes were found across these pathways.

Replication
SNPs reaching P o1 × 10 − 4 in the discovery stage were prioritized for replication. In total, 16 SNPs were selected. Replication was carried out using two data sets: one existing in silico data set and one set for de novo genotyping. The in silico data set came from an existing GWAS of AN, 7 genotyped on the Illumina HumanHap610 platform. This data set included 1033 cases and 3733 controls. All cases included in this study were female. Controls were both male and female. The de novo replication cohort consisted of 266 self-volunteered female UK cases, collected through the charity Charlotte's Helix (www.charlotteshelix.net). All participants were adults and had been diagnosed with AN by their clinician. In addition, all participants completed an online questionnaire based on the Structured Clinical Interview 43 for the Diagnostic and Statistical Manual of Mental Disorders-IV Module H. The Structured Clinical Interview has been used extensively in epidemiological investigations. The Structured Clinical Interview eating disorder module was modified to capture information on lifetime history of eating disorders including AN, and includes questions on body mass index, age of onset, and experience of eating disorders. DNA from the saliva samples was extracted using standard protocols and was quantified using pico-green. Samples were genotyped on the Infinium HumanExome 12 Beadchip, genotypes were called using GenCall and Zcall algorithms and stringent QC was performed pre-and post-call. In all, 1500 ancestry-matched controls (55% female) were obtained from the UK Household Longitudinal Study.
De novo genotyping was performed using the iPLEX Assay and the MassARRAY System (Agena Bioscience, San Diego, CA, USA) (formerly Sequenom). Sample and SNP QC were carried out within each replication data set, using an 80% sample call rate and a 90% SNP call rate threshold, and a Hardy-Weinberg equilibrium threshold of 10 − 4 . Five samples and one SNP were removed using these criteria.
Post-QC, 15 SNPs and 261 de novo cases remained. The de novo replication analysis therefore included 15 SNPs, 261 cases and 1500 controls. Genotypes for 12/16 SNPs were available in the in silico replication cohort, across 1033 in silico cases and 3733 controls.

Expression analysis
Gene expression data were obtained from the Genotype-Tissue Expression (GTex project) web portal, data release version 6 (dbGap Accession phs000424.v6.p1). [44][45][46] Power The sample sizes used in this study are small in the context of other psychiatric phenotypes. Power to identify genome-wide significant signals was calculated using Quanto. 47,48 This study is adequately powered to detect low-frequency alleles with large effect sizes and common alleles with substantial effect sizes (80% power to detect common alleles with odds ratio (OR)41.5; low-frequency alleles with OR42, Supplementary  Figure 1).

Data availability
Genotypes of European cases included in this study are publicly available through the European Genome-Phenome Archive (EGA), under accession number EGAS00001000913, data set EGAD00010001043, with the exception of German and Dutch genotypes. Genotypes for cases from the United States of America may be obtained through dbGaP. Summary statistics are available for download from the PGC website (https://www.med.unc.edu/ pgc/results-and-downloads).

GWAS and replication meta-analyses
Association testing was performed separately for each of the nine discovery cohorts within this study (2158 cases, 15 485 controls), and the results were meta-analyzed. No inflation was seen in the QQ plot ( Figure 2b). Six variants were identified with Po1 × 10 − 5 , and nine additional variants with Po1 × 10 − 4 (Figure 2a and Supplementary Table 5). Of these, one variant approached genome-wide significance (exm860538/rs199965409, P = 9.97 × 10 − 8 ), although this variant is polymorphic only in the Finnish population within these data sets, in the Exome Aggregation Consortium 49 and in the 1000 Genomes project panel data. 25 Variants with Po1 × 10 − 4 were taken forward for replication.
In total, 16 independent variants were selected for follow-up in one in silico cohort (1033 cases, 3733 controls) and one de novo genotyping cohort (261 cases, 15 000 controls). Of these, five were low frequency (MAF~1%) and 11 were common frequency variants.
Twelve signals passed QC and were polymorphic in the de novo genotyping cohort, of which four were nominally significant (Supplementary Table 6; P o0.05, minimum P = 0.001). Eight of twelve SNPs had the same direction of effect as in the discovery GWAS, including three of the four nominally significant variants.
Ten of the sixteen variants were present in the in silico cohort, of which six had the same direction of effect as in the discovery cohort, and one of these six was associated with P = 0.02 (Supplementary Table 7). On the basis of the number of SNPs taken forward for replication, we would not expect to see any variants reaching Po0.05 by chance. We also see a higher concordance in direction of effect between discovery and replication cohorts (7/10 in the in silico analysis, 8/12 in the de novo analysis) than might be expected by chance; however, the number of SNPs tested was too small to achieve statistical significance (P = 0.17, P = 0.19, one-sided binomial test).
Five SNPs had the same direction of effect across the metaanalyzed discovery cohort and both replication cohorts. No SNPs reached genome-wide significance in the final global metaanalysis. Two variants were associated with the same direction of effect across discovery and replication cohorts, and reached P o0.05 in at least one replication cohort (Table 3). rs10791286 was associated with risk for AN across all discovery and replication cohorts (Figure 3a, global P = 9.89 × 10 − 6 , OR 0.84, 95% confidence interval 0.78-0.91). It resides in intron one of the opioid-binding protein/cell adhesion molecule-like (OPCML) gene. Data from the CommonMind Consortium project indicate that this variant is an eQTL for OPCML in the dorsolateral prefrontal cortex, and is associated with reduced expression (P = 0.014 after correction for multiple testing). 50 OPCML has a role in opioid-binding and opioid receptor function 51,52 and is expressed in a range of neuronal tissues, primarily the cerebellum and cerebellar hemispheres. [44][45][46] OPCML has previous associations with body mass index, 53 waist-hip ratio, 54 visceral fat distribution 55 and alcohol dependence, 56 among other phenotypes.  The variant itself has no previously reported associations in any phenotype. rs7700147 was associated with AN across all discovery and replication cohorts (global P = 2.93 × 10 − 5 , OR 1.2, 95% confidence interval: 1.1, 1.3; Figure 3b). It is an intergenic variant and has no previous associations.
Burden testing Burden testing allows the contribution of multiple low-frequency variants to be aggregated across discrete units (for example, genes). Three genes were identified with P o1 × 10 − 4 , although none reached genome-wide significance (Table 4). A further five genes reached Po 1 × 10 − 4 , but passed inclusion thresholds in one population only (Table 4), and as such are likely to be falsepositives.
Biological pathways associated with AN Allowing a 20 kb window for SNP to gene assignment identified two pathways significant at qo 0.05: 'Phospholipase activator' and 'GTP-rho binding' (Table 5).
Using the strictest assignment method of SNPs to genes for the full data set, no pathways were significant after multiple-testing correction. The highest ranking pathway was 'Calcium ion import' (q-value = 0.069).

DISCUSSION
To our knowledge, this work constitutes the first examination of low frequency (o 1% MAF) and rare exonic variation in AN in the context of a genome-wide scan. No low frequency or rare variant replicating associations were identified, although this study was well-powered to detect low-frequency variants with large effect sizes (Supplementary Figure 1). Although polymorphic only in the Finnish population, rs199965409 approached genome-wide significance. It is a non-synonymous variant with a MAF of 0.5% in the Finnish population. 65,66 The variant is within the WDR11 gene, which is associated with hypogonadotropic hypogonadism 14 with or without anosmia. [67][68][69] The clinical features of the disease, such as delayed sexual maturation, 68,70,71 suggest that it may be misdiagnosed or comorbid with AN, which may explain its association in the analysis.
Two notable, but common-frequency, signals were identified with consistent direction of effect across discovery and replication cohorts (rs10791286 and rs7700147). These variants had been removed from the first Genetic Consortium for AN, as part of the Wellcome Trust Case Control Consortium 3 (WTCCC3) AN GWAS because of poor cluster plots; therefore, we were not able to compare effect sizes between studies. Burden tests to investigate an aggregation of rare variants within genes rendered three potentially interesting genes, which require further replication.
Studying rare variation presents a range of challenges. The sample sizes required to identify rare variants with modest effect sizes are substantially larger than for common variants. Further, the MAF spectra seen across trans-European populations differ more for rare variants than for common variants, especially when considering genetically distant populations such as Finland and Italy. 25 This can reduce the power to detect a signal and achieve replication. There are also many technical challenges to consider when conducting a rare variant study; for example, the inflation seen in association tests at low minor allele count 26 and the increased error rate of calling algorithms when applied to rare variants 22,23,72 We mitigated against the latter challenge by comprehensively examining cluster plots of 410 000 variants that surpassed a P-value threshold of P o 1 × 10 − 4 in any analysis.
Of the genes potentially implicated through the single-point and burden test analyses, three have associations with metabolic and anthropometric phenotypes (OPCML, C6orf10 and FAM96a). OPCML has previously been associated with waist-to-hip ratio, while C6orf10 has associations with childhood obesity. 58 FAM96A has been shown to be associated with metabolic phenotypes such as low-density lipoprotein and cholesterol levels. The associations of these three genes with metabolic and obesity-related phenotypes may indicate some roles for metabolic processes in AN development, although pathway analysis did not corroborate this observation. A growing body of evidence suggests involvement of metabolic processes in AN development, including appetite-satiety pathways, gut motility and gastric-emptying times. [73][74][75][76][77][78][79] For example, application of the LD Score regression method revealed significant negative genetic correlations between AN and body mass index, insulin, glucose, and lipid phenotypes and significant positive genetic correlations between AN and HDL cholesterol phenotypes. 1,80 Notably, C6orf10 has been previously associated with childhood obesity. 58 This finding is particularly interesting for a number of reasons. First, appetite and satiety dysregulation have been shown to be central to the development of childhood obesity. 81,82 In particular, reduced satiety responsiveness (experiencing an urge to eat despite internal 'full' signals) and heightened responsiveness to food have a role in increased adiposity. Aberrant responses to satiety signals and reduced responsiveness to food are also operative in AN, suggesting shared biological dysregulation between the two conditions. 83,84 Children with increased adiposity are at higher risk of eating disorders 85 as they are more likely to engage in high-risk behaviors such as repeated and excessive dieting and erratic or overly rigid eating patterns. [85][86][87][88][89] These children are also at higher risk of being bullied about their weight, which may increase weight and shape concerns, body dissatisfaction and a host of related risk factors for AN development. [85][86][87][88][89][90][91][92]    Low-frequency and rare variation in anorexia nervosa LM Huckins et al The most significant pathway analysis association was with phospholipase activator pathways, which act to catalyze the hydrolysis of glycerophospholipids (GO:0016004 phospholipase activator activity). Phospholipase has a central role in the serotonin-triggered metabolism of arachidonic acid in the brain, [93][94][95] which is a common target for antidepressants 94,95 such as lithium, carbamazepine (Tegretol), valproate and lamotrigine (Lamictal). 96 These antidepressants have been shown to have varying efficacy in treating AN. [97][98][99] Lithium has been used in treatment of AN (with varying success), 97-99 while carbamazepine and valproate have been successfully used in individuals with complex comorbid eating disorder phenotypes. [100][101][102][103][104] Finally, lamotrigine has been shown to significantly improve eating disorder and mood symptoms in individuals with binge-eating and purging behaviors. 105 The second pathway identified as significantly associated with AN was GTP-rho binding. This pathway has a role in brain development, and is regulated by autism-susceptibility candidate gene 2 (AUTS2). 106 This finding is consistent with the comorbidity between AN and autism. 107 Moreover, individuals with AN may be socially withdrawn 107 and exhibit elevated levels of autistic traits associated with lower social functioning. [107][108][109] AUTS2 has also been well studied as a candidate gene for alcohol abuse, 110 which is commonly comorbid with eating disorders. 111 There is also a wellestablished link between GTP-rho activation and cognition. 112 Mice with altered expression of genes regulating Rho-GTPases have been shown to have altered exploratory and anxiety-related behavior, decreased sociability and memory formation, and decreased body weight, among others. 112 These findings are in line with some of the comorbidities and intermediate phenotypes noted in AN, for example, the high comorbidity with anxiety-related disorders. 113 There is substantial evidence for the involvement of chromatinmodulating genes in the development of autism, 114-119 schizophrenia 120-124 and body mass index changes. 114 Given the comorbidity of these disorders with AN, and the potential overlap with autism indicated in the pathway analysis results, we tested for enrichment of chromatin-modulating genes in these results. We obtained a list of 340 genes involved in modifying chromatin accessibility and/or modifying histone marks from existing literature; of these, 30 reached nominal significance in our burden test, substantially more than expected by chance (binomial test, P = 0.0026). Moreover, one of the variants identified in the global meta-analysis (exm540361) lays near a gene included in this list (UHRF1BP1). Together, these results may indicate a role for chromatin-modifying genes in AN, although more work will be needed to investigate this further.
A number of limitations should be borne in mind when evaluating these results. First, the sample size of this study is small. Psychiatric disorders in general require very large sample sizes in order to identify reliable genome-wide significant signals. 125 The current study was powered to detect common variants with substantial OR, and rare variants conferring substantial increases in disease risk (OR42). To our knowledge, this was the first time a study has specifically investigated the role of rare variation in AN, and the lack of low-frequency replicating findings may indicate that little advancements may be made in this particular genomic search space.
We did not see any overlap between the pathways identified here and those identified in the recent PGC pathway analysis; 126 however, this may reflect the relatively small sample size of this study, as well as different pathway analysis methodologies used.
In this study we only examined female AN cases of European origin. It has been suggested that the genetics underlying AN development may be easier to assess in an all-male study, 3 as there may be a greater genetic risk required to induce trait expression. The higher relative risk in male subjects may also reflect this. 3 To date, this has not been possible because of the lower prevalence of the disorder in men, resulting in substantially smaller sample sizes. Moreover, if AN is heterogeneous between populations, in order to fully understand the genetic etiology of the disorder, it will be necessary to expand collection to include more diverse samples. Efforts are already underway in a number of Asian populations such as Taiwan, Japan, Korea and China, as well as some South American populations such as Argentina and Brazil.
A caveat to this study is that controls were not screened for AN, and that both male and female controls were used. Given the population prevalence of AN across population of European descent,~80 female and~10 male controls would be expected to have AN diagnoses. Given the low rate of treatment seeking in AN, 127 it would not be possible to confidently screen populationbased or previously existing control cohorts for AN.
The underlying biological etiology of AN is complex and has not been elucidated yet. Here we have identified a number of variants that warrant follow-up in larger sample sizes, and which point to a role for metabolic, appetite-related and obesity-related effects, in line with a growing body of evidence for metabolic involvement in AN development. Substantially increased sample sizes and detailed phenotyping to reduce heterogeneity will be necessary to empower the characterization of the genetic architecture of AN.

CONFLICT OF INTEREST
GB has received grant funding and consultancy fees from Eli Lilly. DD is speaker, consultant or on advisory boards of various pharmaceutical companies, including AstraZeneca, Boehringer, Bristol Myers Squibb, Eli Lilly, 28 Genesis Pharma, GlaxoSmithKline, Janssen, Lundbeck, Organon, Sanofi, UniPharma and Wyeth, and he has unrestricted grants from Lilly and AstraZeneca as director of the Sleep Research Unit of Eginition Hospital (National and Kapodistrian University of Athens, Greece). AK is on the Shire Canada BED Advisory Board. JK is a member of SAB of AssurexHealth Inc (unpaid). ML has received lecture honoraria from Lundbeck, AstraZeneca and Biophausia Sweden, and served as scientific consultant for EPID Research Oy. There exists no other equity ownership, profit-sharing agreements, royalties, or patents. PS is scientific advisor to Pfizer, Inc. JT received an honorarium for speaking at a diabetic conference for Lilly and royalties from a published book. The remaining authors declare no conflicts of interest. Charlotte's Helix: We thank all the probands and parents from the parent-led group, Charlotte's Helix. This charity was set up by Charlotte Bevan, after her daughter was diagnosed with anorexia nervosa. The charity is deeply committed to supporting biological work, particularly genetics, to help understand anorexia and has set up a database of patients' names and details. Charlotte's Helix has been collaborating closely with the King's College London (KCL) team since 2013 by providing the database of probands, some funding and publicizing the scientific projects through regular blogs on its website and social media.

ACKNOWLEDGMENTS
We gratefully acknowledge the participation of NIHR BRC South London and the Maudsley NHS Foundation Trust (SLaM) BioResource volunteers, and thank the NIHR BRC SLaM BioResource centre and staff for their contribution.
This dbGap: We obtained High Density SNP Association Analysis of Melanoma: Case-Control and Outcomes Investigation data set through dbGaP (dbGaP Study Accession: phs000187.v1.p1). Research support to collect data and develop an application to support this project was provided by 3P50CA093459, 5P50CA097007, 5R01ES011740 and 5R01CA133996. TEENAGE (TEENs of Attica: Genes and Environment): This work was funded by the Wellcome Trust (098051) and has been co-financed by the European Union (European Social Fund-ESF) and Greek national funds through the Operational Program 'Education and Lifelong Learning' of the National Strategic Reference Framework (NSRF)-Research Funding Program: Heracleitus II. Investing in knowledge society through the European Social Fund. We thank all study participants and their families as well as all volunteers for their contribution in this study. We thank the following staff from the Sample Management and Genotyping Facilities at the Wellcome Trust Sanger Institute for sample preparation, QC and genotyping: Dave Jones, Doug Simpkin, Emma Gray, Hannah Blackburn and Sarah Edkins.
UKHLS (The UK Household Longitudinal Study): The UK Household Longitudinal Study (UKHLS) is led by the Institute for Social and Economic Research at the University of Essex and is funded by the Economic and Social Research Council. The survey was conducted by NatCen, and the genome-wide scan data were analyzed and deposited by the Wellcome Trust Sanger Institute. Information on how to access the data can be found on the Understanding Society website https://www. understandingsociety.ac.uk/.
Exome Aggregation Consortium: We thank the Exome Aggregation Consortium and the groups that provided exome variant data for comparison. A full list of contributing groups can be found at http://exac.broadinstitute.org/about.
CommonMind Consortium: CommonMind Consortium Data were used in this manuscript. These data were generated as part of the CommonMind Consortium supported by funding from Takeda Pharmaceuticals Company Limited, F Hoffman-La Roche and NIH grants R01MH085542, R01MH093725, P50MH066392, P50MH080405, R01MH097276, RO1-MH-075916, P50M096891, P50MH084053S1, R37MH057881 and R37MH057881S1, HHSN271201300031C, AG02219, AG05138 and MH06692. Brain tissue for the study was obtained from the following brain bank collections: the Mount Sinai NIH Brain and Tissue Repository, the University of Pennsylvania Alzheimer's Disease Core Center, the University of Pittsburgh NeuroBioBank and Brain and Tissue Repositories and the NIMH Human Brain Collection Core.