Large genome-wide association study identifies three novel risk variants for restless legs syndrome

Restless legs syndrome (RLS) is a common neurological sensorimotor disorder often described as an unpleasant sensation associated with an urge to move the legs. Here we report findings from a meta-analysis of genome-wide association studies of RLS including 480,982 Caucasians (cases = 10,257) and a follow up sample of 24,977 (cases = 6,651). We confirm 19 of the 20 previously reported RLS sequence variants at 19 loci and report three novel RLS associations; rs112716420-G (OR = 1.25, P = 1.5 × 10−18), rs10068599-T (OR = 1.09, P = 6.9 × 10−10) and rs10769894-A (OR = 0.90, P = 9.4 × 10−14). At four of the 22 RLS loci, cis-eQTL analysis indicates a causal impact on gene expression. Through polygenic risk score for RLS we extended prior epidemiological findings implicating obesity, smoking and high alcohol intake as risk factors for RLS. To improve our understanding, with the purpose of seeking better treatments, more genetics studies yielding deeper insights into the disease biology are needed.

R estless legs syndrome (RLS) is a common sensorimotor disorder that is known to impact quality of life and health 1,2 . The prevalence ranges from 5 to 18.8% in European populations [3][4][5] with approximately 2 to 3% of the general population thought to benefit from medical treatments that ameliorate symptoms [5][6][7] . RLS symptoms include uncomfortable sensations predominantly localized in the legs that are experienced as pain in at least one-third of subjects, which elicit a strong urge to move for symptomatic relief. The symptoms increase in the evening and at night. Consequently, the onset and maintenance of sleep are negatively impacted in most RLS patients, which in turn, is thought to impair daytime cognition and mental well-being 8 . The majority of RLS patients experience involuntary leg movements at transitions to sleep, and during sleep (periodic leg movements in sleep (PLMS)). Many also have social activities and work productivity interrupted by RLS symptoms 2 .
One of the underlying pathophysiological mechanisms of RLS involves impaired re-uptake of synaptic dopamine and reduced D2 receptor density, explaining why the disorder can sometimes be treated with dopamine-based therapies 9 . It is hypothesized that the re-uptake of synaptic dopamine is affected by brain iron level 9 . Supporting this, in RLS patients low brain iron has been found in the substantia nigra and the striatum, whose roles in regulating reward, motivation, and movement are well established [10][11][12] .
Moreover, a variety of modifiable health and lifestyle risk factors that accompany or aggravate RLS have been reported, including obesity, smoking, high alcohol intake, and sedentary lifestyle 3,13 . The prevalence is greater in individuals with reduced iron reserves 14 . Even though iron supplementation can be effective in relieving symptoms, especially in patients with iron deficiency, there are currently limited treatment options for RLS 15,16 , which also appears to be underdiagnosed 17 . Existing treatments address symptoms rather than the underlying cause of the disease. A fundamental reason for this is our relatively limited knowledge of the pathogenesis of the disorder. One way to increase our understanding of RLS is to expand knowledge of the genetic architecture of the disorder, which is complex and polygenic in nature 6 . Genome-wide association studies (GWAS) of European ancestry populations have yielded 20 single nucleotide polymorphisms (SNPs) in 19 loci that associate with RLS 6,[18][19][20][21][22][23][24] .
The aim of the present study was to search for additional RLSassociated loci that might provide new insights into the disease pathophysiology and be useful in the discovery of new drugs or repurposing of existing drugs for RLS treatment. To this end, a meta-analysis of GWAS of RLS including 480,982 adults of European ancestry (recruited from Iceland, Denmark, United Kingdom (UK), Netherlands and the United States (USA)) was conducted. Following this, novel findings were tested for replication in two additional case-control sets of European ancestry, the EU-RLS-GENE and RBC-Omics cohorts. Subsequently, all cohorts were meta-analyzed. Finally, to search for traits associated with RLS, we calculated polygenic risk scores for RLS (RLS-PRS) for the UK Biobank subjects and tested associations between RLS-PRS and 12,075 traits (binary and quantitative). The UK Biobank is one of the largest and most widely used recourses for studying health and well-being. The biobank sample is population-based, and the 500,000 volunteer participants provide health information to approved researchers by allowing the UK Biobank to link to existing health records, such as those from general practice and hospitals 25,26 . This study confirms 19 of the 20 previously reported RLS sequence variants at 19 loci and identifies three novel RLS-associated variants. Cis-eQTL analysis indicates a potential causal impact on gene expression at four of the 22 RLS loci. Finally, investigating traits associated with polygenic risk score for RLS, this study confirms and adds to prior epidemiological findings by implicating among other factors obesity, smoking and high alcohol intake as lifestyle risk factors for RLS.

Results
Genome-wide association study: discovery and replication. The discovery meta-analysis confirmed 19 of the 20 previously reported RLS variants 6 ( Fig. 1 and Supplementary Tables 1-3). The remaining SNP, rs12962305-T, had an effect size that was significantly smaller than previously reported meta-analyses (Table 1). The P-values of association with five sequence variants, at loci not previously associated with RLS, were below 5 × 10 −8 in the discovery sample and were tested in a follow up sample, including the Fig. 1 Manhattan plot displaying results from the RLS discovery meta-analysis for N = 480,982 independent biological samples. Variants labeled orange are previously reported variants. Variants labeled blue and green are novel variants (five) that were tested in a follow-up sample. Of the five novel variants, three were confirmed (green diamond shape) in the follow up analysis and met the genome-wide significance threshold 27,28 , whereas two did not ( The combined analysis comprises both the discovery sample as well as the two replication samples. c Represents significant P-value for replication samples after multiple testing: EU-RLS-GENE cohort (6228 cases and 10,992 controls) and the RBC-Omics cohort (423 cases and 7,334 controls) (Supplementary Table 1 and Supplementary Figs. 1-5 for regional association plots). Three of the tested variants surpassed genome-wide significance in the meta-analysis of all samples 27,28 (Table 1). The novel RLSassociated sequence variants are; rs10068599-T in an intron of RANBP17 on 5q35.1 (OR = 1.09, P = 6.9 × 10 −10 , 95% CI: 1.06-1.12), rs112716420-G in close proximity of MICALL2 on 7p22.3 (OR 1.25, P = 1.5 × 10 −18 , 95% CI: 1.19-1.31) and rs10769894-A near LMO1 and STK33 on 11p15.4 (OR = 0.90, P = 9.4 × 10 −14 , 95% CI: 0.88-0.93) ( Table 1).
Cis-co-localization analysis of RLS variants using GTEx. To identify the RLS variants acting as cis-expression quantitative trait loci (cis-eQTL) sharing the same signal with top eQTL of respective gene and tissue, we performed stepwise pairwise colocalization analysis. We investigated cis-eQTL of RLS variants in 54 tissues reported in the GTEx database. Of the 23 tested RLS variants (20 previously reported and three novel), we found cis-eQTL data for 11 impacting 17 genes (Supplementary Tables 4  and 5). Of the 11 with data, 10 strongly associate with cis-gene expression (P < 3.3 × 10 −06 , Supplementary Table 6). Six of these 10 variants are in LD (r 2 > 0.3) with top-eQTL for the respective gene (Supplementary Table 4). To ascertain that RLS variants and top-eQTLs share the same signal, we further evaluated these six variants by two-way approximate conditional analysis, which was implemented in COJO 29 . Therein, conditional analysis using RLS effect sizes showed that four RLS variants and eQTLs share the same signal (Supplementary Table 5). Additionally, conditional analysis using GTEx effect sizes also confirmed these as the same associated signals (Supplementary Table 6). Hence, four RLS variants (rs10068599-T, rs1063756-CACAG, rs12450895-A, and rs3784709-T) co-localize with top eQTLs for five genes respectively (RANBP17, CASC16, HOXB2, MAP2K5, and SKOR1) (Fig. 2) (for all RLS-associated variants see Supplementary Fig. 2). rs10068599-T is associated with a lower expression of RANBP17 in brain subcortical regions, mainly in the basal ganglia and in the liver, thyroid and heart left ventricle. rs3784709-T is associated with a lower expression of SKOR1 in pituitary, pancreas, and mammary tissues, while the variant also is associated with a lower expression of MAP2K5 in the left ventricle of the heart. Moreover, rs10653756-CACAG appears to be associated with a specific effect on CASC16 expression in testes. rs12450895-A affects the expression of HOXB2 by lowering it in suprapubic skin, fibroblasts cells, and in the omentum (visceral adipose tissue) (Fig. 2).
Genetic risk and LD regression analysis. We used RLS-PRS to predict RLS clinical cases (N = 1916 with the ICD10:G25.8 diagnostic code) in UK Biobank data. The analysis showed that RLS-PRS explains 0.97% of the phenotypic variance ( Supplementary  Fig. 7). One SD increase in RLS-PRS increases the odds of RLS 1.40-fold over that in population controls (P = 4.4 × 10 −46 , OR = 1.40, 95% CI: 1. 35-1.45). Area under the curve and receiver operator curve analysis show that the risk for RLS increases for ascending quartiles (Supplementary Table 7 and Supplementary  Fig. 8). RLS-PRS was used to identify traits associated with the score in the UK Biobank. Our analysis showed that higher RLS-PRS burden is negatively associated with educational attainment (P = 2.  Tables 8 and 9). Results from LD score regression 30 and PRS-association analysis are in keeping (Supplementary Tables 10 and 11). The gene-set enrichment/pathway analysis using MAGMA 31 on a molecular signature database 32 recourse did not reveal any significant associations after correction for multiple testing (Supplementary  Table 12).

Discussion
Several sequence variants have been shown to associate with RLS, although causal variants at the associated loci and their functional relevance remains largely unknown. In a previous meta-analysis of RLS, 20 sequence variants at 19 loci were associated with RLS 6 . Here, we confirm associations with 19 of the 20 variants and report three novel associations bringing the number of RLSassociated variants to 23 at 22 loci. The three novel variants are rs112716420-G, rs10068599-T, and rs10769894-A.
The known protein-coding genes closest to rs112716420-G on chromosome 7 are MICALL2 and UNCX. Variants in these genes are associated with red blood cell count and volume (i.e., hematocrit values), hemoglobin concentration and glomerular filtration rate [33][34][35] . rs112716420-G, however, does not associate significantly with these phenotypes in our samples. Hence, it does not appear that rs112716420-G impacts iron homeostasis, which is thought to be involved in the pathogenesis of RLS 11 . It is known that peripheral iron deficiency affects brain iron availability, although the specific mechanisms explaining how iron moves between the periphery and the nervous system remain unclear 9 . Moreover, the homeobox comprising transcription factor Uncx4.1 has been found to be expressed in glutamatergic, GABAergic and dopaminergic neurons in the mouse midbrain 36 .
rs10068599-T is in an intron of RANBP17 (Ran-binding protein 17) on chromosome 5, which is a protein-coding gene of the exportin family. The cis-gene expression analysis showed that the rs10068599-T lowers the expression of RANBP17 mainly in the basal ganglia and in the cerebral cortex. Previous studies have found that variants in RANBP17 are associated with visceral fat 37 , body mass index (BMI) 38 , high-density lipoprotein (HDL) cholesterol 39 , smoking status 40 and alcohol consumption 41 .
The closest protein-coding gene to rs10769894-A on chromosome 11 is LMO1. This gene encodes the protein rhombotin-1, which is normally expressed in neural lineage cells 42,43 . Variants in LMO1 have been associated with BMI 44 and neuroblastoma and T-cell leukemia 45,46 , which is of interest since the strongest genetic predictor for RLS is a variant in MEIS1 that affects cancers such as leukemia and neuroblastoma [47][48][49] .
By integrating association statistics with gene expression data, we identified potential causal variants and genes affected at four of the 22 loci. As mentioned, the variant rs10068599-T lowers the expression of RANBP17 in brain subcortical regions. rs3784709-T lowers the expression of SKOR1 in pituitary, pancreas and mammary tissues. MEIS1 is considered an upstream activator of SKOR1 50 , while rs12450895-A lowers the expression of HOXB2 in adipose tissue and skin. Finally, we found that rs10653756-CACAG affects the expression of CASC16 in testis. Hence, these variants may exert their causal effects through their impact on gene expression.
Our analysis showed that RLS-PRS, the aggregated genetic predisposition for RLS, correlates negatively with years of education and performance on cognitive tests but positively with neuroticism score. The RLS-PRS also correlates negatively with age at first birth and positively with several anthropometric measures, including whole body fat, percentage fat in trunk, legs and arms and waist-to-hip ratio. These findings extend prior epidemiological studies 3 and both confirm and extend those of Schormair et al. 6 who searched for diseases and other traits associating with RLS-PRS. RLS has consistently been associated with modifiable lifestyles broadly considered to be unhealthy. In a prospective cohort study including 55,540 US adults, for example, RLS prevalence was lower among individuals who had a normal body weight, who were physically active, who were non-smokers, and who had an alcohol intake below the medium amount 13 .
RLS is a complex polygenic sensorimotor disorder strongly influenced by lifestyle. This study increases the number of known independent RLS-associated genes to 23 in 22 loci, and cis-eQTL highlights genes at four of the loci giving more insights into RLS etiology. Future studies investigating the effect of drugs targeting the implicated physiological pathways and behavioral lifestyle changes on RLS as a therapeutic regime may provide valuable knowledge on the pathophysiology and the most prudent treatment modalities for RLS.  54 , 408,565 subjects from the UK Biobank (UK) (1916 cases) 55 , 2363 subjects from the Donor InSight-III cohort (The Netherlands) (565 cases) 56 and 1417 subjects from the Department of Neurology and Program in Sleep at Emory University (Emory cohort) (US) (696 cases) (Fig. 3).

Methods
We used clinical diagnosis or questionnaire data to assess RLS status in the participants, either applying questions based on the International RLS Study Group (IRLSSG) diagnostic criteria for RLS 57,58 or the Cambridge-Hopkins RLS questionnaire (CH-RLSq), which is also based on these criteria. Definite and probable RLS cases were combined into one group 59,60 (questionnaires are displayed in "Questionnaires used to assess RLS" on page 4 in Supplementary material). For subjects in the UK Biobank, the clinical diagnostic code ICD10: G25.8 was used to inform affectation status, whereas for the Emory cohort, gold standard diagnosis derived from face-to-face clinical evaluations by RLS specialists was used and the controls were determined for those lacking symptoms and signs associated with RLS.
Discovery meta-analysis. In total, we tested 15,838,848 sequence variants (1000 Genome phase 3 panel markers) for association with RLS (For a more detailed description of the included cohorts, see section "Cohorts included in the discovery meta-analysis" on page 2 in Supplementary material and section "Genotyping, imputation, and association analysis of cohorts included in the discovery metaanalysis" on page 7 for a detailed description of the methods). The GWAS results from the six cohorts (Iceland, Denmark, UK INTERVAL, UK Biobank, US Emory, and the Netherlands) were combined using a fixed effect inverse variance model 61 allowing different allele frequencies (of genotypes) in each populations, i.e., based on the effect estimates and standard error. Moreover, to control for a heterogenetic effect of the markers tested in the populations, we used a likelihood ratio test (Cochran's Q) and so evaluated their test statistics.
Before conducting the meta-analysis, variants in each dataset were mapped to NCBI Genome reference Consortium Build 38 (GRCh38) positions and matched to the Icelandic variants based on position and alleles. We included variants that were properly imputed in all datasets and which have a minor allele frequency >0.1% in more than one cohort. For the suggestive associations we used conventional genome-wide P-value threshold of P < 5 × 10 −08 to find lead associations and to test those for replication. To claim a novel genome-wide association the sequence variants used in the meta-analysis (N = 15,838,848) were split into five classes based on their genome annotation and the weighted significance threshold for each class was used 28  Replication of novel variants. Novel variants identified in the discovery phase of our study were tested for association in two replication datasets consisting of subjects of European ancestry, the EU-RLS-GENE consortium 6 (6228 cases and 10,992 controls) and the RBC-Omics cohort (423 cases and 7334 controls) 62 . In both replication tests, analyses were adjusted for age, sex, and the first 10 principal components of ancestry in a logistic regression model (For a more detailed description of the included cohorts, see section "Cohorts used for follow-up/ replication analysis" on page 6 in Supplementary material) (Fig. 3). For the suggestive associations we used conventional genome-wide threshold (P < 5 × 10 −08 ) to find lead associations, which were tested for replication. To claim a novel genome-wide association the sequence variants used in the meta-analysis (n = 15,838,848) were split into five classes based on their genome annotation, and the weighted significance threshold for each class was used 28 .
Gene expression. We assessed cis-eQTL effects of the variants associated with RLS. RNA sequencing data from 54 human tissues was obtained from the Genotype-Tissue Expression (GTEx) portal 63 . We tested all genes in a one Mb window centered on the 23 variants. In total 15,153 tests were performed, and Bonferroni threshold was applied to the P-value. Therefore, P < 0.05/15,153 = 3.3 × 10 −06 was considered statistically significant.
Genetic risk. To assess the impact conferred by the confluence of common RLS variants we calculated a RLS-PRS for each of the 500,000 UK Biobank subjects. The RLS-PRSs were calculated using summary statistics from a subset of the RLS-GWAS meta-analysis (UK participants from the INTERVAL and the UK Biobank excluded). Briefly, to generate the RLS-PRS for the UK Biobank sample we used 630,000 informative SNPs across the genome and constructed locus allele-specific weightings by applying LDpred to the summary data from the subset meta-analysis GWAS 64 . Constructing individual weightings, we were able to calculate an aggregated score of genetic susceptibility for RLS in all included individuals. Displaying eQTL variants. We found cis-eQTL data for 11 of the 23 RLS variants impacting 17 genes. Figure 2 displays the four variants that are significantly associated with cis-gene expression at least in one tissue tested are in linkage disequilibrium (LD) (r 2 > 0.30) and share the same causal signal (as confirmed through approximate conditional analysis) with the top eQTL variant of the respective genes (results for the remaining variants are displayed in Supplementary Fig. 6). Cis-eQTL effect estimates (normalized) are provided and those sharing same causal signal (COJO conditional analysis, results from this are displayed in Supplementary

Data availability
Data used in the present study is whole blood samples that have been genotyped. For this study, summary statistics from different RLS-GWAS's were collected and combined in a meta-analysis. The RLS meta-analysis summary statistics will be made available at https:// www.decode.com/summarydata/. Data is available upon request. For access to data included in the meta-analysis, please contact the authors in charge of the respective cohorts.

Code availability
Statistical codes are available upon request from corresponding author. No custom codes were used.