Genome-wide gene-based analyses of weight loss interventions identify a potential role for NKX6.3 in metabolism

Hundreds of genetic variants have been associated with Body Mass Index (BMI) through genome-wide association studies (GWAS) using observational cohorts. However, the genetic contribution to efficient weight loss in response to dietary intervention remains unknown. We perform a GWAS in two large low-caloric diet intervention cohorts of obese participants. Two loci close to NKX6.3/MIR486 and RBSG4 are identified in the Canadian discovery cohort (n = 1166) and replicated in the DiOGenes cohort (n = 789). Modulation of HGTX (NKX6.3 ortholog) levels in Drosophila melanogaster leads to significantly altered triglyceride levels. Additional tissue-specific experiments demonstrate an action through the oenocytes, fly hepatocyte-like cells that regulate lipid metabolism. Our results identify genetic variants associated with the efficacy of weight loss in obese subjects and identify a role for NKX6.3 in lipid metabolism, and thereby possibly weight control.

A full description of the statistics including central tendency (e.g. means) or other basic estimates (e.g. regression coefficient) AND variation (e.g. standard deviation) or associated estimates of uncertainty (e.g. confidence intervals) For null hypothesis testing, the test statistic (e.g. F, t, r) with confidence intervals, effect sizes, degrees of freedom and P value noted Give P values as exact values whenever suitable.

For Bayesian analysis, information on the choice of priors and Markov chain Monte Carlo settings
For hierarchical and complex designs, identification of the appropriate level for tests and full reporting of outcomes Estimates of effect sizes (e.g. Cohen's d, Pearson's r), indicating how they were calculated

Clearly defined error bars
State explicitly what error bars represent (e.g. SD, SE, CI) Our web collection on statistics for biologists may be useful.

Software and code
Policy information about availability of computer code Data collection Genotype data were generated using HumanCoreExome-12 v1.1 with 264,909 tag SNP marker and 244,593 exome-focused markers. They were processed with the Illumina TM platform following Infinium® HD Assay Ultra, Manual according to manufacturer's instructions. Genotypes were called with the GenomeStudio Software (Illumina). Quality control procedure followed recommendations from the GenABEL package67 and excluded SNPs with call rate < 95%, violating Hardy-Weinberg equilibrium (FDR<20%), low minor allele frequency < 1%. Subjects were excluded if they had low call rate (<95%), abnormally high autosomal heterozygosity (FDR<1%), an XXY karyotype, or gender inconsistencies between genotype data and clinical records. For subjects with high identity-by-state (IBS>95%), only the one having the highest call rate was kept. Principal component analyses (PCA) were performed independently on each cohort to discard subjects that were outliers in term of genetic structure. Subjects from both cohorts were all of European ancestry and the two cohorts had similar genetic structure ( Figure S1). Upon all genetic QCs, 1166 Ottawa and 789 DiOGenes subjects were kept for subsequent analyses. Genotype imputation was then performed using SHAPEIT68 and IMPUTE269 based on the European reference panel from the 1000 Genome project70 (March 2012 release, phase 1 version 3). Imputation post-filtering removed SNPs with reference allele frequency less than 1% and INFO score < 0.8. Upon such filtering, data for 4.9M imputed SNPs were available for both datasets.
Functional analyses in Fly Fly strains. Fly stocks were maintained on standard diet with agar, sugar and yeast and were raised in 25°C incubator at a 12/12 dark and night cycle. Actin-Gal4 and TubGal80ts was from Bloomington. UAS-AnkIR (GD25945), UAS-HGTXIR1(GD12608), UAS-HGTXIR2 (KK109732) UAS-CG3209 IR2 (KK10281) were from the VDRC. Triglyceride assay: 5 male flies were weighted and homogenised in 200 μl PBST (PBS + 0.05% Tween 20) on ice, then sonicated for 10s nature research | reporting summary April 2018 using a probe sonicator on ice. After sonication, 800μl ice-cold PBST was added and mixed thoroughly. 50 μl of the mixture was used to determine the triglycerides using the Roche triglycerides kit (11730711216) under the manufacturer's instructions, and 10 μl of the mixture was used to determine to protein using Bradford protein assay kit (Sigma). Triglycerides were normalized to protein level. Glycogen assay: Glycogen was determined by measuring the glucose degraded from glycogen using amyoglucosidases. 5 male flies were homogenized and dissolved in 1M KOH solutions. After twice 95% ethanol extraction, the pellet was resuspended 1ml amyoglucosidase reaction buffer (0.3 mg/ml in 0.25 M acetate buffer, pH 4.75) and incubated in the 37°C shaking incubator overnight and a glucose assay was performed. Trehalose assay: Trehalose was determined by measuring the glucose degraded from trehalose using trehalosase. 20 male flies' heads were homogenized in 200 μl ice-cold TB buffer (5 mM Tris pH 6.6, 136 mM NaCl, 2.7 mM KCL), then centrifuged at 4 ºC for 5 min at maximum speed and collect the supernatant. 20 ul of supernatant was used to measure glucose and protein, respectively. 50 μl of supernatant was mixed with 50 μl TE solution (10 μl trehalose in 1.5 ml TB buffer). Incubated at 37°C for overnight. Centrifuged at 4 °C for max speed for 3 min and then 20 ul of supernatant was used to measure glucose assay. Glucose assay: 50 μl of mixture was used to determine the [glucose] using Thermo Infinite Glucose kit (TR15421) as the manufacturer's instructions. Body weight Body weight were measured by analytic balance Food intake assay Food intake was measured by CAFE assay. Food intake was measured over 24 hours. Empty vials were used for evaporation controls. All food intake experiments were set up at Zeitgeber time 6-8, and food intake was recorded exactly, 24 hours after start of food loading. Starvation assay Male flies were put into DAM2 tubes with 2% agar and the death was monitored per hour. qPCR 1μg of mRNA was reversely transcribed into cDNA using SuperScript® III First-Strand Synthesis System (Invitrogen). All primers used for qPCR that have been prescreened for efficiency and specificity. RT-PCR was performed using SensimixTM probe kit (Bioline). The program is following: 95°C 10 min, 40 cycles of 95°C, 15s; 55°C, 15s; 72°C, 15s. The reactions were run on Light Cycle® 480 (Roche). The gene expression was normalized to the reference gene rp49.

Data analysis
Single-SNP analyses were performed using BOLT-LMM24, a Bayesian linear mixed effect model to adjust for population structure and cryptic relatedness between individuals. The rate of weight loss was adjusted for sex, age and starting BMI. Results from the two cohorts, were subsequently meta-analyzed using Genome-Wide Association Meta Analysis (GWAMA) software71 with random-effect modeling and a double genomic-control (GC) correction72 (GC correction at study-level and also at meta-analysis level). Gene-based analysis was performed using the VEGAS (versatile gene-based association study) approach20. This method integrates single-SNP p-values at the gene-level and accounts for LD patterns through Monte-Carlo simulations. Analysis was made with the following settings: European ancestry population from the 1000 Genome project as reference population for LD patterns, set-based test using the top 80% SNPs and a gene block size set to 20kb. Such setting assigns to each gene, its neighboring SNPs (within 20kb) and discard the 20% SNPs having the least significant p-value. For regions with closely located genes (e.g. gene clusters), potentially the same SNPs could be assign to different genes, leading to different "gene-locus" having similar p-values. This is not an issue as the goal of such analyses is only to highlight loci associated with the trait of interest. Adjustment for multiple-testing was performed using the Benjamini-Hochberg correction73. Genome-wide significance threshold was set to FDR < 5%. Prioritization of GWA signals was performed using a Bayesian framework to model the joint likelihood of association p-values with largescale epigenomic annotations. Such risk variance inference was performed using the RiVIERA-beta framework21 and with 450 epigenomic annotations (including histone marks, DNase I hypersensitivity, transcription factor binding, and localization within exons). The goal of this framework is to infer for each input SNP the posterior probability of disease given its association p-value and overlap in functional annotations. Epigenomic annotations were retrieved from Pickrell et al.74. Analysis of fly phenotypes was performed using one-way ANOVA with Bonferroni adjustment for multiple comparisons.
For manuscripts utilizing custom algorithms or software that are central to the research but not yet described in published literature, software must be made available to editors/reviewers upon request. We strongly encourage code deposition in a community repository (e.g. GitHub). See the Nature Research guidelines for submitting code & software for further information.

Data
Policy information about availability of data All manuscripts must include a data availability statement. This statement should provide the following information, where applicable: -Accession codes, unique identifiers, or web links for publicly available datasets -A list of figures that have associated raw data -A description of any restrictions on data availability The data that support the findings of this study are available from the corresponding author (Armand.Valsesia@rd.nestle.com) upon reasonable request.

nature research | reporting summary
April 2018 Field-specific reporting Please select the best fit for your research. If you are not sure, read the appropriate sections before making your selection.

Life sciences Behavioural & social sciences Ecological, evolutionary & environmental sciences
For a reference copy of the document with all sections, see nature.com/authors/policies/ReportingSummary-flat.pdf

Life sciences study design
All studies must disclose on these points even when the disclosure is negative.

Sample size
These analyses are exploratory and no sample size computation could be performed.
The Canadian samples consisted of patients enrolled in the Weight Management Clinic16 who had completed 6 to 12 weeks mealreplacement regimen consisting of a product uniquely available in Canada, Optifast900 (Nestlé Health Science, Switzerland). Program adherence criteria included: attendance to 75% of clinical sessions and strict adherence to the meal replacement (evaluated by the number of meal-replacement products that were consumed). Patients under medication known to affect rate of weight loss or glucose homeostasis or abnormal thyroid indices were excluded from the analyses. In total, 1436 Optifast900 subjects were eligible for genotyping and had available weight measurements. The DiOGenes study is an interventional, multi-center pan-European study17,18. Eight partners participated to the study: Bulgaria, Czech Republic, Denmark, Germany, Greece, the Netherlands, Spain and United Kingdom. Participants followed an 8-week LCD. The LCD provided 800 kcal per day with the use of a meal-replacement product (Modifast, Nutrition et Santé France). Participants could also eat up to 400g of vegetables (corresponding to a maximal addition of 200kcal/day). In total, 888 DiOGenes subjects were included for genotyping. The macronutrient composition of the two meal-replacement products was considered similar (fat=17% and14%; proteins=34% and 42% and carbohydrates=49% and 44%, respectively for the Optifast and Modifast products).
Data exclusions Data inclusions/exclusions were based on availability of weight measures (at baseline and upon 5-weeks of low-caloric diet intervention) and whether genotyping led to high quality data. Genotyping QC was performed as follows: Quality control procedure followed recommendations from the GenABEL package67 and excluded SNPs with call rate < 95%, violating Hardy-Weinberg equilibrium (FDR<20%), low minor allele frequency < 1%. Subjects were excluded if they had low call rate (<95%), abnormally high autosomal heterozygosity (FDR<1%), an XXY karyotype, or gender inconsistencies between genotype data and clinical records. For subjects with high identity-by-state (IBS>95%), only the one having the highest call rate was kept. Principal component analyses (PCA) were performed independently on each cohort to discard subjects that were outliers in term of genetic structure. Subjects from both cohorts were all of European ancestry and the two cohorts had similar genetic structure ( Figure S1). Upon all genetic QCs, 1166 Ottawa and 789 DiOGenes subjects were kept for subsequent analyses.