Obesity and related complications are major health burdens. Almost 700 million adults are currently obese globally and the prevalence is predicted to rise towards 2030. The sudden change of lifestyle with physical inactivity and excessive calorie intake undoubtedly have a major part of the epidemic development; however, some individuals seem to be more prone to be affected by an unhealthy lifestyle than others. Hence, genetic predisposition also has an essential role in determining disease susceptibility and response to lifestyle factors. Since the introduction of genome-wide association studies (GWAS), the success of identifying obesity susceptibility variants have increased, and a total of 32 variants have been identified associating genome-wide significantly with body mass index (BMI) and 18 with measures of fat distribution during four overall obesity GWAS waves. However, the immediate success of the GWAS approach has eased off, but the proportion of explained variance for BMI by the identified obesity variants remains low. This review suggests and discusses new initiatives to take GWAS of obesity to the next level, including gene–environment interactions as modulating/masking factors, low-frequent or rare variants and ways to address such analyses, and finally reflections about the applicability of epigenetic modifications when elucidating the genetic background of obesity.
Obesity and the complications associated with excessive body fat accumulation has become a major global health burden. Projection estimates predict that the number of obese adults will rise from 500 million in 2008 to over 700 million in 2015, and this trend will continue towards 2030.1, 2 The rapid increase in incidence and prevalence of obesity seems to be explained predominantly by the radical change in lifestyle during the last century where high intake of energy-dense food and physical inactivity have become more common. Yet, some individuals seem to be more susceptible to this obesogenic environment, underlining an important genetic component, that also has been established in several twin, family and adoption studies, with heritability estimates ranging from 40 to 70%.3, 4, 5
Obesity is a result of positive energy balance, and biological pathways such as appetite regulation, metabolism and adipogenesis are important factors in the aetiology; however, the complete molecular background of obesity is far from understood. It is anticipated that a deeper understanding of the genetic predisposition to the disease will contribute to the identification of new biological pathways, and hence new drug targets, as well as better prediction and prevention strategies. However, common obesity is a complex, heterogeneous and multi-factorial disease and consequently the unravelling of its genetic architecture has turned out to be a challenging task.
Before 2007 where genome-wide association studies (GWAS) were introduced, obesity gene identification was facilitated using the biological candidate gene method or linkage studies. These methods have resulted in the suggestion of numerous genes; however, none which could be firmly validated.6 Retrospectively, the lack of success was linked to substantial shortcomings of both these methods. Commonly, they suffered from inadequate statistical power to detect the outlined associations, whereas a major limitation of the biological candidate method was inadequate biological and genomic knowledge. Linkage studies identified extremely broad genomic regions and the subsequent fine-mapping to pinpoint the causative gene and/or variant was virtually impossible at the time, and the only withstanding gene is PCSK1 identified using a combination of the two methods.7
The overall lack of success identifying disease-associated genes combined with the aspiration to increase the general biological knowledge and pathological understanding of complex diseases have facilitated new and innovative approaches including GWAS, where the entire genome is scanned for common disease-associating variants in a hypothesis-free manner. This review depicts the progress made within the genetic field of obesity following the introduction of GWAS, with an overview of the identified variants and the method refinements made continually through the GWAS waves. Endingly possible ways ahead and new strategies within the GWAS framework will be discussed.
Genome-wide association studies
The advent of GWAS was facilitated by technological progress and increased knowledge about the human genome, with the International HapMap Consortium (www.hapmap.org) as a major driving force. The complete outline of common single-nucleotide polymorphisms (SNPs) and the existing linkage disequilibrium enabled near-genomic coverage (∼80%) of common variation using a moderate number of SNPs (∼500 000–1 600 000). Simultaneously, progression in genotyping methods shifting to a chip-based technology made massive SNP typing with high accuracy at relatively low costs possible. The number of SNPs analysed and their hypothesis-free scattering across the genome has revolutionised the association study approach, but has also created challenges both in regards to significance threshold, replication demands and interpretation of the functionality. However, as the GWAS waves progressed most challenges have been addressed and adaptive refinements have been made continually. Stringent genome-wide significance thresholds (<10−8) have been established to overcome false-positive findings and a design that involves a discovery stage and at least one replication stage has been introduced to ensure higher validity of the findings. Moreover, imputation strategies have been applied8, 9 to allow combination of data across GWAS populations effectively enlarging the study samples through meta-analyses consequently increasing the power to detect associations. Nevertheless, as these refinements to ensure reproducibility have been an adaptive process, some non-replicable findings did emerge when the GWAS approach was first implemented in the search for obesity susceptibility variants.
GWAS suggested obesity susceptibility loci
The first GWAS of obesity phenotypes was published in 2006. Compared with the later GWAS it was small not only in respect to the number of SNPs analysed but also in respect to the sample size, as a total of 86 604 SNPs were analysed in 694 participants from family studies, and therefore it is often regarded as a pre-GWAS. One SNP, rs7566605 near INSIG2, was suggested to associate with obesity, which was validated in the independent replication stage10 (Table 1).
The true GWAS era was introduced a year later, and so far it constitutes of four waves. Most GWAS of obesity has used body mass index (BMI) as a continuous trait, whereas others have examined extreme obesity in children or adults, under the assumption that morbidly obese individuals might be enriched in obesity susceptibility variants. The first obesity GWAS wave resulted in the suggestion of four susceptibility loci. FTO was originally highlighted in a GWAS of type 2 diabetes;11 however, adjustment for BMI revealed that the association was mediated through obesity.12 Variants in or near FTO have since become the most replicated obesity susceptibility locus, emerging in all subsequent GWAS performed on obesity13, 14, 15, 16, 17, 18, 19, 20, 21 except one.22 In the discovery study, the lead SNP (rs9939609) showed a BMI increase of 0.36 kg m−2 and an odds ratio of 1.31 (1.23–1.39) per risk allele carried (Table 1). In the wake of the FTO discovery, a few GWAS suggested variants in or near PFKP,13 CTNNBL1 and FDFT1,22 but replication has been problematic23, 24, 25 even in the replication stage of the discovery studies.13, 22 In the second GWAS wave, the GIANT (Genetic Investigation of ANthropometric Traits) consortium performed meta-analyses of ∼17 000 Caucasian individuals and identified variants in or near MC4R associating with measures of obesity26 (Table 1) and the same variants were also shown to associate with fat distribution represented by waist circumference27 (Table 2).
The third obesity GWAS wave included three studies. A meta-analysis of GWAS and an independent GWAS identified variants in or near TMEM18, SH2B1, KCTD15, NEGR1,15, 16 GNPDA2, MTCH2,15 BDNF, SEC16B, FAIM2 and ETV5(ref. 16) genome-wide significantly associated with BMI. Both studies included ∼32 000 individuals and showed effect sizes ranging from 0.06 to 0.54 kg m−2 when comparing homozygous risk allele carriers with non-carriers (Table 1). The third study was performed in study samples of early-onset extreme obesity and reported four putative loci, NPC1, MAF, PTER and PRL; however, only MAF showed stringent genome-wide significance.17 In addition to GWAS performed using measures of general obesity, a parallel GWAS strategy focused on measures of fat distribution using waist circumference and waist-to-hip ratio (WHR) adjusted for BMI. Four novel loci were identified associating with fat distribution measures; LYPLAL1 with waist circumference in women (Table 2), TFAP2B, MSRA28 and NRXN3(ref.29) with WHR (Table 3).
The fourth obesity GWAS wave was dominated by two meta-analyses performed by the GIANT consortium, one comprising ∼124 000 individuals in the discovery stage and ∼250 000 in total using BMI,18 and another comprising ∼77 000 individuals using WHR adjusted for BMI as obesity measure.30 These identified 18 and 13 novel loci, respectively, listed in Tables 1 and 3, respectively. Generally, the WHR variants show stronger association in women than in men, in accordance with the gender-specific difference in fat distribution. Three loci have been suggested in GWAS of extreme obesity; SDCCAG8 and TNKS observed in study samples of children and adolescent,19 KCNMA1 found in an adult population,20 and finally, two loci, OLFM4 and HOXB5, have been identified in studies of common childhood obesity21 (Table 1). Thus, a total of 43 loci have at present been suggested to predispose to overall adiposity and 18 loci to visceral fat accumulation. Of these, 32 BMI and 14 waist/WHR variants are genome-wide significant (Figure 1), as well as one variant (MAF) associating with morbid obesity. The vast majority identified in the fourth wave through extremely large meta-analyses with decreasing effect sizes as consequence (Figure 1).
Replication of GWAS findings in independent studies
Replication in independent study samples was especially important in the first obesity GWAS wave, before the genome-wide significance threshold was introduced and replication demands were systematically met. Nevertheless, even after refining the GWAS approach, such studies still have their justification, as they estimate independent effect sizes not inflated by ‘winner’s curse’ and also often extend with analyses of additional related phenotypes, thereby contributing to the elucidation of the overall metabolic impact of identified variants/loci. FTO remains the best replicated obesity gene, as well as the strongest, and a tremendous number of studies have validated the association.31 Likewise, the relatively strong association with obesity observed for variants near MC4R has been well replicated in independent studies.32, 33 For the loci identified in the third GWAS wave, replication attempts have primarily been performed in Caucasian population with divergent results. Among the most successfully validated are TMEM18,19, 34, 35, 36 NEGR1,34, 36, 37 SH2B1,34, 36, 37 MTCH2,34, 36, 37 GNPDA2,35, 37, 38 FAIM235, 38 and BDNF,35, 38 a pattern also recognised in Asian populations.39, 40, 41 Some attempts have been made to validate the fat distribution loci identified in the third wave, however, with limited success.42, 43 These missing associations in independent studies probably reflect a lack of power due to the relatively low effect sizes.
Gained biological knowledge from the obesity GWAS waves
The potential knowledge gained through obesity GWAS findings are generally accumulating as the speed of translation into new biological insight in retrospect has been overestimated. Major impeding factors of the overall biological elucidation have been the fact that the vast majority of the identified obesity susceptibility variants are located in non-coding areas of the genome, including intronic or intergenic regions (Tables 1, 2, 3), and the obvious functionality of the SNPs is therefore difficult to establish within the frames of current genomic knowledge. Hence, the identified variants could either be linkage disequilibrium markers of the causal variant, but some could theoretically be the true causal variant, lying in unknown regulatory motifs or small coding areas of non-described regulatory molecules. Hence, a more thorough understanding of the human genome is required to label variants as functional or non-functional linkage disequilibrium markers with any certainty. In addition, the genomic location of the variants makes a precise link between SNP and affected gene difficult to establish, and consequently, no specific novel biological pathway or mechanism has yet been pinpointed. Nevertheless, it has been suggested that non-coding variants influence transcript regulation rather than gene function44 and some interesting observations are emerging when expression patterns are studied. A majority of the suggested genes harbouring variants associated with overall obesity, represented by BMI, are highly expressed in the central nervous system, whereas many of the suggested fat distribution genes are highly expressed in peripheral tissues.45
Some of the suggested genes have known functions related to obesity; MC4R that is important in appetite regulation,46 BDNF that has been linked with the reward system and eating disorders,47 SH2B1 that is implicated in leptin and insulin signalling,48 and NRXN3 also implicated in reward behaviour.29 TMEM18 is possibly responsible for neural development and NEGR1 controls neuronal outgrowth;15 however, a direct link with obesity has not been established. Several of the identified genes are specifically expressed in hypothalamic regions that could indicate important roles in controlling appetite. These include FTO,49, 50, 51 MTCH2, FAIM2, GNPDA2, KCTD15, ETV5 and NPC1;15 however, their exact biological function and link with obesity remain to be elucidated. Although overall adiposity, for a major part, seems to be mediated through the central nervous system, specific fat deposits or fat accumulation seems to be controlled peripherally, for example, by the adipose tissue itself. This is illustrated by TFAPB2 and LYLPLAL1, which both show high expression in adipose tissue28 and are responsible for lipid accumulation and lipase activity, respectively. The implication of different tissues in overall adiposity and visceral fat accumulation is thus one major biological gain from the obesity GWAS waves.
However, even though GWAS have succeeded in identifying obesity susceptibility variants, especially compared with the previous methods, the proportion of explained variance is still rather low. The GIANT consortium estimated that the confirmed obesity variants explained 1.45% of the inter-individual variation in BMI,18 and obviously a large task still exists in identifying the remaining heritability. Theoretically, a fifth GWAS wave could include even larger meta-analyses, but this would inevitable result in the identification of variants with smaller effect sizes, and it must be considered doubtful whether such knowledge can be translated into increased explained genetic variance. Hence, new strategies must be adopted to take gene identification to the next level, incorporating innovative thinking and new statistical approaches.
Beyond genetic main effects—gene–environment (G × E) interactions
One way to unravel some of the missing or hidden heritability of obesity could be by taking lifestyle factors into account. The environment has changed rapidly during a relatively short period of time, resulting in prevailing sedentary lifestyle and unhealthy dietary habits. During this time, the genetic pool has been stable, and as the obesogenic environment affects individuals at different levels, an important interplay between genes and environmental factors as causation of the obesity epidemic is indicated. This conviction is supported by studies observing an increase in the genetic contribution to BMI variance during the time the environmental changes occurred.52 Thus, a further elucidation of the genetic architecture of complex diseases could involve a comprehensive understanding of more aspects involved in its multi-factorial background, and an evaluation of plausible G × E interactions. However, several challenges supervene when implementing such interactions in genetic epidemiological studies of obesity. First, the identification and prioritisation that environmental exposures are not always straightforward. For complex diseases, such as obesity, the heterogeneous multi-factorial aetiology makes it is a rather demanding task, as numerous potential factors could be intertwining and interplaying with disease risk. Commonly accepted environmental risk factors of obesity are physical inactivity and unhealthy diet, and consequently, these are the most studied factors. Second, environmental factors can be difficult to quantify, and behavioural aspects are especially complicated to estimate. A large gap exists between the gold standard and timely and economically feasible approximations, and large-scale epidemiological studies often rely on subjective self-reports for quantification of both physical activity and dietary patterns. Problems with obesity-specific over- or underreporting have been recognised when accessing both physical activity and food intake,53, 54 but this must still be counterbalanced with the feasibility of the measuring method. Third, methodologically we are far from the ideal scenario where the statistical models used to estimate or elucidate obesity risk can include all modulating factors. This would require models of extreme complexity and the number of parameters needed to be estimated may potentially be infinite. Therefore, current statistical models are unable to fully mimic biological and environmental systems, and with concern of simplicity and practicability models are restricted to include combinations of few genetic and environmental factors. Fourth, adequate statistical power is extremely hard to achieve in G × E interaction analyses. Substantial genetic main effect is needed to obtain the statistical power to detect possible modulating effects of the environment, and even the introduction of GWAS has only resulted in the identification of few variants with sufficient impact to enter such analyses. In addition, even with adequate genetic main effects well-powered G × E interaction studies would still require extremely large study populations,55, 56, 57 only achievable through collaborations and meta-analyses.
In the post GWAS era, the most studied locus with respect to environmental influences has been FTO and especially the impact of physical activity has been evaluated. After the discovery of FTO, it was reported that the increased obesity risk associated with the rs9939609 T-allele was attenuated by physical activity.58 Comprehensive replication attempts have been made in study populations of different ethnicities and with different assessments of physical activity, and validation were achieved in some59, 60, 61, 62, 63, 64, 65, 66, 67, 68 but far from all69, 70, 71, 72, 73, 74 studies, and these inconsistencies left it unresolved whether physical activity reduced the effect of FTO on obesity. To clarify this incongruence, a large meta-analysis comprising 218 166 individuals from 48 different studies has been performed. Overall, a nominal significant interaction was observed with a per allele decreasing effect on BMI of 0.14 kg m−2 (pint=0.005) when comparing physically active and inactive individuals.75 This conclusion could be proof-of-concept in more than one sense. It indicates that well-augmented and biologically plausible G × E interactions do in fact exist, and that several studies can be combined successfully using approximations to standardised quantifications of environmental risk factors.
Another approach recently adopted in G × E interaction analyses is the conversion of several obesity variants into a genetic predisposition score to circumvent lack of power to detect the interactions individually. The applicability was illustrated by a study comprising 20 430 individuals, where 12 SNPs from the first two obesity GWAS waves were combined in a genetic predisposition score summarising the number of BMI increasing alleles. Each BMI increasing allele was associated with a 0.154-kg m−2 increase in BMI, more pronounced in physically inactive individuals (0.205 kg m−2 per allele) than in physically active individuals (0.131 kg m−2 per allele; pint=0.005).76 Collectively, these results indicate that a vast amount of genetic information is hidden or modulated by different lifestyle patterns, and that G × E interaction analyses likely will help improve our understanding of the pathophysiology of obesity and related phenotypes in the future.
Ideally, G × E interaction analyses should be included already in the discovery phase of future GWAS. This could lead to the identification of associations masked by environmental exposures, and hence variants with limited overall genetic main effect but pronounced effect in subgroups of the population. However, the implementations of G × E interactions in GWAS discovery phases pose a huge challenge to international collaborations, consortia and meta-analyses as it is recommended that the study samples are four-doubled when interaction terms are included in the statistical models.77
Methodologically, there is a long way before complete capability to model the complex biology of combined genetic predisposition and modulating environmental exposures is accomplished. Several methods have begun emerging with different focus areas. Some aim at implementing the G × E interaction analysis in the GWAS discovery phase using the likelihood-ratio tests,78, 79, 80 thereby increasing the power to detect associations masked by environmental exposures.81 Others refine associations of genetic variants with known main effect on disease risk, for example, using Bayesian approaches and random forest, to cope with the uncertainties in the general assumption about independence between genetics and the environmental exposure.82, 83, 84 These methods are also employed when searching for the combination of genotypes and environmental factors or interaction chains, with highest impact on disease risk.82, 84, 85 Finally, pathway-driven approaches collecting multiple genetic variants according to their biological function and pathway involvement is gaining ground in G × E interaction analyses.86 However, these innovative methods are not widely used yet, but they could withhold promises for the future for better selection of well-argued combinations of genetic variants and environmental factors in multi-factorial analyses.82, 85, 87
The current GWAS design has focussed on common SNPs as the predominant type of variation. Nevertheless, a substantial part of the missing or hidden heritability could be found in other types of variants, either structural or of lower frequency.
Copy number variations (CNVs)
The implication of structural variations in common diseases as represented by SNPs in linkage disequilibrium with CNVs on GWAS arrays have been low,88, 89 which could be a result of underrepresentation of such CNV tagSNPs on genotyping chips. Nevertheless, for obesity a few examples have in fact been suggested. The obesity-associated signal rs2815752 tag a 45-kb deletion upstream of NEGR1. Hence, the deletion is a causal candidate for the association signal, but further work is needed in terms of fine-mapping and functional studies before this can be firmly determined.15 Further evidence that CNVs contribute to the genetic architecture of obesity comes from the finding that large deletions on chromosome 16q11 are associated with severe obesity,90, 91 and the deletion spans a large number of genes including SH2B1 also identified in GWAS of obesity.15, 16 A genome-wide analysis has suggested that CNVs at chromosome 11q11 are involved in early-onset extreme obesity; however, this did not reach genome-wide significance.92 Moreover, a spectacular pattern of CNVs has been observed at chromosome 16p11.2. Where deletions in this chromosomal region causes morbid obesity,93 duplications result in underweight among both children and adults94 as an impressive example of how gene dosage can be linked with extreme mirror body composition phenotypes.
Hence, implications of the involvement of structural variation in obesity are present, but due to technical challenges in identifying, quantifying and hence genotyping the CNVs, the complete impact of these types of variation is difficult to estimate with any accuracy before new and better approaches have been developed.
Low-frequent and rare variants
The risk allele frequencies of the obesity variants identified through the GWAS waves are all quite high (Tables 1, 2, 3). Much speculation about missing heritability and improvement of explained variance has focussed on detecting variants with lower frequencies but substantially higher impact on disease risk. Low frequency (∼1–5%) and rare (<1%) variants could have large effect sizes, increasing the risk two- to threefold, without demonstrating Mendelian inheritance,95 and it has been suggested that low-frequent and rare variants in fact are disease disposing,96 and that they can be used for efficient prediction in complex diseases.97 However, detection of potentially disease predisposing, low-frequency and rare variants requires sequencing of a large number of cases and controls,98 which is a demanding task both with respect to costs and the amount of data created. Nevertheless, initiatives to sequence the entire human genome,99 as well as extensive sequencing of all coding regions (the exome) in the ∼20 000 known human genes,100 are already ongoing, and the number of identified low-frequent and rare variants is excessive. It is expected that each of these variants will have a relatively low impact on the disease endpoint and in combination with the heterogeneous nature of common complex diseases, the power to detect associations when testing one variant at the time will be rather low. New analytical strategies cumulating several variants are therefore optimal to obtain adequate statistical power. A tremendous number of methods for these genetic burden tests have recently been developed.101 Some methods use simplistic collapsing of the rare variants (usually <1%) analysing them as one unit, and some weigh the variants using allele frequencies or predicted functionality. As simple pooling of variants can be hampered by associations in different directions, some methods use data-based algorithms, which allow variants to be either protective or deleterious to overcome the diminishing association signal introduced by opposing associations.101
Nevertheless, the gain of identifying a catalogue of low-frequency or rare variants with larger impact on disease risk would be tremendous. A contribution of low-frequency and rare variants in common complex diseases seems to be established,102 and it has been estimated that ∼30 variants with a frequency of 1% and an odds ratio of ∼3 putatively could explain all inherited variance of complex disease.103
Factors not directly changing the DNA sequence could also contribute to the missing heritability of complex diseases, for example, epigenetic alterations. Epigenetics refer to modifications that regulate gene activity and/or expression rather than its DNA sequence.104 This could be methylation of the DNA sequence, in imprinting, packing of DNA on histones or as blockage of specific gene transcription through methylation of CpG islands in promoter regions. Epigenetic modifications can be programmed already in the intrauterine environment,105, 106 and interestingly, rodent models show inheritance through generations.107 If this is validated to apply to humans, it will interfere with the accepted notion that genetic variation is the only source of heritable diseases, and could give rise to new fundamental theories about heritability of metabolic diseases.108 To what extend epigenetic modifications contribute to the total heritability of obesity is presently unknown. A complicating factor when elucidating the role of epigenetic modifications in complex diseases is the fact that they are highly dynamic and display great tissue specificity.109 As obesity in part is a central nervous system-mediated disorder, tissue samples are inaccessible, further complicating the complete understanding of the role of these modifications. However, several loci related to obesity have interestingly been shown to be subject to genetic imprinting,107, 110 indicating the importance of epigenetic modifications. Moreover, it has been suggested that epigenetics could constitute the link between genetic susceptibility and environmental factors,111 as the plasticity of methylation patterns and histone packing fits perfectly with the dynamic structure of environmental exposures.112 Future steps could therefore include linking the causally unexplained GWAS association signals with epigenetics,113 yet, major efforts lie ahead, even though new technological advances move towards the point where epigenetic can be taken to a large-scale genome-wide level.114, 115 Among other epigenetic modifications and regulators of gene expression are microRNAs, which are small non-coding RNA molecules shown to have a role in many biological and pathological processes through regulation of gene expression.116 Several microRNAs have been shown to interfere with genes in adipogenesis and lipid metabolism; however, the precise mechanisms and extent has not been clarified.117, 118
Concluding remarks and looking ahead
For human genomics research, 2007 was a banner year, where the use of genotyping platforms made GWAS feasible and lifted genetic epidemiological studies to a higher level. The breakthroughs of the HapMap project were integrated in an agnostic approach revealing SNPs located in unanticipated locations of the genome and near loci with no prior link to the disease of interest.
In obesity research, the success of GWAS has resulted in four major waves and a total of 32 validated genome-wide significant loci associated with measures of overall adiposity and 18 loci associated with visceral fat accumulation. However, the instant and immediate success seems to have eased off, and the identification of new SNPs and novel loci only proceeded through the establishment of consortia and collection of large sample sizes in meta-analyses. However, despite a reasonable number of obesity susceptibility variants identified, the proportion of explained genetic variance of BMI remains low.18 The discrimination ability between normal weight and obese individuals is likewise inadequate and far from clinically useful.18, 37, 119 Still, overall important lessons have been learned during the four obesity GWAS waves; fewer variants than expected has been identified, which could be a result of overestimated or anticipated statistical power given the effect sizes appearing. Nevertheless, it is possible that some missing heritability lies in the variants associating near genome-wide significantly in the fourth GWAS wave, but given the heterogeneous and complex nature of the disease, where a high number of common variants most likely contributes in divergent combinations in different individuals, it requires an extremely large study sample to obtain statistical power to clarify this and, the contribution of such variants to explained genetic variance and discrimination ability is, for the same reasons probably, low.18 Therefore, different and innovative strategies increasing the likelihood of identifying new obesity variants with high impact should be incorporated in future GWAS obesity waves.
Emerging strategies include a shift from focussing at common adult obesity to focus at common childhood obesity, and such an initiative has already yielded success. A GWAS meta-analysis of totally 5530 cases and 8318 controls, using age- and gender-matched measures of BMI, identified two loci, OLFM4 and HOXB5, associating genome wide significantly with common childhood obesity,21 and this and similar strategies will undoubtedly contribute to the genetical knowledge of overall obesity in the future.
However, studying the extremes of the BMI distribution still seems as a possible and reasonable way to move forward towards a further unravelling of obesity genetics. Quite some examples already exist where genes causing monogenic forms of obesity through rare, severe and often private mutations also appear in GWAS of common obesity represented by less severe and often non-coding SNPs in proximity to the gene, such examples includes variants near MC4R, POMC and BDNF.
More general initiatives can be made to increase the probability of identifying novel obesity susceptibility variants. The use of refined and more accurate phenotypes could entail more precise classification of existing obesity subtypes, thereby increasing the statistical power to detect distinctive associations. Several approaches and directions could be pursued, one being the improvement of body composition measures. BMI is an accessible measure but dependent on both fat mass and lean mass, and it has been shown to provide misleading information about overall fat content.120 If, for example, the use of skinfold measures and bioimpedance measurements, which gives more accurate estimate of body fat content, were implemented in the GWAS strategy, it would probably increase the likelihood of detecting novel and more specific obesity variants as in the case of rs2943650 near IRS1, which was identified in a GWAS of body fat percentage.121 Another approach could be the identification of serum biomarkers, such as adipokines, potentially able to differentiate between various fat deposits, such as visceral and omental fat.122 Finally, a complementary phenotyping approach could be innovative reflections about the obesity phenotype, for example, focussing on the central nervous system-controlled part of obesity and the neurobiological mechanisms that override the tightly controlled energy homeostasis. Such information on individual addictive behaviour including food preferences could be gained from questionnaires and from functional neuroimaging.
Within genomics, the possibilities of developing and improving the GWAS approach are many. One obvious way to move forward is by focussing on low-frequent and/or rare variants. Novel reference genomes and newly developed algorithms123 make more accurate imputation a plausible gateway to the analyses of low-frequent variants in GWAS settings and this may very well be the next step forward in the unravelling of the genetic background of complex diseases, including obesity. Nevertheless, rare variants are currently not covered by such imputation strategies and initiatives using deep next generation sequencing approaches have, as discussed, already been applied to identify disease predisposing variants with frequencies below 5%. Where whole genome sequencing continues the GWAS outline, with no a priori hypothesis as to genomic location, whole exome sequencing is based on the anticipation that the majority of functional variants will be located in regions presently known to be coding, which also makes the interpretation of functionality more straightforward with the current genomic understanding. Both approaches rely on sequencing cases and controls; however, when studying obesity, this setup and consequently statistical power to identify predisposing variants could be compromised by the fact that the disease theoretically consists of a large number of subtypes with phenotypic distinctions that could be at an almost personal level. No obvious solution exist to circumvent this, but genetics could turn out to be an important contributor when identifying obesity subtypes, as the general subdivision or classification could be predicted by the underlying genetic architecture.
However, substantial challenges emerge when association studies shift focus from common to low-frequent and/or rare variants. Single SNP analyses will be statistically underpowered even in extremely large study populations and hence, large efforts are being put into the development of genetic burden tests were the combined and weighted effect of multiple risk and susceptibility variants in a single gene, a restricted genomic region or in genes involved a biological pathway can be analysed. Some of the developed methods even allow inclusion of interaction terms and this way G × E or gene–gene (G × G) interactions, or in theory even longer interaction chains, could therefore be incorporated into these collapsing methods making this an interesting avenue for future studies.
Even though progress and innovation is important, the bulge of work that has accumulated during the first four obesity GWAS waves cannot be dismissed. It is argued that the non-coding association signals are markers and not the actual causal variants, and this is a highly plausible explanation in the context of the current knowledge about the human genome; however, this is far from complete. Furthermore, the function of most of the human genes, as well their regulation, is unknown; therefore, important transcription factors, and hence also transcription factor binding sites could theoretically exist. Such undiscovered regulatory motifs and coding sequences for small regulatory molecules could justify the theory of the identified association signals being positioned in functional regions. A deeper understanding regarding the genomic location of the identified variants could be an important indicator of where to search for genomic variation in future GWAS and whole genome sequencing waves. One approach that has been used to narrow down the functional variant is resequencing of flanking regions; however, even this can be a daunting task as the distance between the association signal and a causative variant is unknown and in theory can be quite substantial.
Although deep imputation strategies and genetic burden test combining multiple common, low-frequent and rare variants identified through sequencing are realistic approaches in the near future, long-term strategies could include taking large parts of the human genomic sequence into consideration as a personal ‘barcode’. This could instead of focussing on single-nucleotide exchanges also include a more precise mapping of structural variation such as insertions/deletions or CNVs, as well as non-coding RNAs and CpG islands, which could bring the determination of epigenetic modifications much further compared with what is possible today.
Conclusively, the success in genetic epidemiology studies introduced by GWAS has started a scientific avalanche that hopefully will lead to the development of new statistical tools, more detailed genomic insight, deeper biological understanding of disease pathology and translation into clinical use. Eventually, these efforts may have great impact on the treatment strategies for common metabolic disorders like obesity. Moreover, they may at an early stage enable prediction of individuals at high risk of developing obesity making more effective prevention strategies feasible, which could be one of the turning points for the current metabolic health crisis.
About this article
Journal of Genetics (2014)