Differential coexpression analysis of obesity-associated networks in human subcutaneous adipose tissue

Article metrics

Abstract

Objective:

To use a unique obesity-discordant sib-pair study design to combine differential expression analysis, expression quantitative trait loci (eQTLs) mapping and a coexpression regulatory network approach in subcutaneous human adipose tissue to identify genes relevant to the obese state.

Study design:

Genome-wide transcript expression in subcutaneous human adipose tissue was measured using Affymetrix U133 Plus 2.0 microarrays (Affymetrix, Santa Clara, CA, USA), and genome-wide genotyping data was obtained using an Applied Biosystems (Applied Biosystems; Life Technologies, Carlsbad, CA, USA) SNPlex linkage panel.

Subjects:

A total of 154 Swedish families ascertained through an obese proband (body mass index (BMI) >30 kg m−2) with a discordant sibling (BMI>10 kg m−2 less than proband).

Results:

Approximately one-third of the transcripts were differentially expressed between lean and obese siblings. The cellular adhesion molecules (CAMs) KEGG grouping contained the largest number of differentially expressed genes under cis-acting genetic control. By using a novel approach to contrast CAMs coexpression networks between lean and obese siblings, a subset of differentially regulated genes was identified, with the previously GWAS obesity-associated neuronal growth regulator 1 (NEGR1) as a central hub. Independent analysis using mouse data demonstrated that this finding of NEGR1 is conserved across species.

Conclusion:

Our data suggest that in addition to its reported role in the brain, NEGR1 is also expressed in subcutaneous adipose tissue and acts as a central ‘hub’ in an obesity-related transcript network.

Introduction

Obesity, commonly defined as a body mass index (BMI) >30 kg m−2, has steadily risen in prevalence globally, a trend that could lead to over a billion people being obese by 2030.1 Obesity is already a major public health problem, resulting in increased morbidity and mortality2 and different hypotheses have been suggested to account for this.3 Genome-wide linkage analysis alone has identified many genomic regions linked to obesity, but replication has been problematic.4 More recently, common low-penetrant variants associated with obesity have been identified in genome-wide association studies (GWAS).5, 6, 7, 8 Additionally, rare copy number variants9 have also been implicated in the causality of obesity. All of these approaches rely on the correlation between genomic variation and either obesity status or an obesity-related quantitative phenotype, for example, BMI.

Gene expression levels reflect the combined effects of a wide range of genomic modifications including point mutations, structural variants and epigenetic changes. Abundance of any specific mRNA is, therefore, likely to more closely reflect the overall genomic effects than each type of variation separately. This is especially true for those changes having a direct effect on the transcription levels, although alterations in protein structure and function might also have a feedback effect on transcriptional activity.10 Environmental effects are also likely to be indirectly captured by transcript levels, as recently shown in leucocyte gene expression studies among three Moroccan sub-populations where at least 37% of the differentially expressed transcripts were not explainable by genetic and methylation differences.11 Therefore, the assessment of genome-wide gene expression provides a snapshot of underlying cellular processes and their environmental and genomic influences.

As the transcript levels are strongly modulated by polymorphisms in regulatory regions, they can be powerfully mapped by correlating gene expression with genetic data. The regions identified by such correlations, named expression quantitative trait loci (eQTLs), directly pinpoint the functional link between variants in the genome and their biological effect. For this reason, eQTL analysis has been suggested as a means to identify genetic variants involved in the susceptibility to complex diseases, and to fill the gap between disease associations identified by GWA and the mechanism by which they contribute to the disease.12, 13 The choice of tissue is central to a gene expression study, as the expression profile is context dependant and differs between tissues.14 In addition, within the same tissue, eQTLs can be specific to the cellular differentiation state.15 Subcutaneous adipose tissue (SAT) is the tissue of choice to investigate common human obesity because it displays obesity-related changes in gene expression,16 it has clear endocrine organ characteristics17 and samples can be obtained from large numbers of human subjects. Altered expression of a number of genes implicated with obesity and the metabolic syndrome has been reported in studies of SAT from obese subjects, including CD3618 and PFKFB3.19

Instead of analyzing each transcript independently from the others, novel approaches can exploit the interactions among transcripts to identify gene networks. They delineate the complex interrelationships occurring amongst gene transcription levels, which can be correlated with phenotypic and genomic data for the identification of relevant biological pathways.12 Measurement of gene expression in multiple tissues in mice has allowed the delineation of a gene network enriched for genes involved in the inflammatory response and macrophage activation that is highly correlated with obesity-related phenotypes.20 A similar overlapping network has been identified in human SAT.21

Our study takes advantage of the SibPair cohort, which consists of 154 families (n=732) identified by having an obese proband (BMI >30 kg m−2), with a BMI-discordant sibling (BMI difference of at least 10 kg m−2).22 SAT and blood samples were available from the siblings and peripheral blood from all subjects. These unique discordant families allowed a combined approach for the identification of genes and pathways involved in obesity. Using a relatively small sample, we have combined eQTL mapping, differential-expression analysis and a novel differential coexpression network approach in sib-pairs to identify biologically relevant transcriptional modules and their key regulators to provide insights into the pathogenesis of obesity.

Materials and methods

Participants and study design

The study cohort was 154 nuclear families (732 subjects) ascertained via an extremely BMI-discordant sib-pair (difference 10 kg m−2).22 Average family size was 4.75. SAT samples were available from the siblings and peripheral blood from all subjects. Median BMI (1st–3rd quartiles) was 27.2 (23.0–33.2), range 16.9–57.8. Median age (1st–3rd quartiles) was 45 years (36–63). Informed written consent was obtained from all participants. This study was approved by the ethics committee of Gothenburg University.

Nucleic acid isolation

Genomic DNA was isolated from whole blood using the QIAamp DNA Blood Maxi Kit (Qiagen, Hilden, Germany), according to the manufacturer's recommendations. SAT biopsies were immediately frozen in liquid nitrogen and RNA was extracted using the Qiagen RNeasy Lipid Tissue kit.

Linkage genotyping

The SNPlex System Linkage Mapping Set (http://www.appliedbiosystems.com) was used, comprising 3922 SNPs, of which 75% are in clusters, distributed across 95 probe pools. Allelic discrimination was performed using an Applied Biosystems 3730xl DNA Analyzer and GeneMapper3.7 software (Applied Biosystems, Carlsbad, CA, USA). Pedcheck23 was used to detect Mendelian inconsistency. Genetic markers giving rise to tight double recombinants were identified with MERLIN24 and treated as missing data.

Gene expression measurement

Gene expression was measured using the Affymetrix Human Genome U133 Plus 2.0 array. In brief, RNA was reverse transcribed into complimentary DNA (cDNA) and biotin-labelled cRNA was prepared by in vitro transcription (Enzo Diagnostics Inc., Farmingdale, NY, USA). After hybridization, the arrays were scanned using the Affymetrix GeneArray GCS3000 scanner and visualized using GeneChip Operating Software (Affymetrix). Gene expression levels were normalized using the robust multiarray average method.25

Real-time PCR gene expression analysis

Adipose tissue biopsies were obtained from subcutaneous fat depots of two French volunteers, as previously described.26 For each sample, 1 μg of total RNA was transcribed into cDNA using the cDNA Archive Kit (Applied Biosystems) or Random Primed First Strand Synthesis (Applied Biosystems). 4 μl of a 1/10th dilution of each resulting cDNA was used in a 20-μl reaction, including 10 μl of TaqMan gene expression mastermix (Applied Biosystems) and 1 μl of the appropriate assay (Applied Biosystems). Quantitative real-time PCR analyses were performed using ABI 7900 HT SDS2.3 software (Applied Biosystems) and each sample was run in triplicate. Neuronal growth regulator 1 (NEGR1) expression levels were obtained relative to three housekeeping genes (ACTB, TOP1 and POLR2A). The cDNA sample content was normalized by subtracting the number of copies of the mean of three housekeeping genes from the number of copies of the target gene (ΔCt=Ct of target gene−Ct of housekeeping genes). Expression was calculated using the formula 100 × 2−ΔCt.

Linkage analysis

After quality control, 149 families were considered suitable for analysis. We selected the subset of transcripts having a unique position or specificity >70% in the genome (n=27 904 transcripts) using SCAMPA (http://web.bioinformatics.ic.ac.uk/scampa). Linkage was evaluated using MERLIN-REGRESS.24 Although robust to misspecification, MERLIN-REGRESS requires the population trait's mean, variance and heritability. Population parameters were estimated using the variance component model implemented in SOLAR.27 As the variance components analysis requires a normal distribution for the trait, we applied a Box-Cox transformation to each transcript level.28 Gene expression values falling outside the mean±3 s.d.'s were excluded from the analysis. Age and sex were included as covariates in the SOLAR analyses.

To identify cis-eQTLs, a window of 2.5 cM left and right of each transcript position was used. Given this map resolution there are 1483 transcripts, which have no marker within 5 cM, therefore a subset of 26 421 transcripts was analyzed. All 27 904 transcripts were included in the trans-eQTL analysis. Linkage disequilibrium among the SNPs was modeled by specifying in MERLIN-REGRESS to treat as a ‘super-locus’ all SNPs, for which the observed pairwise r2>0.1.29 All P-values were calculated from LOD-scores, then corrected for multiple testing by the False Discovery Rate (FDR) procedure.30

Assessing the significance of trans-eQTLs

To determine the empirical significance of trans-eQTLs, the approach of Emilsson21 was used. Linkage analysis of the 27 904 transcripts was repeated using ten genome-wide data sets simulated by gene dropping under the null hypothesis of no-linkage. The top-hit trans-eQTL for every transcript was extracted from each of the ten genome-wide analyses, giving a distribution of 279 040 LOD-scores that was used to assess empirical P-values for the trans-eQTLs was observed in the original data.

For the detection of hotspots of trans-regulation, we are interested in the probability for different signals, each of them genome-wide significant, to randomly arise at the same location. Hidden underlying correlation structure between the inheritance by descent (IBD) at a genetic location and the transcription levels might influence the occurrence of false coincident linkages. The 5% LOD-score observed in the simulated data set was used as threshold for the genome-wide significance of each analyzed transcript in our data. The number of coincident linkages was then recorded at each marker location. Applying the same procedure to the simulated data set, we obtained the distribution of coincident linkages under the null hypothesis of no-linkage. We used this distribution to assess empirical P-values for the size of the observed coincident linkages. Finally, multiple test correction was assessed using the FDR procedure.30

Differential expression

Log-transformed expression levels for the whole set of 54 675 transcripts were corrected for age and sex, and 119 pairs of extreme sibs were selected. The Limma package was used to identify significant genes that were over- or underexpressed.31 Linear and robust regressions were performed separately, before applying the empirical Bayes shrinkage method, obtaining similar results. Paired design was taken into account and specified accordingly. Correction for multiple testing was performed using Storey's FDR procedure32 on the P-values of the shrunk test statistics.

Differential coexpression analysis

Diseases can often result from the dysregulation of a gene network.33 Differential coexpression analysis34, 35 might help in identifying those genes within the network that lead to the disruption of the regulatory mechanisms.

We propose a novel approach of testing the difference between gene networks in two groups. Firstly, we built obese and lean relevance networks, with correlation matrices calculated using Kendall's tau correlation36 in order to robustify the analysis. Then we contrasted the two networks calculating the differences between the transcript–transcript correlation matrices. Significant difference was evaluated using permutation tests37 with different resample schemes chosen according to the two sample's dependencies. Empirical P-values were computed as the proportion of the differences observed in the permuted data sets that were equal or greater than what was observed in the original data set. An FDR-thresholding procedure32 was applied to the empirical P-values to highlight the most significant differences.

Our approach, although similar in spirit to other methods that look at differences in coexpression networks between different conditions/or case –control groups (for a review see ref. 38), is new in many respects. Firstly, through a model-free permutation test, we test directly whether the observed correlations differences are significant so we are not considering differences in the graph's topology.39 Secondly, by simply changing the sampling scheme for the permutation test, we can accommodate different levels of dependence between the groups. Thirdly, we do not consider just strong (positive or negative) correlations or strong differences using ad hoc thresholding.40 Selection of what is relevant is obtained by applying the FDR procedure. Finally, the network module is defined as the connected component after FDR calculation, avoiding the ad hoc metric distances required in cluster algorithms.40, 41

Identification of obesity-related biological pathways

At 10% FDR level, we selected those differentially expressed transcripts for which cis-eQTLs were also identified. Enrichment of KEGG pathways was assessed with DAVID. Using all differentially expressed transcripts belonging to identified KEGG pathways and the same sub-sample selected by the Limma analysis, we applied the differential coexpression analysis approach at 10% FDR level. To take into account the paired design, we randomly relabeled the data within each pair in each permuted data set.

We tested whether the number of connections observed for the analyzed genes was larger than that expected, under the null hypothesis of these genes being randomly connected.42 We also contrasted 1000 relevance networks between obese and lean subjects generated using randomly selected transcripts. The maximum number of connections was recorded for each simulation to evaluate the empirical significance for the most connected genes in the original data set.

Validation of the differential coexpressed network in mouse

To validate the differential coexpression network identified in human SAT, we used adipose tissue gene expression data that were available from a mouse F2 intercross, although this was from white adipose tissue rather than pure SAT.43 The first and third quartiles of mouse weight were used to select the most obese and most lean mice (n=144). Orthologous genes were identified using Ensembl Biomart (build 37).44 For comparison, the differential coexpression analysis in humans was re-evaluated using the subset of genes also present in the mouse data set. To assess the empirical significance of the difference observed between relevance networks, we applied the differential coexpression analysis approach at 10% FDR level. Assuming the independence of the two samples, in each permuted data set the pooled sample was randomly split preserving the original sample size of the two groups.

Statistical assessment was carried out to determine whether any gene showed a number of connections in both the human and mouse differential coexpression networks higher than expected, under the assumption of independence. Assuming that the number of connections in each network follows a Poisson distribution, we simulated 1 000 000 times a sample of n paired observations from two independent Poisson, with n equal to the number of genes used to build the two networks. In each simulation we calculated the proportion of connections for the same gene in both networks and we recorded the highest joint proportion, which under the null hypothesis, corresponds to the product of the two marginal distributions. Finally, the empirical distribution of the highest joint proportion was used to evaluate the empirical P-value for each pair of significant genes identified in both the human and mouse difference relevance networks.

Correlation of NEGR1 gene expression in human SAT and hypothalamus

In order to investigate the possibility of correlation between expression of NEGR1 in adipose tissue and in the hypothalamus, a publicly available data set was used (NCBI GEO accession number GSE3526).45 This study analyzed gene expression in different normal tissues from ten healthy donors using the Affymetrix Human Genome U133 Plus 2.0 Array. Genome-wide expression levels in hypothalamus were available for eight subjects. For three subjects, expression levels were also available for adipose tissue. We assessed NEGR1 correlation in expression levels between the two tissues, using the genome-wide data to generate a null distribution of no association. An empirical P-value was derived using one million permutations.

Results

Differentially expressed transcripts

We determined which transcripts were differentially expressed between obese and lean subjects. The results are reported in Table 1. Obesity showed a global effect on genome-wide gene expression. A majority (55%) of the differentially expressed transcripts were upregulated in lean subjects. DAVID/KEGG analysis of the differentially expressed transcripts did not identify significant enrichment for any obvious obesity-related pathway.

Table 1 Numbers (and percentage) of differentially expressed transcripts between lean and obese subjects identified using the Limma package at different FDR levels, using the linear regression option. Number (and percentage) of upregulated transcripts in obese subjects is also provided

Detection of cis-eQTLs

Given the inter-SNP map distances, we defined a cis-eQTL signal for each transcript as the maximum LOD-score obtained within 2.5 cM 5′ or 3′ of each transcript position in the genome. There were 26 421 transcripts with a SNP marker within 5 cM. Median (1st–3rd quartile) heritability was 0.19 (0.05–0.34). The maximum LOD-score was detected at a median (1st–3rd quartile) distance from the center of the transcript of 1.5 cM (0.8–2.1 cM). We identified 1063 (4%) eQTLs at 10% FDR level. The twenty cis-eQTLs with the highest LOD-scores are shown in Table 2. As expected, cis-eQTLs were detected for those expression traits that had a heritability score of zero or close to zero, but traits with higher heritability also had higher LOD-scores.

Table 2 Top twenty human cis-eQTLs across the whole genome in descending order of LOD-score

Detection of trans-eQTLs

For each transcript, we recorded the maximum peak LOD-score located on a chromosome different to the chromosome where the transcript was located. Using simulations (see Materials and methods) with a 10% FDR, we identified 50 significant trans-eQTLs distributed across 12 chromosomes (see Table 3). Although most trans-eQTLs were not significant after multiple testing correction, we noted that trans-linkage signals for many transcripts were concentrated in the 1p13.3-q23.3 region. The empirical probability of observing coincident linkage was tested by simulating under the null hypothesis of no-linkage. In the simulations, when a false positive was detected for a transcript, a number of correlated transcripts also showed a linkage peak in the same region, as expected. Using the empirical probability of coincident linkages through the genome, we determined a significant clustering of 374 transcripts in the 1p13.3-q23.3 region at 10% FDR.

Table 3 List of human trans-eQTLs across the whole genome in chromosomal position order and descending order of LOD-score, where trans-eQTLs have the same position

Biological pathways involved in obesity

Given the set of 1063 cis-eQTLs, pathway enrichment analysis using DAVID46, 47 identified the KEGG insulin signaling pathway as the most significantly enriched (P=1.6 × 10−2). The proportion of differentially expressed genes in this pathway did not differ from that observed in the whole data set. No significant enrichment was observed for the small number of trans-regulated genes identified in this study. For the hotspot of trans-regulators on chromosome 1p13.3-q23.3, the proportion of obesity-related transcripts was again not different from what would be expected at random. Significant enrichment was observed for genes in the apoptosis pathway (P=5.5x10−3) but no obvious obesity candidates were present in this very large region.

To identify obesity-related networks that include transcripts under genetic control, we focused on 425 transcripts that for an FDR of 10% were both differentially expressed between lean and obese subjects and under cis-acting regulation. Using an EASE48 score threshold of 0.1 in DAVID to rank categories of genes, only the cell adhesion molecules (CAMs) KEGG functional grouping was highlighted, which in our data set contains 160 transcripts (corresponding to 76 genes), eight of them (corresponding to seven genes) under cis-regulation. Relevance gene networks were constructed separately in obese and lean subjects using these 160 transcripts and the empirical significance of the observed differences in coexpression among pairs of transcripts in the two networks evaluated by permutations (see Materials and methods).

The lean and obese relevance networks and their contrast are shown in Figure 1. Table 4 lists the CAMs genes and their number of connections in the contrasted network, that is, the number of significantly different correlations (FDR <10%) of each gene with the remaining genes between the two groups. The NEGR1 gene was the most connected gene with nine edges, while four significant connections were observed for HLA-DQB2, three for ALCAM, and two for HLA-DQA2, ITGAM and CD86. We tested whether the number of connections observed for these genes was larger than that expected by chance. Using as the null distribution a Poisson random variable with mean equal to the average connectivity in the network, the nine connections observed for NEGR1 were considered as a rare event (P=4.4 × 10−10). Having four, three and two connections in this data set correspond to P-values of 2.7 × 10−4, 0.002 and 0.02, respectively.

Figure 1
figure1

Differential coexpression analysis of CAM gene expression in human SAT. Differentially coexpressed network of the CAMs functional grouping resulting from the contrast between obese (a) and lean (b) networks at FDR 10%. Red and blue edges represent negative and positive correlations, respectively. For simplicity, gene names are only shown for the external nodes in (a) and (b).

Table 4 Genes showing significantly different coexpression between lean and obese human SAT CAMs networks at FDR 10% in descending order of their number of connections

We also evaluated the empirical significance of the connectivity observed for these genes by contrasting relevance networks (between obese and lean subjects) randomly generated by using the same number of transcripts, and recording the gene with highest connectivity in each simulated data set. Out of 1000 replicates, sporadic differences were observed between the obese and lean correlation matrices, as expected, but none of them showed a similar number of differences with respect to the original data set. In no cases did a sample size of 160 transcripts contain a gene with nine edges. Marginal significance was observed for HLA-DQB2 (P=0.028).

Validation of the CAMs network in mouse

From the whole set of 76 genes belonging to the human CAMs pathway, 57 orthologous genes were present in a mouse data set,43 corresponding to 66 mouse transcripts. To assess the importance of the NEGR1 gene in both humans and mice, we first restricted the set of CAMs genes in the human data to those which were also present in the mouse, resulting in a set of 115 human transcripts. Contrast of the coexpression networks was carried out in human and mouse, and significant results filtered using a 10% FDR level. Table 5 shows the list of significantly connected genes from the mouse analysis, highlighting that NEGR1 is highly connected in the contrasted mouse network as well. The mouse differential relevance network contained an overall larger number of connections, probably reflecting higher intra-group homogeneity and reduced environmental noise in this data set. We ordered each gene with respect to the observed joint connectivity in both networks. The empirical significance of its rank was assessed through simulations under the null hypothesis of networks’ independence (see Materials and methods). Only NEGR1 showed a significant departure from this assumption (P=2.1 × 10−5), indicating that this gene is integral to both the human and mouse networks.

Table 5 Genes showing significantly different coexpression by contrasting the mouse SAT CAMs networks between mice in the first and third quartile of the weight distribution at 10% FDR level in descending order of their number of connections

Expression of the NEGR1 transcript in SAT

The NEGR1 gene, central to the contrasted coexpression network, is expressed at high levels in brain7. Using quantitative real-time PCR, we demonstrated that NEGR1 is also expressed in SAT (as well as in the heart and skeletal muscle) using a commercially available tissue panel and two independent unrelated human SAT samples (Figure 2).

Figure 2
figure2

NEGR1 expression levels in human tissues. Relative expression of the NEGR1 transcript in human SAT compared with expression in other human tissues from a commercially available multiple tissue panel.

Correlation of NEGR1 expression in human SAT and hypothalamus

NEGR1 expression levels were significantly correlated between adipose and hypothalamus tissues (r=0.99; P-value=0.020). This places NEGR1 in the top 3% of the most correlated transcripts genome-wide. We also assessed the empirical significance of our finding using genome-wide expression data in the two tissues to generate a null distribution of no association (empirical P-value=0.022).

Discussion

An important goal of systems biology is to identify biological pathways and genetic networks underlying complex human diseases. We studied genome-wide gene expression in SAT and its genetically determined variation in families ascertained through sib-pairs discordant for obesity. The expression of about 30% of all genes was significantly altered in the obese state, confirming a broad effect of obesity on SAT gene expression.21

Linkage analysis identified a large number of significant eQTL, most of them localized in cis, and a lesser number of trans-acting signals perhaps due to reduced power of detection. Gene Ontology and pathway analyses of the cis-regulated genes demonstrated that they were enriched for genes involved in the insulin signaling pathway. The identification of genetic regulation of the insulin pathway is intriguing, as it may indicate a role for SAT in glucose homoeostasis and identify its contribution to the development of polygenic type 2 diabetes.49 Clear identification of genetic regulation of this pathway in SAT suggests that exploration of the regulated genes may give valuable insights into the fact that only a minority of obese subjects develop T2D, and those that do, typically have insulin resistance, metabolic syndrome and insulin secretion defects.50 No significant biological clustering was observed for the small number of trans-regulated genes identified in this study. A group of 374 transcripts suggested the presence of a significant hotspot for trans-regulation on chromosome 1p13.3-q23.3, and it may be of note that this overlaps the well-replicated T2D linkage locus of 1q21-q25.51 Significant enrichment was observed for genes in the apoptosis pathway but no obvious candidates could be identified in this very large region.

Although differential expression analysis can identify those genes and pathways with a causal or reactive role in obesity, genetic analysis can highlight which of them are under genetic control and therefore likely to be ‘functionally’ transcribed in SAT cells.52 Therefore, whereas differential expression may be a result of the ‘obesogenic environment’, those biological pathways enriched for differentially expressed and genetically controlled genes are more likely to have a causative role in the development of obesity. The subset of cis-regulated transcripts, which were also differentially expressed between lean and obese subjects suggested a possible role for genes belonging to the CAMs functional grouping. Contrasting CAMs coexpression networks between lean and obese subjects identified a subset of genes whose pattern of coexpression was significantly associated with the obese state. We found NEGR1 as the central highly connected gene of this subset and replicated this observation using a mouse expression data set, thus validating its central role for this pathway. In the context of disease, the topology of gene networks is often determined by key genes showing a high degree of connectivity. Indeed, highly connected genes are likely to encode essential genes,53 which are often evolutionarily conserved.54 Genes showing an intermediate number of connections have been shown to be more likely to harbor inherited mutations for common diseases.55 Whereas NEGR1 was the most connected gene in the contrasted network, it showed intermediate connectivity within each group-specific coexpression network, thus supporting its possible role as a disease gene. The GiANT consortium meta-analysis of obesity GWA studies reported that genetic markers near the NEGR1 gene are associated with obesity.7

The NEGR1 protein is a member of the immunoglobulin superfamily, is highly expressed in the hypothalamus56 where it appears to modulate synapse number in neurons57 and this makes it a good functional candidate for obesity,58 especially when considering obesity as a disorder having a neurobehavioral origin.59 Our findings demonstrate that NEGR1 is expressed in human SAT, where it appears to be central to the network of the most differentially expressed set of functionally related genes between lean and obese subjects. Using publicly available data,60 we observed high correlation in the expression levels of NEGR1 between human subcutaneous adipose and hypothalamus tissues. These results suggest a similar expression pattern for NEGR1 across tissues. Thus, transcriptional regulation of NEGR1 might not be restricted to neural development and might involve additional mechanisms shared by other tissues.

In addition to NEGR1, other genes in the CAMs network have been previously shown to be overexpressed in SAT. In a study of BMI-discordant identical twins,61 the upregulation of inflammatory and cytoskeleton pathways and downregulation of energy metabolism and cell differentiation pathways were clearly demonstrated. Specifically, an overexpression of major histocompatibility complex class II transcripts in obese subjects was reported and these are present in our CAMs relevance network. This further supports the utility of our approach, and suggests that other genes within the identified obesity subset of CAMs genes might be good candidates for further investigation.

In summary, we have identified a subset of genes that are both differentially expressed between lean and obese subjects and are under cis-regulation, and so are very good candidates to investigate further for the presence of gene variants regulating their expression and thus contributing to obesity. We have applied a novel differential coexpression analysis strategy to identify NEGR1 as a gene central to the CAMs network in the obese state, and confirmed this finding in a different species.

Accession codes

Accessions

Gene Expression Omnibus

References

  1. 1

    Kelly T, Yang W, Chen CS, Reynolds K, He J . Global burden of obesity in 2005 and projections to 2030. Int J Obes (Lond) 2008; 32: 1431–1437.

  2. 2

    Haslam DW, James WP . Obesity. Lancet 2005; 366: 1197–1209.

  3. 3

    Walley AJ, Asher JE, Froguel P . The genetic contribution to non-syndromic human obesity. Nat Rev Genet 2009; 10: 431–442.

  4. 4

    Saunders CL, Chiodini BD, Sham P, Lewis CM, Abkevich V, Adeyemo AA et al. Meta-analysis of genome-wide linkage studies in BMI and obesity. Obesity (Silver Spring) 2007; 15: 2263–2275.

  5. 5

    Hinney A, Nguyen TT, Scherag A, Friedel S, Bronner G, Muller TD et al. Genome wide association (GWA) study for early onset extreme obesity supports the role of fat mass and obesity associated gene (FTO) variants. PLoS One 2007; 2: e1361.

  6. 6

    Thorleifsson G, Walters GB, Gudbjartsson DF, Steinthorsdottir V, Sulem P, Helgadottir A et al. Genome-wide association yields new sequence variants at seven loci that associate with measures of obesity. Nat Genet 2009; 41: 18–24.

  7. 7

    Willer CJ, Speliotes EK, Loos RJ, Li S, Lindgren CM, Heid IM et al. Six new loci associated with body mass index highlight a neuronal influence on body weight regulation. Nat Genet 2009; 41: 25–34.

  8. 8

    Meyre D, Delplanque J, Chevre JC, Lecoeur C, Lobbens S, Gallina S et al. Genome-wide association study for early-onset and morbid adult obesity identifies three new risk loci in European populations. Nat Genet 2009; 41: 157–159.

  9. 9

    Walters RG, Jacquemont S, Valsesia A, de Smith AJ, Martinet D, Andersson J et al. A new highly penetrant form of obesity due to deletions on chromosome 16p11.2. Nature 2010; 463: 671–675.

  10. 10

    Schadt EE, Monks SA, Drake TA, Lusis AJ, Che N, Colinayo V et al. Genetics of gene expression surveyed in maize, mouse and man. Nature 2003; 422: 297–302.

  11. 11

    Idaghdour Y, Storey JD, Jadallah SJ, Gibson G . A genome-wide gene expression signature of environmental geography in leukocytes of Moroccan Amazighs. PLoS Genet 2008; 4: e1000052.

  12. 12

    Schadt EE . Molecular networks as sensors and drivers of common human diseases. Nature 2009; 461: 218–223.

  13. 13

    Cookson W, Liang L, Abecasis G, Moffatt M, Lathrop M . Mapping complex disease traits with global gene expression. Nat Rev Genet 2009; 10: 184–194.

  14. 14

    Petretto E, Mangion J, Dickens NJ, Cook SA, Kumaran MK, Lu H et al. Heritability and tissue specificity of expression quantitative trait loci. PLoS Genet 2006; 2: e172.

  15. 15

    Gerrits A, Li Y, Tesson BM, Bystrykh LV, Weersing E, Ausema A et al. Expression quantitative trait loci are highly sensitive to cellular differentiation state. PLoS Genet 2009; 5: e1000692.

  16. 16

    Wellen KE, Hotamisligil GS . Obesity-induced inflammatory changes in adipose tissue. J Clin Invest 2003; 112: 1785–1788.

  17. 17

    Vazquez-Vela ME, Torres N, Tovar AR . White adipose tissue as endocrine organ and its role in obesity. Arch Med Res 2008; 39: 715–728.

  18. 18

    van Beek EA, Bakker AH, Kruyt PM, Hofker MH, Saris WH, Keijer J . Intra- and inter individual variation in gene expression in human adipose tissue. Pflugers Arch 2007; 453: 851–861.

  19. 19

    Jiao H, Kaaman M, Dungner E, Kere J, Arner P, Dahlman I . Association analysis of positional obesity candidate genes based on integrated data from transcriptomics and linkage analysis. Int J Obes (Lond) 2008; 32: 816–825.

  20. 20

    Chen Y, Zhu J, Lum PY, Yang X, Pinto S, MacNeil DJ et al. Variations in DNA elucidate molecular networks that cause disease. Nature 2008; 452: 429–435.

  21. 21

    Emilsson V, Thorleifsson G, Zhang B, Leonardson AS, Zink F, Zhu J et al. Genetics of gene expression and its effect on disease. Nature 2008; 452: 423–428.

  22. 22

    Carlsson LM, Jacobson P, Walley A, Froguel P, Sjostrom L, Svensson PA et al. ALK7 expression is specific for adipose tissue, reduced in obesity and correlates to factors implicated in metabolic disease. Biochem Biophys Res Commun 2009; 382: 309–314.

  23. 23

    O’Connell JR, Weeks DE . PedCheck: a program for identification of genotype incompatibilities in linkage analysis. Am J Hum Genet 1998; 63: 259–266.

  24. 24

    Abecasis GR, Cherny SS, Cookson WO, Cardon LR . Merlin—rapid analysis of dense genetic maps using sparse gene flow trees. Nat Genet 2002; 30: 97–101.

  25. 25

    Irizarry RA, Hobbs B, Collin F, Beazer-Barclay YD, Antonellis KJ, Scherf U et al. Exploration, normalization, and summaries of high density oligonucleotide array probe level data. Biostatistics 2003; 4: 249–264.

  26. 26

    Poulain-Godefroy O, Lecoeur C, Pattou F, Fruhbeck G, Froguel P . Inflammation is associated with a decrease of lipogenic factors in omental fat in women. Am J Physiol Regul Integr Comp Physiol 2008; 295: R1–R7.

  27. 27

    Almasy L, Blangero J . Multipoint quantitative-trait linkage analysis in general pedigrees. Am J Hum Genet 1998; 62: 1198–1211.

  28. 28

    Box GEP, Cox DR . An analysis of transformations. J R Stat Soc B 1964; 26: 211–252.

  29. 29

    Abecasis GR, Wigginton JE . Handling marker-marker linkage disequilibrium: pedigree analysis with clustered markers. Am J Hum Genet 2005; 77: 754–767.

  30. 30

    Benjamini Y, Hochberg Y . Controlling the false discovery rate: a practical and powerful approach to multiple testing. J R Stat Soc B 1995; 57: 289–300.

  31. 31

    Smyth GK . Linear models and empirical bayes methods for assessing differential expression in microarray experiments. Stat Appl Genet Mol Biol 2004; 3: Article3.

  32. 32

    Storey JD, Tibshirani R . Statistical significance for genomewide studies. Proc Natl Acad Sci USA 2003; 100: 9440–9445.

  33. 33

    Kleinjan DA, van Heyningen V . Long-range control of gene expression: emerging mechanisms and disruption in disease. Am J Hum Genet 2005; 76: 8–32.

  34. 34

    Li KC . Genome-wide coexpression dynamics: theory and application. Proc Natl Acad Sci USA 2002; 99: 16875–16880.

  35. 35

    Choi JK, Yu U, Yoo OJ, Kim S . Differential coexpression analysis using microarray data and its application to human cancer. Bioinformatics 2005; 21: 4348–4355.

  36. 36

    Zhu D, Hero AO, Cheng H, Khanna R, Swaroop A . Network constrained clustering for gene microarray data. Bioinformatics 2005; 21: 4014–4020.

  37. 37

    Pesarin F . Multivariate Permutation Tests: With Applications in Biostatistics. Wiley: Chichester, England, 2001.

  38. 38

    Fang G, Kuang R, Pandey G, Steinbach M, Myers CL, Kumar V . Subspace differential coexpression analysis: problem definition and a general approach. Pac Symp Biocomput 2010; 145–156.

  39. 39

    Fuller TF, Ghazalpour A, Aten JE, Drake TA, Lusis AJ, Horvath S . Weighted gene coexpression network analysis strategies applied to mouse weight. Mamm Genome 2007; 18: 463–472.

  40. 40

    Xu M, Kao MC, Nunez-Iglesias J, Nevins JR, West M, Zhou XJ . An integrative approach to characterize disease-specific pathways and their coordination: a case study in cancer. BMC Genomics 2008; 9 (Suppl 1): S12.

  41. 41

    Oldham MC, Horvath S, Geschwind DH . Conservation and evolution of gene coexpression networks in human and chimpanzee brains. Proc Natl Acad Sci USA 2006; 103: 17973–17978.

  42. 42

    Barabasi AL, Oltvai ZN . Network biology: understanding the cell's functional organization. Nat Rev Genet 2004; 5: 101–113.

  43. 43

    Wang S, Yehya N, Schadt EE, Wang H, Drake TA, Lusis AJ . Genetic and genomic analysis of a fat mass trait with complex inheritance reveals marked sex specificity. PLoS Genet 2006; 2: e15.

  44. 44

    Kasprzyk A, Keefe D, Smedley D, London D, Spooner W, Melsopp C et al. EnsMart: a generic system for fast and flexible access to biological data. Genome Res 2004; 14: 160–169.

  45. 45

    Roth RB, Hevezi P, Lee J, Willhite D, Lechner SM, Foster AC et al. Gene expression analyses reveal molecular relationships among 20 regions of the human CNS. Neurogenetics 2006; 7: 67–80.

  46. 46

    Dennis Jr G, Sherman BT, Hosack DA, Yang J, Gao W, Lane HC et al. DAVID: database for annotation, visualization, and integrated discovery. Genome Biol 2003; 4: P3.

  47. 47

    Huang DW, Sherman BT, Lempicki RA . Systematic and integrative analysis of large gene lists using DAVID bioinformatics resources. Nat Protoc 2009; 4: 44–57.

  48. 48

    Hosack DA, Dennis Jr G, Sherman BT, Lane HC, Lempicki RA . Identifying biological themes within lists of genes with EASE. Genome Biol 2003; 4: R70.

  49. 49

    Frojdo S, Vidal H, Pirola L . Alterations of insulin signaling in type 2 diabetes: a review of the current evidence from humans. Biochim Biophys Acta 2009; 1792: 83–92.

  50. 50

    Iozzo P . Viewpoints on the way to the consensus session: where does insulin resistance start? the adipose tissue. Diabetes Care 2009; 32 (Suppl 2): S168–S173.

  51. 51

    Prokopenko I, Zeggini E, Hanson RL, Mitchell BD, Rayner NW, Akan P et al. Linkage disequilibrium mapping of the replicated type 2 diabetes linkage signal on chromosome 1q. Diabetes 2009; 58: 1704–1709.

  52. 52

    Dimas AS, Deutsch S, Stranger BE, Montgomery SB, Borel C, Attar-Cohen H et al. Common regulatory variation impacts gene expression in a cell type-dependent manner. Science 2009; 325: 1246–1250.

  53. 53

    Goh KI, Cusick ME, Valle D, Childs B, Vidal M, Barabasi AL . The human disease network. Proc Natl Acad Sci USA 2007; 104: 8685–8690.

  54. 54

    Bergmann S, Ihmels J, Barkai N . Similarities and differences in genome-wide expression data of six organisms. PLoS Biol 2004; 2: E9.

  55. 55

    Feldman I, Rzhetsky A, Vitkup D . Network properties of genes harboring inherited disease mutations. Proc Natl Acad Sci USA 2008; 105: 4323–4328.

  56. 56

    Miyata S, Funatsu N, Matsunaga W, Kiyohara T, Sokawa Y, Maekawa S . Expression of the IgLON cell adhesion molecules Kilon and OBCAM in hypothalamic magnocellular neurons. J Comp Neurol 2000; 424: 74–85.

  57. 57

    Hashimoto T, Yamada M, Maekawa S, Nakashima T, Miyata S . IgLON cell adhesion molecule Kilon is a crucial modulator for synapse number in hippocampal neurons. Brain Res 2008; 1224: 1–11.

  58. 58

    Bauer F, Elbers CC, Adan RA, Loos RJ, Onland-Moret NC, Grobbee DE et al. Obesity genes identified in genome-wide association studies are associated with adiposity measures and potentially with nutrient-specific food preference. Am J Clin Nutr 2009; 90: 951–959.

  59. 59

    O’Rahilly S, Farooqi IS . Human obesity: a heritable neurobehavioral disorder that is highly sensitive to environmental conditions. Diabetes 2008; 57: 2905–2910.

  60. 60

    Yang X, Deignan JL, Qi H, Zhu J, Qian S, Zhong J et al. Validation of candidate causal genes for obesity that affect shared metabolic pathways and networks. Nat Genet 2009; 41: 415–423.

  61. 61

    Pietilainen KH, Naukkarinen J, Rissanen A, Saharinen J, Ellonen P, Keranen H et al. Global transcript profiles of fat in monozygotic twins discordant for BMI: pathways behind acquired obesity. PLoS Med 2008; 5: e51.

Download references

Acknowledgements

We wish to acknowledge the participation of all the families and clinical staff involved in the SOS SibPair study. We thank Professor Eric Schadt for advice and the provision of the mouse data set and the staff of the Imperial College High-Performance Computing Service for their advice and support. This study was funded by Grant no. 079534/z/06/z from the Wellcome Trust, the Swedish Research Council (K2010-55X-11285-13), the Swedish foundation for Strategic Research to Sahlgrenska Center for Cardiovascular and Metabolic Research, the Swedish Diabetes foundation and the Swedish federal government under the LUA/ALF agreement. Sylvia Richardson acknowledges support from the MRC Grant G0600609.

Author information

Correspondence to P Froguel.

Ethics declarations

Competing interests

The authors declare no conflict of interest.

Rights and permissions

Reprints and Permissions

About this article

Keywords

  • gene expression
  • network
  • eQTL
  • sibpair
  • linkage
  • adipose tissue

Further reading