Main

Variability in the human genome amounts to over 5 million polymorphisms, but only a fraction of them has biological and clinical significance.13 Documentation of functional relevance may lead to better insights about various biological pathways and complex disease outcomes. Moreover, epidemiological investigations are further strengthened if gene variants with population-level phenotype associations are also shown to have functional relevance.3,4

There are many methods for assessing and establishing the functional relevance of genetic variants.3 For a few metabolic candidate genes, assays have long been available to measure enzymatic activity, but this applies to only a minority of genes in the current discovery-oriented era. Gene variants that entirely abrogate protein expression or function are also a very small minority. However, for most genes and variants, it is a challenge to evaluate their impact on gene transcription, let alone protein levels and activity. A large component of the genetic variability that impacts on phenotypes and complex diseases may reflect regulatory variation in the human genome.5 Such variation seems to be very extensive across different nonhuman genomes69 and the same may apply to humans.8,10

However, making sense of the epidemiological and clinical meaning of this variation remains a major challenge. A large number of in vitro, ex vivo, and in vivo functional assays are available. While newer technologies still emerge,10 luciferase reporter systems have been the most popular method for establishing in vitro the functional significance of polymorphisms to date,3,11,12 especially for variants in regulatory regions (e.g., promoter or enhancer regions). Luciferase reporter systems use constructs that contain segments of a genetic region of interest along with the luciferase gene. These constructs are transfected in cell lines. Experiments can be performed with segments containing different genetic variants. The transcriptional efficiency of the different variants is then measured through luciferase activity. Important questions may be posed. Do these in vitro functional effects correspond to the presence or not of postulated gene-disease associations? Do stronger functional effects of different gene variants correspond also to stronger gene-disease associations? Are the in vitro results consistent across different cell lines and experimental protocols?

Here we evaluated a systematic sample of studies where investigators had reported a probed epidemiologic association of a common disease phenotype and concurrently examined the differential effects of this polymorphism in transfected cell lines with luciferase gene reporter systems. We estimated the empirical concordance between epidemiological and functional biological data, and aimed to obtain insight on how the conduct, reporting and interpretation of studies addressing both functional and epidemiological data could be improved.

MATERIALS AND METHODS

Identification of eligible studies

Eligible studies for this analysis were retrieved from PubMed using the combination of “luciferase” and “polymorphism”. We screened the retrieved articles as of June 2005 for studies that presented epidemiological data from cases and controls with and without a disease or with and without a disease outcome; and data on the activity of the same genetic variant of interest based on a luciferase assay. We excluded studies where only epidemiological data or only functional data were presented in the article. We excluded articles with non-original data, non-English language articles, and articles where very rare variants were described occurring in <1% of the control population. To maximize consistency, for epidemiological data we focused on case-control studies of unrelated subjects (including studies with other design, e.g., cohorts, where case and control status could be inferred) and excluded the sparse available data on family-based designs.

In order to further achieve standardization of the data to be analyzed, we created two datasets of eligible studies. The first dataset focused on bi-allelic polymorphisms and on dichotomous outcomes (including continuous traits, if categorized upfront into two groups by the authors). This dataset included also data on haplotypes of several polymorphisms, whenever there was complete linkage disequilibrium and thus only two haplotypes were available. In this first dataset it would be possible to estimate consistently an odds ratio for the gene-disease association and a luciferase activity ratio for the comparison of the two alleles and perform a quantitative comparison of these data.

The second dataset included all studies where luciferase experiments had been performed with haplotypes of two or more different gene variants (not in perfect linkage disequilibrium). This second dataset allowed to extent a qualitative comparison of the epidemiological and functional inferences with haplotypes, since haplotype analyses have recently become the standard in population genetics.13,14

Of the 342 electronically retrieved items, 201 were excluded upon reading the title and abstract, as it was clear that they did not have original data with both luciferase experiments and epidemiological associations in human populations. Of the remaining 141 articles, 80 were excluded as they did not fulfill eligibility criteria, 5 could not be retrieved in full text for further scrutiny, and 56 were eligible for the analysis (36 in the first dataset of bi-allelic markers, 23 in the second dataset of haplotypes [3 articles were common in both datasets]).

Analyzed data

From each eligible article, we extracted data on the authors, year of publication, and the genetic variant(s) or haplotypes of interest where both epidemiologic and functional data were available.

For the bi-allelic marker dataset, we also recorded the 2-by-2 table for cases and controls at the allele level for each eligible genetic variant and outcome of interest and odds ratios were estimated for each 2-by-2 table. When different case-control samples were available from populations of similar ethnic descent, data were merged to obtain a single 2-by-2 table, while data from populations of different ethnic descent or significantly different allele frequencies in their control groups were combined by the Mantel-Haenszel method15 (a Mantel-Haenszel synthesis was performed also in one study that addressed two types of cancer with separate case and control samples). The odds ratios for the epidemiological association were expressed consistently to show the association of the disease/outcome with the minor allele. We recorded whether this odds ratio was formally statistically significant (P < 0.05) or not and whether the original authors had claimed a significant epidemiological association based on any allele- or genotype-based contrast in the entire population or subgroups thereof. Discrepancies in the level of statistical significance were noted along with their reasons.

For the haplotypes dataset, we recorded whether an analysis had been performed considering all haplotypes with frequency of at least 1% in the study population and if so whether there were formally statistically significant differences. We also noted whether the original authors had claimed a significant epidemiological association based on any allele-, genotype- or haplotype-based contrast in the entire population or subgroups thereof.

We also recorded for each probed association, the data on luciferase experiments. For the bi-allelic maker dataset, we recorded the ratio of luciferase activity with the minor versus major allele construct under baseline conditions as well as whether the difference between the two alleles was formally statistically significant (P < 0.05). When more than one cell type was used, data were recorded separately for each cell type. When data were also provided with various co-stimulation conditions or changed plasmid constructs, these were also recorded separately for each experimental condition. For the haplotype dataset, we similarly recorded the haplotypes, cell lines, and experimental conditions assessed and whether the functional differences were formally statistically significant or not when all tested haplotypes were considered. When P-values were not given for an analysis involving all tested haplotypes, we performed an analysis of variance using the presented mean values and standard deviations.

Assessment of continuous traits is far less common than assessment of binary phenotypes. Nevertheless, for all eligible studies, we also examined whether any additional continuous phenotypes had been evaluated representing the disease under study and whether inferences were similar to those obtained using the binary disease outcomes.

Finally, we recorded information on whether any additional in vitro assays had been used to establish functional differences between gene variants or haplotypes, and if so, what the results had been. The sparse in vivo and ex vivo data were also recorded.

All data were extracted independently by two investigators and discrepancies were resolved with discussion. Consensus was reached on all items.

Analyses

In the bi-allelic marker dataset, we examined whether there is correlation between epidemiological odds ratios and luciferase activity ratios. Data were analyzed either using the minor allele's data as the nominator for both odds ratios and luciferase ratios; or coining both odds ratios and luciferase ratios to be ≥1, a probably biased analysis that forces the biological signal to square with the direction of the epidemiological signal. Analyses were performed using nonparametric Spearman's correlation coefficients (secondary analyses used the parametric Pearson correlation coefficient with both metrics log-transformed).

We also examined whether the absolute values of luciferase activity ratios can tell whether the respective probed epidemiological association would be statistically significant or not; and whether the absolute values of luciferase ratios can tell whether they are also statistically significant or not. We estimated the luciferase activity ratio that would yield a minimum of 90% sensitivity and calculated the respective specificity. All luciferase ratios were coined as ≥1 for these analyses. Analyses were based on receiver operating characteristics curves that plot the sensitivity against the specificity for various cut-offs of absolute luciferase activity ratios. Areas under the ROC curves were estimated. An area of 0.5 shows total lack of concordance (no diagnostic ability) and an area of 1.0 shows perfect concordance (perfect diagnostic ability).

We used analysis of variance to estimate whether variability in luciferase activity ratios was larger between different gene variants or between different cell types and experimental conditions for the same gene variant.

In the haplotypes dataset, we examined whether luciferase and epidemiological inferences agreed or not in the presence of formal statistical significance.

For both datasets, we recorded whether different cell lines or experimental conditions gave luciferase activity ratio estimates that differed in their level of statistical significance. Finally, we examined the concordance of other functional assays that had been used as compared with luciferase results and epidemiological association results.

All analyses were conducted in SPSS 12.0 (SPSS Inc., Chicago, IL) and reported P-values are 2-tailed.

RESULTS

Bi-allelic markers

Of the 36 evaluated bi-allelic polymorphisms1651 (Table 1), 28 were located in the 5′-flanking region, 5 were exonic, 2 were intronic, and 1 lay in the 3′-untranslated region. A wide variety of disease phenotypes were probed.

Table 1 Studies addressing concurrently epidemiological associations and luciferase experiments on the same alleles for bi-allelic markers

For 29 of the 36 cases, the investigators claimed the presence of a statistically significant epidemiological association (Table 1 and Appendix 1 [online only]). However, in 8 of the 29 claimed associations, there was no formal statistical significance for the contrast of the two alleles, when all data were analyzed. Significant associations had been based on selected genotype contrasts, often with peculiar choices (e.g., a contrast of both homozygote groups combined vs. heterozygotes) without further justification; or on exploratory subgroup analyses based on age or racial descent, although the results in the selected isolated subgroups did not differ beyond chance compared to the other subjects.52 In one study, a significant association was seen only in a selected genotype contrast, for the subgroup of younger people, further limited to the sub-subgroup of those carrying a specific genotype of another gene.37 Based on a priori definitions in our protocol, we considered these eight associations as not formally significant, since they were clearly post hoc explorations. Moreover the direct equivalent of luciferase assays would be allele-based comparisons, since the transfection constructs use alleles.

Luciferase activity ratios versus genetic odds ratios in bi-allelic markers

There was no correlation between the observed luciferase activity ratio and the observed odds ratio in the epidemiological case-control association analysis. Across the 36 topics, the Spearman correlation coefficient was −0.09 (P = 0.60, Pearson correlation coefficient 0.04, P = 0.83) when we considered the geometrical mean of the luciferase activity ratios of different cell lines with baseline experimental conditions and the allele-level odds ratio (Fig. 1). When data from different cell lines on the same gene variant were considered as separate data points, the Spearman correlation coefficient was −0.27 (P = 0.06, Pearson correlation coefficient 0.19, P = 0.28), suggesting a small trend for smaller luciferase activity ratios with larger epidemiological effects.

Fig. 1
figure 1

Lack of correlation between the observed odds ratio in the case-control epidemiological study and the luciferase activity ratio for the same gene variant. Odds ratios pertain to allele-level estimates for the effect of the minor allele. For direct analogy, the luciferase activity ratio pertains to the activity of the construct containing the minor allele versus the construct with the major allele. Only the baseline experimental conditions for luciferase assays are considered here. When many different cell lines were tested, we used the geometrical mean of the luciferase activity ratios across cell lines. Two outliers are not shown.

We also performed an analysis where all odds ratios and all luciferase activity ratios (geometric means for several cell lines) were also coined to be ≥1. This analysis assumes that the allele that increases the risk of a disease phenotype may either increase or decrease mRNA levels and both increase and decrease count as evidence of biological function that is concordant with the epidemiological effect. Thus, the analysis forces the results toward concordance. Even with this analysis, the correlation coefficient was only 0.24 and not statistically significant (P = 0.17).

Luciferase activity ratios also had absolutely no diagnostic ability for telling whether the respective epidemiological study would show a statistically significant (P < 0.05) or not association. The area under the ROC curve was 0.52 (Fig. 2).

Fig. 2
figure 2

Receiver operating characteristic (ROC) curve for luciferase activity ratios as a diagnostic test for determining whether the respective epidemiological association would be statistically significant (P < 0.05) or not. The diagonal shows total lack of diagnostic information (no concordance at all) and the observed data are hovering around this diagonal with area under the curve 0.52 (P = 0.82). To achieve a sensitivity of 91%, the specificity is only 26%. Only the baseline experimental conditions for luciferase assays have been considered. Luciferase ratios have been consistently coined to be ≥1, so as to always show the difference between the high- versus low-activity allele. When different cell lines were tested, these have been entered separately in the calculations. Analyses using the geometrical mean of the luciferase activity ratios across different cell lines on the same gene variant yield similar results (area under the curve 0.60, P = 0.31, not shown).

The number of total luciferase experiments and replicates varied from 1 to 39 (Appendix Table 2; online only), but many studies were unclear whether they reported on the number of independent experiments or number of replicates of the same experiment.

Appendix Table 2 Key data on luciferase activity ratios

Variability across luciferase assay experimental conditions for bi-allelic markers

Across all 99 available datasets (Appendix Table 2; online only), we found that the variation due to different experimental conditions accounted for < 8% of the total variation based on analysis of variance. For 19 gene variants, experiments had been done with two or more different cell lines and/or various experimental conditions. In 12 of them, all cell lines and experimental conditions yielded the same conclusions (always statistically significant differences between the two alleles or always nonstatistically significant differences). For five gene variants there were no significant differences for constructs bearing the two different alleles at baseline conditions, but differences emerged upon stimulation with various substances; one gene variant had opposite effects in different cell lines; and for one gene variant the differential effect of the minor allele was seen only on infected, but not uninfected cell lines. Despite these modest differences, statistically significant luciferase activity ratios in opposite direction were seen for only one gene variant.

The absolute value of the luciferase activity ratio could tell with high accuracy whether it would also be formally statistically significant (P < 0.05) or not - the area under the ROC curve was 0.95. Using a ratio cut-off of 1.44 for the high versus low activity allele had a sensitivity of 91% and specificity of 94% for identifying formally statistically significant differences in function.

Associations involving haplotypes

Twenty-three studies16,35,45,5372 performed luciferase experiments using constructs with haplotypes and also addressed epidemiological associations (Table 2). Of the 23 evaluated haplotypes (Table 2), 19 were entirely in the 5′-flanking region, 3 also included intronic or coding regions, and one was intronic.

Table 2 Studies addressing concurrently epidemiological associations and luciferase experiments involving haplotypes

Overall, the inferences of epidemiological and luciferase analyses agreed in terms of whether there were statistically significant effects or not in six studies, and disagreed in five studies, while agreement varied in two studies (different results depending on whether allele- or genotype-based analyses were done; or depending on the disease outcome considered). In the remaining 10 studies, the investigators did not perform epidemiological analyses using the haplotypes examined in the luciferase experiments (N = 7 studies) or reported only on specific haplotype contrasts, without considering all haplotypes in the epidemiological analyses (N = 3 studies).

In nine studies, the investigators performed luciferase experiments on selected haplotypes only, and in six of these the selected haplotypes were not chosen with strict preference to the ones that were more common in the study population. In another three studies, the investigators tested in luciferase experiments haplotypes that were nonexistent in the study population (frequency = 0%).

In 10 studies, luciferase experiments were performed with two cell lines and the results were consistent in terms of whether overall statistical significance was present or not in 9 of them (both significant N = 7, both non-significant N = 2, discordant N = 1). In another study, 5 cell lines were evaluated and results agreed in terms of statistical significance with 4 of the 5 cell lines. However, with one exception, in all studies where several cell lines found statistically significant results, the highest luciferase activity was seen for different haplotypes across different cell lines.

In three studies, luciferase experiments were also done with different stimulation conditions. In one study, the results were similar with stimulated and unstimulated conditions,55 while in the other two studies the same order of activity was seen across haplotypes, but the results became formally significant, while they were non-significant with unstimulated conditions.68,70

Continuous disease outcomes

Four of the 36 studies with bi-allelic markers and binary outcomes also evaluated association analyses for continuous traits that would represent the disease under study. Binary and continuous traits usually gave concordant inferences. VKORC1 -1639G>A was significantly associated both with binary-categorized warfarin sensitivity and with the dose of warfarin required.51 Clock 3111 T>C was not significantly associated with either evening sleep preference or with the τ value for sleep.38 UGT1A1 -3263 T>G was significantly associated with the risk of binary-categorized hyperbilurubinemia and was also significantly related with the levels of bilirubin in the control group.46 Finally, IL-8 -251 A>T was significantly associated with the risk of gastric cancer and was also significantly associated with the antral atrophy and metaplasia score, although the latter was seen only in the younger subjects.36

Of the 23 studies evaluating haplotypes, one found a statistically significant association between RANTES promoter and a continuous outcome (CD-4 cell depletion), but this was not the same as the binary outcome examined in that same study (HIV infection) for which there was no significant association.59 One other study of asthma also evaluated the continuous outcomes of forced expiratory volume at one second and bronchial hyper-responsiveness score, but tested only single markers (not haplotypes) for these outcomes.64 Another study that addressed also associations with serum IgE levels but tested different haplotypes than those tested in the luciferase experiments.57

Other in vitro functional assays

In 11 studies, investigators examined also binding signals in electrophoretic mobility shift assays (EMSAs); seven of these studies had addressed in luciferase experiments single bi-allelic markers, three had addressed haplotypes, and one had addressed both. All 11 investigations claimed differences in binding affinity, but only two of them tried to quantify the difference in the signal intensity (described as 1.5-fold24 and 1.8-fold31 intensity difference), while the other 9 studies gave qualitative data on whether the signal was weaker, stronger, absent, or different with one of the two alleles.16,25,37,41,42,49,54,69,71 In one study, three different cell lines were tested and results differed qualitatively across cell lines.71

There was modest concordance at best with the epidemiological data. Formally statistically significant epidemiological associations were seen in seven16,25,31,41,54,69,71 of the 11 investigations.

The results of EMSAs were generally consistent with the inferences of luciferase assays. However, in the three studies where the respective luciferase assays had examined haplotypes, EMSAs did not examine all the polymorphisms involved in the luciferase-tested haplotypes; therefore the full correspondence of the results is difficult. Among the studies of bi-allelic markers, in two investigations24,25 the luciferase assays did not show consistently significant differences between the two alleles except under special conditions.

Sparse data on other reporter constructs (one study33) and real time PCR quantification of mRNA in vitro (three studies18,33,55) showed consistent inferences with the respective luciferase data, but agreed with epidemiological inferences only in two18,55 of the three studies.

DISCUSSION

In the appraised sample of investigations, luciferase results could not tell whether the respective epidemiological association would be formally statistically significant or not. Moreover, larger luciferase activity ratios did not correlate with stronger epidemiological effects. Luciferase activity ratios tended to be qualitatively similar across cell lines and experimental conditions, but exceptions did occur. The available comparative data on other outcomes and functional assays suggested that binary and continuous disease outcomes usually gave concordant results; other in vitro methods, in particular EMSA, agreed with luciferase results.

There is no consensus in the literature on what constitutes a large enough luciferase activity ratio.5 In theory, very small differences may become formally statistically significant, if many experiments are performed. Conversely, quite large differences may be dismissed as non-significant, if only one or few experiments are performed. Luciferase studies should explicitly describe how many independent experiments were performed and how many replicates were done in each experiment; this information was often difficult to decipher in the analyzed studies. A sufficient number of experiments is needed, since luciferase assays have some unavoidable variance. Nevertheless, in the assembled database a cutoff of 1.44 adequately differentiated significant from non-significant luciferase activity ratios. Efforts need to be made to standardize further functional assays and their interpretation across laboratories. Our finding does not necessarily mean that ratios as low as 1.5 are always biologically important. Such values are very low compared to what is typically seen for the effects of mutations in monogenetic disorders, but for multigenetic effects, relatively small differences should not be dismissed lightly.

In our analysis, we focused on in vitro functional data. Information on in vivo and ex vivo functional assays in the analyzed studies was very limited, but it suggested that there was modest to good concordance with epidemiological data (Appendix Table 3; online only). Jais has conducted a far more comprehensive, extensive review of gene expression in healthy versus diseased tissues for genetic variants involved in replicated genetic associations.4 This evaluation concluded that many epidemiological associations are accompanied by significant differences in tissue gene expression. As with our in vitro data, the absolute differences in biological signals were modest at best. It is reasonable to expect that biological effects measured ex vivo are likely to be closer to the epidemiological associations than in vitro functional effects. However, obtaining such ex vivo data are more difficult.

Appendix Table 3 In vivo or ex vivo functional effects

We observed some common problems in the literature that we analyzed. First, results were often selectively reported for particular genetic contrasts, variants, haplotypes, or population subgroups. Second, some studies used different genetic variants and contrasts in epidemiological versus functional analyses and thus these lines of evidence were not directly comparable. Third, luciferase experiments were often performed only for selected haplotypes, not necessarily the most frequent ones. As haplotypes analyses have now become the norm for investigations of human variation, these design and reporting problems can create confusion and spurious claims. Some investigators may have reported preferentially their best data73,74 and may have strived to show that there is concordance between their epidemiological and biological data.75 Thus, if anything, published data may be biased in favor of agreement between epidemiological and functional data. However, we found little concordance.

Most studies did not evaluate more than one functional assay. We should acknowledge that luciferase assays are one of many possible functional assays. Generalization across assays should be made cautiously. Different functional assays may provide complementary insights. Their results should not be forced to fit with those of other assays or clinical data using spurious contrasts and analyses. There is a continuum between binary disease categorizations, continuous traits, in vivo functional measurements, ex vivo experiments, and in vitro functional data. This continuum should be examined without preconceptions on whether results should agree across these different experimental levels. Comprehensive, comparable analyses with no selection bias in reporting should allow maximizing our insight about the credibility of postulated gene-disease associations and their biological background. Table 3 summarizes some suggestions on how to achieve this goal based on the empirical data that we examined. Moreover, it should be anticipated that in contrast to monogenetic disorders where functional approaches show large effects in line with very high odds ratios, for multigenetic heritability due to common genetic variation, both functional and epidemiological effects are likely to be very modest, and need careful design and optimal measurements.

Table 3 Considerations for studies addressing both epidemiological and functional effects of genetic variation

Some caveats should be discussed. First, our evaluation used a convenience sample of studies that involved both functional and epidemiological data. It would be impractical or even impossible to identify all studies that have performed both types of research. We simply used a systematic sample that would be large enough to answer our questions appropriately. Moreover, for some of these gene variants and associations, other investigators may have performed independent studies. However, we wanted to see whether there is concordance under what are, in theory, the most favorable circumstances, i.e., in the hands of the same team performing the epidemiological and biological analyses. This caveat reinforces our basic observation of lack of agreement.

We also found that the luciferase results were relatively robust to different experimental conditions. Selective reporting of best results is less likely to be a problem here. While discrepancies of epidemiology and biology may have been seen as unattractive to publish, several authors seemed to dwell with interest on the differential luciferase assay results obtained with different conditions and tried to build complex biological explanations around them.19,32,36 However, while exceptions did occur, usually different cell lines gave largely similar inferences. Differences are more common with different stimulation conditions; for haplotypes analyses, the exact order of haplotypes in terms of luciferase activity varied across cell lines and experimental conditions, but full reversal of the order with different conditions was seen only in one study. Interpretation of such differences should be cautious. It is difficult to reproduce in an in vitro system the exact biological milieu that leads to a complex disease phenotype.5 The same applies to more recently developed functional assays7678 and their reproducibility needs to be empirically evaluated across many studies, as we did for the luciferase reporter systems.

Functional gene variants are very common,79 especially among promoter polymorphisms.80,81 However, the link to specific postulated associations for pinpointed phenotypes is difficult. The lack of concordance between epidemiological and luciferase data may be due to many reasons. The epidemiological associations may not be accurate and may not even be replicated.8284 Even for well-documented functional variants, altered gene expression may have a different impact on the risk for different diseases, and often it is not possible to guess which disease would be most relevant for each functional variant. Alternatively, the luciferase experiments may not be capturing the biological effect, which may even involve a pathway other than transcription. For markers in linkage disequilibrium with the true functional variant, the luciferase assays may or may not capture the transcriptional effect, depending on whether the true marker is also included in the construct and whether the linkage disequilibrium is very strong or weak. Therefore, it should not be very surprising that these two lines of epidemiological and in vitro evidence provide largely independent information. Investigators in complex disease genetics should approach epidemiological and biological lines of evidence without any preconception or prejudice about their concordance. These lines of experimentation provide complementary evidence that needs to be carefully integrated rather than forced to fit.