Introduction

A persistent observation in the current field of human genetics is that single-nucleotide polymorphisms (SNPs) identified through genome-wide association studies only explain a small fraction of the genetic variation of complex human traits;1, 2, 3 the so-called missing heritability problem. Several non-mutually exclusive hypotheses such as rare alleles with large effects, and gene–gene and gene–environment interactions (reviewed by Manolio et al.,4 and Gibson5) have been proposed to explain this lack of explained genetic variation, and they may all account in part for the missing heritability. However, most of these hypotheses assume that the genome is read in the same way across all individuals and therefore, the same SNP has exactly the same functionality from a person to another person. This assumption may be not true for the use alternative regulatory elements (for example, gene promoters).6 For example, in a gene with multiple promoters some persons would use preferentially a particular promoter, and other subjects would tend to use another promoter of the same gene. Under this scenario, a particular SNP would have functional significance only among those individuals who use the promoter inside which the SNP is located. The observed effect of that SNP would be attenuated relative to its actual effect on a phenotypic trait of interest.

In the present report, I explore some of the quantitative consequences of the hypothesis of alternative use of regulatory elements.

Materials and methods

The present model is based on the recently proposed hypothesis on the existence of inter-individual variation in the use of alternative promoters.6 Let us assume a gene X that controls a continuous phenotypic trait Y. Expression of the gene X is under the control of M alternative promoters, and different SNPs may be present inside each promoter (Figure 1). The model assumes the existence of person-to-person variation in the use of the alternative promoters may be through the action of epigenetic marks (for example, DNA methylation). Although in the present work the model is restricted to a gene with only two promoters, P1 and P2, and two SNPs: G1 (inside P1) and G2 (inside P2), the results can be easily generalized to a gene with more than two promoters and more than one SNP inside each promoter. The SNP G1 has two alleles, A1 with frequency equal to p1 and A2 with frequency equal to p2. The allele A1 increases by a units the value of the phenotypic trait Y compared with the allele A2. The SNP G2 has two alleles, B1 with frequency equal to q1 and B2 with frequency equal to q2. The allele B1 increases by b units the value of the phenotypic trait Y compared with the allele B2. Each allele will affect the phenotypic trait Y only when its corresponding promoter is being used (that is, the allele A1 increases the value of Y only when the promoter P1 is used, and the allele B1 increases the value of Y only when the promoter P2 is used). The promoter P1 is used in a proportion f1 of the chromosomes in the population, and the promoter P2 is used in a proportion f2 of the chromosomes in the population. Chromosomes that use the promoter P1 have an increase of e units of the phenotypic trait Y compared with chromosomes that use the promoter P2. Hardy–Weinberg equilibrium is assumed for each SNP. As the goal of the present analysis is to show how genetic variability may be hidden because of the use of alternative promoters, in the following results it is assumed that we do not observe which one of the alternative promoters is being used. We only observe the genotypes in G1 and G2 as well as the individuals’ values of the phenotypic trait Y.

Figure 1
figure 1

Gene with alternative promoters. A gene X is transcribed from M alternative promoters, P1, P2, …, PM. It is proposed the existence of person-to-person variation in which of the promoters is used. Each promoter contains different single-nucleotide polymorphisms (SNPs). A polymorphism G1 is located inside the P1 promoter, and a different polymorphism G2 is located inside the P2 promoter.

Table 1 shows both the unobserved types of chromosomes according to the unobserved promoter use and observed genotypes in the G1 and G2 SNPs (upper half of the table, chromosome types H1 through H8). Note that additive effects are measured relative to the chromosome H8 (P2A2B2) that by definition has a value equal to zero for the phenotypic trait Y. Observed chromosomes based on only the genotypes in the G1 and G2 SNPs are shown in the bottom half of Table 1 (chromosome types J1 through J4). Chromosome frequencies are shown for four different models of linkage disequilibrium (LD) between the G1 and G2 SNPs. The most general scenario (model 1) makes no assumption about any particular value of LD (given by the D coefficient or covariance between the G1 and G2 SNPs). Models 2, 3 and 4 are particular cases of model 1. Model 2 assumes linkage equilibrium between the G1 and G2 SNPs. Scenarios portrayed by models 3 and 4 refer to complete LD between the G1 and G2 SNPs. In model 3, the A1 and B1 (and A2 and B2) alleles are always present together (that is, only unobserved chromosomes P1A1B1, P1A2B2, P2A1B1 and P2A2B2 exist in the population). An opposite pattern of complete LD is shown in model 4, where the A1 and B2 (and A2 and B1) alleles are always transmitted together (that is, only unobserved chromosomes P1A1B2, P1A2B1, P2A1B2 and P2A2B1 are present in the population). It must be noticed that observed chromosomes are obtained from the unobserved chromosomes after collapsing over promoters P1 and P2. For example, the observed J1 chromosome (A1B1) is a mixture of the unobserved H1 (P1A1B1) and H5 (P2A1B1) chromosomes. Phenotypic value of the J1 chromosome is the average of the phenotypic values of the H1 and H5 chromosomes weighted by the f1 and f2 proportions, respectively. Frequency of the J1 chromosome is just the sum of the frequencies of the H1 and H5 chromosomes. The rest of the observed chromosomes can be obtained in a similar way: J2=H2+H6, J3=H3+H7 and J4=H4+H8.

Table 1 List of actual and observed types of chromosomes with their respective phenotypic values and frequencies under linkage disequilibrium (LD) patterns

Summary statistics

The mean chromosome value of the phenotypic trait Y is equal to

where phenotype(i) and frequency(i) are the phenotype value and frequency of the i-th chromosome (either observed or unobserved).

Variance of the chromosome phenotype values would be equal to

It is noteworthy that the mean of the chromosome phenotype values does not depend on LD and is the same for the actual (unobserved) and observed chromosomes. However, as it will be shown below, the actual variance due to unobserved chromosomes (that is, total variance) will be always greater or equal than the variance due to observed chromosomes. In other words the variance due to measurable genetic variation (that is, G1 and G2 SNPs) will fail to explain 100% of the actual variance due to the totality of unobserved chromosomes in the population.

Results

Let us define K2 as the ratio of the variance due to observed chromosomes to the total variance due to unobserved chromosomes. Figure 2 shows K2 under three different particular scenarios: (1) no LD between the G1 and G2 SNPs, (2) positive LD between the G1 and G2 SNPs (that is, A1 and B1 alleles tend to be transmitted together) and (3) negative LD between the G1 and G2 SNPs (that is, A1 and B2 alleles tend to be transmitted together).

Figure 2
figure 2

Proportion of total variance that is explained by observed chromosomes. In presence of person-to-person variation in the use of alternative promoters the variance due to observed chromosomes is always lower that the total variance of the genetic system (K2<1). Only when use of one promoter is fixed in the population (f1=0 or f1=1) the observed chromosomes would explain 100% of the total genetic variance (K2=1). K2 variation is under three possible scenarios of linkage disequilibrium (LD) between the G1 and G2 SNPs: (a) linkage equilibrium, (b) positive LD, and (c) negative LD. The additive effects of the A1 and B1 alleles were assumed to be equal to 5 units of the phenotypic trait. The epigenetic effect was allowed to take four different values in (a) (e=0, 5, 10 and 20 units), and kept constant in (b, c) (e=5 units).

SCENARIO 1

Figure 2a shows K2 under linkage equilibrium between the G1 and G2 SNPs (model 2 in Table 1) for different epigenetic effects and proportion of chromosomes using the promoter P1. The additive effects of the A1 and B1 alleles were assumed to be equal to 5 units of the continuous phenotypic trait (a=b=5 units). The epigenetic effect was allowed to take four different values: e=0, 5, 10 and 20 units of the phenotypic trait. It is clear that K21, and the higher the epigenetic effect the lower the K2 ratio (that is, the observed chromosomes explain less of the total variance due to the actual unobserved chromosomes). Even in absence of any epigenetic effect (that is, e=0 meaning that the P1 and P2 promoters have the same baseline level of the phenotypic trait Y) the observed chromosomes do not explain the totality of the variance due to unobserved chromosomes. The only instances when K2=1 are when only one promoter is used in the population (that is, f1=1, use of promoter P1 is fixed; or f1=0, use of promoter P2 is fixed).

SCENARIO 2

Figure 2b shows K2 under positive LD between the G1 and G2 SNPs (that is, the A1 and B1 alleles tend to be transmitted together in the same chromosome) for different r2 values (squared correlation between the G1 and G2 SNPs) and proportion of chromosomes using the promoter P1. The additive effects of the A1 and B1 alleles as well as the epigenetic effect were kept constant and equal to 5 units of the phenotypic trait (a=b=e=5 units). For this scenario K21 too, and it is noteworthy that the stronger the LD between both SNPs (that is, the higher r2) the more the observed chromosomes would explain the total variance. When r2=1.0 (complete positive LD as shown in model 3 of Table 1) reduction of K2 is attenuated in comparison with the case of linkage equilibrium (r2=0.0).

SCENARIO 3

Figure 2c shows K2 under negative LD between the G1 and G2 SNPs (that is, the A1 and B2 alleles tend to be transmitted together in the same chromosome) for different r2 values and proportion of chromosomes using the promoter P1. The additive effects of the A1 and B1 alleles as well as the epigenetic effect were kept constant and equal to 5 units of the phenotypic trait (a=b=e=5 units). Similar to the previous two scenarios we have that K21 however, in presence of negative LD the higher r2 the lower the variance that is explained by the observed chromosomes. Maximum reduction of that K2 is observed when r2=1.0 (complete negative LD as shown in model 4 of Table 1). Only the haplotypes A1B2 and A2B1 are observed in the presence of complete negative LD, and as Figure 2c shows the variance due to the observed chromosomes can completely disappear (K2=0). A simple calculation shows that K2 vanishes when the proportion of chromosomes using the P1 promoter is equal to f1=b/(a+b). K2 will disappear at f1=0.5 when both A1 and B1 alleles have the same additive effect (a=b); at f1<0.5 when the A1 allele has a higher additive effect than allele B1 (a>b); and at f1>0.5 when the A1 allele has a lower additive effect that allele B1 (a<b).

Discussion

The current model offers a potential mechanism to explain in part why genetic variants discovered so far do not explain much of the expected genetic variability. Although part of the unexplained variability may be due to rare genetic polymorphisms still to be found,4 the model predicts that person-to-person variation in the use of alternative promoters would reduce the observed genetic variance of a genetic system. Thus, even a complete knowledge of all the genetic variants involved in a particular phenotypic trait would be no enough to explain the whole genetic variance of the trait.

Three major factors explain the reduction of the genetic variance according to the model discussed in the present work. First, the observed additive effects of the SNPs inside each of the alternative promoters are attenuated in comparison with their actual effects. For example, because the allele A1 of the G1 SNP exerts its effect only when the promoter P1 is being used, its observed additive effect would be reduced by a factor equal to f1 relative to its actual effect. The same situation applies for the B1 allele of the G2 SNP whose observed additive effect would be attenuated by a factor equal to f2. Second, because the use of alternative promoters is not being measured (for example, in current genetic epidemiology studies such scenario is not even considered as a possibility) the dimensionality of the observed data would be always lower than the actual dimensionality of the population data. The number of observed chromosomes will be less than the number of actual chromosomes in the population. Third, different promoters may have different baseline levels of the phenotypic trait under study further reducing the proportion of the actual variance that is due to measured genetic polymorphisms.

Recent published evidence supports the proposed hypothesis of person-to-person variation in the use of alternative promoters. Turner et al.7 reported the presence of high inter-individual variability in the methylation patterns of alternative promoters of the glucocorticoid receptor (NR3C1) gene in 26 healthy subjects, suggesting person-to-person variation in epigenetic regulatory mechanisms. A small study that measured promoter activity of the aromatase (CYP19A1) gene in skin fibroblasts from four normal volunteers found that one subject showed increased activity of the promoters I.3 and II in response to cyclic adenosine monophosphate, in contrast to the other three subjects who expressed the cyclic adenosine monophosphate-unresponsive promoter I.4.8 In non-malignant lung tissue from 15 patients with non-small-cell lung cancer, two cases used mostly promoters I.3 and II of the CYP19A1 gene and the rest of patients used the promoter I.4.9 It is noteworthy that may even exist ethnic differences in the use of alternative promoters. A recent study in 101 women with uterine leiomyoma (31 African American, 34 white American and 36 Japanese women) reported that leiomyoma tissue from African American women expressed the promoter II in higher proportion compared with Japanese women.10 At last, the CD36 gene showed inter-individual variability in the use of four out of five alternative promoters in cultured monocytes from 10 subjects.11

The present results, published evidence about variability in the use of alternative promoters, and the fact that more than half of human genes have alternative promoters,12 with a mean of 3.1 promoters per gene13 stress the need to carry out extensive studies in human populations to determine and quantify inter-individual variation in the use of alternative promoters. To date there are few approaches to assess the use of alternative promoters in a genome-wide scale. Singer et al.14 developed a promoter tiling array that can identify about 35 000 alternative promoters from almost 7000 human genes, and Jacox et al.15 described a computational approach to determine alternative promoter usage in nearly 1500 genes using the Affymetrix Exon 1.0 array (Affymetrix, Santa Clara, CA, USA). Although those microarrays only interrogate a subset of genes in the genome (that is, those genes with known alternative promoters) they would provide enough data to test the proposed hypothesis in a genome-wide scale. A comprehensive assessment should ideally measure person-to-person variation across different types of tissue.

The present model can be easily extended to include cases of genes with more than two promoters and more than one SNP in each of the promoters. In a gene with multiple promoters, the observed additive effect of a particular SNP would be reduced by a factor equal to the proportion of chromosomes in the population using the promoter in which the SNP is located. The model may also be used for other types of alternative regulatory elements such as multiple enhancers affecting gene expression; the so-called shadow enhancers.16, 17, 18 A limitation of the presented model is that depends on the knowledge about alternative promoters or regulatory elements in general. More experimental work such as chromatin immunoprecipitation-chip assays validated with transgenic models is needed to identify new regulatory elements.

In summary, the present report shows that in presence of inter-individual variation in the use of alternative promoters the observable effects of genetic variants will be lower than their actual effects. The proposed model may explain in part why genome-wide association studies-identified variants are in most part poor predictors of human complex traits. Future studies are needed to determine and quantify the person-to-person variability in the use of alternative promoters as well as to identify new regulatory elements in the human genome.