In his excellent Review, David Balding describes the statistical approaches to population association studies as well as the constraints that apply to each type of data analysis1. We would like to comment on two points that were raised. First, we believe that exploring haplotype–phenotype associations may be of greater value than estimated in the Review. Second, the logarithmic data transformation that the author recommends be carried out before linear regression analysis could lead to false conclusions about genotype–phenotype associations. The reasoning behind our statements is explained below.
Concerning haplotype-based phenotype analysis, take the example of two loci (A and B); each has two alleles, A_A′ and B_B′, with minor allele frequencies of 0.31 and 0.4, respectively (Fig. 1). Let us assume that allele A′ has one-third the activity of A and that B′ has 60% higher activity than B. The activity of the haplotypes AB, AB′, A′B and A′B′ would thus be 100%, 160%, 33.3% and 53.3%, respectively, as depicted on the right-hand side of Fig. 1 for individuals who are homozygous for each haplotype. Let us also assume that there is moderate linkage disequilibrium, resulting in haplotype frequencies of AB, AB′, A′B and A′B′ of 0.54, 0.15, 0.06 and 0.25, respectively. On the basis of polymorphism analysis alone, the calculated activity of B′/B′ homozygotes would be identical to that of B/B homozygotes (B/B: (100%·0.54+33.3%·0.06)/(0.54+0.06) = 93.3% versus B′/B′: (160%·0.15+53.3·0.25)/(0.15+0.25) = 93.3%). In A′/A′ homozygotes, activity would be slightly overestimated as 44% of that in A/A homozygotes (A/A: (100%·0.54+160%·0.15)/(0.54+0.15) = 113%, A′/A′: (33.3%·0.06+53.3%·0.25)/(0.06+0.25) = 49.5%).
Although this case is hypothetical, it illustrates a documented example: the polymorphism analysis of SLCO1B1 (solute carrier organic anion transporter family, member 1B1) resulted in the year-long assumption that a specific polymorphism in this gene was non-functional2. The situation was clarified only on haplotype-based phenotypic analysis3. Therefore, we do not completely agree with David Balding that haplotype analysis provides no advantage over the polymorphism-wise analysis1. In fact, we believe that haplotype-based phenotype association analysis should routinely complement polymorphism analysis.
The second point concerns data transformation. Following standard practice in statistics, David Balding recommends that phenotypic data that are not normally distributed should be logarithmically transformed before linear regression for genotype–phenotype analysis1. However, such data transformation alters the shape of a relationship between the number of certain alleles and the phenotype (Fig. 2). Linear regression analysis might fail to detect some linear gene-dose–phenotype relationships (Fig. 2, top) or, conversely, some non-linear relationships will appear to be significant (Fig. 2, bottom). The extent of both effects depends on the particular data set and the kind of data transformation used. Thus, we would usually discourage data transformation before applying linearization methods in genetics and pharmacogenetics.
Balding, D. J. A tutorial on statistical methods for population association studies. Nature Rev. Genet. 7, 781–791 (2006).
Niemi, M. Role of OATP transporters in the disposition of drugs. Pharmacogenomics 8, 787–802 (2007).
Vormfelde, S. V. et al. The polymorphisms Asn130Asp and Val174Ala in OATP1B1 and the CYP2C9 allele *3 independently affect torsemide pharmacokinetics and dynamics. Clin. Pharmacol. Ther. (in the press).
About this article
Cite this article
Vormfelde, S., Brockmöller, J. On the value of haplotype-based genotype–phenotype analysis and on data transformation in pharmacogenetics and -genomics. Nat Rev Genet 8, 983 (2007). https://doi.org/10.1038/nrg1916-c1
SARS-CoV-2 infection susceptibility influenced by ACE2 genetic polymorphisms: insights from Tehran Cardio-Metabolic Genetic Study
Scientific Reports (2021)
Multi-allelic haplotype model based on genetic partition for genomic prediction and variance component estimation using SNP markers
BMC Genetics (2015)
PLoS Computational Biology (2014)
Microsatellites in immune-relevant regions and their associations with Maedi-Visna and ovine pulmonary adenocarcinoma viral diseases
Veterinary Immunology and Immunopathology (2012)
The 3′ UTR Variants in the GRP78 Are Not Associated with Overall Survival in Resectable Hepatocellular Carcinoma
PLoS ONE (2011)