Human genetic approaches to elucidate common human diseases include segregation studies to assess heritability of disease, family-based linkage studies to identify disease susceptibility loci, and genetic association studies to identify genes for disease. Common human diseases typically involve a number of genetic and environmental factors, and human genetic approaches in this context have not met with huge success. However, with the completion of the sequencing of the human and other genomes, the completion of the first phase of the HapMap Project,1 and with the development of a number of high-throughput functional genomic technologies able to provide unprecedented looks at molecular processes underlying disease, human genetics is poised to have an impact on common diseases like never before. One exciting variation to the classic human genetics approaches that has recently emerged seeks to integrate molecular profiling data (eg, gene expression) with genotypic and clinical trait data to elucidate the network of molecular interactions that underlie complex traits.2, 3, 4 Identifying variations in DNA that lead to variations in transcript levels can facilitate the identification of variations that lead to disease, given the molecular changes that drive disease are directly mapped.

Previous human genetic studies on the genetics of gene expression have demonstrated that expression in segregating human populations is significantly heritable and under the control of genetic loci.5, 6 In a recent letter to Nature, Cheung et al7 have extended this work by carrying out one of the most comprehensive genome-wide association (GWA) scans published to date to map the genetic determinants for several hundred lymphoblastoid cell line gene expression traits. Using the results from their previous linkage study on the genetics of gene expression in 14 CEPH families,6 Cheung et al identified 374 gene expression traits that gave rise to linkages that were proximal to the corresponding genes' physical location. Expression traits are unique compared with other quantitative traits considered in a genetics context because the trait itself corresponds to a specific genetic locus, namely the structural gene that gives rise to the expression trait. DNA variations within the structural gene that in turn give rise to changes in transcript abundances for that gene are cis-acting and are detected as cis linkages. Using the greater than 770 000 HapMap SNPs typed in the CEPH individuals profiled in this study, Cheung et al identified all SNPs located in the regions of these 374 genes (defined as the 50 kb upstream of transcription start and transcription stop) and tested each SNP for association with the expression trait in 57 unrelated CEPH individuals from the International HapMap Project.8

Of the 374 expression traits considered, 65 (17%) were associated with at least one ‘cis’ SNP marker.When the set of genes was narrowed to the 27 expression traits with the strongest cis-linkage signals, 19 were found to be associated with at least one cis SNP marker, demonstrating at least in the expression context that strong linkage appears to predict association. Cheung et al then went on to leverage the 770 394 HapMap SNPs typed in the 57 CEPH individuals profiled in their study to assess whether a comprehensive GWA scan would be able to recover cis or other associations for these 27 genes. Testing for association between each expression trait and each marker, 12 genes showed significant cis-only association, one showed a cis and trans association, and another showed only a trans association. The association results were largely concordant with the linkage results. For one of the genes, CHI3L2, the most strongly associated SNP was only 91 bases upstream of the transcription start, suggesting it may be the causal variant or tightly linked with the causal variant. Two independent functional assays supported this SNP as the determinant affecting CHI3L2 expression levels.

This study has several important implications. First, there is still considerable debate as to the utility of GWA studies to map genes for complex traits.9 Although there are some notable successes employing such an approach,10 this study demonstrated success for 14 traits (an unprecedented number for a single study) using a much higher density SNP map than has been used before, even despite what must be considered an incredibly small sample size for a GWA study. Of course, these results should be interpreted cautiously given the GWA results were restricted to traits that had already been detected with significant linkages in a previous study, selected from thousands of traits, and so represent the most informative traits studied in this context to date. Such selection information will not typically be available a priori in most GWA studies.

Another important contribution is the demonstration that genetic determinants for expression traits can be mapped in human populations, providing a level of functional validation for expression-associated SNPs that in turn make them attractive candidates to explore in disease populations. The ability to leverage linkage disequilibrium in a GWA study results in much higher resolution mapping than can typically be achieved by linkage alone. Several of the associations identified by Cheung et al spanned regions less than 100 kb in length, and in one case, the determinant of expression variation may have been identified. Because expression traits are closer to the genetics, given the trait itself is transcribed from the DNA, genetic associations may be easier to detect, and once a DNA variation has been associated with expression, it becomes generally interesting as a variation that may lead to disease.

Although this study represents an exciting addition to a growing number of influential genetics of gene expression studies, a number of limitations should be noted. This study began with traits from a previous study and only examined a relatively small set of genes. No indication was given as to the strength of genetic signal over the more than 8500 genes profiled in the CEPH individuals and a number of statistical issues were avoided, including no estimates of false discovery provided. This study does not shed much light on how thousands or even tens of thousands of traits should be studied in a genetic context, whether a linkage or GWA (or a combination) approach would be preferred, the fraction of genetic variation that may be explained by epistasis and what it would take to detect such effects over tens of thousands of traits and hundreds of thousands of markers, and whether causal genes for disease could be more readily identified in human populations using this approach. In addition, this first look may highlight a small class of expression traits that are easy to detect but that are unlikely to reflect the general case. Further, SNP density issues were also not addressed, so no clear guidance on whether 770K SNPs or more are needed, or whether less would have sufficed. Finally, certainly one of the primary aims in these types of studies is to leverage the expression data to elucidate disease, and this still remains to be shown in human studies (whereas this has been shown in mouse).

These issues notwithstanding, the study by Cheung et al provides strong support for carrying out larger-scale studies integrating genotypic and expression data in human populations. Human gene expression studies where the population being studied has been clinically characterised will provide for the possibility of directly mapping genes for complex disease traits, as has been done in mouse studies.11 Larger-scale studies will also provide a more complete characterization of SNPs that associate with expression, offering an alternative, or at least an effort that is strongly complementary to projects that aim to map all functional variants.12 SNPs associated with expression achieve a level of functional validation that does not exist a priori for coding SNPs, and this type of objective functional validation may lead to a set of SNPs that is far and away more enriched for disease-causing SNPs, in addition to providing a genetic map for the transcriptomeâ–ª