Introduction

Protein kinases are key regulators of cell function by adding phosphate groups to substrate proteins. Phosphorylation by protein kinases is the most widespread and well-studied signaling mechanism in eukaryotic cells. Phosphorylation can regulate almost every property of a protein and is involved in the activity, localization and overall function of many proteins. It serves to orchestrate the activity of almost all cellular processes.1 The protein kinase complement (∼500 genes) of the human genome (kinome) constitutes one of the largest and the most functionally diverse gene families and has been comprehensively cataloged by the Human Kinome Project.2

As protein phosphorylation has a central role in diverse biological processes, such as control of cell growth, metabolism, differentiation and apoptosis, abnormal phosphorylation has been implicated in the cause of human cancer. The development of selective protein kinase inhibitors that can block or modulate diseases caused by abnormalities in these signaling pathways is widely considered a promising approach for drug development.3 Several new cancer treatments are designed to inhibit aberrantly activated kinases within cancer cells in an effort to prevent cell division. FDA (Food and Drug Administration)-approved kinase inhibitors that are used to treat various cancers include, for example, erlotinib and gefitinib, which target the epidermal growth factor receptor (EGFR),4, 5 and sorafenib, which was designed as an inhibitor of Raf kinase, but also targets the vascular endothelial growth factor receptors.6, 7 Protein kinases have now become the second most important group of drug targets, after G-protein-coupled receptors.8

Although socioeconomic status could affect health-related disparities, for some diseases, there are well-established relationships between ancestry and disease risk/pharmacological response. For example, African-American, Hispanic, Asian and Native American women have a lower incidence of breast cancer but higher mortality compared with non-Hispanic white women.9 A significant difference in response and pulmonary toxicity to gefitinib, an inhibitor of the EGFR kinase, has been observed between patients with advanced non-small-cell lung cancer from Asia and Europe/North America.10 In addition, there are clear population differences in EGFR, which may explain some of the clinical population differences.11 Thus, we hypothesized that there may be clinically important population differences in other kinase genes, and sought to comprehensively assess the entire kinome and the relevant cognate ligands.12

To catalog the genetic variation in protein kinase genes, we used a resource of single-nucleotide polymorphisms (SNPs) from the International HapMap Project (http://www.hapmap.org/).13, 14 The Phase 1/2 HapMap genotypic database, which comprises >3 million SNPs,15 has proven to be a key resource for researchers investigating the genetic contribution to human diseases, variation in gene expression and drug response.16 A comprehensive survey was carried out to identify protein kinase genes as well as ligand genes that contained SNPs with differential frequencies (eSNPs) among a panel of human lymphoblastoid cell lines derived from apparently healthy individuals of northern and western European ancestry (CEU: 60 unrelated Caucasian individuals from Utah, USA), YRI (60 unrelated Yoruba people from Ibadan, Nigeria) of African ancestry and ASN (CHB: 45 unrelated Han Chinese from Beijing, China; JPT: 45 unrelated Japanese from Tokyo, Japan) of Asian ancestry. As the three major continental populations (Asians, Europeans and Africans) have been separated geographically during the past 50 000–100 000 years, recent positive selection has been shown to contribute to the genetic17 and phenotypic (for example, gene expression)18 differences in the current populations. Therefore, the evidence for recent positive selection17 among the kinase and ligand genes was also searched using the HapMap SNP genotypic data.

Materials and methods

Human protein kinase genes

The protein kinase complement of the human genome was previously cataloged using public and proprietary genomic, cDNA and expressed sequence tag sequences.2 The list of human protein kinase genes was downloaded from the Human Kinome Project database (http://kinase.com/mammalian/).2 This updated list (December 2007) is comprised of 514 putative human kinase genes belonging to 10 groups and 133 families.2 The 102 protein kinase pseudogenes in the database were excluded from this study.

Human ligand genes

The Database of Ligand-Receptor Partners (DLRP)19 is a subset of the Database of Interacting Proteins (http://dip.doe-mbi.ucla.edu/),20 which lists protein pairs that are known to interact with each other. In particular, the DLRP is a database of protein ligand and protein receptor pairs that are experimentally known to interact with each other.19 In total, 181 unique ligands and 133 unique receptors (473 ligand–receptor relationships) are included in the current DLRP database (November 2001).19 Among them, 35 unique kinase receptors (cross-checked with the Human Kinome Project database2) and 58 unique ligands representing 183 ligand–kinase receptor relationships were included in our analysis.

Identifying kinase and ligand genes containing eSNPs

SNP@Ethnos (http://variome.kobic.re.kr/SNPatETHNIC/),21 a catalog of SNPs and genes that contains human population variation, was queried for variant kinases and ligands containing eSNPs across human populations. The database contains results for detecting natural selection and population differences using the ∼3.6 million Phase 1 (release 16c) HapMap Project13, 14 SNPs. In particular, the nearest shrunken centroid method (NSCM) score22 was calculated by SNP@Ethnos21 to detect population differences in the allele frequencies of ∼1 million common SNPs in the genic regions across the following three HapMap populations:13, 14 CEU (60 unrelated Caucasian individuals from Utah, USA) of northern and western European ancestry, YRI (60 unrelated Yoruba people from Ibadan, Nigeria) of African ancestry and ASN (CHB: 45 unrelated Han Chinese from Beijing, China; JPT: 45 unrelated Japanese from Tokyo, Japan) of Asian ancestry. A detailed mathematical explanation of the NSCM is described in the study by Tibshirani et al.22 For example, three similar scores obtained for CHB+JPT, CEU and YRI indicate that the SNP is not critical, whereas one score differing from the other two indicates that the SNP is specific to that population. An SNP is called specific in population A (that is, eSNP for population A) if (∣s(A)−s(B)∣+∣s(A)−s(C)∣)/2>0.3,21 where, for example, s(A) is the score of population A. In addition to the NSCM score, various other related information such as minor allele frequency can be obtained by searching SNP@Ethnos.21

Enriched kinase groups

The enrichment of a particular kinase group was detected by a binomial test using the entire human kinome as reference. The annotations for kinase groups were retrieved from the Human Kinome Project database.2 The entire human kinome comprises 10 major groups.2 A false discovery rate of 5% after the Benjamini–Hochberg correction23 was used for significance in this enrichment analysis. In addition, only groups with a minimum of three genes were considered to minimize the small sample size effect.

Genes under recent positive selection

Happlotter (http://hg-wen.uchicago.edu/selection/)17 was used to evaluate whether a particular gene had been a target of recent positive selection. Haplotter is a web application that has been developed to display the results of a scan for positive selection in the human genome using the HapMap data. In particular, we used the Happlotter-calculated iHS (integrated haplotype score) (HapMap Phase 1 data) to measure the possibility of a gene undergone recent positive selection. The empirical P-values, quantified by the proportion of SNPs with ∣iHS∣ >2 for each bin of 50 neighboring SNPs, were generated by Happlotter.17 Simulations indicate that this criterion provides a powerful signal of selection.17 The empirical P-value of 0.05 was used as the cutoff for significance.

Results

Variant kinases containing eSNPs

By searching the SNP@Ethnos database, 268 unique kinase genes (Supplementary Table 1) were found to contain eSNPs across the three HapMap populations. The proportion of kinase genes (∼52%) containing eSNPs across populations was much higher than that of the whole genome, whose ∼38% genes (10 138 out of 26 280 genes in the Phase 1 HapMap data21) contain eSNPs (binomial test P=8.4E-11). Table 1 lists some examples of these kinase genes. In total, 77 genes had eSNPs in the CEU samples, 240 genes had eSNPs in the YRI samples and 53 genes had eSNPs in the ASN samples. Among them, 39 genes had eSNPs in both the CEU and YRI samples, 8 genes had eSNPs in both the CEU and ASN samples and 15 genes had eSNPs in both the YRI and ASN samples. Furthermore, 20 kinase genes had eSNPs in all of the three HapMap populations (Figure 1).

Table 1 Some examples of kinase and ligand genes containing eSNPs
Figure 1
figure 1

A Venn diagram of the kinases that contained population-specific SNPs (eSNPs) in different populations. CEU, Caucasian individuals from Utah, USA; YRI, Yoruba people from Ibadan, Nigeria; ASN, Asian individuals from Beijing, China and Tokyo, Japan.

Enriched kinase groups among genes containing eSNPs

After the Benjamini–Hochberg correction (Padjusted<0.05), no kinase groups were found to be enriched among the 268 kinase genes containing eSNPs relative to the distribution of the entire human kinome (514 kinase genes). The top-ranking kinase group among the 268 genes was classified as ‘Other’ (that is, kinases not belonging to other major groups) (25 genes, nominal P=0.00502, Padjusted=0.0502). In addition, at Padjusted<0.05, none of the kinase groups were enriched among the 77, 240 and 53 genes that contained eSNPs in the CEU, YRI and ASN samples, respectively.

Variant ligands containing eSNPs

Among the 58 ligands of protein kinases, 23 ligand genes were found to contain eSNPs. In contrast to the protein kinase genes, this proportion (∼39%) was not different from that of the whole genome background (binomial test P=0.79). Table 1 lists some examples of these variant ligand genes. In total, 7 ligand genes had population-specific SNPs in the CEU samples, 19 genes had population-specific SNPs in the YRI samples and 3 genes had population-specific SNPs in the ASN samples. Among them, two ligand genes had population-specific SNPs in both the CEU and YRI samples. Furthermore, two ligand genes had population-specific SNPs in all of the three HapMap populations. These 23 ligand genes represent 74 ligand–kinase receptor relationships with 29 unique kinase genes belonging to two groups (TK: tyrosine kinase and TLK: tyrosine kinase like), among which 22 kinase genes contained eSNPs. Supplementary Table 2 shows the complete list of these 74 ligand–kinase receptor relationships.

Kinase–ligand pairs under recent positive selection

Among the 74 ligand–kinase receptor relationships (Supplementary Table 2), two ligand genes had evidence for recent positive selection: BPM3 in the CEU samples and BMP5 in the ASN samples. In addition, 11 kinase genes had evidence for recent positive selection (CEU: 4, YRI: 3 and ASN: 4 genes). Furthermore, two ligand–kinase pairs involving BMPR2 (bone morphogenetic protein receptor, type II), BMP3-BMPR2 and BMP5-BMPR2, showed evidence for recent positive selection in both the ligand and kinase genes. In particular, BMP3 and its target kinase gene BMPR2 have been under recent positive selection in the CEU samples, whereas BMP5 showed evidence for recent positive selection in the ASN samples. Some examples are illustrated in Table 1 and in Figure 2.

Figure 2
figure 2

Population-specific SNPs (eSNPs)-containing ligand genes and their kinase targets. Triangles indicate ligand genes. Circles indicate kinases. Arrows link ligands to their targets. (a) The kinases belonging to the TK (tyrosine kinase) group; (b) The kinases belonging to the TLK (tyrosine kinase-like) group.

Discussion

The results of our study show that the human kinome, important in many different diseases, manifests significant genetic variation among major continental populations, in excess of that expected across the general genome background. This increased population diversity suggests a stronger adaptation of these genes within each population. Given that the kinases are hubs of various cellular functions, the stronger adaptation of these genes could have been critical for different populations to adapt to their new environments as Homo sapiens migrated from Africa to other continents. Another observation was that the African individuals represented by the YRI samples from Nigeria had a much larger number of kinases (240 genes) that contained eSNPs than did the Asian (53 genes) and Caucasian samples (77 genes), consistent with the observation that the YRI samples (∼74%) are more diverse than the CEU (∼15%) and ASN (∼7%) samples in terms of the proportion of total eSNPs.21 Again, this difference may reflect the evolutionary history of human populations migrating from Africa to other continents, as the Africans are older populations containing more genetic variation. On the other hand, no particular kinase group was exceptionally enriched among the kinase genes containing eSNPs, suggesting that no particular kinase group(s) underwent faster evolution relative to the other groups.

As many kinases perform their cellular function by the activation of ligands, surveying the relationship of population genetic variation between the kinase and ligand pairs could shed some light on the evolution of these dynamic cellular components. Using a list of experimentally verified ligands that were obtained from the DLRP,19 we found that the proportion of ligand genes that contained eSNPs was not different from that of the whole genome background. This suggests that the ligand genes may not be the major targets of adaptation for the signaling pathways involving kinases. On the other hand, some eSNP-containing ligand genes were found to share common kinase targets, but the ligand genes and their kinase targets may not necessarily contain eSNPs in the same population(s) (Table 1). For example, BTC (betacellulin) and NRG1 (neuregulin 1) are common ligands of ERBB4 (v-erb-a erythroblastic leukemia viral oncogene homolog 4) (Figure 2a); bone morphogenetic proteins BMP3, BMP5, BMP7 and BMP15 are common ligands of BMPR1A (BMP receptor, type IA), BMPR1B (BMP receptor, type IB) and BMPR2 (BMP receptor, type II) (Figure 2b). However, these ligand genes and their kinase targets often contained eSNPs in different populations, as illustrated in Figure 3 for BTC and the genes encoding its two kinase targets (EGFR and ERBB4). Although BTC and EGFR contained eSNPs in the YRI samples, ERBB4 contained eSNPs in all the three populations, suggesting that ligands and their kinase targets could be under different adaptation in each population. In addition, a search for signatures of recent positive selection in the kinase and ligand pairs showed that more kinases (11 genes) had significant ∣iHS∣17 scores than did ligands (2 genes) (Table 1). Therefore, it seems that the health-related disparities associated with kinase signaling pathways are more likely to be driven by the genetic variation in the kinase genes than by the genes encoding their cognate ligands. However, one limitation is that the current DLRP comprises only a subset of the ligands of kinase genes. In fact, only the TK and TLK groups of kinases are represented in the database. A more comprehensive list of kinase and ligand pairs may be necessary to evaluate the relationship of the evolutionary history of these genes.

Figure 3
figure 3

The nearest shrunken centroid method (NSCM) scores of the population-specific SNPs (eSNPs) of a ligand gene BTC and its kinase targets. X axis is the genomic position based on NCBI build 36. Y axis is the NSCM score, which was used to identify eSNPs. Gray: the CEU samples; Black: the ASN samples; and Light gray: the YRI samples. (a) BTC (Chr4). Ten intronic eSNPs are shown. (b) EGFR (Chr7), encoding a kinase receptor for BTC. Ten intronic eSNPs are shown. (c) ERBB4 (Chr2), encoding a kinase receptor for BTC. Fifty-two intronic eSNPs are shown.

Technically, the NSCM score22 was used to identify eSNPs in the HapMap samples. It is a discriminating value, which is small if there is little difference between the classes or if the variation of the SNP distribution is large.21 The NSCM has been proposed as a suitable approach to solving the classification problem when there are a large number of features (for example, ∼1 million HapMap SNPs) from which to predict a relatively small number of classes (for example, three HapMap populations).22 A limitation of this score is that the cutoff is empirical and a comprehensive evaluation relative to other metrics is lacking. There are also some important limitations of using the HapMap genotypic data. As the CEU samples were collected decades earlier24 than the YRI and ASN samples,13, 14 certain biases may occur because of the differences in cell line culture and transformation techniques.25 In addition, the current HapMap samples were obtained from individuals of the three major human populations. More samples from other populations (for example, the Phase 3 HapMap samples such as the Mexican Americans) will greatly benefit the investigation of population genetic variation in these genes. Furthermore, although the HapMap genotypic data are extensive (>3 million SNPs), the project was designed to cover only common genetic variants (minor allele frequency >5%).13, 14 As the human genome may contain ∼10 million SNPs, untyped or unknown genetic variants may also contribute significantly to the population differences in the human kinome and their ligands. Deep resequencing projects (for example, the 1000 Genomes Project26) using next-generation sequencing technologies27, 28 may allow researchers to more comprehensively catalog the human genetic variants,29 thus improving our understanding of the genetic variation in these important genes in the future.

Conflict of interest

The authors declare no conflict of interest.