Introduction

Platelets are small, anucleate cell fragments that play an integral role in the blood clotting process. The number of platelets in a given volume of blood (ie the platelet count) is correlated with several clinically relevant phenotypes including blood clotting time,1 and may be a risk factor in the development of thrombosis2, 3 and atherosclerosis.4, 5 A very low platelet count (thrombocytopenia) is a common side effect of cancer chemotherapy and is a limiting factor in the dose of cytotoxic agents that can be used. Current treatment typically involves the transfusion of platelet concentrates, but this carries significant risks including transmission of infectious agents as well as alloimmunization and transfusion reactions. In addition, clinical trials of several new agents including recombinant thrombopoietin (a major regulator of platelet production) have yet to demonstrate a clear benefit in the treatment of thrombocytopenic cancer patients receiving dose-intensive chemotherapy for acute leukemia. There is clearly an urgent need to develop other agents which are effective in treating thrombocytopenic states.

We have shown that platelet count is highly heritable with genetic factors explaining 80% of the phenotypic variance in adolescent twins.6 In this paper we extend these results by performing a genome-wide linkage scan of the dizygotic (DZ) twin pairs and their families from that study. We measured twins' platelet count longitudinally at 12, 14 and 16 years of age, and sought to increase the power to detect linkage by performing multivariate multipoint sib-pair linkage analyses across the genome.7, 8, 9, 10, 11 Identification of QTLs for platelet count will improve our understanding of platelet biology and may suggest novel treatment strategies for thrombocytopenic conditions.

Subjects and methods

Twins were recruited as part of an ongoing study concerned with the development of melanocytic naevi (moles), the clinical protocol of which has been described in detail elsewhere.12, 13, 14 Twins were enlisted by contacting the principals of primary schools in the greater Brisbane area, by media appeals and by word of mouth. It is estimated that approximately 50% of the eligible birth cohort were recruited into the study and they were typical of the population with respect to platelet concentrations.6 Informed consent was obtained from all participants and parents prior to testing.

Venous blood samples were collected using EDTA tubes and platelet count was subsequently measured using a Coulter Model STKS blood counter. Blood was collected from twins at 12, 14 and 16 years of age. Where possible, DNA was also obtained from parents and siblings for genotyping (platelet count was not measured in these relatives). No attempt was made to exclude subjects suffering from illness.

Platelet count was available from 745 twin pairs comprising 162 MZ female, 165 MZ male, 101 DZ female, 110 DZ male and 207 opposite sex (OS) twin pairs (consisting of 94 pairs where the female was born first and 113 pairs where the male was born first). Although twins were tested as close as possible to their 12th, 14th and 16th birthdays, not all twins were tested at all three measurement occasions (see Table 1 for a breakdown of these data).

Table 1 Breakdown of participation data showing the number of complete twin pairs for whom platelet count and genotyping information (in the case of DZ pairs) were available

Genotypes

DNA was extracted from buffy coats using a modification of the ‘salt method’.15 For same sex twin pairs, zygosity was determined by typing nine independent DNA microsatellite polymorphisms plus the X/Y amelogenin marker for sex-determination by polymerase chain reaction (ABI Profiler system). All twins were also typed for ABO, Rh and MNS blood groups.

The genome scan consisted of 726 highly polymorphic autosomal microsatellite markers at an average spacing of 5 cM in 539 families (2360 individuals). Markers on the X chromosome were also typed but linkage to these is not reported here. The microsatellites consisted of a combination of markers from the ABI-Prism and CIDR genotyping sets. Overlapping parts of the sample received either a 10 cM scan using the ABI-2 marker set (400 markers) at the Australian Genome Research Facility (Melbourne), a 10 cM scan using the Weber marker set at Center for Inherited Disease Research (Baltimore), or both. Only 30 markers were common to both marker sets and were used for quality control; the remaining markers intercalated to form a scan at approximately 5 cM spacing. The only families to receive one scan had both parents genotyped, and so had a high information content. The average heterozygosity of markers was 0.78, and the mean information content was 0.77. Full details of the genome scan are provided elsewhere.16

Linkage analyses

Univariate multipoint variance components linkage analysis was used to test for linkage between marker loci and blood cell phenotypes.9, 17, 18, 19 In the variance components method, the expected phenotypic covariance matrix is partitioned into the following components:

where σq2 is the additive genetic variance due to the QTL, σa2 is the (residual) polygenic additie genetic variance, and σe2 is the unique environmental variance.17, 20 is a matrix containing the elements , which denote the estimated proportion of alleles shared IBD at the trait locus by individuals i and j. is an estimate of the true identity by descent sharing matrix Π and is a function of the estimated IBD matrices as well as the distance between the marker and the QTL.17, 21 Φ is the Kinship matrix containing elements ϕij – the kinship coefficients between individuals i and j (ie equal to half the additive genetic sharing coefficients),22 and In is an identity matrix of order n.

Variance components were estimated by maximum-likelihood analysis of the raw data23 as implemented in the software package MERLIN24 along with fixed effects for sex and age. Since both circadian and seasonal effects have been reported for platelet count, linear, quadratic and sinusoidal fixed effects were included for the time of day and month from which blood was sampled.25 Univariate linkage analyses were performed at each marker at each age. Only phenotypic data from DZ pairs were included in the analyses because MZ twins share all their genes identical by descent (IBD) across the genome, and are thus uninformative for linkage.

The null hypothesis that additive genetic variance in platelet count caused by a QTL linked to a given marker is zero (ie σq2=0) was tested by comparing the likelihood of a reduced model in which σq2 was constrained to zero with the likelihood of a model in which the genetic variance due to the QTL (σq2) was estimated. Twice the difference in natural log-likelihood between these models is distributed asymptotically as a 1/2:1/2 mixture of χ12 and a point mass at zero,26 while the difference between the two log10 likelihoods produces a LOD score equivalent to the classical LOD score of parametric linkage analysis.27

Previous studies have suggested that variance components linkage analysis is sensitive to deviations from multivariate normality, particularly to high levels of kurtosis in the trait distribution.28 The distribution of platelet counts for 12, 14 and 16 year olds exhibited a kurtosis of 0.102, 0.068 and −0.002 respectively. Recent statistical genetics theory suggests that this level of kurtosis will have relatively minor effects on the distribution of LOD scores and that the standard nominal P-values for LOD scores are appropriate in these cases.29

Multivariate analyses

Several groups have demonstrated that multivariate methods are a powerful method of detecting QTLs that influence a set of phenotypes pleiotropically.7, 8, 11, 30 Given that the Pearson correlations between platelet count at the various time points are quite high (r12&14=0.74; r14&16=0.71; r12&16=0.65), our longitudinal data for platelet count are well suited to multivariate QTL linkage analysis.31 In this study, multivariate QTL linkage analyses were performed using structural equation modeling as implemented in Mx,9, 10, 32 using data from both MZ and DZ pairs. The QTL was modeled as a single latent factor that pleiotropically affected the phenotypes (Figure 1). In this model, the correlation between the QTL effects was set to one for MZ twins (since MZ twins are genetically identical), and to , the estimated proportion of alleles shared IBD at the marker locus, for DZ twins. Probability of sharing zero (p0), one (p1) or two (p2) marker alleles IBD for each DZ twin pair was calculated in a multipoint fashion using a modification of the Lander-Green algorithm as implemented in MERLIN.24 These probabilities were then used to obtain =p2+1/2p1 for each DZ pair at each marker location. Residual sources of variation were modeled using saturated Cholesky structures (Figure 1). Since phenotypic information was present from MZ twins, it was possible to partition the residual familial resemblance into separate variance components due to additive genetic (A) and common environmental (C) sources of variation.

Figure 1
figure 1

Path diagram for multivariate variance components QTL linkage analysis. The QTL is parameterized as a single latent variable that pleiotropically affects the phenotypes. Residual sources of variation are parameterized using Cholesky structures. Only additive genetic and unique environmental residual sources of variation are shown. When MZ twins are present, similar structures for the common environment/dominance may also be included in the model.

Trivariate linkage analyses of platelet count at age 12, 14 and 16 were conducted at each marker across the genome. For these longitudinal analyses we included the same fixed effects as in the univariate models above as well as age deviations at 14 and 16 years of age in the analysis.20

Two multivariate tests for linkage were performed. In the first (‘Multivariate Test I’), the factor loadings of the QTL on platelet count at 12, 14 and 16 were unconstrained. As the true values of some of these parameters under the null hypothesis of no linkage are located on the boundary of the parameter space defined by the alternative hypothesis, the likelihood ratio test statistic is distributed as a complicated mixture of χ2 distributions.26 We therefore generated the empirical distribution of the test statistic under the null hypothesis of no QTL effects on any trait, while assuming the same background genetic and environmental correlations as in the actual data (10 000 simulations). We then used this empirical distribution (μ=2.83, σ2=6.50) to evaluate the significance of Multivariate Test I.

In order to perform the second test (ie Multivariate Test II) we first tested whether we could equate the factor loadings of the three measurement occasions on the QTL (ie we calculated the difference in minus two log-likelihood between the full model and the model with the equated loadings and evaluated it against a χ2 distribution with two degrees of freedom). This was equivalent to testing whether the QTL was responsible for the same amount of phenotypic variation at each age. If this were the case, then the test for linkage was whether the (equated) loadings could then be set to zero. Since only one QTL variance component was estimated, the test statistic was distributed as in the univariate case (ie a 50:50 mixture of a point mass at zero and χ12). Note that this test is equivalent to taking the mean of the phenotypes across the three ages and performing a univariate test of linkage on this statistic.11

Results

The results from the genome-wide univariate and multivariate tests of linkage are displayed in Figures 2 and 3. Figure 2 shows the difference in χ2 at each marker along the genome, while Figure 3 displays the corresponding significance levels for these tests of linkage. The most prominent result was the linkage peak at ages 12 (LOD=2.12, p=0.000896), 14 (LOD=2.23, p=0.00068) and 16 (LOD=1.01, p=0.015563) in the same region of chromosome 19q. Other regions of suggestive linkage (ie LOD scores > 2.0) were also observed on chromosome 4 at age 16 (LOD=2.49, p=0.000357), on different regions of chromosome 5 at ages 12 (LOD=2.14, p=0.000852) and 16 (LOD=2.02, p=0.001151), on chromosome 10 at age 12 (LOD=2.1, p=0.00942), and on chromosome 15 at age 16 (LOD=2.44, p=0.000404). With the exception of one of the peaks on chromosome 5, which showed some evidence of linkage at the other two ages (12: LOD=0.94, p=0.018789; 14: LOD=0.89, p=0.021518), there was little evidence for linkage in similar regions at other ages.

Figure 2
figure 2

Difference in minus two log-likelihood χ2 for platelet count. Distance in cM is shown on the x-axis.

Figure 3
figure 3

Significance levels for univariate and multivariate tests of genetic linkage for platelet count. Distance in cM is shown on the x-axis.

Comparison of the pink and dark blue lines with the others in Figure 2 show that both multivariate tests were associated with an increase in χ2 across most regions of the genome. However, in the case of the first multivariate test, this increase in χ2 was also accompanied by an increase in the associated degrees of freedom. This resulted in a decreased level of significance across many areas of the genome, including the region on chromosome 19q (Figure 3). In contrast, the second multivariate test increased the power to detect linkage in several regions including chromosome 19q at the D19S220 (LOD=2.59, p=0.000277) and D19S420 markers (LOD=2.11, p=0.00092), as well as regions on chromosomes 2q, 10 and 15 (see Table 2 for a summary of regions of the genome with a LOD score in excess of one for the second multivariate analysis).

Table 2 Regions with LOD>1 for the second multivariate analysis of platelet count

Figure 4 displays the standardized path coefficients under the full multivariate model, under the model where the QTL factor loadings were equated, and under the model of no QTL effects. Results are shown for the D19S220 marker (ie the marker where evidence for linkage was maximal). Note in particular how the factor loadings for the QTL under the full multivariate model were similar at all ages, suggesting that the QTL was responsible for similar amounts of variation in the phenotype at all ages. Consistent with this interpretation, the difference in χ2 between the full multivariate model and the model where the QTL factor loadings were equated was not significant (χ22=2.19, p=0.335).

Figure 4
figure 4

Standardized parameter estimates for the multivariate analysis of platelet count at the D19S220 marker. Results are shown for the full multivariate model: (a) –2LL=2963.589 df=2761; the model where the QTL factor loadings are equated: (b) –2LL=2965.777 df=2763; and the model where the QTL effect has been dropped (c) –2LL=2977.704 df=2764.

Discussion

To our knowledge, this is the first study to have used a complete genome scan to attempt to map genes responsible for variation in platelet count. A major strength of this study has been the availability of multiple measurements on the same individuals across time. Not only has this permitted multivariate QTL linkage analysis,9, 10, 11 but has also provided supporting evidence for linkage when a signal has been present in the same region at multiple ages. While these common regions of linkage do not formally constitute replication (since the longitudinal data are not independent), they do indicate the extent to which the present results are robust with respect to measurement error and temporal changes in the phenotype.

The most striking results from this study were the linkage peaks on the same region of chromosome 19q which were present at all three ages. The evidence for linkage in this region was greatest at ages 12 and 14, possibly reflecting the smaller number of DZ pairs from whom platelet count was available at age 16 (see Table 1). In addition, multivariate analyses (ie Multivariate Test II) increased the evidence for linkage at some of the markers in this region increasing support for the existence of a QTL. The most promising candidate in this region is the gene for glycoprotein VI. Glycoprotein VI is a membrane-bound receptor that plays an important role in platelet aggregation in response to collagen.33 It has been demonstrated that polymorphism in this gene is associated with differences in collagen-induced platelet aggregation.34 Very recently, Nurden et al (2004) reported a patient with symptoms of gray platelet syndrome (ie a low platelet count and absence of α-granules) accompanied by a severe deficiency in glycoprotein VI.35 These and similar observations have led to speculation that polymorphism in the gene for glycoprotein VI may be a risk factor in the development of arterial thrombosis.36 Our results suggest that glycoprotein VI or a gene close by may be responsible for nonpathological variation in platelet count.

Several other regions of ‘suggestive’ linkage (defined as LOD>2.0) were identified in this study, including a peak on chromosome 4 at age 16, two peaks on chromosome 5 at ages 12 and 16, respectively, a peak on chromosome 10 at age 12, and a peak on chromosome 15 at age 16. Curiously, in most cases there was little evidence of linkage in these regions at other ages (the exception was one of the peaks on chromosome 5). One possibility is that different genes may regulate platelet count at different ages. However, we consider this explanation unlikely given that the majority of the genetic variance was transmitted through the time series, and that the role of new genetic innovation was minimal (fitting a time series model to the data did not change interpretation of the results — data not shown).31 More likely is that some of these peaks represent stochastic variation associated with the sampling of a complex phenotype. That is, while some of these regions will harbor QTLs, other peaks will simply be a result of random fluctuation and type I error. While multivariate analysis can go some way to addressing these questions (see below), the key to assessing the significance of these linkages will be replication using independent data sets.

Since this is the first study of its kind, there are no previous linkage studies with which to compare the location of the peaks. However, it is interesting to note that there was little evidence for linkage in the chromosomal regions containing the genes for thrombopoietin (3q27) and its receptor (1q34). While mutations in the genes for thrombopoietin and the thrombopoietin receptor are known to produce clinical pathology (eg Congenital Amegakaryocytic Thrombocytopenia, Essential Thrombocythemia), the present results suggest that other regions of the genome may be more important in determining variation in platelet count amongst healthy adolescent twins.37, 38

In most positions along the genome the multivariate analyses (ie Multivariate test I and Multivariate test II) did not increase the power to detect linkage relative to the univariate results. This was perhaps surprising given that several theoretical papers have documented a clear advantage of multivariate approaches in the power to detect QTLs.7, 8, 11 There are several possible reasons for this discrepancy. First, we note that since there are three phenotypes (ie platelet count at 12, 14 and 16), strictly speaking each univariate test should be evaluated with a higher threshold of significance than if a single trait were analyzed. Second, any increase in power from a multivariate analysis will tend to be small when the traits are highly correlated.39 Third, univariate analyses are more susceptible to random variation. In other words, it is not uncommon to see more similar trait values in individuals who share two alleles IBD at the marker locus than individuals who share zero marker alleles IBD as a result of random variation in the bivariate distribution of phenotypes. In contrast, multivariate analyses are less susceptible to this randomness because additional information is incorporated into the test through the trait covariances. In other words, to increase the evidence of linkage, any random variation would have to induce an appropriate pattern of IBD sharing not only in the cross-sibling covariance, but also in the cross-trait cross-sibling covariances as well. It is for this reason that the multivariate result should be a more precise reflection of the true effect due to the QTL.10

However, a complicating factor in the interpretation of the multivariate analyses is the large amount of missing data in the study. Multivariate approaches provide increased power to detect linkage because they capture additional information on the QTL from the cross-time covariances as well as the cross-sibling cross-time covariances. Since a substantial proportion of sib-pairs were measured on only one occasion (see Table 1), information from the cross-time covariances was not always available. Since the degrees of freedom associated with the first multivariate test increased with the number of traits in the analysis, this may have resulted in a decrease in the power to detect linkage. In this regard, we note that the significance of the first multivariate test was evaluated against an empirical distribution of χ2 scores rather than against an asymptotic distribution. This was because the asymptotic distribution of the likelihood ratio statistic for multivariate tests of linkage is complicated and has yet to be well characterized.10, 30 We suggest that there is an urgent need to characterize the asymptotic distribution associated with these multivariate tests of linkage so that the full power associated with these approaches can be realized.

In contrast, the test statistic for Multivariate test II is distributed exactly as in the univariate case (ie a 50:50 mixture of a point mass at zero and χ12). This test is a natural and powerful method of increasing the power to detect QTLs, which is particularly appropriate when analyzing repeated measures data (so long as the QTL affects the measurements pleiotropically). The test provided increased evidence of linkage at several regions across the genome including areas on chromosomes 2, 10 and 15, but most importantly at two markers on chromosome 19q13.13–19q13.31. We also note that another group has recently demonstrated suggestive evidence of linkage to platelet count in a homologous region of the mouse genome.40

In conclusion, we have identified an area on chromosome 19q that shows suggestive linkage to platelet count at 12, 14 and 16 years of age. Multivariate analyses provided a small increase in the power to detect linkage at two of the markers in this region. We intend to fine map this location (ie increasing marker density) and perform association analyses involving the gene for glycoprotein VI to better localize the area involved. In addition, the univariate and multivariate analyses revealed several other regions of interest on chromosomes 2, 4, 5, 8, 10 and 15. We hope that this study represents the first step in the eventual identification of genes that will not only increase our understanding of platelet biology but also suggest novel therapeutic possibilities in the treatment of thrombocytopenic states.