Introduction

In bone marrow transplantation, the major histocompatibility complex antigens, or human leukocyte antigens (HLA), need to be compatible between the donor and recipient to avoid the development of acute graft-versus-host disease (GVHD). However, GVHD following bone marrow transplantation frequently occurs even in HLA-matched donor-recipient pairs. The incompatibility of the minor histocompatibility antigen (mHa) is a major cause of the development of acute GVHD in such pairs. If compatibilities of all the mHas are confirmed prior to the bone marrow transplantation, acute GVHD can be avoided effectively. Therefore, as many mHas as possible are required to be identified. To date, several mHas have been detected by in vitro experiments (Goulmy et al. 1983; den Haan et al. 1998). Although in vitro studies are essential for characterizing mHas, such studies for the screening of mHa genes are laborious and expensive.

A linkage analysis for mapping an mHa gene has been proposed by Lunetta and Rogus (1998). However, the following association analysis is necessary for detecting the mHa gene finally. In addition, the recent identification of a large number of single nucleotide polymorphisms (SNPs) in the human genome now enables us to perform a genome-wide association analysis for detecting mHa genes directly or indirectly. Furthermore, bone marrow transplantation is performed in unrelated pairs as well as in related pairs, whereas a linkage analysis cannot be applied to the former. Thus, an association test needs to be developed for detecting mHa genes.

Association studies using a candidate gene approach have been conducted for mHa genes (Behar et al. 1996; Nichols et al. 1996; Maruya et al. 1998), while statistical methods that have been used differ among studies and a suitable method has not been established yet. In some studies, a proportion of mismatch at a candidate mHa locus is compared between the donor-recipient pairs with GVHD and those without GVHD. However, in considering an etiology of GVHD caused by mHas, we should assess compatibility (described later in detail), rather than mismatch, at the mHa locus.

Bone marrow transplantation is generally performed either in sib pairs or in unrelated pairs, and an association test can be performed for both pairs. The statistical power for the former pair must be different from that for the latter. However, it is unclear which pair should be used for the association test in terms of statistical power. To detect mHa genes efficiently, it is necessary to perform a suitable statistical test and to evaluate the statistical power prior to the study. The present study will provide a guideline for designing an efficient association study of mHa genes.

Model

Probability of compatibility

The mHas are small self-peptides derived from intracellular proteins and presented by HLA class I and class II molecules to T cell receptors (Wallny and Rammensee 1990; Simpson and Roopenian 1997). Thus, all the genes coding self-peptides potentially produce mHas. Since the coding region of each self-peptide or potential mHa gene is short (den Haan et al. 1998), a point mutation causing an amino acid substitution is unlikely to occur within the same fragment frequently. Thus, we assume that there are at most two alleles at each mHa locus. In addition, we discuss multiple mHa loci in this paper because several mHas have been detected already. This assumption is different from that of Lunetta and Rogus (1998) who assumed a single mHa locus with multiple alleles.

Different peptides are bound by different HLA molecules, implying that causative mHas are considered to be different among donor-recipient pairs with different HLA molecules. To detect the mHa genes, association analysis as well as linkage analysis must be performed for HLA-matched donor-recipient pairs sharing same HLA alleles. In addition, pairs with same affected parts (e.g., skin, gut, and liver) of GVHD may be selected as case pairs, because tissues in which the mHas are expressed differ among different mHas. Throughout this paper, we consider association analysis performed for such pairs.

We assume that there are m mHa loci in a population and there are two alleles, Ai and ai, at ith (i = 1...m) mHa locus. All mHa loci are assumed not to be linked to HLA loci (Schreuder et al 1993) and not linked to each other. In bone marrow transplantation, when a recipient has an mHa allele not possessed by a donor (say, “incompatibility”), GVHD may develop. On the other hand, when a donor has no mHa allele not carried by a recipient (say, “compatibility”), GVHD never develops. The definition of incompatibility and compatibility in terms of genotype combination is presented in Table 1.

Table 1. Definition of compatibility (C) and incompatibility (IC) in a donor-recipient pair

Let p i and q i (=1-p i) be the population frequencies of Ai and ai, respectively, and A i is assumed to be a minor allele (i.e., 0<p i≤0.5). Assuming that Hardy-Weinberg equilibrium holds at each mHa locus in the population and donor-recipient pairs are randomly sampled, the probability of an unrelated pair (referred to as u) being compatible at ith mHa locus, Pr(Ci|u), and that of sib pairs (referred to as s), Pr(Ci|s), are given as follows (Lunetta and Rogus 1998):

$$ \Pr {\left( {{\text{C}}_{i} \left| {\text{u}} \right.} \right)} = p^{4}_{i} + q^{4}_{i} + 2p_{i} q_{i} $$

and

$$ \Pr {\left( {{\text{C}}_{i} \left| {\text{s}} \right.} \right)} = \frac{1} {4}{\left( {p^{4}_{i} + q^{4}_{i} + 2p^{3}_{i} + 2q^{3}_{i} + 6p_{i} q_{i} + 1} \right)}. $$

since 0<p i≤0.5, 0.625≤Pr(Ci|u)<1 and 0.78125≤Pr(Ci|s)<1. In addition, Pr(Ci|u) is always smaller than Pr(Ci|s) for the same p i.

GVHD Model

We assume that the occurrence of GVHD is influenced by the number of mHa loci being incompatible in a donor-recipient pair. That is, a larger number of incompatible loci in a donor-recipient pair are more likely to cause acute GVHD. Furthermore, the effect of each mHa locus on the development of GVHD is assumed to be independent. The probability that GVHD is caused by incompatibility at ith mHa locus is denoted by r i. In this model, r i is not influenced by either compatibilities or incompatibilities at other mHa loci. Under this assumption, the probability that GVHD develops in an unrelated pair, K, and that in a sib pair, K S, are given by

$$ 1 - {\prod\limits_{i = 1}^m {{\left[ {1 - {\left( {1 - \Pr ({\text{C}}_{i} \left| {{\text{u}})} \right.} \right)}r_{i} } \right]}} } $$
(1)

and

$$ 1 - {\prod\limits_{i = 1}^m {{\left[ {1 - {\left( {1 - \Pr ({\text{C}}_{i} \left| {{\text{s}})} \right.} \right)}r_{i} } \right]}} } $$
(2)

respectively.

If K and K S are known, p and r may be inferred from eqs. (1) and (2) without predetermination of m (->deleted). When we assume that all the m mHa loci have the same minor allele frequency (i.e., p 1 = ... = p i = ... = p m = p) and that the probability of GVHD developing is the same at all the mHa loci (i.e., r 1 = ... = r i = ... = r m = r), we may obtain p and r that satisfy equations (1) and (2), although whether it is possible or not depends on the values of K and K S.

Association test

Test statistic

In an association analysis of the mHa gene, a proportion of incompatible pairs for a candidate mHa locus or marker locus is compared between pairs in which GVHD develops in the recipient (GVHD+) and those in which the GVHD does not develop (GVHD-). When an mHa locus or a marker locus being in linkage disequilibrium with the mHa locus is examined, GVHD+ pairs are more frequently incompatible than GVHD- pairs. A comparison of the two proportions is a statistical test suitable for this purpose. In short, the null hypothesis H 0 of no association between the candidate locus and GVHD is examined using the test statistic:

$$ Z = \frac{{\hat{q}_{ + } - \hat{q}_{ - } }} {{{\sqrt {\hat{q}{\left( {1 - \hat{q}} \right)}{\left( {\frac{1} {{N_{ + } }} + \frac{1} {{N_{ - } }}} \right)}} }}} $$

since the difference in the two proportions can be assumed to follow a normal distribution with a mean of 0 and a standard deviation of \( {\sqrt {\hat{q}{\left( {1 - \hat{q}} \right)}{\left( {\frac{1} {{N_{ + } }} + \frac{1} {{N_{ - } }}} \right)}} } \) under H 0. Here, N + and N - are the numbers of GVHD+ and GVHD- pairs, and \( \hat{q}_{ + } \) and \( \hat{q}_{ - } \) represent sample frequencies of pairs incompatible at the candidate or marker locus in GVHD+ and GVHD- pairs, respectively. \( \hat{q} \) is given by \( \frac{{N_{ + } \hat{q}_{ + } + N_{ - } \hat{q}_{ - } }} {{N_{ + } + N_{ - } }} \). It should be noted here that this test is one-sided (i.e., Z = 1.64 corresponds to α = 0.05), because the alternative hypothesis H 1 is q +−q -->0.

Power of association test

Under the alternative hypothesis H 1 of the association between ith mHa locus and GVHD, sample frequencies of incompatible pairs at ith mHa locus are given, by Bayes theorem, as:

$$ \begin{array}{*{20}l} {{q_{ + } } \hfill} & {{ = \frac{{{\left( {1 - \Pr ({\text{C}}_{i} \left| {\text{u}} \right.)} \right)}r_{i} + {\left( {1 - \Pr ({\text{C}}_{i} \left| {\text{u}} \right.)} \right)}(1 - r_{i} ){\left( {1 - \frac{{{\prod\limits_{i = 1}^m {{\left[ {1 - {\left( {1 - \Pr ({\text{C}}_{i} \left| {{\text{u}})} \right.} \right)}r_{i} } \right]}} }}} {{1 - {\left( {1 - \Pr ({\text{C}}_{i} \left| {{\text{u}})} \right.} \right)}r_{i} }}} \right)}}} {{1 - {\prod\limits_{i = 1}^m {{\left[ {1 - {\left( {1 - \Pr ({\text{C}}_{i} \left| {{\text{u}})} \right.} \right)}r_{i} } \right]}} }}}} \hfill} \\ {{} \hfill} & {{ = \frac{{{\left( {1 - \Pr ({\text{C}}_{i} \left| {\text{u}} \right.)} \right)}r_{i} + {\left( {1 - \Pr ({\text{C}}_{i} \left| {\text{u}} \right.)} \right)}(1 - r_{i} ){\left( {1 - \frac{{1 - K}} {{1 - {\left( {1 - \Pr ({\text{C}}_{i} \left| {\text{u}} \right.)} \right)}r_{i} }}} \right)}}} {K}} \hfill} \\ \end{array} $$

in GVHD+ unrelated pairs and

$$ q_{ - } = \frac{{{\left( {1 - \Pr ({\text{C}}_{i} \left| {{\text{u}})} \right.} \right)}(1 - r_{i} )}} {{1 - {\left( {1 - \Pr ({\text{C}}_{i} \left| {{\text{u}})} \right.} \right)}r_{i} }} $$

in GVHD- unrelated pairs. Replacing Pr(Ci|u) and K by Pr(Ci|s) and K S in the above equations, we obtain those for sib pairs.

Assuming that Z is approximately normally distributed both under H 0 and H 1, the power of 1 - β with a significance level of α can be calculated by

$$ 1 - \Phi {\left( {z_{{1 - \beta }} } \right)} = 1 - {\int\limits_{ - \infty }^{z_{{1 - \beta }} } {\frac{1} {{{\sqrt {2\pi } }}}e^{{ - \frac{{x^{2} }} {2}}} dx} } $$

where \( z_{{1 - \beta }} = \frac{{{\sqrt {q_{0} (1 - q_{0} ){\left( {\frac{1} {{N_{ + } }} + \frac{1} {{N_{ - } }}} \right)}} }z_{\alpha } - (q_{ + } - q_{ - } )}} {{{\sqrt {\frac{{q_{ + } (1 - q_{ + } )}} {{N_{ + } }} + \frac{{q_{ - } (1 - q_{ - } )}} {{N_{ - } }}} }}} \) and \( q_{0} = \frac{{N_{ + } q_{ + } + N_{ - } q_{ - } }} {{N_{ + } + N_{ - } }} \). Similar formulas have been applied to calculate power of case-control association analysis of complex disease genes (Ohashi et at. 2001; Ohashi and Tokunaga 2002), although a more accurate method for the calculation of power has been developed (Jackson et al. 2002). To obtain the power for a study using the above formula, q +, q -, N +, N -, and z α must be predetermined. For the determination of q + and q -, we need to consider p i, r, and K for unrelated pairs (K S for sib pairs). We should note that (1−Pr(Ci|u))r iK must be satisfied for unrelated pairs (1−Pr(Ci|s))r iK s for sib pairs). Figures 1 and 2 show powers of association tests for the ith mHa gene for unrelated pairs and for sib pairs.

Fig. 1.
figure 1

Power of association test for unrelated pairs as a function of N (N = N + = N -). z α is set to 1.64 (corresponding to α = 0.05). Combinations of r i and p i are given as follows: a (r i = 1 and p i = 0.01), b (r i = 1 and p i = 0.1), c (r i = 1 and p i = 0.5), d (r i = 0.5 and p i = 0.01), e (r i = 0.5 and p i = 0.1), and f (r i = 0.5 and p i = 0.5). In the case of K = 0.2, the combination of r i = 1 and p i = 0.5 is not achieved

Fig. 2.
figure 2

Power of association test for sib pairs as a function of N (N = N + = N -). z α is set to 1.64 (corresponding to α = 0.05). Combinations of r i and p i are given as follows: a (r i = 1 and p i = 0.01), b (r i = 1 and p i = 0.1), c (r i = 1 and p i = 0.5), d (r i = 0.5 and p i = 0.01), e (r i = 0.5 and p i = 0.1), and f (r i = 0.5 and p i = 0.5). In the case of K S = 0.2, the combination of r i = 1 and p i = 0.5 is not achieved

Throughout this paper, it is assumed that N = N + =N - andz α = 1.64 (corresponding to α = 0.05). In both figures, as p i, and r i increase, the power increases. In contrast, as K decreases, the power increases because the contribution of ith mHa gene to the occurrence of GVHD is increased in the present model. Since K is considered to be in the range of 0.50–0.80, and K S is in the range of 0.16–0.50 (Hägglund et al. 1995; Kernan and Dupont 1996; Michallet et al. 1996), we may say that with a small sample size, the present association test can detect an mHa gene with large p and r values.

Based on the above calculations, it is impossible to determine which pairs (e.g., unrelated or sib) should be analyzed in a population. Here we present an application of the present method to the Japanese population, where K and K S were reported as 0.652 and 0.495 in the Annual Report of Nationwide Survey 2001 by The Japan Society for Hematopoietic Cell Transplantation (acute GVHD of grades 1–4 severity are regarded as GVHD in our paper), although it should be noted that these values are obtained regardless of HLA types in donor-recipient pairs.

Figure 3 shows the power of association test for the Japanese population. From this figure, we can see that the power for unrelated pairs is higher than that for sib pairs when mHa with a high r value is analyzed. Although there is a case in which the association test for sib pairs shows a higher power than that for unrelated pairs (e.g., r = 0.2), it is suitable for collecting unrelated pairs in the Japanese population, rather than sib pairs, for detecting the mHa gene with large contribution (i.e., high r) to the development of GVHD. As shown here, when K and K S are known, the present method allows us to select suitable pairs as subjects of the study.

Fig. 3.
figure 3

Comparison of power between unrelated pairs and sib pairs in the Japanese population, where K and K S are 0.652 and 0.495, respectively. Power is represented as a function of N (N = N + = N -). z α is set to 1.64 (corresponding to α = 0.05). Curves are given as follows: a (p i = 0.01 for unrelated pairs), b (p i = 0.01 for sib pairs), c (p i = 0.1 for unrelated pairs), d (p i = 0.1 for sib pairs), e (p i = 0.5 for unrelated pairs), and f (p i = 0.5 for sib pairs). In the case of r i=0.5, curves c and f are generally identical

Although unrelated pairs and sib pairs have been considered in this study, an association test can be performed for the other relative pairs, such as parent/offspring pairs. The power can be estimated also based on the probability of being compatible at the mHa locus in a relative pair. To calculate the probability of compatibility, it is useful to consider the probability that a relative pair shares i (i = 0, 1, 2) alleles identity-by-descent as presented by Lunetta and Rogus (1998).

Discussion

We introduced a comparison of two proportions as an association test of the mHa gene for transplants in HLA-matched pairs and examined its statistical power. Our results suggest that the association test for an mHa locus with large values of p and r shows a high power. Once such an mHa is detected, the development of GVHD can be avoided effectively in the population because the contribution to the occurrence of GVHD is large. Furthermore, if mHa loci detected already are considered in the association test (i.e., when only donor-recipient pairs in which such mHa loci are compatible are analyzed), the power of the test for a novel mHa gene is expected to be increased.

However, we should note the disadvantage of association study of mHa gene based on a candidate gene approach. Because SNPs with amino acid alterations could be candidates, it may be difficult to choose a true mHa gene as a candidate. Therefore, genome-wide linkage analysis is still an attractive method for detecting a mHa locus. The power of genome-wide linkage analysis (discordant sib pair analysis) under the present GVHD model will be considered in more detail elsewhere.

The development of GVHD due to incompatibilities of mHa genes is largely influenced by allele frequencies at mHa loci. The allele frequency of the mHa gene would be different among different populations, and HLA alleles are known to be different among different populations, suggesting causative mHa genes are different among different populations. It is, therefore, essential to detect mHa genes in each population.

In the present study, linkage disequilibrium testing of the mHa gene using polymorphic markers has not been described. For the use of biallelic and multiallelic markers such as SNPs and microsatellites, additional determinations of allele frequency at the marker and the degree of linkage disequilibrium allow us to estimate the power of the association test based on the conditional probabilities as mentioned elsewhere (Ohashi and Tokunaga 2001, Ohashi and Tokunaga 2002, Ohashi and Tokunaga (2003) in press). As the difference in allele frequency between the mHa gene and the marker is increased, the power of study is markedly reduced (data not shown). To date, a large number of SNPs with amino acid alterations have been detected in the human genome. Thus, in genome-wide association analysis using SNPs, we should examine SNPs with amino acid alterations rather than those as markers in intergenic regions.

A linkage analysis proposed by Lunetta and Rogus (1998) requires only GVHD+ sib pairs, while both GVHD+ and GVHD- pairs are necessary for the present association analysis. If GVHD- sib pairs are available, a linkage analysis for affected sib pairs (Risch 1990) can be performed for GVHD- pairs to search genomic regions frequently shared by GVHD- sib pairs (Ohashi et al. 2002). Such an analysis could be powerful, especially for the mHa gene with a low r value. Because both GVHD+ and GVHD- sib pairs can be used for both linkage and association analyses, it may be effective to collect both pairs for detecting the mHa genes.

In this study, the severity of GVHD was not considered, whereas actual severities of GVHD are classified into several grades. If there is an association of incompatibility at a particular mHa locus with the severity of the GVHD, it is suited to analyze only GVHD+ pairs with the same grade. If many GVHD+ pairs are available, an association test should be performed for each grade.