## Abstract

Correlated multiple testing is widely performed in genetic research, particularly in multilocus analyses of complex diseases. Failure to control appropriately for the effect of multiple testing will either result in a flood of false-positive claims or in true hits being overlooked. Cheverud proposed the idea of adjusting correlated tests as if they were independent, according to an â€˜effective numberâ€™ (*M*_{eff}) of independent tests. However, our experience has indicated that Cheverud's estimate of the *M*_{eff} is overly large and will lead to excessively conservative results. We propose a more accurate estimate of the *M*_{eff}, and design *M*_{eff}-based procedures to control the experiment-wise significant level and the false discovery rate. In an evaluation, based on both real and simulated data, the *M*_{eff}-based procedures were able to control the error rate accurately and consequently resulted in a power increase, especially in multilocus analyses. The results confirm that the *M*_{eff} is a useful concept in the error-rate control of correlated tests. With its efficiency and accuracy, the *M*_{eff} method provides an alternative to computationally intensive methods such as the permutation test.

### Similar content being viewed by others

## Introduction

Complex diseases such as diabetes, schizophrenia and cancer lay a heavy burden on public health, with frequencies over 1%. As they are caused by interactions among genes, an increasing number of studies (Hoh et al, 2001; Nelson et al, 2001; Ritchie et al, 2001; Cordell and Clayton, 2002) have employed multilocus methods to detect the causative genes. Despite the abundance of such studies, only 16â€“30% of the reports of significant associations have been consistently replicated (Page et al, 2003). One reason for the low replication rate is the failure in controlling for the effect of multiple testing (Lander and Kruglyak, 1995).

Multilocus methods search loci in combinations, which inevitably tests a great number of hypotheses simultaneously: a *k*-locus analysis of *n* loci would have *C*_{n}^{k} locus-combinations to inspect. In addition to *C*_{n}^{k} being a large and cumbersome number, the tests are correlated with each other. For example, locus-combination *AB* is obviously correlated with combination *BC* for they share a mutual locus *B*. Even when loci are completely independent, their combinations are still correlated. These characteristics complicate the control of the error rate in multilocus analyses. Appropriate adjustment for multiple testing is very important to prevent a flood of false-positive claims and to prevent true associations being overlooked.

A traditional criterion for multiple testing is the experiment-wise significant level (EWSL, *Î±*_{e}), which is the probability that one or more hypotheses would be falsely rejected. To control the EWSL, the corresponding point-wise significant level (PWSL, *Î±*_{p}), which is the probability that an individual hypothesis would be falsely rejected, should be calculated appropriately. For independent tests, the Bonferroni (or Sidak) correction provides a simple and accurate control: *Î±*_{p}=1âˆ’(1âˆ’*Î±*_{e})^{1/M}â‰ˆ*Î±*_{e}/*M*, where *M* is the number of tests. However, this adjustment is overly conservative for correlated tests. Lander and Botstein (1989) and Lander and Kruglyak (1995) offered analytical adjustments for correlated tests in single-locus analyses, but it is hard to apply them to multilocus analyses.

In practice, most researchers have therefore employed resampling-based methods to control the EWSL of correlated tests. Hoh et al (2001) and Ritchie et al (2001) used the permutation test; Nelson et al (2001) used the cross-validation test. Although resampling-based methods are accurate, they demand intensive computation, which constrains them to cases with a limited number of tests. Taking the permutation test as an example, Churchill and Doerge (1994) recommended at least 1000 shuffles to estimate a 0.05 EWSL and at least 10â€‰000 shuffles to estimate a 0.01 EWSL.

An alternative criterion for multiple testing is the false discovery rate (FDR) (Benjamini and Hochberg, 1995), which is the expected proportion of falsely rejected hypotheses among all those rejected. When all hypotheses are true, the FDR is equal to the EWSL, but otherwise it is smaller. For independent tests, Benjamini and Hochberg (1995) and Benjamini and Liu (1999) proposed a step-up procedure and a step-down procedure to control the FDR. For correlated tests, Benjamini and Yekutieli (2001) proposed a simple but highly conservative procedure. Storey (2002) and Storey and Tibshirani (2003) proposed a variation of the FDR: the positive false discovery rate (pFDR), which is the conditional FDR given that at least one hypothesis is rejected. These authors gave direct estimates of the pFDR for independent tests, and suggested resampling-based methods such as the permutation test for correlated tests. Correlation among tests is still a problem for FDR methods.

In 2001, Cheverud proposed a new idea to adjust correlated tests as though they were independent according to an effective number (*M*_{eff}) of independent tests. He also gave an estimation of the *M*_{eff} for single-locus analyses. In Nyholt (2004)'s evaluation, the *M*_{eff} method approximated the permutation test accurately, and was more than 100 times faster. It is highly desirable to apply this method to control the EWSL and FDR in multilocus analyses. However, our experience suggests that Cheverud's (2001) estimate of *M*_{eff} is overly large, especially when there are a substantial number of moderately correlated tests. This situation is, unfortunately, typical of multilocus analyses, leading to overly conservative results.

In this study, we propose a more accurate estimate of *M*_{eff}, and design *M*_{eff}-based procedures to control the EWSL and FDR. An assessment based on both real and simulated data shows that the procedures can control the error rate accurately in multilocus analyses.

## Approach

### The *M*_{eff} method

When loci are in linkage disequilibrium (LD), hypothesis tests in single-locus analyses are not independent. Cheverud (2001) proposed the method below to adjust the correlated tests:

*Step* 1: Calculate the correlation matrix for the loci.

*Step*2: Estimate the effective number (*M*_{eff}) of independent tests from the eigenvalues of the correlation matrix with Equation (1), where *M* is the number of tests (equal to the number of loci in single-locus analyses) and *Î»*_{i} (*i*=1,â€¦, *M*) are the eigenvalues

*Step* 3: Adjust the test criteria as though there were *M*_{eff} independent tests with the Sidak (1967) correction below

*Step* 4: Test the association between genotypes and phenotypes locus by locus. If the *p*-value of any test is lower than *Î±*_{p}, the nonassociation hypothesis is rejected.

### A new *M*_{eff} equation

The key step of the *M*_{eff} method is Step 2, estimating the *M*_{eff} from the eigenvalues. As Cheverud (2001) explained, Equation (1) considers two extreme situations. When the tests are completely independent, the eigenvalues are all equal to 1 and Equation (1) gives *M*_{eff}=*M*. When the tests are completely identical, the first eigenvalue is *M*, while all the others are zero, and Equation (1) gives *M*_{eff}=1. Although Equation (1) gives correct estimates at the two extremes, it overestimates the *M*_{eff} in other situations. If the *M* tests are composed of *c* (1â©½*c*â©½*M*) copies of *M*/*c* independent tests, the eigenvalues are Sequence (3)

According to Equation (1), *M*_{eff}=*M*+1â€“*c*, but actually there are only *M*/*c* independent tests. The ratio *r* is

In multilocus analyses, *M* is large and *c* is moderate, so *r* tends to be considerably larger than 1.

Intuitively, *c* identical tests will result in an eigenvalue *Î»*_{i}=*c*, while a test partially correlated with others will result in an eigenvalue 0<*Î»*_{i}<1. Thus, an eigenvalue can be decomposed into two parts: the integral part and the nonintegral part. The integral part represents identical tests that should be counted as one in the *M*_{eff}. The nonintegral part represents a partially correlated test that should be counted as a fractional number between 0 and 1. From this intuitive consideration, we propose Equation (5) to estimate the *M*_{eff}:

where *I*(*x*â©¾1) is the indicator function, which gives 1 when *x*â©¾1 and gives 0 otherwise, and is the floor function, which gives the largest integer less than or equal to *x*. Thus, completely correlated tests will be counted as *I*(*x*â©¾1) and partially correlated tests will be counted as . Equation (5) gives the right *M*_{eff} for integral eigenvalue Sequence (6), where *t* is the true number of independent tests and *c*_{1}, *c*_{2},â€¦, *c*_{t} are integers

### Multilocus analyses

By regarding locus-combinations as â€˜generalized lociâ€™ (GLs), we can apply the *M*_{eff} method to multilocus analyses. For example, if two loci have *g*_{1} and *g*_{2} genotypes, respectively, their combination can be regarded as a new â€˜locusâ€™ with *g*_{1}*g*_{2} genotypes. Consequently, the calculation of the correlation coefficients in Step 1 needs adaptation. As GLs are multinomial variables, neither the Pearson correlation (used by Cheverud, 2001) nor the LD measure (used by Nyholt, 2004) is suitable.

Similar to the Pearson correlation for numerical variables, the correlation for multinomial variables can be defined as in Equation (7a), but the variance (Var) and the covariance (Cov) should be measured differently. Let *X*_{1} and *X*_{2} denote two GLs, that is, two multinomial variables. Their variance and covariance can be measured with the entropy and mutual information (Cover and Thomas, 1991), respectively, as in Equation (7b) and (7c).

where *P*(*X*_{i}) is the probability function of *X*_{i} and *E*[*x*] is the expectation of a random variable *x*.

Besides its compatibility with both single- and multilocus analysis, this correlation coefficient has a property:

where *A*, *B* and *C* are independent loci or GLs.

### An *M*_{eff}-based FDR procedure

Let *R* denote the number of rejected hypotheses and *V* the number of those falsely rejected. Benjamini and Hochberg (1995) defined the FDR as

where *I*(*R*>0) is the indicator function, which gives 1 when *R*>0 and gives 0 otherwise. The FDR is equal to the EWSL when all null hypotheses are true and is smaller otherwise. They also proposed the following step-up procedure to control the FDR for independent tests. Let *p*_{(1)}â©½â‹¯â©½*p*_{(M)} be the ordered *p*-values of *M* independent hypotheses H_{(1)},â€¦,â€‰H_{(M)}. To control the FDR at level *q* (usually 5%), hypotheses H_{(1)},â€¦,â€‰H_{(k)} should be rejected, where

The ordered *p*-values are compared with an arithmetic sequence from *q*/*M* to *q*. The start of the sequence, *q*/*M*, ensures that the FDR is equal to the EWSL when all the hypotheses are true. The end of the sequence, *q*, ensures that the FDR is controlled at *q* when all the hypotheses are rejected. To involve the effect of correlation, we can replace the start *q*/*M* with *q*/*M*_{eff}, keep the end as *q*, and consequently replace the step *q*/*M* with (*q*âˆ’*q*/*M*_{eff})/(*M*âˆ’1). The *M*_{eff}-based step-up procedure is to reject H_{(1)},â€¦, H_{(k)}, where

## Performance

### Single-locus analyses

The *M*_{eff} method has previously been applied to two single-locus analyses. Cheverud (2001) applied it to the *LG*/*J* and *SM*/*J* mouse data (Cheverud et al, 2001), and Nyholt (2004) applied it to the *angiotensin*-*I*-*converting enzyme* (*ACE*) gene data (Keavney et al, 1998). In these analyses, Cheverud (2001) and Nyholt (2004) derived two eigenvalue sequences from the two data sets, respectively. In this study, we calculated different *M*_{eff}'s with both Equations (1) and (5) from the two eigenvalue sequences, and compared the results. As shown in Table 1, the two equations gave similar *M*_{eff}'s, so they should have similar performance on the two data sets. Setting *Î±*_{e}=5% for the Keavney data set, Nyholt (2004) obtained *Î±*_{p}=0.015 from a permutation test with 50â€‰000 shuffles, and obtained *Î±*_{p}=0.011 with the *M*_{eff}=4.59 from Equation (1). Using *M*_{eff}=4.01 from Equation (5), we obtained *Î±*_{p}=0.013, a value closer to the permutation test's 0.015 than the value of 0.011 obtained from Equation (1).

### Multilocus analyses

To compare the performance of Equations (1) and (5) in multilocus analyses, we used four sub-data sets extracted from a schizophrenia data set (Xu et al, 2004). The schizophrenia data set concerns 12 genes of the dopamine pathway, *COMT* (7), *ALDH*3 (11), *ADH* (14), *TYR* (1), *TPO* (2), *MAO* (3), *DAO* (2), *AOX*1 (2), *DDC* (4), *HDG* (1), *DBH* (2) and *GSTZ*1 (1), with 83 cases and 108 controls. (The integers in the parentheses are the numbers of SNPs genotyped for each gene.) To investigate the performance on correlated loci, we extracted the SNPs of gene *COMT*, *ALDH*3 and *ADH* as three sub-data sets COMT, ALDH3 and ADH, respectively. The three genes were reported as potential effective SNPs by Xu et al (2004), and they have enough SNPs (more than five) to ensure the multiplicity of tests in the analyses. To investigate the performance on independent loci, we selected one SNP per gene as the fourth sub-data set and called it DOP (short for dopamine). For each data set, we performed single-, two- and three-locus analyses.

We used the permutation test as the golden standard to evaluate the performance. In each permutation shuffle, we applied the *M*_{eff} method to control the EWSL (*Î±*_{e}). As a result, we obtained a decision of whether or not to reject the nonassociation hypothesis. After repeating this procedure 3000 times, we counted the proportion of rejection (*P*_{r}). Ideally, *P*_{r} should be equal to the preset *Î±*_{e}. The association between a GL and the phenotype was tested with a combination of the *Ï‡*^{2} test and the Fisher's exact test (When Cochran (1954)'s condition was satisfied, the association was tested with the *Ï‡*^{2} test. When the condition was not satisfied, it was tested with the Fisher's exact test. Cochran's condition: in a contingency table, every cell has an expected value no less than 1, and 80% of the cells have expected values of no less than 5. This algorithm was implemented in ACM algorithm 643 (Clarkson et al, 1993).)

Figure 1 shows the comparison between *P*_{r} and *Î±*_{e}. In single-locus analyses, both Equations (1) and (5) controlled *P*_{r} at *Î±*_{e}. In two- and three-locus analyses, the *M*_{eff}'s obtained from Equation (1) were much greater than those obtained from the permutation test (see Table 2), and consequently the *P*_{r} of Equation (1) was much lower than *Î±*_{e}. In contrast, the *P*_{r} of Equation (5) approximated *Î±*_{e} fairly well in most cases.

### Effects on FDR

We performed two-locus analyses on simulated caseâ€“control data to evaluate the effects of the *M*_{eff} on the FDR. The simulated data contains 10 genomic regions with five observed biallelic SNPs in each region. All SNPs are in the Hardyâ€“Weinberg equilibrium and the minor alleles have the same frequency. SNPs in different regions are in linkage equilibrium, and SNPs in the same region are in LD with *r*^{2}=0.8 (Hill and Robertson, 1968) between two successive SNPs. The disease is caused by the epistatic interaction of two unobserved loci, which hide in the first region and the second region, respectively. The LD (*r*^{2}) between a causative loci and its nearest marker SNP is 0.8. As neither of the causative loci has a marginal effect, a two-locus analysis is necessary to detect the epistatic interaction. Consequently, there are *C*_{50}^{2}=1225 SNP pairs to inspect, and only the 25 pairs composed of the loci from the first and second regions are associated with the disease. The data were simulated from the six epistasis models in Table 3 (Ritchie et al, 2003). For each model, 10â€‰000 data sets were generated with 200 cases and 200 controls in each of them.

Both Benjamini and Hochberg (1995)'s direct step-up procedure and the new *M*_{eff}-based step-up procedure were applied to control the FDR. Although Benjamini and Yekutieli (2001) designed an FDR procedure for correlated multiple testing, we did not employ it because it is much more conservative than the direct step-up procedure. The *M*_{eff} was estimated with Equation (5). As shown in Figure 2, the *M*_{eff}-based step-up procedure controlled the FDR more accurately than the direct method, and was considerably more powerful.

## Discussion and conclusion

In 2001, Cheverud proposed the idea of adjusting correlated tests as if they were independent according to the effective number (*M*_{eff}) of independent tests. Piepho (2001) has also proposed a quick method for computing approximate threshold levels that control the genome-wise type I error rate of tests for quantitative trait locus detection in interval mapping and composite interval mapping. In the multilocus analyses presented in this paper, our extended concept of *M*_{eff} succeeded in controlling both the EWSL and the FDR. This result supports the use of *M*_{eff} in error-rate control of correlated multiple testing. As a data-driven algorithm based on a correlation matrix, the *M*_{eff} method can be applied to a wide range of problems. Nowadays, dense and linked markers are widely used in genetic study, and it is hard to model their genetic behavior precisely, especially in multilocus analyses. The diversity of populations also prohibits models obtained from one population from being applied to another. Avoiding the error of an imprecise model, the data-driven *M*_{eff} method is suitable for different situations. In addition to the data-driven property, the *M*_{eff} can be calculated for various types of data by generalizing the definition of correlation coefficient. With Equation (7), it can be applied to categorical data. With the Pearson correlation, it can be applied to numerical data such as microarray data. With King and Chinchilli (2001)'s generalized correlation coefficient, it can be applied to a mixture of numerical and categorical data.

Cheverud (2001) proposed Equation (1) to estimate the *M*_{eff}. As he explained, Equation (1) considers two extreme situations: hypothesis tests are completely independent or they are completely identical. However, Equation (4) shows that Equation (1) tends to overestimate the *M*_{eff} when there are a large number of moderately correlated tests. Unfortunately, in multilocus analyses, this situation is usually encountered, and Equation (1) will lead to conservative reports. In this paper, we propose Equation (5) to estimate the *M*_{eff} more accurately. It considers a more general situation than the two extremes Equation (1) concerns, where tests are composed of copies of independent tests. Therefore, it outperforms Equation (1) in multilocus analyses. Figure 1 clearly shows that both Equations (1) and (5) perform excellently in single-locus analyses, but only the latter is effective in multilocus analyses. The higher performance of Equation (5) over Equation (1) is not because it better fits the correlation coefficient defined by Equation (7). As Equation (4) has shown, no matter what correlation coefficient is employed to calculate the correlation matrix, Equation (1) will overestimate the *M*_{eff} for eigenvalue Sequence (3), while Equation (5) will give an appropriate estimate. We also used the Pearson correlation in single-locus analyses, but as shown in Table 2, its performance was similar to that of Equation (7).

If *n* tests need adjustment, the computation complexity of a permutation test is *O*(*stn*), where *s* is the number of shuffles and *t* is the time needed to test a hypothesis. The computation cost for the *M*_{eff} method concentrates on eigenvalues' calculation whose complexity is *O*(*mn*^{3}), where *m* is the time needed to multiply two real numbers. Theoretically, when *n* is large, the permutation test is more efficient than the *M*_{eff} method. However, because *m* is tiny in comparison with *st*, the *M*_{eff} method is much faster than the permutation test until *n* is very large. The largest *n* we encountered in this study was 1225 in Section Effects on FDR. It took about 10â€‰000â€‰s to repeat the calculation of the 1225 *p*-values 10â€‰000 times, and took about 3â€‰s to solve the eigenvalues with the numeric package *Lapack*. (The computation time was measured on a 2.8â€‰GHz Pentium with Linux 2.6.10.) When there are about 1000 tests, the *M*_{eff} method is at least 1000 times faster than the permutation test. This would meet the need of a genome scan with about 1000 SNPs per chromosome. As SNPs in different chromosomes are in linkage equilibrium, the correlation matrix of a single-locus analysis is diagonally blocked. We can calculate the *M*_{eff} for each chromosome individually and then sum them up as the *M*_{eff} for the scan.

Freed from the heavy computation burden, statisticians can realize many sophisticated multilocus methods. For example, we can split a classification tree's nodes according to the EWSL or the FDR rather than the PWSL. In a tree's growth procedure, a node splits when the *p*-value of the optimal variable exceeds a threshold. The more the variables are, the easier it is for the optimal variable to exceed the threshold, and the more easily the tree overfits. Zhang and Bonney (2000) considered this problem in splitting, but they arbitrarily set the PWSL<0.001 as the threshold. With the *M*_{eff} method, the threshold can be estimated efficiently. If the threshold is calculated with the permutation test, the computation will be very demanding because every node needs thousands of shuffles.

Multiple testing is not only important in data analysis but also in experiment design. Neglecting the issue would lead to an underpowered design, while overemphasizing it would waste resources. Owing to the expense of genetic experiments, it is highly desirable to apply the *M*_{eff} method to experiment designs. By calculating the *M*_{eff} from a pilot study, we can estimate the appropriate PWSL and the sample size for an experiment involving multiple testing.

## References

Benjamini Y, Hochberg Y (1995). Controlling the false discovery rate: a practical and powerful approach to multiple testing.

*J Roy Stat Soc Ser B (Methodological)***57**: 289â€“300.Benjamini Y, Liu W (1999). A step-down multiple hypotheses testing procedure that controls the false discovery rate under independence.

*J Stat Plan Infer***82**: 163.Benjamini Y, Yekutieli D (2001). The control of the false discovery rate in multiple testing under dependency.

*Ann Stat***29**: 1165â€“1188.Cheverud JM (2001). A simple correction for multiple comparisons in interval mapping genome scans.

*Heredity***87**: 52â€“58.Cheverud JM, Vaughn TT, Pletscher LS, Peripato AC, Adams ES, Erikson CF

*et al*(2001). Genetic architecture of adiposity in the cross of LG/J and SM/J inbred mice.*Mamm Genome***12**: 3â€“12.Churchill GA, Doerge RW (1994). Empirical threshold values for quantitative trait mapping.

*Genetics***138**: 963â€“971.Clarkson DB, Fan Y-a, Joe H (1993). A remark on algorithm 643: FEXACT: an algorithm for performing Fisher's exact test in

*r*Ã—*c*contingency tables.*ACM Trans Math Software (TOMS)***19**: 484â€“488.Cochran W (1954). Some methods of strengthening the common

*Ï‡*^{2}tests.*Biometrics***10**: 417â€“451.Cordell HJ, Clayton DG (2002). A unified stepwise regression procedure for evaluating the relative effects of polymorphisms within a gene using case/control or family data: application to HLA in type 1 diabetes.

*Am J Hum Genet***70**: 124â€“141.Cover TM, Thomas JA (1991).

*Elements of Information Theory*. Wiley-Interscience: New York.Hill WG, Robertson A (1968). Linkage disequilibrium in finite populations.

*Theoret Appl Genet***38**: 226â€“231.Hoh J, Ott J (2003). Mathematical multi-locus approaches to localizing complex human trait genes.

*Nat Rev Genet***4**: 701â€“709.Hoh J, Wille A, Ott J (2001). Trimming, weighting, and grouping SNPs in human caseâ€“control association studies.

*Genome Res***11**: 2115â€“2119.Keavney B, McKenzie CA, Connell JM, Julier C, Ratcliffe PJ, Sobel E

*et al*(1998). Measured haplotype analysis of the angiotensin-I converting enzyme gene.*Hum Mol Genet***7**: 1745â€“1751.King TS, Chinchilli VM (2001). A generalized concordance correlation coefficient for continuous and categorical data.

*Stat Med***20**: 2131â€“2147.Lander E, Kruglyak L (1995). Genetic dissection of complex traits: guidelines for interpreting and reporting linkage results.

*Nat Genet***11**: 241â€“247.Lander ES, Botstein D (1989). Mapping Mendelian factors underlying quantitative traits using RFLP linkage maps.

*Genetics***121**: 185â€“199.Nelson MR, Kardia SL, Ferrell RE, Sing CF (2001). A combinatorial partitioning method to identify multilocus genotypic partitions that predict quantitative trait variation.

*Genome Res***11**: 458â€“470.Nyholt DR (2004). A simple correction for multiple testing for single-nucleotide polymorphisms in linkage disequilibrium with each other.

*Am J Hum Genet***74**: 765â€“769.Page GP, George V, Go RC, Page PZ, Allison DB (2003). â€˜Are we there yetâ€™? Deciding when one has demonstrated specific genetic causation in complex diseases and quantitative traits.

*Am J Hum Genet***73**: 711â€“719.Piepho H-P (2001). A quick method for computing approximate thresholds for quantitative trait loci detection.

*Genetics***157**: 425â€“432.Ritchie MD, Hahn LW, Moore JH (2003). Power of multifactor dimensionality reduction for detecting gene-gene interactions in the presence of genotyping error, missing data, phenocopy, and genetic heterogeneity.

*Genet Epidemiol***24**: 150â€“157.Ritchie MD, Hahn LW, Roodi N, Bailey LR, Dupont WD, Parl FF

*et al*(2001). Multifactor-dimensionality reduction reveals high-order interactions among estrogen-metabolism genes in sporadic breast cancer.*Am J Hum Genet***69**: 138â€“147.Sidak Z (1967). Rectangular confidence regions for the means of multivariate normal distributions.

*J Am Stat Assoc***62**: 626â€“633.Storey JD (2002). A direct approach to false discovery rates.

*J Roy Stat Soc Ser B***64**: 479â€“498.Storey JD, Tibshirani R (2003). Statistical significance for genomewide studies.

*Proc Natl Acad Sci USA***100**: 9440â€“9445.Xu Q, Jia YB, Zhang BY, Zou K, Tao YB, Wang YP

*et al*(2004). Association study of an SNP combination pattern in the dopaminergic pathway in paranoid schizophrenia: a novel strategy for complex disorders.*Mol Psychiatry***9**: 510â€“521.Zhang H, Bonney G (2000). Use of classification trees for association studies.

*Genet Epidemiol***19**: 323â€“332.

## Acknowledgements

We are grateful to Qi Xu *et al* for their generous sharing of the published schizophrenia data. We thank Ying Huang, Yulong Zhang and the journal reviewers for their helpful comments. This work was supported by the Ministry of Science and Technology of People's Republic of China under Contract No. 2003CB517106.

## Author information

### Authors and Affiliations

### Corresponding author

## Rights and permissions

## About this article

### Cite this article

Li, J., Ji, L. Adjusting multiple testing in multilocus analyses using the eigenvalues of a correlation matrix.
*Heredity* **95**, 221â€“227 (2005). https://doi.org/10.1038/sj.hdy.6800717

Received:

Accepted:

Published:

Issue Date:

DOI: https://doi.org/10.1038/sj.hdy.6800717