Abstract
The genome-wide association study is an elementary tool to assess the genetic contribution to complex human traits. However, such association tests are mainly proposed for autosomes, and less attention has been given to methods for identifying loci on the X chromosome due to their distinct biological features. In addition, the existing association tests for quantitative traits on the X chromosome either fail to incorporate the information of males or only detect variance heterogeneity. Therefore, we propose four novel methods, which are denoted as QXcat, QZmax, QMVXcat and QMVZmax. When using these methods, it is assumed that the risk alleles for females and males are the same and that the locus being studied satisfies the generalized genetic model for females. The first two methods are based on comparing the means of the trait value across different genotypes, while the latter two methods test for the difference of both means and variances. All four methods effectively incorporate the information of X chromosome inactivation. Simulation studies demonstrate that the proposed methods control the type I error rates well. Under the simulated scenarios, the proposed methods are generally more powerful than the existing methods. We also apply our proposed methods to data from the Minnesota Center for Twin and Family Research and find 10 single nucleotide polymorphisms that are statistically significantly associated with at least two traits at the significance level of 1 × 10−3.
Similar content being viewed by others
Introduction
The genome-wide association study is an elementary tool to assess the genetic contribution to complex human traits (Kang et al. 2010). Thousands of single nucleotide polymorphisms (SNPs) have been found to be associated with hundreds of complex traits by association tests (Chen et al. 2017; Ma et al. 2015a; Zheng et al. 2007). However, only a few association tests have focused on the X chromosome (Chang et al. 2014; Wang et al. 2019a), which consists of 1669 (almost 5%) known genes and affects ~7% of complex traits (Wise et al. 2013; Xu and Hao 2018). Unlike autosomes, the X chromosome has several distinct biological features. For instance, the number of copies of the X chromosome is different between sexes. In addition, gene expression in females is affected by X chromosome inactivation (XCI), where one copy of the two X chromosomes in females is silenced to compensate for the X chromosome dosage difference between sexes, i.e., complete dosage compensation is achieved (Hickey and Bahlo 2011; Wang et al. 2014). However, Carrel and Willard (2005) claimed that weak expression of the silenced X chromosome occurs in ~10% of genes, which is referred to as incomplete dosage compensation. XCI was discovered over fifty years ago (Lyon 1961). In XCI, which is usually regarded as a random process referred to as random XCI, ~50% of cells have the risk allele active, while the other ~50% of cells have the normal allele active (Jin et al. 2017; Wang et al. 2014). However, in recent studies, it has been reported that some X-linked genes in females may also undergo skewed XCI and escape from XCI (XCI-E) (Amos-Landgraf et al. 2006; Carrel and Willard 2005). The former is defined that one allele is inactivated in more than 50% of cells, such as 75% or even 90% of cells in some extreme cases (Minks et al. 2008; Wong et al. 2011). The latter implies that both alleles in female cells remain active, which is also referred to as no dosage compensation (Brown et al. 1997; Carrel et al. 2006). XCI is a complex biological mechanism that is not yet fully understood (Wu et al. 2014). Therefore, robust and powerful association tests on the X chromosome are needed to account for these characteristics.
Some methods for testing association have been developed to accommodate the X chromosome (Chung et al. 2007; Ding et al. 2006; Horvath et al. 2000; Zhang et al. 2008). Zheng et al. (2007) proposed several allele-based and genotype-based tests on the X chromosome, and compared their performance under Hardy-Weinberg equilibrium (HWE) and departure from HWE. However, these methods may lose power when XCI exists (Chen et al. 2017; Loley et al. 2011). To address this issue, Clayton (2008) suggested a 1 degree of freedom chi-square test and a 2 degrees of freedom chi-square test by treating males as homozygous females, without the assumption of HWE. In this case, three female genotypes were coded as 0, 1 and 2, and two male genotypes were coded as 0 and 2 (Hickey and Bahlo 2011). Nevertheless, Clayton’s methods require the allele frequencies of the same allele to be equal between sexes, and only random XCI is considered (Clayton 2008). Using this coding strategy may lead to power loss when the XCI pattern is misspecified. As such, Wang et al. (2014) proposed a unified coding strategy, in which female genotypes were coded as 0, γ and 2, where γ ranges from 0 to 2. Here, γ < 1 represents XCI towards the risk allele, γ > 1 represents XCI towards the normal allele, and γ = 1 denotes random XCI. In the method proposed by Wang et al. (2014), the test power under skewed XCI is improved by maximizing the likelihood ratio over different biological models (random XCI, skewed XCI and XCI-E). However, the strategy is time-consuming because a permutation procedure is required to obtain the p value (Jin et al. 2017). Chen et al. (2017) proposed a test statistic that does not need to specify the underlying XCI pattern and HWE. It constructs the models for females and males separately and then combines them using Fisher’s method (Fisher et al. 1967). The method proposed by Chen et al. (2017) effectively utilizes the information of both females and males. To further improve the test power, Wang et al. (2019a) provided an allelic test that considers different deviations from HWE. Instead of combining the test statistics of females and males by Fisher’s method, Wang et al. (2019a) used the effective sample sizes of females and males to combine the information of both sexes. Different dosage compensation patterns can be incorporated in this method by selecting different weights.
All of the methods mentioned above were developed primarily for case‒control studies. Some studies have shown that genetic loci on the X chromosome also affect quantitative traits (Al-Ayadhi et al. 2020; Auer et al. 2014; Gaukrodger et al. 2005; Konzman et al. 2020). Factors such as mutation, genetic interactions and parent-of-origin effects may influence the expression level of genes, thus changing the phenotypic means or variances across different genotypes (Brown et al. 2014; Cao et al. 2014; Ma et al. 2015b; Morley et al. 2004; Soave et al. 2015; Yang et al. 2012). As such, Ma et al. (2015b) assumed that XCI would cause extra phenotypic variance for heterozygous females and proposed three X-linked association tests, denoted as TVar, TW and TS. TVar, which can be regarded as a modification of the Brown-Forsythe test, directly tests for the inflated variance of the trait value for heterozygous females (Brown and Forsythe 1974). TW uses a weighted linear regression to examine the means of the trait value and allows for variance heterogeneity in females. Finally, TS first transforms the p values of TVar and TW to Z scores and then combines them using Stouffer’s method (Stouffer et al. 1949). Since the methods proposed by Ma et al. (2015b) ignore the information of males, these methods should each lose test power. To effectively account for XCI, Chen et al. (2020) used a Bayesian model to average over different XCI patterns. However, the Bayesian model is known to be time-consuming because multiple Markov chains of parameters are generated. Deng et al. (2019) proposed a series of methods that simultaneously incorporate the information of females and males to investigate the variances among genotypes. One of the methods proposed by Deng et al. (2019) computes the p values of Levene’s test for females and males separately (Levene 1961) and then combines them using Fisher’s method (denoted as Fisher in this article). Deng et al. (2019) assumed that the association between the SNP and the quantitative trait being studied could be biased by sex-specific means or variances because of the different numbers of copies of the X chromosome between females and males. In this regard, two two-stage methods, wM3V3.2 and wM3VNA3.3, were proposed. For brevity, we refer to these methods as wM3V and wM3VNA, respectively, in this article. In the first stage, these methods regress the value of the quantitative trait on the genotype, sex and their interaction via a regression framework. In the second stage, the wM3V method tests for genotypic variances of the residuals obtained from the first stage via the generalized Levene’s test under the additive genetic model, while the wM3VNA method does the same under the generalized genetic model (Chen and Ng 2012). Although the methods proposed by Deng et al. (2019) incorporate males’ information and efficiently test for variance heterogeneity, the mean differences are only adjusted when conducting the generalized Levene’s test. These methods are not designed to test for the mean differences, which may cause loss of power. In addition, Özbek et al. (2018) proposed an X chromosome association test statistic that considers the sex × SNP interaction term and is applicable to both quantitative and qualitative traits. This method can be directly implemented in PLINK, and in this article, we denote it for quantitative traits as Tplink. Song et al. (2021) further conducted extensive simulations to compare the performance of the model including the interaction term with that not including the interaction term and found that fitting the model with the interaction term can make the estimates of the effect sizes more robust to different XCI patterns. However, Tplink assumes the homogeneity of variances across different genotypes and only takes into account random XCI and XCI-E patterns. Chen et al. (2021) added a variable indicative of heterozygous females in Tplink and suggested an X chromosomal association approach that considers all three XCI patterns and is suitable for both quantitative and qualitative traits. We denote it for quantitative traits as Tchen in this article. However, Tchen only compares the difference in the means of the trait value across different genotypes under the assumption of variance homogeneity.
Therefore, in this article, we propose four novel statistical methods, denoted as QXcat, QZmax, QMVXcat and QMVZmax, to test for association between an SNP on the X chromosome and a quantitative trait. QXcat and QZmax are designed for testing the mean differences of the trait value. In QXcat, we obtain the p values for females and males by testing the mean differences of the trait value via weighted linear regression models. Then, we combine these two p values using Fisher’s method. In QZmax, we use different sample sizes as weights, which represent different dosage compensation patterns according to Wang et al. (2019a), to combine the test statistics for females and males. In addition, we develop QMVXcat (QMVZmax) by combining the p value of QXcat (QZmax) with that of wM3VNA, to test for the difference in both means and variances. We perform extensive simulation studies to investigate the type I error rates and the test powers of the proposed methods. We also apply our proposed methods to data from the Minnesota Center for Twin and Family Research (MCTFR) for their practice.
Materials and methods
Notations
Consider an SNP on the X chromosome with alleles a and A. Let qf and qm be the frequencies of A in females and males, respectively, and let ρ be the inbreeding coefficient in the female population. Then, females have three genotypes, aa, Aa and AA, and males, who are hemizygous, only have two different genotypes, a and A. The frequencies of genotypes aa, Aa and AA for females are denoted as qaa, qAa and qAA, respectively. Thus, qaa = (1 − qf)2 + ρ(1 − qf)qf, qAa = 2(1 − ρ)(1 − qf)qf and \(q_{AA} = q_f^2 + \rho \left( {1 - q_f} \right)q_f\). Suppose that we collect a sample of N independent individuals consisting of nf females and nm males. Let nf0, nf1 and nf2 be the number of females with genotypes aa, Aa and AA (nf0 + nf1 + nf2 = nf), respectively. There are nm0 males with genotype a and nm1 males with genotype A (nm0 + nm1 = nm). Let Yf = (yf1, yf2,…, \({y_{fn}}_f\))T and Ym = (ym1, ym2,…, \({y_{mn}}_m\))T denote the values of the quantitative trait for females and males, respectively. Here, we assume that Yf and Ym are normally distributed or approximately follow normal distributions after the rank-based inverse normal transformation (McCaw et al. 2019). For females, let Gfi denote the number of alleles A in female i ( i = 1, 2,..., nf), i.e., Gfi takes the value of 0, 1 and 2 for aa, Aa and AA, respectively; for males, let Gmi denote the number of alleles A in male i (i = 1, 2,..., nm), i.e., Gmi takes the value of 0 and 1 for a and A, respectively. In females, the means of the quantitative trait for aa, Aa and AA are denoted as μf0, μf1 and μf2, respectively, while the variances of the quantitative trait for aa, Aa and AA are represented by \(\sigma _{f0}^2\), \(\sigma _{f1}^2\) and \(\sigma _{f2}^2\), respectively. Let Vf denote the variance-covariance matrix of Yf, a diagonal matrix with elements \(\sigma _{f0}^2\), \(\sigma _{f1}^2\) and \(\sigma _{f2}^2\) for aa, Aa and AA, respectively. In males, the means of the quantitative trait for a and A are denoted as μm0 and μm1, respectively, while the variances of the quantitative trait for a and A are represented by \(\sigma _{m0}^2\) and \(\sigma _{m1}^2\), respectively. Let Vm be the variance-covariance matrix of Ym, a diagonal matrix with elements \(\sigma _{m0}^2\) and \(\sigma _{m1}^2\) for a and A, respectively. Here, we consider three types of null hypotheses of no association between the SNP and the quantitative trait. \({{{\mathrm{H}}}}_0^{{{{\mathrm{MV}}}}}\): both the means and the variances of the quantitative trait across genotypes are equal (i.e., μf0 = μf1 = μf2, μm0 = μm1, \(\sigma _{f0}^2 = \sigma _{f1}^2 = \sigma _{f2}^2\) and \(\sigma _{m0}^2 = \sigma _{m1}^2\)), \({{{\mathrm{H}}}}_0^{{{\mathrm{M}}}}\): only the means of the quantitative trait across genotypes are equal (i.e., μf0 = μf1 = μf2, μm0 = μm1 and no restrictions on the variances) and \({{{\mathrm{H}}}}_0^{{{\mathrm{V}}}}\): only the variances of the quantitative trait across genotypes are equal (i.e., \(\sigma _{f0}^2 = \sigma _{f1}^2 = \sigma _{f2}^2,\sigma _{m0}^2 = \sigma _{m1}^2\) and no restrictions on the means).
Sex-stratified X chromosome mean-based association test for quantitative traits considering various XCI patterns
Note that SNPs on the X chromosome of females may undergo different XCI patterns. To make our method robust to various XCI patterns, we first propose a general X chromosome association test for quantitative traits named QXcat, which aims to identify the mean differences of the trait value across genotypes. We construct the models for females and males separately because the numbers of X chromosomes are different between sexes and then combine their p values in an efficient way. Specifically, we first assume that A is the risk allele, and the risk allele in females is the same as that in males. In addition, similar to the work in Chen et al. (2017), the generalized genetic model is assumed for the SNP being studied for females, i.e., the genetic effect of carrying two risk alleles is not less than that of carrying one risk allele, and the genetic effect of carrying one risk allele is not less than that of carrying no risk allele (μf2 ≥ μf1 ≥ μf0). Then, we consider two variables \(X_{fi}^{\left( 1 \right)} = I_{\left\{ {G_{fi} \ge 1} \right\}}\) and \(X_{fi}^{\left( 2 \right)} = I_{\left\{ {G_{fi} = 2} \right\}}\) for female i, where I{·} is the indicator function, \(X_{fi}^{\left( 1 \right)}\) indicates that female i carries at least one risk allele and \(X_{fi}^{\left( 2 \right)}\) means that the genotype of female i is AA. Based on the study by Wang et al. (2019b), \(X_{fi}^{\left( 1 \right)}\) and \(X_{fi}^{\left( 2 \right)}\) can be used to test for association between the SNP and the trait under different XCI patterns. Hence, the association between the quantitative trait and the SNP in females can be modeled as
where βf0 is the intercept, and βf1 and βf2 are the regression coefficients of \(X_{fi}^{\left( 1 \right)}\) and \(X_{fi}^{\left( 2 \right)}\), respectively. Zfi denotes a vector of covariates for female i, bf is the vector of the regression coefficients of Zfi, and εfi is a random error that follows \(N(0,\sigma _{f0}^2)\), \(N(0,\sigma _{f1}^2)\) and \(N(0,\sigma _{f2}^2)\) for genotypes aa, Aa and AA, respectively. According to Wang et al. (2019b), under random XCI or XCI-E, βf1 = βf2 ≠ 0 means that the SNP is associated with the quantitative trait. For the skewed XCI, βf1 = 0 and βf2 ≠ 0 when the risk allele is inactivated in 100% of the heterozygous female cells, while βf1 ≠ 0 and βf2 = 0 when all the cells in females with genotype Aa are normal allele inactive. In addition, βf1 ≠ 0, βf2 ≠ 0 and βf1 ≠ βf2 mean that A is associated with the quantitative trait for other skewed XCI patterns. Hence, Model (1) effectively incorporates all the XCI patterns when testing for association. Since some factors (such as mutation and XCI) may lead to unequal trait value variances across different genotypes, we use the weighted least square method to estimate \({\boldsymbol{\beta}}_f = \left( {\beta _{f0},\beta _{f1},\beta _{f2},{{{\mathbf{b}}}}_f^T} \right)^T\). Let Wf be a weight matrix for females. Here, we set \({{{\mathbf{W}}}}_f = {{{\mathbf{V}}}}_f^{ - 1}\) with elements \(\frac{1}{{\sigma _{f0}^2}}\), \(\frac{1}{{\sigma _{f1}^2}}\) and \(\frac{1}{{\sigma _{f2}^2}}\) for genotypes aa, Aa and AA, respectively. We first fit Model (1) by the ordinary least square method and obtain the corresponding residuals. Then, \(\frac{1}{{\sigma _{f0}^2}}\), \(\frac{1}{{\sigma _{f1}^2}}\) and \(\frac{1}{{\sigma _{f2}^2}}\) are estimated by the inverse of the residual variances for genotypes aa, Aa and AA, denoted as \(\frac{1}{{\hat \sigma _{f0}^2}}\), \(\frac{1}{{\hat \sigma _{f1}^2}}\) and \(\frac{1}{{\hat \sigma _{f2}^2}}\), respectively. As a result, \({{{\hat{\mathbf W}}}}_f = {{{\hat{\mathbf V}}}}_f^{ - 1}\). To estimate βf, we minimize the following weighted residual sum of squares \({{{\mathrm{arg}}}}\mathop {{\min }}\nolimits_{{\boldsymbol{\beta}}_f} \| {\widehat {{{\mathbf{W}}}}_f^{1\!/\!2}( {{{{\mathbf{Y}}}}_f - {{{\mathbf{X}}}}_f{\boldsymbol{\beta}}_f})}\|^2\) where \({{{\mathbf{X}}}}_f = ( {{{{\mathbf{X}}}}_f^{( 0)},{{{\mathbf{X}}}}_f^{( 1)},{{{\mathbf{X}}}}_f^{( 2)},{{{\mathbf{Z}}}}_f})\) is a design matrix, and \({{{\mathbf{X}}}}_f^{\left( 0 \right)} = \left( {1,1,...,1} \right)^T\), \({{{\mathbf{X}}}}_f^{( 1)} = ( {X_{f1}^{( 1)},X_{f2}^{( 1 )},...,X_{fn_f}^{( 1 )}})^T\), \({{{\mathbf{X}}}}_f^{( 2)} = ( {X_{f1}^{( 2)},X_{f2}^{( 2)},...,X_{fn_f}^{( 2)}})^T\), and \({{{\mathbf{Z}}}}_f = \left( {{{{\mathbf{Z}}}}_{f1},{{{\mathbf{Z}}}}_{f2},...,{{{\mathbf{Z}}}}_{fn_f}} \right)^T\). Specifically, Zfi denotes a vector of covariates for female i in Model (1). Let \(\widehat {\boldsymbol{\beta}} _f = ( {\hat \beta _{f0},\hat \beta _{f1},\hat \beta _{f2},{{{\hat{\mathbf b}}}}_f^T})^T\) be the estimate of βf, and it can be expressed as
The variance-covariance matrix of \(\widehat {\boldsymbol{\beta}} _f\) is estimated by
Since \({{{\hat{\mathbf W}}}}_f = {{{\hat{\mathbf V}}}}_f^{ - 1}\), \(\widehat {{{{\mathrm{Var}}}}}( {\widehat {\boldsymbol{\beta}} _f}) = \left( {{{{\mathbf{X}}}}_f^T{{{\hat{\mathbf W}}}}_f{{{\mathbf{X}}}}_f} \right)^{ - 1},\) and the estimate of the variance-covariance matrix for \(\hat \beta _{f1}\) and \(\hat \beta _{f2}\) is \(\hat {{\Sigma}}\), which is constructed by the four elements in Rows 2-3 and Columns 2-3 of \(\widehat {{{{\mathrm{Var}}}}}( {\widehat {\boldsymbol{\beta}} _f})\), we define the following test statistics:
Under the null hypothesis of \({{{\mathrm{H}}}}_0^{{{{\mathrm{MV}}}}}\) or \({{{\mathrm{H}}}}_0^{{{\mathrm{M}}}}\), \(T_{f1}^A\) and \(T_{f2}^A\) are independent of each other and asymptotically follow the standard normal distribution. The corresponding proof of this independence is given in Appendix A. The one-sided p values of \(T_{f1}^A\) and \(T_{f2}^A\) are denoted as \(p_{f1}^A = 1 - {{\Phi }}\left( {T_{f1}^A} \right)\) and \(p_{f2}^A = 1 - {{\Phi }}\left( {T_{f2}^A} \right)\), respectively, where Φ(·) is the cumulative distribution function of the standard normal distribution. We combine \(p_{f1}^A\) with \(p_{f2}^A\) using Fisher’s method and obtain the test statistic
Under \({{{\mathrm{H}}}}_0^{{{{\mathrm{MV}}}}}\) or \({{{\mathrm{H}}}}_0^{{{\mathrm{M}}}}\), \(Q_f^A \sim \chi _4^2\) (Chen et al. 2017). We denote the p value of \(Q_f^A\) as \(p_f^A\).
For males, we use the following model to test for the association between the SNP and the trait
where βm0 is the intercept and βm1 is the regression coefficient of Gmi. Zmi is a vector of covariates for male i, and bm is the vector of the regression coefficients of Zmi. εmi is a random error that follows \(N(0,\sigma _{m0}^2)\) and \(N(0,\sigma _{m1}^2)\) for genotypes a and A, respectively. Similar to the case for females, we use the weighted least square method to estimate \({\boldsymbol{\beta}} _m = \left( {\beta _{m0},\beta _{m1},{{{\mathbf{b}}}}_m^T} \right)^T\). Here, we set the weight matrix Wm for males as \({{{\mathbf{V}}}}_m^{ - 1}\) with elements \(\frac{1}{{\sigma _{m0}^2}}\) and \(\frac{1}{{\sigma _{m1}^2}}\) for genotypes a and A, respectively. We denote the estimate of βm1 and its variance as \(\hat \beta _{m1}\) and \(\widehat {{{{\mathrm{Var}}}}}( {\hat \beta _{m1}})\), respectively, and then construct the test statistic as \(T_m^A = \frac{{\hat \beta _{m1}}}{{\sqrt {\widehat {{{{\mathrm{Var}}}}}( {\hat \beta _{m1}})} }}\). If nm is large enough, \(T_m^A \sim N\left( {0,1} \right)\). We denote the one-sided p value of \(T_m^A\) as \(p_m^A = 1 - {{\Phi }}\left( {T_m^A} \right)\). Then, we combine \(p_f^A\) with \(p_m^A\) and obtain the test statistic
Under \({{{\mathrm{H}}}}_0^{{{{\mathrm{MV}}}}}\) or \({{{\mathrm{H}}}}_0^{{{\mathrm{M}}}}\), \(Q^A \sim \chi _4^2\).
Note that the risk allele is generally unknown. Here, we also consider the case where the risk allele is a. We can obtain the test statistics \(Q_f^a\) for females and \(T_m^a\) for males, and the corresponding one-sided p values \(p_f^a\) and \(p_m^a\), respectively, in the same way. Then, the test statistic can be derived as
Similarly, \(Q^a \sim \chi _4^2\) under \({{{\mathrm{H}}}}_0^{{{{\mathrm{MV}}}}}\) or \({{{\mathrm{H}}}}_0^{{{\mathrm{M}}}}\). We define the final mean-based test statistic as
Based on the theorem proposed by Mosteller and Fisher (1948), the p value of QXcat can be approximated as follows:
where \(\xi = 1 - \chi _4^2\left( \eta \right)\). Here, we choose 2ξ to approximate the p value of QXcat, which is denoted as pQXcat.
X chromosome mean-based association test for quantitative traits considering different dosage compensation patterns
Note that QXcat takes all the XCI patterns into account by introducing two indicator variables for females. In addition to this way of considering XCI, Wang et al. (2019a) combined the test statistics for females and males by different weights to account for different dosage compensation patterns in their method Zmax for case‒control design. Adopting a similar idea, we put forward another mean-based association test, which also incorporates the information of dosage compensation by combining the test statistics for females and males based on different weights. Therefore, we propose our QZmax test statistic as follows. Here, we assume that A is the risk allele, and the risk allele in females is the same as that in males. Furthermore, for females, the generalized genetic model is assumed at the SNP (Chen et al. 2017). For females, let \(T_f^A = \frac{1}{{\sqrt 2 }}\left( {T_{f1}^A + T_{f2}^A} \right)\). Since \(T_{f1}^A\) and \(T_{f2}^A\) are independent of each other, \(T_f^A \sim N\left( {0,1} \right)\) under \({{{\mathrm{H}}}}_0^{{{{\mathrm{MV}}}}}\) or \({{{\mathrm{H}}}}_0^{{{\mathrm{M}}}}\). For males, we still use \(T_m^A\), which is independent of \(T_f^A\). Based on the work of Wang et al. (2019a), we combine \(T_f^A\) and \(T_m^A\) in the following way
where λk = 2nf/(knm + 2nf) (1 ≤ k ≤ 2). k = 1 denotes no dosage compensation. 1 < k < 2 indicates incomplete dosage compensation, and k = 2 means complete dosage compensation. Note that the values of \(T_{\lambda _k}\) when A is the risk allele and when a is the risk allele have different signs, while their absolute values are still the same. Therefore, we only consider the corresponding test statistics when A is assumed to be the risk allele. Wang et al. (2019a) demonstrated that incomplete dosage compensation (1 < k < 2) is much less common than no dosage compensation and complete dosage compensation, so we choose k = 1 and k = 2. Since the risk allele is generally unknown in practice, i.e., the signs of \(T_{\lambda _1}\) and \(T_{\lambda _2}\) are unknown, we propose the final mean-based test statistic as follows:
Here, \(T_{\lambda _1}\) and \(T_{\lambda _2}\) jointly follow a bivariate normal distribution. The correlation coefficient of \(T_{\lambda _1}\) and \(T_{\lambda _2}\) can be estimated by
The p value of QZmax (denoted by \(p_{{{{\mathrm{QZ}}}}_{{{{\mathrm{max}}}}}}\)) can be obtained directly by the mvtnorm package (https://cran.r-project.org/web/packages/mvtnorm/index.html) in the R statistical software (R Core Team 2020) as follows:
where \({{{\boldsymbol{R}}}}_{\left( {T_{\lambda _1},T_{\lambda _2}} \right)}\) is a 2 × 2 correlation matrix, and element \(r_{\left( {T_{\lambda _1},T_{\lambda _2}} \right)}\) is the correlation coefficient of \(T_{\lambda _1}\) and \(T_{\lambda _2}\).
Two X chromosome mean-variance-based association tests for quantitative traits
Note that QXcat and QZmax can only test for the mean differences across different genotypes. However, the variances of the trait value across genotypes may also be affected by the mutation at the given SNP. To improve the test power, we propose the other two tests by combining the variance-based test wM3VNA proposed by Deng et al. (2019) with QXcat and QZmax to test for both the mean differences and the variance heterogeneity. Here, we denote the p value of wM3VNA as pwM3VNA. Referring to the proof by Soave et al. (2015), the mean-based association tests and the variance-based association tests for autosomal SNPs and normally distributed traits are independent, and we prove the independence of our proposed mean-based tests (i.e., QXcat and QZmax) and the variance-based test wM3VNA for X chromosomal SNPs and show the proof in Appendix B. Based on this, we construct two mean-variance-based tests QMVXcat, by combining pwM3VNA with pQXcat, and QMVZmax, by combining pwM3VNA with \(p_{Q{{{\mathrm{Z}}}}_{{{{\mathrm{max}}}}}}\), based on Fisher’s method (Fisher et al. 1967), i.e.,
and
Under \({{{\mathrm{H}}}}_0^{{{{\mathrm{MV}}}}}\), both QMVXcat and QMVZmax asymptotically follow a chi-square distribution with 4 degrees of freedom (Chen et al. 2017).
Results
Simulation settings
We evaluate the type I error rates (sizes) and the powers of our proposed methods QMVXcat, QMVZmax, QXcat and QZmax by extensive simulation studies. Furthermore, we include wM3VNA, wM3V, Fisher, Tchen and Tplink for the comparison. Note that Tchen and Tplink do not consider the unequal variances of the trait value across different genotypes, which leads to false-positive results in the presence of variance heterogeneity. Therefore, we also include Tchenw and Tplinkw, which use the weighted least square method to estimate the regression coefficients. To clearly differentiate these 11 tests, we categorize them into three groups: methods testing for means (i.e., QXcat, QZmax, Tchenw, Tplinkw, Tchen and Tplink), methods testing for variances (i.e., wM3VNA, wM3V and Fisher) and methods simultaneously testing for means and variances (i.e., QMVXcat and QMVZmax). (qf, qm) is set as (0.2, 0.2), (0.2, 0.3) and (0.3, 0.2). ρ is taken as 0 and 0.05, where ρ = 0 means HWE and ρ ≠ 0 indicates the departure from HWE. We set the sample size N at 6000, and the sex ratio nf:nm is fixed at 2:1, 1:1 and 1:2, which corresponds to (nf, nm) = (4000, 2000), (3000, 3000) and (2000, 4000), respectively. The genotypes of females are generated from a trinomial distribution with probabilities (qaa, qAa, qAA), while the genotypes of males are simulated from a binomial distribution with probabilities (1 − qm, qm). Let z and g denote the sex and the genotype score, respectively. z is set to 1 for females and 0 for males. Under XCI, g takes the possible values of 0, γ and 2 for genotypes aa, Aa and AA in females, respectively, and values of 0 and 2 are taken for genotypes a and A in males, respectively. Different γ values represent different XCI patterns when XCI exists. Here, γ is fixed as 0, 0.5, 1, 1.5 and 2. Under XCI-E, g is set to 0, 1 and 2 for aa, Aa and AA in females, respectively, and 0 and 1 for a and A in males, respectively.
The trait value yi for individual i can be generated by the following linear regression model:
where gi and zi denote the values of g and z of individual i, respectively, βc is the intercept, βg and βz are the corresponding regression coefficients of gi and zi, respectively, and εi is the random error. Assume that yi follows a normal distribution. The corresponding mean and variance of yi with different coding schemes of g are shown in Table 1. We fix βc = βz = 0.133. \(\beta _g = \sqrt {\frac{{\psi \sigma ^2}}{{2q_g\left( {1 - q_g} \right)}}}\), where σ2 is the variance of the trait value for genotype aa in females (\(\sigma _{f0}^2\)) and that for genotype a in males (\(\sigma _{m0}^2\)), ψ denotes the proportion of the phenotypic variation due to the SNP effect on the means of the trait value and qg is the allele frequency (Struchalin et al. 2010). In our simulations, we set σ2 = 1, and qg = 0.3, which is the maximum of qf and qm, respectively. To simulate the type I error rates of the methods testing for means, ψ is set to 0, which indicates that βg = 0. To simulate the test powers of the mean-based tests, we fix ψ at 0.3% and 0.4%, and the corresponding values of βg are 0.085 and 0.098, respectively. According to Ma et al. (2015b), \(\frac{\gamma }{2}\left( {1 - \frac{\gamma }{2}} \right)b^2\) in \(\sigma _{f1}^2\) under XCI denotes the increased variance caused by XCI for heterozygous females when the SNP has an effect on the means of the trait value, where b is the additive effect of the SNP on the trait value. Hence, when βg ≠ 0 and XCI exists, b takes the same value as βg (i.e., b = 0.085 (0.098) if βg = 0.085 (0.098)), while it is fixed to 0 when βg = 0 or under XCI-E. θ in \(\sigma _{f1}^2\) represents the increased variance caused by factors other than XCI for heterozygous females (Ma et al. 2015b). If \(\sigma _{f1}^2\) is affected by factors other than XCI, θ is set to 0.2; otherwise, θ = 0. τ in \(\sigma _{f2}^2\) and \(\sigma _{m1}^2\) is the additional variance of the trait value introduced by genotype AA in females or A in males. When the SNP influences \(\sigma _{f2}^2\) and \(\sigma _{m1}^2\), τ is 0.2, while it is set to 0 for variance homogeneity. Finally, we use Models (1) and (2) to fit these simulated data.
Since QXcat, QZmax, Tchenw, Tplinkw, Tchen and Tplink only test for the mean difference of the trait value, wM3VNA, wM3V and Fisher only test for the variance heterogeneity, and QMVXcat and QMVZmax test for the differences of both means and variances, we consider the following five scenarios: (1) the means and the variances of the trait value are not influenced by the SNP, (2) the variances of the trait value are affected by the SNP due to factors other than XCI for Aa females and AA females or A males, while the SNP has no effect on the means, (3) under XCI-E, the SNP affects the means while it has no influence on the variances, (4) under XCI, the SNP affects the means and the variances of the trait value because of XCI, specific genotypes (i.e., Aa and AA females or A males) and other factors, and (5) under XCI-E, the SNP affects the means and the variances of the trait value owing to the factors other than XCI for Aa females and AA females or A males. Note that for the case of XCI, if the SNP has an effect on the means, then this SNP will also have an effect on the variances. Therefore, we do not simulate the scenario under XCI in which the SNP affects the means but not the variances. The corresponding values of ψ, βg, γ, b, θ and τ under the five simulated scenarios are displayed in Table 2. In scenario (1) (i.e., no SNP effect), we evaluate the sizes of all the considered methods. In scenario (2) (i.e., SNP effect on variances only), the sizes of the six mean-based tests (QXcat, QZmax, Tchenw, Tplinkw, Tchen and Tplink), the test powers of the two mean-variance-based tests (QMVXcat and QMVZmax) and the three variance-based tests (wM3VNA, wM3V and Fisher) are assessed. In scenario (3) (i.e., SNP effect on means only under XCI-E), the sizes of the three methods testing for variances are presented, and the test powers of the two mean-variance-based tests and the six mean-based tests are compared. In scenarios (4) and (5) (i.e., SNP effect on both means and variances), we compare the test powers of all the methods. The number of replications is fixed at 105, and the significance level is α = 10−4. To further assess the robustness of our proposed methods, we consider the situations where the trait value follows a log-normal distribution with the parameters being the natural logarithm of the means and the variances listed in Table 1. In this case, the trait value will be transformed by the inverse normal transformation method in advance, as recommended by Deng et al. (2019).
Empirical type I error rates
Scenario (1): no SNP effect
Table 3 provides a summary of the sizes of our proposed methods (i.e., QMVXcat, QMVZmax, QXcat and QZmax) and the seven existing methods (i.e., Tchenw, Tplinkw, Tchen, Tplink, wM3VNA, wM3V and Fisher) in scenario (1) under HWE (i.e., ρ = 0) when the trait value follows a normal distribution. In Table 3, we find that all of these methods control the sizes well regardless of allele frequencies and sex ratios. Supplementary Table S1 shows the empirical sizes of all these methods when ρ = 0.05. It can be seen that the sizes of all the methods still maintain levels close to the nominal level 10−4, and the values of ρ have little effect on the empirical sizes.
Scenario (2): SNP effect on variances only
Table 4 shows the estimated sizes of the six mean-based tests (QXcat, QZmax, Tchenw, Tplinkw, Tchen and Tplink) in scenario (2) when ρ = 0 and 0.05, and the trait value follows a normal distribution. It should be noted that only the sizes of QXcat, QZmax, Tchenw and Tplinkw are controlled well when the variances of the trait value are unequal, while the type I error rates of Tchen and Tplink are higher.
Power comparison
Scenario (2): SNP effect on variances only
The simulated powers of the two mean-variance-based tests (QMVXcat and QMVZmax) and the three variance-based tests (wM3VNA, wM3V and Fisher) against nf:nm in scenario (2) under HWE when the trait value is normally distributed are displayed in Supplementary Fig. S1. It is shown in Supplementary Fig. S1 that wM3VNA has better performance in terms of power than the other methods. Because the mean-based tests QXcat and QZmax give the type I error rates under scenario (2), the powers of QMVXcat and QMVZmax are close to each other and are less than those of the three methods for testing variances. Generally, when (qf, qm) remains unchanged, the powers of the five methods gradually become less when nf:nm changes from 2:1, 1:1 to 1:2 (i.e., more male individuals). The powers of these methods for (qf, qm) = (0.2, 0.3) and (0.3, 0.2) are higher than those for (qf, qm) = (0.2, 0.2) when nf:nm is fixed (Supplementary Fig. S1b vs. Supplementary Fig. S1a and Supplementary Fig. S1c vs. Supplementary Fig. S1a). The corresponding test powers when ρ = 0.05 are presented in Supplementary Fig. S2. We find that the performances of the tests in Supplementary Fig. S2 are similar to those in Supplementary Fig. S1.
Scenario (3): SNP effect on means only under XCI-E
Under scenario (3), the methods for testing variances (wM3VNA, wM3V and Fisher) present the type I error rates instead of the test powers (data not shown for brevity). In addition, Supplementary Table S2 shows that when βg = 0.085 and ρ = 0 for a normally distributed trait value in scenario (3), the powers of the existing mean-based tests Tchen and Tplink are very close to those of Tchenw and Tplinkw, respectively. Hence, we remove the simulation results of the three variance-based tests, Tchen and Tplink from all the figures under this scenario for simplicity. The estimated powers of the two methods for simultaneously testing means and variances (QMVXcat and QMVZmax) and the four methods for testing means (QXcat, QZmax, Tchenw and Tplinkw) against nf:nm in scenario (3) when βg = 0.085, ρ = 0 and the trait value follows a normal distribution are plotted in Fig. 1. From Fig. 1, we find that the mean-based test QZmax performs the best and the performance of the mean-variance-based test QMVXcat is the worst. Testing means using QXcat is more powerful than testing means using the mean-variance-based test QMVZmax or the two existing mean-based tests (i.e., Tchenw and Tplinkw). QMVZmax and Tchenw have similar performance in terms of power, and the power of Tplinkw is larger. All the methods in Fig. 1 become less powerful as nf:nm decreases (i.e., more male individuals). When nf:nm is unchanged, the powers of these methods when (qf, qm) = (0.2, 0.3) and (0.3, 0.2) are higher than those when (qf, qm) = (0.2, 0.2) (Fig. 1b vs. Fig. 1a and Fig. 1c vs. Fig. 1a). The powers of these methods in scenario (3) (i.e., SNP effect on means only under XCI-E) when βg = 0.098 and ρ = 0 are given in Supplementary Fig. S3, and the corresponding results for ρ = 0.05 when βg = 0.085 and 0.098 are shown in Supplementary Figs. S4 and S5, respectively. From these figures, we can see that the power when βg = 0.098 is higher than those when βg = 0.085 (Supplementary Fig. S3 vs. Fig. 1 and Supplementary Fig. S5 vs. Supplementary Fig. S4). Different values of ρ have minimal effect on the power.
Scenarios (4) and (5): SNP effect on both means and variances
Since Tchen and Tplink for testing means have increased empirical sizes when the variances of the trait value across genotypes are unequal, we remove them from all the figures in scenarios (4) and (5). Figure 2 gives the estimated power of the two mean-variance-based tests (QMVXcat and QMVZmax), the four mean-based tests (QXcat, QZmax, Tchenw and Tplinkw) and the three variance-based tests (wM3VNA, wM3V and Fisher) against different γ values in scenario (4) (i.e., SNP effect on both means and variances under XCI) when βg = b = 0.085, ρ = 0 and the trait value follows a normal distribution. We can see from Fig. 2 that the two mean-variance-based tests have almost the same performance in terms of power and are more powerful than the other tests. For the four methods testing for means, when γ = 2 and nf:nm = 2:1 or 1:1 (subplots 2a-2f of Fig. 2), the powers of QXcat, Tchenw and Tplinkw are close to each other and are slightly larger than that of QZmax. However, when γ = 2 and nf:nm = 1:2 (subplots 2g-2i of Fig. 2), the four mean-based tests perform similarly. For the cases when γ = 0, the proposed QXcat test generally performs the best, and the other three mean-based methods have similar powers, except for the situations where (qf, qm) = (0.3, 0.2). For the cases when γ = 0 and (qf, qm) = (0.3, 0.2), the existing Tplinkw test has the least power when nf:nm = 2:1 or 1:1 (subplots 2c and 2f of Fig. 2), while the two existing tests (Tchenw and Tplinkw) have similar powers and perform worse than the two proposed tests (QXcat and QZmax) when nf:nm = 1:2 (subplot 2i of Fig. 2). When γ = 0.5, 1 and 1.5, the powers of the four mean-based tests are not much different when nf:nm = 2:1 and 1:1 (subplots 2a-2f of Fig. 2), while the existing Tchenw test has the smallest power when nf:nm = 1:2 (subplots 2g-2i of Fig. 2). In addition, the powers of the two mean-variance-based tests and four mean-based tests increase as γ increases, while the powers of the methods testing for variances under different values of γ are not different because the extra variance for heterozygous females caused by XCI (i.e., \(\frac{\gamma }{2}\left( {1 - \frac{\gamma }{2}} \right)b^2\)) attains the maximum value of 0.0018 when γ = 1, which is very small. For each fixed (nf, nm), all the methods when (qf, qm) = (0.2, 0.3) and (0.3, 0.2) perform better than those when (qf, qm) = (0.2, 0.2) (e.g., Fig. 2b vs. Fig. 2a and Fig. 2c vs. Fig. 2a). For each value of (qf, qm), the two methods for simultaneously testing means and variances and the four methods for testing means become more powerful when nf: nm changes from 2:1, 1:1 to 1:2 (e.g., Fig. 2a vs. Fig. 2d, Fig. 2a vs. Fig. 2g and Fig. 2d vs. Fig. 2g), while the powers of the methods for testing variances generally appear less. These results indicate that larger values of qf and qm may improve the powers of all the methods and that the three variance-based tests can be more efficient with higher nf: nm (i.e., larger female individuals). However, a lower nf: nm (i.e., more male individuals) may cause the two methods simultaneously testing for means and variances and the four mean-based tests to be more powerful.
We plot the powers of all these methods in scenario (4) (i.e., SNP effect on both means and variances under XCI) when βg = b = 0.098 and ρ = 0, and the corresponding results for ρ = 0.05 when βg = b = 0.085 and βg = b = 0.098 in Supplementary Figs. S6–S8, respectively. By comparing Fig. 2 with Supplementary Fig. S6 or comparing Supplementary Fig. S7 with Supplementary Fig. S8, we find that for the methods testing for variances, the powers when βg = b = 0.085 are similar to those when βg = b = 0.098 because for different values of γ, the additional variances caused by XCI (i.e., \(\frac{\gamma }{2}\left( {1 - \frac{\gamma }{2}} \right)b^2\)) for b = 0.085 are close to those for b = 0.098; for the two mean-variance-based tests and the four mean-based tests, the powers when βg = b = 0.098 are higher than those when βg = b = 0.085.
The estimated powers of the two methods for simultaneously testing means and variances (QMVXcat and QMVZmax), four methods for testing means (QXcat, QZmax, Tchenw and Tplinkw) and three methods for testing variances (wM3VNA, wM3V and Fisher) against nf:nm in scenario (5) (i.e., SNP effect on both means and variances under XCI-E) when βg = 0.085 and ρ = 0 are presented in Fig. 3. The corresponding results when βg = 0.098 and ρ = 0 and those with ρ = 0.05 when βg = 0.085 and 0.098 are given in Supplementary Figs. S9–S11. It can be seen from these figures that under scenario (5), QMVZmax for simultaneously testing means and variances is the most powerful, the two mean-variance-based tests are more powerful than the other seven methods, and the power of Tchenw for testing means is the worst. Among the four mean-based tests (QXcat, QZmax, Tchenw and Tplinkw), the order of the performance in terms of power is QZmax > QXcat > Tplinkw > Tchenw. In addition, the power performances of the three variance-based tests in Fig. 3 and Supplementary Figs. S9–S11 are similar to those in Supplementary Figs. S1 and S2.
Other simulation results
We also simulate the type I error rates and powers for all the considered test statistics for all the abovementioned situations when the trait value follows a log-normal distribution. The simulation results are shown in Supplementary Tables S3–S5 and Supplementary Figs. S12–S25. From Supplementary Tables S3–S5, all the sizes stay close to the nominal level, except for the mean-based tests Tchen and Tplink under scenario (2), where the variances across genotypes can be unequal. From Supplementary Figs. S12–S25, we find that the power performances of all the methods and the impact of (qf, qm), nf: nm, γ, ρ, βg and b on the powers of all the methods in scenarios (2)–(5) are similar to those when the trait value is normally distributed.
Application to the MCTFR data
The Minnesota Center for Twin and Family Research Genome-Wide Association Study of Behavioral Disinhibition is a family-based study that includes age (covariate) and five quantitative traits: the nicotine composite score (NIC), the alcohol consumption composite score (CON), the alcohol dependence composite score (DEP), the behavioral disinhibition composite score (BD) and the illicit drug composite score (DRG). This dataset is available from the database of Genotypes and Phenotypes (https://www.ncbi.nlm.nih.gov/gap/) with the accession number phs000620.v1.p1. This dataset includes 2183 families and 7377 individuals, including 3546 males and 3831 females. There are four types of offspring in this dataset, which are monozygotic twins, full biological nontwin siblings, adopted siblings and mixed siblings, which include one biological offspring and one adopted offspring. More details of the family structure in this dataset can be found in Fig. 7 of Li et al. (2021) and Supplementary Fig. S26 in this article for easy reference. In this dataset, 12,354 SNPs on the X chromosome are genotyped.
To ensure that the included individuals are independent, we only use the data of parents in the dataset. Then, the quality control procedures are conducted, in which we first exclude the individuals with a missing genotype rate greater than 10% and select the SNPs for which the minor allele frequencies are greater than 5%, the missing rates are less than 10%, the minimum genotype counts are larger than 20 and the p values of the HWE test are larger than 1 × 10−6 (Ma et al. 2015b; Soave et al. 2015; Marees et al. 2018). As a result, a total of 3649 independent individuals (1949 females and 1700 males) and 9963 SNPs are included in this application. We apply our proposed methods (i.e., QMVXcat, QMVZmax, QXcat and QZmax) and the existing methods (i.e., Tchenw, Tplinkw, Tchen, Tplink, wM3VNA, wM3V and Fisher) to this subset of the MCTFR data.
Note that sex dimorphism of the five quantitative traits generally exists, and the histograms of the five traits for all the individuals, females only and males only are different in the MCTFR data, which are shown in Supplementary Fig. S27. Furthermore, all the residuals estimated from Models (1) and (2) fail to pass the normality tests. According to McCaw et al. (2019), we use the I-INT method to transform the five quantitative traits in females and males and then apply the 11 methods mentioned above to conduct the corresponding association analysis. Here, we include age as the covariate.
Since the five traits in this dataset share many similarities, similar to Schifano et al. (2013), we set the significance level to 1 × 10−3 to find the SNPs that are simultaneously associated with multiple traits. As a result, SNP rs808144 is identified to be simultaneously associated with four traits (BD, DEP, DRG and NIC). Table 5 shows the p values of all the methods for SNP rs808144, from which we discover that SNP rs808144 only influences the mean values of these four traits while having no effect on their variances (all the p values of the variance-based tests wM3VNA, wM3V and Fisher are larger than 1 × 10−3). The p values of the proposed mean-based tests QXcat and QZmax are close to those of the existing mean-based tests (Tchenw, Tplinkw, Tchen and Tplink). In addition, nine SNPs (rs808141, rs5934722, rs5926861, rs7064741, rs5942608, rs17261621, rs204332, rs5977759 and rs5925540) are found to be simultaneously associated with two traits. The p values of all the methods for these nine SNPs are given in Supplementary Table S6. Specifically, SNPs rs808141, rs5926861, rs7064741, rs204332, rs5977759 and rs5925540 only have effects on the mean values of the traits. Among these six SNPs, BD is statistically significantly associated with SNPs rs5926861, rs204332 and rs5925540; CON is only associated with SNP rs5977759; DEP is associated with SNPs rs5926861, rs7064741 and rs5977759; DRG is associated with SNPs rs808141, rs7064741 and rs204332; and NIC is associated with SNPs rs808141 and rs5925540. For SNPs rs808141, rs5926861, rs7064741 and rs5977759, the p values of six mean-based tests (QXcat, QZmax, Tchenw, Tplinkw, Tchen and Tplink) are close to each other, while for SNP rs204332, the p values of QZmax are much larger than those of the other five mean-based tests, and for SNP rs5925540, the p values of QZmax, Tplinkw and Tplink are not much different and are much larger than those of QXcat, Tchenw and Tchen. In addition, we find that SNP rs5934722 only affects the variances of BD (pwM3VNA = 4.18 × 10−4, pwM3V = 5.01 × 10−4 and pFisher = 5.04 × 10−4) and DRG (pwM3VNA = 8.81 × 10−4 and pwM3V = 8.70 × 10−4). SNP rs17261621 is statistically significantly associated with the variance differences of BD (pwM3V = 7.66 × 10−4). From the p values of all the methods for SNP rs17261621 and DRG, only the mean-variance-based test QMVZmax gives the statistically significant result (\(p_{{{{\mathrm{QMVZ}}}}_{{{{\mathrm{max}}}}}} = 3.98 \times 10^{ - 4}\)). Additionally, from the p values of all the methods for SNP rs5942608, only the p values of QMVXcat for simultaneously testing means and variances are lower than the significance level 1 × 10−3, where the p values of QMVXcat for DEP and NIC are 4.53 × 10−4 and 8.02 × 10−4, respectively. This indicates that either the means or the variances of the trait values across different genotypes are different, which needs to be further investigated.
We summarize the positions, minor alleles, major alleles, minor allele frequencies, p values of the HWE test and the genes consisting of the abovementioned 10 SNPs in Supplementary Table S7. We find that SNP rs5934722 is within the SHROOM2 gene, which is reported to be associated with autistic disorder and neurodevelopmental disorders (Kearney et al. 2011; Richards et al. 2015). SNP rs5926861 is included in the DCAF8L2 gene, which has been reported to be associated with autistic disorder, neurodevelopmental disorders and syndromic X-linked intellectual disability Lubs type (Kushima et al. 2018). SNP rs7064741 is located in the GLRA4 gene, which is related to intellectual disability, behavioral problems and craniofacial anomalies (Labonne et al. 2016). SNP rs5977759 is in the HS6ST2 gene, which is associated with the development of myopia and cognitive impairment (Paganini et al. 2019).
Discussion
In this article, we propose four association tests (QMVXcat, QMVZmax, QXcat and QZmax) for X-linked quantitative traits under the assumptions that the risk alleles for females and males are the same and the SNP being studied satisfies the generalized genetic model in females. Among these tests, QXcat and QZmax focus on testing for the mean differences of quantitative traits, while QMVXcat and QMVZmax simultaneously test for both the mean and variance differences of quantitative traits. In addition, we choose two ways to incorporate the XCI information. In QMVXcat and QXcat, we introduce two indicator variables for females, which can be used in testing for the association under all the XCI patterns, and then directly combine the p values of the test statistics based on females and males. In QMVZmax and QZmax, we combine the test statistics for females and males by different weights to consider different dosage compensation patterns and then obtain the test statistic by maximizing these combined test statistics. Extensive simulations are conducted to evaluate the type I error rates and the test powers of these proposed methods and the existing methods Tchenw, Tplinkw, Tchen, Tplink, wM3VNA, wM3V and Fisher. The simulation results show that our proposed methods control the type I error rates in various scenarios well. In the simulated scenarios where the mean values of the trait value are affected by the SNP, two proposed mean-based tests QXcat and QZmax have better performance in terms of power than the existing methods for testing means under XCI-E and in some cases of XCI. In the simulated scenarios where both the means and the variances of the trait value are affected by the SNP, the two proposed mean-variance-based tests QMVXcat and QMVZmax outperform the others, as expected.
For the combination of p values, we use Fisher’s method (Fisher et al. 1967), Stouffer’s method (Stouffer et al. 1949) and Cauchy’s method (Liu and Xie 2020) to combine the p value of wM3VNA for testing variances with those of QXcat and QZmax for testing means to obtain the p values of QMVXcat and QMVZmax for simultaneously testing means and variances. In Stouffer’s method, two p values are transformed to the p upper quantiles of the standard normal distribution, and then \(\frac{1}{{\sqrt 2 }}\) times the sum of these two quantiles is used as the final test statistic, which follows the standard normal distribution under \({{{\mathrm{H}}}}_0^{{{{\mathrm{MV}}}}}\). In Cauchy’s method, we first transform two p values to the corresponding quantiles of the standard Cauchy distribution and then calculate the average of these two quantiles as the final test statistic, which follows the standard Cauchy distribution under \({{{\mathrm{H}}}}_0^{{{{\mathrm{MV}}}}}\). We compare the test powers of the two mean-variance-based tests (QMVXcat and QMVZmax) using the three combination methods under HWE for scenario (3) (i.e., SNP effect on means only under XCI-E) with βg = 0.085 and scenario (4) (i.e., SNP effect on both means and variances under XCI) with βg = b = 0.085 when the trait value follows a normal distribution. The estimated powers of both methods under scenario (3) are listed in Supplementary Table S8, and the corresponding results of QMVXcat and QMVZmax under scenario (4) are given in Supplementary Tables S9 and S10, respectively. From Supplementary Table S8, both QMVXcat and QMVZmax achieve the highest powers when using Cauchy’s method in scenario (3), which are slightly larger than those with Fisher’s method. The power using Stouffer’s method are much less than those using the other two combination methods. In Supplementary Tables S9 and S10, we find that the test powers utilizing Fisher’s method and Stouffer’s method are close to each other, and both are much larger than that of Cauchy’s method. Therefore, we finally choose the robust Fisher’s method to construct the mean-variance-based tests QMVXcat and QMVZmax. Additionally, Chen (2022a) recently proposed a method based on the constrained likelihood ratio test for combining independent p values and showed that this combination method is robust and powerful under many conditions. Moreover, two novel robust tests for combining dependent p values (i.e., MCM and CMC) were suggested by Chen (2022b). Both the simulation results and the real data application demonstrated that the MCM and CMC methods are robust and powerful under many situations and can be considered alternatives to Cauchy’s method. We use the combination methods proposed in the work by Chen (2022a) and Chen (2022b) to calculate the p values of QMVXcat and QMVZmax for simultaneously testing the means and variances in the future and compare the powers of QMVXcat and QMVZmax utilizing these three methods with those using Fisher’s method.
For the mean-based test QXcat, we consider three combination methods to construct the test statistic. The first way is to directly combine two p values for females (i.e., \(p_{f1}^A\) and \(p_{f2}^A\) if the risk allele is A) with the p value for males (i.e., \(p_m^A\) if the risk allele is A) based on Fisher’s method and obtain the corresponding test statistic. The second way is to first combine two test statistics for females (i.e., \(T_{f1}^A\) and \(T_{f2}^A\)) to \(T_f^A\) and compute the corresponding p value, then combine it with the p value for males based on Fisher’s method. The third way is the one we choose for QXcat in this article, which has been introduced in the Materials and methods section. The power performances of QXcat under three combinations are also compared in different scenarios, and we find that QXcat under the third combination achieves the highest power in general (data not shown for brevity).
For the mean-based test QZmax, two test statistics that incorporate more dosage compensation patterns, i.e., \({{{\mathrm{QZ}}}}_{{{{\mathrm{max}}}}3} = {{{\mathrm{max}}}}\left( {\left| {T_{\lambda _1}} \right|,\left| {T_{\lambda _{1.5}}} \right|,\left| {T_{\lambda _2}} \right|} \right)\) and \({{{\mathrm{QZ}}}}_{{{{\mathrm{max}}}}5} = {{{\mathrm{max}}}}\left( {\left| {T_{\lambda _1}} \right|,\left| {T_{\lambda _{1.25}}} \right|,\left| {T_{\lambda _{1.5}}} \right|,\left| {T_{\lambda _{1.75}}} \right|,\left| {T_{\lambda _2}} \right|} \right)\), are also considered. We compare their power performance with \({{{\mathrm{QZ}}}}_{{{{\mathrm{max}}}}} = {{{\mathrm{max}}}}\left( {\left| {T_{\lambda _1}} \right|,\left| {T_{\lambda _2}} \right|} \right)\) under HWE and (qf, qm) = (0.2, 0.2) for scenario (3) (i.e., SNP effect on means only under XCI-E) with βg = 0.085 and scenario (4) (i.e., SNP effect on both means and variances under XCI) with βg = b = 0.085 when the trait value follows a normal distribution. The corresponding results are given in Supplementary Table S11, which shows that the powers of QZmax, QZmax3 and QZmax5 are close to each other. Note that QZmax3 and QZmax5 are much more computationally intense than QZmax. Therefore, we recommend choosing the test statistic \({{{\mathrm{QZ}}}}_{{{{\mathrm{max}}}}} = {{{\mathrm{max}}}}\left( {\left| {T_{\lambda _1}} \right|,\left| {T_{\lambda _2}} \right|} \right)\) in practice.
The proposed mean-based tests QXcat and QZmax assume that the risk alleles for females and males are the same, and the SNP being studied satisfies the generalized genetic model in females (i.e., μf2 ≥ μf1 ≥ μf0). When these two assumptions are satisfied in practice, the methods of constructing the test statistics QXcat and QZmax can effectively incorporate the information from these two assumptions and hence can improve the test powers. For instance, if the risk alleles in females and males are both A and μf2 > μf1 > μf0, the signs of βf1 and βf2 in Model (1) and that of βm1 in Model (2) are the same, and all of them are positive. For QXcat, the one-sided p values \(p_{f1}^A = 1 - {{\Phi }}\left( {T_{f1}^A} \right)\) and \(p_{f2}^A = 1 - {{\Phi }}\left( {T_{f2}^A} \right)\) are smaller than the one-sided p values \(p_{f1}^a = 1 - {{\Phi }}\left( {T_{f1}^a} \right)\) and \(p_{f2}^a = 1 - {{\Phi }}\left( {T_{f2}^a} \right)\), respectively, in females. Thus, \(Q_f^A = - 2{{{\mathrm{ln}}}}\left( {p_{f1}^Ap_{f2}^A} \right)\) is larger than \(Q_f^a = - 2{{{\mathrm{ln}}}}\left( {p_{f1}^ap_{f2}^a} \right)\), and the corresponding p values satisfy \(p_f^A \,<\, p_f^a\). Similarly, \(p_m^A\) is smaller than \(p_m^a\) in males. Then, \(Q^A = - 2{{{\mathrm{ln}}}}\left( {p_f^Ap_m^A} \right)\) is larger than \(Q^a = - 2{{{\mathrm{ln}}}}\left( {p_f^ap_m^a} \right)\). By utilizing the information that \(p_{f1}^A\), \(p_{f2}^A\) and \(p_m^A\) are smaller than \(p_{f1}^a\), \(p_{f2}^a\). and \(p_m^a\), respectively, a final test statistic with a relatively large absolute value is obtained by maximizing QA and Qa, so the test power of QXcat = max(QA, Qa) will increase. For QZmax, because both \(T_{f1}^A\) and \(T_{f2}^A\) should be positive with a high probability, \(T_f^A = \frac{1}{{\sqrt 2 }}\left( {T_{f1}^A + T_{f2}^A} \right)\) is also larger than zero. In addition, note that both \(T_f^A\) and \(T_m^A\) should be greater than zero, so the signs of \(T_{\lambda _1} = \sqrt {\lambda _1} T_f^A + \sqrt {1 - \lambda _1} T_m^A\) and \(T_{\lambda _2} = \sqrt {\lambda _2} T_f^A + \sqrt {1 - \lambda _2} T_m^A\) are the same as those of \(T_f^A\) and \(T_m^A\). Therefore, the test power of \({{{\mathrm{QZ}}}}_{{{{\mathrm{max}}}}} = {{{\mathrm{max}}}}\left( {\left| {T_{\lambda _1}} \right|,\left| {T_{\lambda _2}} \right|} \right)\) is improved by the weighted average of the test statistics \(T_{f1}^A\), \(T_{f2}^A\) and \(T_m^A\) having the same signs.
However, if either of these two assumptions is violated, both QXcat and QZmax may lose the test power. For example, considering the situation where the risk allele in females is A while that in males is a, the signs of βf1 and βf2 are different from that of βm1 (i.e., \(p_f^A \,<\, p_f^a\) and \(p_m^A \,>\, p_m^a\)). Then, both the test statistics \(Q^A = - 2{{{\mathrm{ln}}}}\left( {p_f^Ap_m^A} \right)\) and \(Q^a = - 2{{{\mathrm{ln}}}}\left( {p_f^ap_m^a} \right)\) are less than \(- 2{{{\mathrm{ln}}}}\left( {p_f^Ap_m^a} \right)\) (assuming that the risk alleles for females and males are known), and QXcat = max(QA, Qa) is not the best combination of \(p_{f1}^A\), \(p_{f2}^A\), \(p_{f1}^a\), \(p_{f2}^a\), \(p_m^A\) and \(p_m^a\), which may reduce the test power. For QZmax, the signs of \(T_f^A\) and \(T_m^A\) should be different with a high probability; then, \(T_f^A\) and \(T_m^A\) may be canceled out in the calculation of \(T_{\lambda _1} = \sqrt {\lambda _1} T_f^A + \sqrt {1 - \lambda _1} T_m^A\) and \(T_{\lambda _2} = \sqrt {\lambda _2} T_f^A + \sqrt {1 - \lambda _2} T_m^A\). In this case, a smaller value of the final test statistic \({{{\mathrm{QZ}}}}_{{{{\mathrm{max}}}}} = {{{\mathrm{max}}}}\left( {\left| {T_{\lambda _1}} \right|,\left| {T_{\lambda _2}} \right|} \right)\) will be obtained, and hence, the power of QZmax will be reduced. However, if the SNP being studied does not satisfy the generalized genetic model in females (e.g., μf1 > μf2 > μf0), βf1 > 0 and βf2 < 0 (i.e., \(p_{f1}^A \,<\, p_{f1}^a\) and \(p_{f2}^A \,>\, p_{f2}^a\)) when A is assumed to be the risk allele. As such, \(Q_f^A = - 2{{{\mathrm{ln}}}}\left( {p_{f1}^Ap_{f2}^A} \right)\) and \(Q_f^a = - 2{{{\mathrm{ln}}}}\left( {p_{f1}^ap_{f2}^a} \right)\) are smaller than \(- 2{{{\mathrm{ln}}}}\left( {p_{f1}^Ap_{f2}^a} \right)\). Hence, the final test statistic QXcat = max(QA, Qa) may be very small, and the test power may be low. For QZmax, \(T_{f1}^A\) (larger than zero) and \(T_{f2}^A\) (less than zero) can be canceled out in calculating \(T_f^A = \frac{1}{{\sqrt 2 }}\left( {T_{f1}^A + T_{f2}^A} \right)\). Therefore, the final test statistic \({{{\mathrm{QZ}}}}_{{{{\mathrm{max}}}}} = {{{\mathrm{max}}}}\left( {\left| {T_{\lambda _1}} \right|,\left| {T_{\lambda _2}} \right|} \right)\) is reduced, and the corresponding test power is lower. Note that it is generally reasonable to assume that the risk alleles for females and males are the same and that the SNP being studied satisfies the generalized genetic model for females (Chen et al. 2017). Furthermore, the ideas of constructing the test statistics QXcat and QZmax are similar to those in Chen et al. (2017) and Wang et al. (2019a), respectively, and both the simulation results of Chen et al. (2017) and Wang et al. (2019a) showed that the powers of their proposed methods are generally higher than those of other existing association tests. Additionally, under the simulated scenarios, both the proposed mean-based tests QXcat and QZmax have better performance in power than the existing mean-based tests under XCI-E and in some cases of XCI. We also apply all the considered methods to the MCTFR data, and some further discussions on the violation of the assumptions can be found in Appendix C.
In this article, we consider the departure from HWE by fixing the inbreeding coefficient ρ at 0.05. To further assess the validity of our proposed methods without the HWE assumption, we simulate the following population stratification model by referring to the simulation settings of Haldar and Ghosh (2012) and Xia et al. (2013). Suppose that the whole population consists of two subpopulations, each of which is HWE. The sample of size N = 6000 is composed of N1 and N2 individuals from the first and second subpopulations, respectively. The ratio N1:N2 is set to be 2:3 and 1:1, and the sex ratio in each subpopulation is fixed at 2:1, 1:1 and 1:2. Let qf1 and qm1 (qf2 and qm2) denote the frequencies of allele A for females and males in the first (second) subpopulation, respectively, and (qf1, qm1, qf2, qm2) are assumed to be (0.1, 0.1, 0.9, 0.9) and (0.2, 0.2, 0.5, 0.5), respectively. The simulated type I error rates of four proposed tests (QMVXcat, QMVZmax, QXcat and QZmax) under scenario (1) (i.e., no SNP effect) when ρ = 0 and the trait value follows a normal distribution are shown in Supplementary Table S12, while the empirical sizes of two mean-based tests QXcat and QZmax under scenario (2) (i.e., SNP effect on variances only) when ρ = 0 and the trait value follows a normal distribution are presented in Supplementary Table S13. It can be seen from these two tables that our proposed methods can control the sizes well, which verifies their validity under population stratification.
Our proposed methods have several advantages. First, the proposed mean-variance-based tests have higher powers than the existing methods in the simulated scenarios where both the means and the variances of the trait value across different genotypes are different. Second, our methods incorporate XCI information in two different ways that are necessarily considered when conducting X chromosome association tests. Third, we use the information of the two sexes, which improves the test power. Nonetheless, there are some limitations in our methods. When two assumptions (i.e., the risk alleles in females and males are the same and the genetic effect of heterozygous females is between those of two homozygous females) are not satisfied in practice, the powers of the proposed association tests may decrease. In addition, these methods cannot test for the association between SNP sets and a trait. These methods cannot incorporate the information of family structure, which results in a loss of power and needs to be improved in the future. In summary, our proposed methods not only effectively consider the XCI but are also powerful under XCI-E and in some cases of XCI.
Software
The R package QMVtest is publicly available at https://github.com/yuxinyuanqt/QMVtest, which is implemented by R software (version 4.1.2).
Data availability
The MCTFR data used for the analyses described in this article can be found on the database of Genotypes and Phenotypes with accession number phs000620.v1.p1, and dbGaP request numbers 86747-6 and 95621-5 (https://www.ncbi.nlm.nih.gov/projects/gap/cgi-bin/study.cgi?study_id=phs000620.v1.p1).
References
Al-Ayadhi LY, Qasem HY, Alghamdi HAM, Elamin NE (2020) Elevated plasma X-linked neuroligin 4 expression is associated with autism spectrum disorder. Med Princ Pr 29:480–485
Amos-Landgraf JM, Cottle A, Plenge RM, Friez M, Schwartz CE, Longshore J et al. (2006) X chromosome-inactivation patterns of 1,005 phenotypically unaffected females. Am J Hum Genet 79:493–499
Auer PL, Teumer A, Schick U, O’Shaughnessy A, Lo KS, Chami N et al. (2014) Rare and low-frequency coding variants in CXCR2 and other genes are associated with hematological traits. Nat Genet 46:629–634
Brown AA, Buil A, Viñuela A, Lappalainen T, Zheng HF, Richards JB et al. (2014) Genetic interactions affecting human gene expression identifed by variance association mapping. Elife 3:e01381
Brown CJ, Carrel L, Willard HF (1997) Expression of genes from the human active and inactive X chromosomes. Am J Hum Genet 60:1333–1343
Brown MB, Forsythe AB (1974) Robust tests for the equality of variances. J Am Stat Assoc 69:364–367
Cao Y, Wei P, Bailey M, Kauwe JSK, Maxwell TJ (2014) A versatile omnibus test for detecting mean and variance heterogeneity. Genet Epidemiol 38:51–59
Carrel L, Park C, Tyekucheva S, Dunn J, Chiaromonte F, Makova KD (2006) Genomic environment predicts expression patterns on the human inactive X chromosome. PLoS Genet 2:e151
Carrel L, Willard HF (2005) X-inactivation profle reveals extensive variability in X-linked gene expression in females. Nature 434:400–404
Chang D, Gao F, Slavney A, Ma L, Waldman YY, Sams AJ et al. (2014) Accounting for eXentricities: analysis of the X chromosome in GWAS reveals X-linked genes implicated in autoimmune diseases. PLoS One 9:e113684
Chen B, Craiu RV, Strug LJ, Sun L (2021) The X factor: A robust and powerful approach to X-chromosome-inclusive whole-genome association studies. Genet Epidemiol 45:694–709
Chen B, Craiu RV, Sun L (2020) Bayesian model averaging for the X-chromosome inactivation dilemma in genetic association study. Biostatistics 21:319–335
Chen ZX (2022a) Optimal tests for combining p-values. Appl Sci 12:322
Chen ZX (2022b) Robust tests for combining p-values under arbitrary dependency structures. Sci Rep. 12:3158
Chen ZX, Ng HKT (2012) A robust method for testing association in genome-wide association studies. Hum Hered 73:26–34
Chen ZX, Ng HKT, Li J, Liu Q, Huang H (2017) Detecting associated single-nucleotide polymorphisms on the X chromosome in case control genome-wide association studies. Stat Methods Med Res 26:567–582
Chung RH, Morris RW, Zhang L, Li YJ, Martin ER (2007) X-APL: an improved family-based test of association in the presence of linkage for the X chromosome. Am J Hum Genet 80:59–68
Clayton D (2008) Testing for association on the X chromosome. Biostatistics 9:593–600
Deng WQ, Mao S, Kalnapenkis A, Esko T, Mägi R, Paré G et al. (2019) Analytical strategies to include the X-chromosome in variance heterogeneity analyses: evidence for trait-specifc polygenic variance structure. Genet Epidemiol 43:815–830
Ding J, Lin S, Liu Y (2006) Monte Carlo pedigree disequilibrium test for markers on the X chromosome. Am J Hum Genet 79:567–573
Fisher B, Costich ER, Ganz M, Stanford JW (1967) Questions & answers. J Am Dent Assoc 75:799
Gaukrodger N, Mayosi BM, Imrie H, Avery P, Baker M, Connell JMC et al. (2005) A rare variant of the leptin gene has large effects on blood pressure and carotid intima-medial thickness: a study of 1428 individuals in 248 families. J Med Genet 42:474–478
Haldar T, Ghosh S (2012) Effect of population stratifcation on false positive rates of population-based association analyses of quantitative traits. Ann Hum Genet 76:237–245
Hickey PF, Bahlo M (2011) X chromosome association testing in genome wide association studies. Genet Epidemiol 35:664–670
Horvath S, Laird NM, Knapp M (2000) The transmission/disequilibrium test and parental-genotype reconstruction for X-chromosomal markers. Am J Hum Genet 66:1161–1167
Jin H, Park T, Won S (2017) Efficient statistical method for association analysis of X-linked variants. Hum Hered 82:50–63
Kang HM, Sul JH, Service SK, Zaitlen NA, Kong SY, Freimer NB et al. (2010) Variance component model to account for sample structure in genome-wide association studies. Nat Genet 42:348–354
Kearney HM, Thorland EC, Brown KK, Quintero-Rivera F, South ST (2011) American college of medical genetics standards and guidelines for interpretation and reporting of postnatal constitutional copy number variants. Genet Med 13:680–685
Konzman D, Abramowitz LK, Steenackers A, Mukherjee MM, Na HJ, Hanover JA (2020) O-GlcNAc: regulator of signaling and epigenetics linked to X-linked intellectual disability. Front Genet 11:605263
Kushima I, Aleksic B, Nakatochi M, Shimamura T, Okada T, Uno Y et al. (2018) Comparative analyses of copy-number variation in autism spectrum disorder and schizophrenia reveal etiological overlap and biological insights. Cell Rep. 24:2838–2856
Labonne JDJ, Graves TD, Shen YP, Jones JR, Kong IK, Layman LC et al. (2016) A microdeletion at Xq22. 2 implicates a glycine receptor GLRA4 involved in intellectual disability, behavioral problems and craniofacial anomalies. BMC Neurol 16:132
Levene H (1961) Robust tests for equality of variances. Contributions to Probability and Statistics: 279–292.
Li BH, Yu WY, Zhou JY (2021) A statistical measure for the skewness of X chromosome inactivation for quantitative traits and its application to the MCTFR data. BMC Genom Data 22:24
Liu Y, Xie J (2020) Cauchy combination test: a powerful test with analytic p-value calculation under arbitrary dependency structures. J Am Stat Assoc 115:393–402
Loley C, Ziegler A, König IR (2011) Association tests for X-chromosomal markers–a comparison of different test statistics. Hum Hered 71:23–36
Lyon MF (1961) Gene action in the X-chromosome of the mouse (Mus musculus L.). Nature 190:372–373
Ma C, Boehnke M, Lee S, GoT2D Investigators (2015a) Evaluating the calibration and power of three gene-based association tests of rare variants for the X chromosome. Genet Epidemiol 39:499–508
Ma L, Hoffman G, Keinan A (2015b) X-inactivation informs variance-based testing for X-linked association of a quantitative trait. BMC Genomics 16:241
Marees AT, Kluiver HD, Stringer S, Vorspan F, Curis E, Marie-Claire C et al. (2018) A tutorial on conducting genome-wide association studies: quality control and statistical analysis. Int J Methods Psychiatr Res 27:e1608
McCaw ZR, Lane JM, Saxena R, Redline S, Lin X (2019) Operating characteristics of the rank-based inverse normal transformation for quantitative trait analysis in genome-wide association studies. Biometrics 76:1262–1272
Minks J, Robinson WP, Brown CJ (2008) A skewed view of X chromosome inactivation. J Clin Invest 118:20–23
Morley M, Molony CM, Weber TM, Devlin JL, Ewens KG, Spielman RS et al. (2004) Genetic analysis of genome-wide variation in human gene expression. Nature 430:743–747
Mosteller F, Fisher RA (1948) Questions and answers. Am Stat 2:30–31
Özbek U, Lin HM, Lin Y, Weeks DE, Chen W, Shaffer JR et al. (2018) Statistics for X-chromosome associations. Genet Epidemiol 42:539–550
Paganini L, Hadi LA, Chetta M, Rovina D, Fontana L, Colapietro P et al. (2019) A HS6ST2 gene variant associated with X-linked intellectual disability and severe myopia in two male twins. Clin Genet 95:368–374
R Core Team (2020) R: a language and environment for statistical computing. R Foundation for Statistical Computing, Vienna. Austria, http://www.R-project.org/
Richards S, Aziz N, Bale S, Bick D, Das S, Gastier-Foster J et al. (2015) Standards and guidelines for the interpretation of sequence variants: a joint consensus recommendation of the american college of medical genetics and genomics and the association for molecular pathology. Genet Med 17:405–423
Schifano ED, Li L, Christiani DC, Lin X (2013) Genome-wide association analysis for multiple continuous secondary phenotypes. Am J Hum Genet 92:744–759
Soave D, Corvol H, Panjwani N, Gong J, Li W, Boëlle PY et al. (2015) A joint location-scale test improves power to detect associated SNPs, gene sets, and pathways. Am J Hum Genet 97:125–138
Song YL, Biernacka JM, Winham SJ (2021) Testing and estimation of X-chromosome SNP effects: Impact of model assumptions. Genet Epidemiol 45:577–592
Stouffer SA, Suchman EA, DeVinney LC, Star SA, Williams Jr RM (1949) The american soldier: adjustment during army life. (studies in social psychology in World War II). Princeton Univ. Press.
Struchalin MV, Dehghan A, Witteman JCM, Duijn CV, Aulchenko YS (2010) Variance heterogeneity analysis for detection of potentially interacting genetic loci: method and its limitations. BMC Genet 11:92
Wang J, Yu R, Shete S (2014) X-chromosome genetic association test accounting for X-inactivation, skewed X-inactivation, and escape from X-inactivation. Genet Epidemiol 38:483–493
Wang P, Xu SQ, Wang BQ, Fung WK, Zhou JY (2019a) A robust and powerful test for case-control genetic association study on X chromosome. Stat Methods Med Res 28:3260–3272
Wang P, Zhang Y, Wang BQ, Li JL, Wang YX, Pan D et al. (2019b) A statistical measure for the skewness of X chromosome inactivation based on case-control design. BMC Bioinforma 20:11
Wise AL, Gyi L, Manolio TA (2013) eXclusion: toward integrating the X chromosome in genome-wide association analyses. Am J Hum Genet 92:643–647
Wong CCY, Caspi A, Williams B, Houts R, Craig IW, Mill J (2011) A longitudinal twin study of skewed X chromosome-inactivation. PLoS One 6:e17873
Wu H, Luo J, Yu H, Rattner A, Mo A, Wang Y et al. (2014) Cellular resolution maps of X-chromosome inactivation: implications for neural development, function, and disease. Neuron 81:103–119
Xia F, Zhou JY, Fung WK (2013) Powerful tests for association on quantitative trait loci incorporating imprinting effects. J Hum Genet 58:384–390
Xu W, Hao M (2018) A unifed partial likelihood approach for X-chromosome association on time-to-event outcomes. Genet Epidemiol 42:80–94
Yang J, Loos RJF, Powell JE, Medland SE, Speliotes EK, Chasman DI et al. (2012) FTO genotype is associated with phenotypic variability of body mass index. Nature 490:267–272
Zhang L, Martin ER, Chung RH, Li YJ, Morris RW (2008) X-LRT: a likelihood approach to estimate genetic risks and test association with X-linked markers using a case-parents design. Genet Epidemiol 32:370–380
Zheng G, Joo J, Zhang C, Geller NL (2007) Testing association for markers on the X chromosome. Genet Epidemiol 31:834–843
Acknowledgements
The authors are grateful to the editor, the associate editor and three anonymous reviewers for insightful comments which greatly improved the presentation of the materials.
Funding
This work was supported by the National Natural Science Foundation of China (grant numbers 82173619 and 81773544), the Science and Technology Planning Project of Guangdong Province (grant number 2020B1212030008), the Hong Kong RGC GRF (grant number 17302919) and the National and Guangzhou University Students’ Innovation and Enterprise Training Project of China (grant number 201912121019). The Minnesota Center for Twin and Family Research was supported by the National Institute on Drug Abuse (grant number U01 DA024417). The sample ascertainment and data collection in MCTFR data were supported by the National Institute on Drug Abuse (grant numbers R37 DA05147 and R01 DA13240), the National Institute on Alcohol Abuse and Alcoholism (grant numbers R01 AA09367 and R01 AA11886), and the National Institute of Mental Health (grant number R01 MH66140). The funding bodies played no role in the design of the study and collection, analysis, and interpretation of data and in writing the manuscript.
Author information
Authors and Affiliations
Corresponding authors
Ethics declarations
Competing interests
The authors declare no competing interests.
Additional information
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Associate editor: Sara Knott.
Supplementary information
Rights and permissions
Springer Nature or its licensor holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Yang, ZY., Liu, W., Yuan, YX. et al. Robust association tests for quantitative traits on the X chromosome. Heredity 129, 244–256 (2022). https://doi.org/10.1038/s41437-022-00560-y
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1038/s41437-022-00560-y