Introduction

The genome-wide association study is an elementary tool to assess the genetic contribution to complex human traits (Kang et al. 2010). Thousands of single nucleotide polymorphisms (SNPs) have been found to be associated with hundreds of complex traits by association tests (Chen et al. 2017; Ma et al. 2015a; Zheng et al. 2007). However, only a few association tests have focused on the X chromosome (Chang et al. 2014; Wang et al. 2019a), which consists of 1669 (almost 5%) known genes and affects ~7% of complex traits (Wise et al. 2013; Xu and Hao 2018). Unlike autosomes, the X chromosome has several distinct biological features. For instance, the number of copies of the X chromosome is different between sexes. In addition, gene expression in females is affected by X chromosome inactivation (XCI), where one copy of the two X chromosomes in females is silenced to compensate for the X chromosome dosage difference between sexes, i.e., complete dosage compensation is achieved (Hickey and Bahlo 2011; Wang et al. 2014). However, Carrel and Willard (2005) claimed that weak expression of the silenced X chromosome occurs in ~10% of genes, which is referred to as incomplete dosage compensation. XCI was discovered over fifty years ago (Lyon 1961). In XCI, which is usually regarded as a random process referred to as random XCI, ~50% of cells have the risk allele active, while the other ~50% of cells have the normal allele active (Jin et al. 2017; Wang et al. 2014). However, in recent studies, it has been reported that some X-linked genes in females may also undergo skewed XCI and escape from XCI (XCI-E) (Amos-Landgraf et al. 2006; Carrel and Willard 2005). The former is defined that one allele is inactivated in more than 50% of cells, such as 75% or even 90% of cells in some extreme cases (Minks et al. 2008; Wong et al. 2011). The latter implies that both alleles in female cells remain active, which is also referred to as no dosage compensation (Brown et al. 1997; Carrel et al. 2006). XCI is a complex biological mechanism that is not yet fully understood (Wu et al. 2014). Therefore, robust and powerful association tests on the X chromosome are needed to account for these characteristics.

Some methods for testing association have been developed to accommodate the X chromosome (Chung et al. 2007; Ding et al. 2006; Horvath et al. 2000; Zhang et al. 2008). Zheng et al. (2007) proposed several allele-based and genotype-based tests on the X chromosome, and compared their performance under Hardy-Weinberg equilibrium (HWE) and departure from HWE. However, these methods may lose power when XCI exists (Chen et al. 2017; Loley et al. 2011). To address this issue, Clayton (2008) suggested a 1 degree of freedom chi-square test and a 2 degrees of freedom chi-square test by treating males as homozygous females, without the assumption of HWE. In this case, three female genotypes were coded as 0, 1 and 2, and two male genotypes were coded as 0 and 2 (Hickey and Bahlo 2011). Nevertheless, Clayton’s methods require the allele frequencies of the same allele to be equal between sexes, and only random XCI is considered (Clayton 2008). Using this coding strategy may lead to power loss when the XCI pattern is misspecified. As such, Wang et al. (2014) proposed a unified coding strategy, in which female genotypes were coded as 0, γ and 2, where γ ranges from 0 to 2. Here, γ < 1 represents XCI towards the risk allele, γ > 1 represents XCI towards the normal allele, and γ = 1 denotes random XCI. In the method proposed by Wang et al. (2014), the test power under skewed XCI is improved by maximizing the likelihood ratio over different biological models (random XCI, skewed XCI and XCI-E). However, the strategy is time-consuming because a permutation procedure is required to obtain the p value (Jin et al. 2017). Chen et al. (2017) proposed a test statistic that does not need to specify the underlying XCI pattern and HWE. It constructs the models for females and males separately and then combines them using Fisher’s method (Fisher et al. 1967). The method proposed by Chen et al. (2017) effectively utilizes the information of both females and males. To further improve the test power, Wang et al. (2019a) provided an allelic test that considers different deviations from HWE. Instead of combining the test statistics of females and males by Fisher’s method, Wang et al. (2019a) used the effective sample sizes of females and males to combine the information of both sexes. Different dosage compensation patterns can be incorporated in this method by selecting different weights.

All of the methods mentioned above were developed primarily for case‒control studies. Some studies have shown that genetic loci on the X chromosome also affect quantitative traits (Al-Ayadhi et al. 2020; Auer et al. 2014; Gaukrodger et al. 2005; Konzman et al. 2020). Factors such as mutation, genetic interactions and parent-of-origin effects may influence the expression level of genes, thus changing the phenotypic means or variances across different genotypes (Brown et al. 2014; Cao et al. 2014; Ma et al. 2015b; Morley et al. 2004; Soave et al. 2015; Yang et al. 2012). As such, Ma et al. (2015b) assumed that XCI would cause extra phenotypic variance for heterozygous females and proposed three X-linked association tests, denoted as TVar, TW and TS. TVar, which can be regarded as a modification of the Brown-Forsythe test, directly tests for the inflated variance of the trait value for heterozygous females (Brown and Forsythe 1974). TW uses a weighted linear regression to examine the means of the trait value and allows for variance heterogeneity in females. Finally, TS first transforms the p values of TVar and TW to Z scores and then combines them using Stouffer’s method (Stouffer et al. 1949). Since the methods proposed by Ma et al. (2015b) ignore the information of males, these methods should each lose test power. To effectively account for XCI, Chen et al. (2020) used a Bayesian model to average over different XCI patterns. However, the Bayesian model is known to be time-consuming because multiple Markov chains of parameters are generated. Deng et al. (2019) proposed a series of methods that simultaneously incorporate the information of females and males to investigate the variances among genotypes. One of the methods proposed by Deng et al. (2019) computes the p values of Levene’s test for females and males separately (Levene 1961) and then combines them using Fisher’s method (denoted as Fisher in this article). Deng et al. (2019) assumed that the association between the SNP and the quantitative trait being studied could be biased by sex-specific means or variances because of the different numbers of copies of the X chromosome between females and males. In this regard, two two-stage methods, wM3V3.2 and wM3VNA3.3, were proposed. For brevity, we refer to these methods as wM3V and wM3VNA, respectively, in this article. In the first stage, these methods regress the value of the quantitative trait on the genotype, sex and their interaction via a regression framework. In the second stage, the wM3V method tests for genotypic variances of the residuals obtained from the first stage via the generalized Levene’s test under the additive genetic model, while the wM3VNA method does the same under the generalized genetic model (Chen and Ng 2012). Although the methods proposed by Deng et al. (2019) incorporate males’ information and efficiently test for variance heterogeneity, the mean differences are only adjusted when conducting the generalized Levene’s test. These methods are not designed to test for the mean differences, which may cause loss of power. In addition, Özbek et al. (2018) proposed an X chromosome association test statistic that considers the sex × SNP interaction term and is applicable to both quantitative and qualitative traits. This method can be directly implemented in PLINK, and in this article, we denote it for quantitative traits as Tplink. Song et al. (2021) further conducted extensive simulations to compare the performance of the model including the interaction term with that not including the interaction term and found that fitting the model with the interaction term can make the estimates of the effect sizes more robust to different XCI patterns. However, Tplink assumes the homogeneity of variances across different genotypes and only takes into account random XCI and XCI-E patterns. Chen et al. (2021) added a variable indicative of heterozygous females in Tplink and suggested an X chromosomal association approach that considers all three XCI patterns and is suitable for both quantitative and qualitative traits. We denote it for quantitative traits as Tchen in this article. However, Tchen only compares the difference in the means of the trait value across different genotypes under the assumption of variance homogeneity.

Therefore, in this article, we propose four novel statistical methods, denoted as QXcat, QZmax, QMVXcat and QMVZmax, to test for association between an SNP on the X chromosome and a quantitative trait. QXcat and QZmax are designed for testing the mean differences of the trait value. In QXcat, we obtain the p values for females and males by testing the mean differences of the trait value via weighted linear regression models. Then, we combine these two p values using Fisher’s method. In QZmax, we use different sample sizes as weights, which represent different dosage compensation patterns according to Wang et al. (2019a), to combine the test statistics for females and males. In addition, we develop QMVXcat (QMVZmax) by combining the p value of QXcat (QZmax) with that of wM3VNA, to test for the difference in both means and variances. We perform extensive simulation studies to investigate the type I error rates and the test powers of the proposed methods. We also apply our proposed methods to data from the Minnesota Center for Twin and Family Research (MCTFR) for their practice.

Materials and methods

Notations

Consider an SNP on the X chromosome with alleles a and A. Let qf and qm be the frequencies of A in females and males, respectively, and let ρ be the inbreeding coefficient in the female population. Then, females have three genotypes, aa, Aa and AA, and males, who are hemizygous, only have two different genotypes, a and A. The frequencies of genotypes aa, Aa and AA for females are denoted as qaa, qAa and qAA, respectively. Thus, qaa = (1 − qf)2 + ρ(1 − qf)qf, qAa = 2(1 − ρ)(1 − qf)qf and \(q_{AA} = q_f^2 + \rho \left( {1 - q_f} \right)q_f\). Suppose that we collect a sample of N independent individuals consisting of nf females and nm males. Let nf0, nf1 and nf2 be the number of females with genotypes aa, Aa and AA (nf0 + nf1 + nf2 = nf), respectively. There are nm0 males with genotype a and nm1 males with genotype A (nm0 + nm1 = nm). Let Yf = (yf1, yf2,…, \({y_{fn}}_f\))T and Ym = (ym1, ym2,…, \({y_{mn}}_m\))T denote the values of the quantitative trait for females and males, respectively. Here, we assume that Yf and Ym are normally distributed or approximately follow normal distributions after the rank-based inverse normal transformation (McCaw et al. 2019). For females, let Gfi denote the number of alleles A in female i ( i = 1, 2,..., nf), i.e., Gfi takes the value of 0, 1 and 2 for aa, Aa and AA, respectively; for males, let Gmi denote the number of alleles A in male i (i = 1, 2,..., nm), i.e., Gmi takes the value of 0 and 1 for a and A, respectively. In females, the means of the quantitative trait for aa, Aa and AA are denoted as μf0, μf1 and μf2, respectively, while the variances of the quantitative trait for aa, Aa and AA are represented by \(\sigma _{f0}^2\), \(\sigma _{f1}^2\) and \(\sigma _{f2}^2\), respectively. Let Vf denote the variance-covariance matrix of Yf, a diagonal matrix with elements \(\sigma _{f0}^2\), \(\sigma _{f1}^2\) and \(\sigma _{f2}^2\) for aa, Aa and AA, respectively. In males, the means of the quantitative trait for a and A are denoted as μm0 and μm1, respectively, while the variances of the quantitative trait for a and A are represented by \(\sigma _{m0}^2\) and \(\sigma _{m1}^2\), respectively. Let Vm be the variance-covariance matrix of Ym, a diagonal matrix with elements \(\sigma _{m0}^2\) and \(\sigma _{m1}^2\) for a and A, respectively. Here, we consider three types of null hypotheses of no association between the SNP and the quantitative trait. \({{{\mathrm{H}}}}_0^{{{{\mathrm{MV}}}}}\): both the means and the variances of the quantitative trait across genotypes are equal (i.e., μf0 = μf1 = μf2, μm0 = μm1, \(\sigma _{f0}^2 = \sigma _{f1}^2 = \sigma _{f2}^2\) and \(\sigma _{m0}^2 = \sigma _{m1}^2\)), \({{{\mathrm{H}}}}_0^{{{\mathrm{M}}}}\): only the means of the quantitative trait across genotypes are equal (i.e., μf0 = μf1 = μf2, μm0 = μm1 and no restrictions on the variances) and \({{{\mathrm{H}}}}_0^{{{\mathrm{V}}}}\): only the variances of the quantitative trait across genotypes are equal (i.e., \(\sigma _{f0}^2 = \sigma _{f1}^2 = \sigma _{f2}^2,\sigma _{m0}^2 = \sigma _{m1}^2\) and no restrictions on the means).

Sex-stratified X chromosome mean-based association test for quantitative traits considering various XCI patterns

Note that SNPs on the X chromosome of females may undergo different XCI patterns. To make our method robust to various XCI patterns, we first propose a general X chromosome association test for quantitative traits named QXcat, which aims to identify the mean differences of the trait value across genotypes. We construct the models for females and males separately because the numbers of X chromosomes are different between sexes and then combine their p values in an efficient way. Specifically, we first assume that A is the risk allele, and the risk allele in females is the same as that in males. In addition, similar to the work in Chen et al. (2017), the generalized genetic model is assumed for the SNP being studied for females, i.e., the genetic effect of carrying two risk alleles is not less than that of carrying one risk allele, and the genetic effect of carrying one risk allele is not less than that of carrying no risk allele (μf2 ≥ μf1 ≥ μf0). Then, we consider two variables \(X_{fi}^{\left( 1 \right)} = I_{\left\{ {G_{fi} \ge 1} \right\}}\) and \(X_{fi}^{\left( 2 \right)} = I_{\left\{ {G_{fi} = 2} \right\}}\) for female i, where I{·} is the indicator function, \(X_{fi}^{\left( 1 \right)}\) indicates that female i carries at least one risk allele and \(X_{fi}^{\left( 2 \right)}\) means that the genotype of female i is AA. Based on the study by Wang et al. (2019b), \(X_{fi}^{\left( 1 \right)}\) and \(X_{fi}^{\left( 2 \right)}\) can be used to test for association between the SNP and the trait under different XCI patterns. Hence, the association between the quantitative trait and the SNP in females can be modeled as

$$y_{fi} = \beta _{f0} + \beta _{f1}X_{fi}^{\left( 1 \right)} + \beta _{f2}X_{fi}^{\left( 2 \right)} + {{{\mathbf{b}}}}_f^T{{{\mathbf{Z}}}}_{fi} + \varepsilon _{fi},\,i = 1,2, \ldots ,n_f$$
(1)

where βf0 is the intercept, and βf1 and βf2 are the regression coefficients of \(X_{fi}^{\left( 1 \right)}\) and \(X_{fi}^{\left( 2 \right)}\), respectively. Zfi denotes a vector of covariates for female i, bf is the vector of the regression coefficients of Zfi, and εfi is a random error that follows \(N(0,\sigma _{f0}^2)\), \(N(0,\sigma _{f1}^2)\) and \(N(0,\sigma _{f2}^2)\) for genotypes aa, Aa and AA, respectively. According to Wang et al. (2019b), under random XCI or XCI-E, βf1 = βf2 ≠ 0 means that the SNP is associated with the quantitative trait. For the skewed XCI, βf1 = 0 and βf2 ≠ 0 when the risk allele is inactivated in 100% of the heterozygous female cells, while βf1 ≠ 0 and βf2 = 0 when all the cells in females with genotype Aa are normal allele inactive. In addition, βf1 ≠ 0, βf2 ≠ 0 and βf1 ≠ βf2 mean that A is associated with the quantitative trait for other skewed XCI patterns. Hence, Model (1) effectively incorporates all the XCI patterns when testing for association. Since some factors (such as mutation and XCI) may lead to unequal trait value variances across different genotypes, we use the weighted least square method to estimate \({\boldsymbol{\beta}}_f = \left( {\beta _{f0},\beta _{f1},\beta _{f2},{{{\mathbf{b}}}}_f^T} \right)^T\). Let Wf be a weight matrix for females. Here, we set \({{{\mathbf{W}}}}_f = {{{\mathbf{V}}}}_f^{ - 1}\) with elements \(\frac{1}{{\sigma _{f0}^2}}\), \(\frac{1}{{\sigma _{f1}^2}}\) and \(\frac{1}{{\sigma _{f2}^2}}\) for genotypes aa, Aa and AA, respectively. We first fit Model (1) by the ordinary least square method and obtain the corresponding residuals. Then, \(\frac{1}{{\sigma _{f0}^2}}\), \(\frac{1}{{\sigma _{f1}^2}}\) and \(\frac{1}{{\sigma _{f2}^2}}\) are estimated by the inverse of the residual variances for genotypes aa, Aa and AA, denoted as \(\frac{1}{{\hat \sigma _{f0}^2}}\), \(\frac{1}{{\hat \sigma _{f1}^2}}\) and \(\frac{1}{{\hat \sigma _{f2}^2}}\), respectively. As a result, \({{{\hat{\mathbf W}}}}_f = {{{\hat{\mathbf V}}}}_f^{ - 1}\). To estimate βf, we minimize the following weighted residual sum of squares \({{{\mathrm{arg}}}}\mathop {{\min }}\nolimits_{{\boldsymbol{\beta}}_f} \| {\widehat {{{\mathbf{W}}}}_f^{1\!/\!2}( {{{{\mathbf{Y}}}}_f - {{{\mathbf{X}}}}_f{\boldsymbol{\beta}}_f})}\|^2\) where \({{{\mathbf{X}}}}_f = ( {{{{\mathbf{X}}}}_f^{( 0)},{{{\mathbf{X}}}}_f^{( 1)},{{{\mathbf{X}}}}_f^{( 2)},{{{\mathbf{Z}}}}_f})\) is a design matrix, and \({{{\mathbf{X}}}}_f^{\left( 0 \right)} = \left( {1,1,...,1} \right)^T\), \({{{\mathbf{X}}}}_f^{( 1)} = ( {X_{f1}^{( 1)},X_{f2}^{( 1 )},...,X_{fn_f}^{( 1 )}})^T\), \({{{\mathbf{X}}}}_f^{( 2)} = ( {X_{f1}^{( 2)},X_{f2}^{( 2)},...,X_{fn_f}^{( 2)}})^T\), and \({{{\mathbf{Z}}}}_f = \left( {{{{\mathbf{Z}}}}_{f1},{{{\mathbf{Z}}}}_{f2},...,{{{\mathbf{Z}}}}_{fn_f}} \right)^T\). Specifically, Zfi denotes a vector of covariates for female i in Model (1). Let \(\widehat {\boldsymbol{\beta}} _f = ( {\hat \beta _{f0},\hat \beta _{f1},\hat \beta _{f2},{{{\hat{\mathbf b}}}}_f^T})^T\) be the estimate of βf, and it can be expressed as

$$\widehat {\boldsymbol{\beta}} _f = \left( {{{{\mathbf{X}}}}_f^T\widehat {{{\mathbf{W}}}}_f{{{\mathbf{X}}}}_f} \right)^{ - 1}{{{\mathbf{X}}}}_f^T\widehat {{{\mathbf{W}}}}_f{{{\mathbf{Y}}}}_f$$

The variance-covariance matrix of \(\widehat {\boldsymbol{\beta}} _f\) is estimated by

$$\widehat {{{{\mathrm{Var}}}}}( {\widehat {\boldsymbol{\beta}} _f}) = ( {{{{\mathbf{X}}}}_f^T\widehat {{{\mathbf{W}}}}_f{{{\mathbf{X}}}}_f})^{ - 1}{{{\mathbf{X}}}}_f^T\widehat {{{\mathbf{W}}}}_f\widehat {{{\mathbf{V}}}}_f\widehat {{{\mathbf{W}}}}_f^T{{{\mathbf{X}}}}_f( {{{{\mathbf{X}}}}_f^T\widehat {{{\mathbf{W}}}}_f^T{{{\mathbf{X}}}}_f})^{ - 1}$$

Since \({{{\hat{\mathbf W}}}}_f = {{{\hat{\mathbf V}}}}_f^{ - 1}\), \(\widehat {{{{\mathrm{Var}}}}}( {\widehat {\boldsymbol{\beta}} _f}) = \left( {{{{\mathbf{X}}}}_f^T{{{\hat{\mathbf W}}}}_f{{{\mathbf{X}}}}_f} \right)^{ - 1},\) and the estimate of the variance-covariance matrix for \(\hat \beta _{f1}\) and \(\hat \beta _{f2}\) is \(\hat {{\Sigma}}\), which is constructed by the four elements in Rows 2-3 and Columns 2-3 of \(\widehat {{{{\mathrm{Var}}}}}( {\widehat {\boldsymbol{\beta}} _f})\), we define the following test statistics:

$$\left( {\begin{array}{*{20}{c}} {T_{f1}^A} \\ {T_{f2}^A} \end{array}} \right) = \widehat {{\Sigma}} ^{ - 1/2}\left( {\begin{array}{*{20}{c}} {\hat \beta _{f1}} \\ {\hat \beta _{f2}} \end{array}} \right)$$

Under the null hypothesis of \({{{\mathrm{H}}}}_0^{{{{\mathrm{MV}}}}}\) or \({{{\mathrm{H}}}}_0^{{{\mathrm{M}}}}\), \(T_{f1}^A\) and \(T_{f2}^A\) are independent of each other and asymptotically follow the standard normal distribution. The corresponding proof of this independence is given in Appendix A. The one-sided p values of \(T_{f1}^A\) and \(T_{f2}^A\) are denoted as \(p_{f1}^A = 1 - {{\Phi }}\left( {T_{f1}^A} \right)\) and \(p_{f2}^A = 1 - {{\Phi }}\left( {T_{f2}^A} \right)\), respectively, where Φ(·) is the cumulative distribution function of the standard normal distribution. We combine \(p_{f1}^A\) with \(p_{f2}^A\) using Fisher’s method and obtain the test statistic

$$Q_f^A = - 2{{{\mathrm{ln}}}}\left( {p_{f1}^Ap_{f2}^A} \right)$$

Under \({{{\mathrm{H}}}}_0^{{{{\mathrm{MV}}}}}\) or \({{{\mathrm{H}}}}_0^{{{\mathrm{M}}}}\), \(Q_f^A \sim \chi _4^2\) (Chen et al. 2017). We denote the p value of \(Q_f^A\) as \(p_f^A\).

For males, we use the following model to test for the association between the SNP and the trait

$$y_{mi} = \beta _{m0} + \beta _{m1}G_{mi} + {{{\mathbf{b}}}}_m^T{{{\mathbf{Z}}}}_{mi} + \varepsilon _{mi},\,i = 1,2, \ldots ,n_m$$
(2)

where βm0 is the intercept and βm1 is the regression coefficient of Gmi. Zmi is a vector of covariates for male i, and bm is the vector of the regression coefficients of Zmi. εmi is a random error that follows \(N(0,\sigma _{m0}^2)\) and \(N(0,\sigma _{m1}^2)\) for genotypes a and A, respectively. Similar to the case for females, we use the weighted least square method to estimate \({\boldsymbol{\beta}} _m = \left( {\beta _{m0},\beta _{m1},{{{\mathbf{b}}}}_m^T} \right)^T\). Here, we set the weight matrix Wm for males as \({{{\mathbf{V}}}}_m^{ - 1}\) with elements \(\frac{1}{{\sigma _{m0}^2}}\) and \(\frac{1}{{\sigma _{m1}^2}}\) for genotypes a and A, respectively. We denote the estimate of βm1 and its variance as \(\hat \beta _{m1}\) and \(\widehat {{{{\mathrm{Var}}}}}( {\hat \beta _{m1}})\), respectively, and then construct the test statistic as \(T_m^A = \frac{{\hat \beta _{m1}}}{{\sqrt {\widehat {{{{\mathrm{Var}}}}}( {\hat \beta _{m1}})} }}\). If nm is large enough, \(T_m^A \sim N\left( {0,1} \right)\). We denote the one-sided p value of \(T_m^A\) as \(p_m^A = 1 - {{\Phi }}\left( {T_m^A} \right)\). Then, we combine \(p_f^A\) with \(p_m^A\) and obtain the test statistic

$$Q^A = - 2{{{\mathrm{ln}}}}\left( {p_f^Ap_m^A} \right)$$

Under \({{{\mathrm{H}}}}_0^{{{{\mathrm{MV}}}}}\) or \({{{\mathrm{H}}}}_0^{{{\mathrm{M}}}}\), \(Q^A \sim \chi _4^2\).

Note that the risk allele is generally unknown. Here, we also consider the case where the risk allele is a. We can obtain the test statistics \(Q_f^a\) for females and \(T_m^a\) for males, and the corresponding one-sided p values \(p_f^a\) and \(p_m^a\), respectively, in the same way. Then, the test statistic can be derived as

$$Q^a = - 2{{{\mathrm{ln}}}}\left( {p_f^ap_m^a} \right)$$

Similarly, \(Q^a \sim \chi _4^2\) under \({{{\mathrm{H}}}}_0^{{{{\mathrm{MV}}}}}\) or \({{{\mathrm{H}}}}_0^{{{\mathrm{M}}}}\). We define the final mean-based test statistic as

$${{{\mathrm{QXcat}}}} = {{{\mathrm{max}}}}\left( {Q^A,Q^a} \right)$$

Based on the theorem proposed by Mosteller and Fisher (1948), the p value of QXcat can be approximated as follows:

$$2\xi - \xi ^2 \le {{{\mathrm{Pr}}}}\left( {{{{\mathrm{QXcat}}}} \,>\, \eta } \right) \le 2\xi$$

where \(\xi = 1 - \chi _4^2\left( \eta \right)\). Here, we choose 2ξ to approximate the p value of QXcat, which is denoted as pQXcat.

X chromosome mean-based association test for quantitative traits considering different dosage compensation patterns

Note that QXcat takes all the XCI patterns into account by introducing two indicator variables for females. In addition to this way of considering XCI, Wang et al. (2019a) combined the test statistics for females and males by different weights to account for different dosage compensation patterns in their method Zmax for case‒control design. Adopting a similar idea, we put forward another mean-based association test, which also incorporates the information of dosage compensation by combining the test statistics for females and males based on different weights. Therefore, we propose our QZmax test statistic as follows. Here, we assume that A is the risk allele, and the risk allele in females is the same as that in males. Furthermore, for females, the generalized genetic model is assumed at the SNP (Chen et al. 2017). For females, let \(T_f^A = \frac{1}{{\sqrt 2 }}\left( {T_{f1}^A + T_{f2}^A} \right)\). Since \(T_{f1}^A\) and \(T_{f2}^A\) are independent of each other, \(T_f^A \sim N\left( {0,1} \right)\) under \({{{\mathrm{H}}}}_0^{{{{\mathrm{MV}}}}}\) or \({{{\mathrm{H}}}}_0^{{{\mathrm{M}}}}\). For males, we still use \(T_m^A\), which is independent of \(T_f^A\). Based on the work of Wang et al. (2019a), we combine \(T_f^A\) and \(T_m^A\) in the following way

$$T_{\lambda _k} = \sqrt {\lambda _k} T_f^A + \sqrt {1 - \lambda _k} T_m^A$$

where λk = 2nf/(knm + 2nf) (1 ≤ k ≤ 2). k = 1 denotes no dosage compensation. 1 < k < 2 indicates incomplete dosage compensation, and k = 2 means complete dosage compensation. Note that the values of \(T_{\lambda _k}\) when A is the risk allele and when a is the risk allele have different signs, while their absolute values are still the same. Therefore, we only consider the corresponding test statistics when A is assumed to be the risk allele. Wang et al. (2019a) demonstrated that incomplete dosage compensation (1 < k < 2) is much less common than no dosage compensation and complete dosage compensation, so we choose k = 1 and k = 2. Since the risk allele is generally unknown in practice, i.e., the signs of \(T_{\lambda _1}\) and \(T_{\lambda _2}\) are unknown, we propose the final mean-based test statistic as follows:

$${{{\mathrm{QZ}}}}_{{{{\mathrm{max}}}}} = {{{\mathrm{max}}}}\left( {\left| {T_{\lambda _1}} \right|,\left| {T_{\lambda _2}} \right|} \right)$$

Here, \(T_{\lambda _1}\) and \(T_{\lambda _2}\) jointly follow a bivariate normal distribution. The correlation coefficient of \(T_{\lambda _1}\) and \(T_{\lambda _2}\) can be estimated by

$$\begin{array}{l}r_{\left( {T_{\lambda _1},T_{\lambda _2}} \right)} = \frac{{{{{\mathrm{Cov}}}}\left( {T_{\lambda _1},T_{\lambda _2}} \right)}}{{\sqrt {{{{\mathrm{Var}}}}\left( {T_{\lambda _1}} \right){{{\mathrm{Var}}}}\left( {T_{\lambda _2}} \right)} }}\\ \qquad \qquad \, = \frac{{\sqrt {\lambda _1\lambda _2} {{{\mathrm{Var}}}}\left( {T_f^A} \right) + \sqrt {\left( {1 - \lambda _1} \right)\left( {1 - \lambda _2} \right)} {{{\mathrm{Var}}}}\left( {T_m^A} \right)}}{{\sqrt {\left[ {\lambda _1{{{\mathrm{Var}}}}\left( {T_f^A} \right) + \left( {1 - \lambda _1} \right){{{\mathrm{Var}}}}\left( {T_m^A} \right)} \right]\left[ {\lambda _2{{{\mathrm{Var}}}}\left( {T_f^A} \right) + \left( {1 - \lambda _2} \right){{{\mathrm{Var}}}}\left( {T_m^A} \right)} \right]} }}\\ \qquad \qquad \, = \sqrt {\lambda _1\lambda _2} + \sqrt {\left( {1 - \lambda _1} \right)\left( {1 - \lambda _2} \right)} \end{array}$$

The p value of QZmax (denoted by \(p_{{{{\mathrm{QZ}}}}_{{{{\mathrm{max}}}}}}\)) can be obtained directly by the mvtnorm package (https://cran.r-project.org/web/packages/mvtnorm/index.html) in the R statistical software (R Core Team 2020) as follows:

$$\begin{array}{l}p_{{{{\mathrm{QZ}}}}_{{{{\mathrm{max}}}}}} = 1 - pmvnorm\left(lower = - rep\left( {Q{{{\mathrm{Z}}}}_{{{{\mathrm{max}}}}},2} \right),\right.\\\left. upper = rep\left( {Q{{{\mathrm{Z}}}}_{{{{\mathrm{max}}}}},2} \right),corr = {{{\boldsymbol{R}}}}_{\left( {T_{\lambda _1},T_{\lambda _2}} \right)} \right)\end{array}$$

where \({{{\boldsymbol{R}}}}_{\left( {T_{\lambda _1},T_{\lambda _2}} \right)}\) is a 2 × 2 correlation matrix, and element \(r_{\left( {T_{\lambda _1},T_{\lambda _2}} \right)}\) is the correlation coefficient of \(T_{\lambda _1}\) and \(T_{\lambda _2}\).

Two X chromosome mean-variance-based association tests for quantitative traits

Note that QXcat and QZmax can only test for the mean differences across different genotypes. However, the variances of the trait value across genotypes may also be affected by the mutation at the given SNP. To improve the test power, we propose the other two tests by combining the variance-based test wM3VNA proposed by Deng et al. (2019) with QXcat and QZmax to test for both the mean differences and the variance heterogeneity. Here, we denote the p value of wM3VNA as pwM3VNA. Referring to the proof by Soave et al. (2015), the mean-based association tests and the variance-based association tests for autosomal SNPs and normally distributed traits are independent, and we prove the independence of our proposed mean-based tests (i.e., QXcat and QZmax) and the variance-based test wM3VNA for X chromosomal SNPs and show the proof in Appendix B. Based on this, we construct two mean-variance-based tests QMVXcat, by combining pwM3VNA with pQXcat, and QMVZmax, by combining pwM3VNA with \(p_{Q{{{\mathrm{Z}}}}_{{{{\mathrm{max}}}}}}\), based on Fisher’s method (Fisher et al. 1967), i.e.,

$${{{\mathrm{QMVX}}}}_{{{{\mathrm{cat}}}}} = - 2{{{\mathrm{ln}}}}\left( {p_{{{{\mathrm{QXcat}}}}}p_{{{{\mathrm{wM}}}}3{{{\mathrm{VNA}}}}}} \right)$$

and

$${{{\mathrm{QMVZ}}}}_{{{{\mathrm{max}}}}} = - 2{{{\mathrm{ln}}}}\left( {p_{{{{\mathrm{QZ}}}}_{{{{\mathrm{max}}}}}}p_{{{{\mathrm{wM}}}}3{{{\mathrm{VNA}}}}}} \right)$$

Under \({{{\mathrm{H}}}}_0^{{{{\mathrm{MV}}}}}\), both QMVXcat and QMVZmax asymptotically follow a chi-square distribution with 4 degrees of freedom (Chen et al. 2017).

Results

Simulation settings

We evaluate the type I error rates (sizes) and the powers of our proposed methods QMVXcat, QMVZmax, QXcat and QZmax by extensive simulation studies. Furthermore, we include wM3VNA, wM3V, Fisher, Tchen and Tplink for the comparison. Note that Tchen and Tplink do not consider the unequal variances of the trait value across different genotypes, which leads to false-positive results in the presence of variance heterogeneity. Therefore, we also include Tchenw and Tplinkw, which use the weighted least square method to estimate the regression coefficients. To clearly differentiate these 11 tests, we categorize them into three groups: methods testing for means (i.e., QXcat, QZmax, Tchenw, Tplinkw, Tchen and Tplink), methods testing for variances (i.e., wM3VNA, wM3V and Fisher) and methods simultaneously testing for means and variances (i.e., QMVXcat and QMVZmax). (qf, qm) is set as (0.2, 0.2), (0.2, 0.3) and (0.3, 0.2). ρ is taken as 0 and 0.05, where ρ = 0 means HWE and ρ ≠ 0 indicates the departure from HWE. We set the sample size N at 6000, and the sex ratio nf:nm is fixed at 2:1, 1:1 and 1:2, which corresponds to (nf, nm) = (4000, 2000), (3000, 3000) and (2000, 4000), respectively. The genotypes of females are generated from a trinomial distribution with probabilities (qaa, qAa, qAA), while the genotypes of males are simulated from a binomial distribution with probabilities (1 − qm, qm). Let z and g denote the sex and the genotype score, respectively. z is set to 1 for females and 0 for males. Under XCI, g takes the possible values of 0, γ and 2 for genotypes aa, Aa and AA in females, respectively, and values of 0 and 2 are taken for genotypes a and A in males, respectively. Different γ values represent different XCI patterns when XCI exists. Here, γ is fixed as 0, 0.5, 1, 1.5 and 2. Under XCI-E, g is set to 0, 1 and 2 for aa, Aa and AA in females, respectively, and 0 and 1 for a and A in males, respectively.

The trait value yi for individual i can be generated by the following linear regression model:

$$y_i = \beta _c + \beta _gg_i + \beta _zz_i + \varepsilon _i,i = 1,2, \ldots ,N$$

where gi and zi denote the values of g and z of individual i, respectively, βc is the intercept, βg and βz are the corresponding regression coefficients of gi and zi, respectively, and εi is the random error. Assume that yi follows a normal distribution. The corresponding mean and variance of yi with different coding schemes of g are shown in Table 1. We fix βc = βz = 0.133. \(\beta _g = \sqrt {\frac{{\psi \sigma ^2}}{{2q_g\left( {1 - q_g} \right)}}}\), where σ2 is the variance of the trait value for genotype aa in females (\(\sigma _{f0}^2\)) and that for genotype a in males (\(\sigma _{m0}^2\)), ψ denotes the proportion of the phenotypic variation due to the SNP effect on the means of the trait value and qg is the allele frequency (Struchalin et al. 2010). In our simulations, we set σ2 = 1, and qg = 0.3, which is the maximum of qf and qm, respectively. To simulate the type I error rates of the methods testing for means, ψ is set to 0, which indicates that βg = 0. To simulate the test powers of the mean-based tests, we fix ψ at 0.3% and 0.4%, and the corresponding values of βg are 0.085 and 0.098, respectively. According to Ma et al. (2015b), \(\frac{\gamma }{2}\left( {1 - \frac{\gamma }{2}} \right)b^2\) in \(\sigma _{f1}^2\) under XCI denotes the increased variance caused by XCI for heterozygous females when the SNP has an effect on the means of the trait value, where b is the additive effect of the SNP on the trait value. Hence, when βg ≠ 0 and XCI exists, b takes the same value as βg (i.e., b = 0.085 (0.098) if βg = 0.085 (0.098)), while it is fixed to 0 when βg = 0 or under XCI-E. θ in \(\sigma _{f1}^2\) represents the increased variance caused by factors other than XCI for heterozygous females (Ma et al. 2015b). If \(\sigma _{f1}^2\) is affected by factors other than XCI, θ is set to 0.2; otherwise, θ = 0. τ in \(\sigma _{f2}^2\) and \(\sigma _{m1}^2\) is the additional variance of the trait value introduced by genotype AA in females or A in males. When the SNP influences \(\sigma _{f2}^2\) and \(\sigma _{m1}^2\), τ is 0.2, while it is set to 0 for variance homogeneity. Finally, we use Models (1) and (2) to fit these simulated data.

Table 1 Means and variances of the trait values across different genotypes under XCI and XCI-E.

Since QXcat, QZmax, Tchenw, Tplinkw, Tchen and Tplink only test for the mean difference of the trait value, wM3VNA, wM3V and Fisher only test for the variance heterogeneity, and QMVXcat and QMVZmax test for the differences of both means and variances, we consider the following five scenarios: (1) the means and the variances of the trait value are not influenced by the SNP, (2) the variances of the trait value are affected by the SNP due to factors other than XCI for Aa females and AA females or A males, while the SNP has no effect on the means, (3) under XCI-E, the SNP affects the means while it has no influence on the variances, (4) under XCI, the SNP affects the means and the variances of the trait value because of XCI, specific genotypes (i.e., Aa and AA females or A males) and other factors, and (5) under XCI-E, the SNP affects the means and the variances of the trait value owing to the factors other than XCI for Aa females and AA females or A males. Note that for the case of XCI, if the SNP has an effect on the means, then this SNP will also have an effect on the variances. Therefore, we do not simulate the scenario under XCI in which the SNP affects the means but not the variances. The corresponding values of ψ, βg, γ, b, θ and τ under the five simulated scenarios are displayed in Table 2. In scenario (1) (i.e., no SNP effect), we evaluate the sizes of all the considered methods. In scenario (2) (i.e., SNP effect on variances only), the sizes of the six mean-based tests (QXcat, QZmax, Tchenw, Tplinkw, Tchen and Tplink), the test powers of the two mean-variance-based tests (QMVXcat and QMVZmax) and the three variance-based tests (wM3VNA, wM3V and Fisher) are assessed. In scenario (3) (i.e., SNP effect on means only under XCI-E), the sizes of the three methods testing for variances are presented, and the test powers of the two mean-variance-based tests and the six mean-based tests are compared. In scenarios (4) and (5) (i.e., SNP effect on both means and variances), we compare the test powers of all the methods. The number of replications is fixed at 105, and the significance level is α = 10−4. To further assess the robustness of our proposed methods, we consider the situations where the trait value follows a log-normal distribution with the parameters being the natural logarithm of the means and the variances listed in Table 1. In this case, the trait value will be transformed by the inverse normal transformation method in advance, as recommended by Deng et al. (2019).

Table 2 Values of ψ, βg, γ, b, θ and τ in five simulated scenarios.

Empirical type I error rates

Scenario (1): no SNP effect

Table 3 provides a summary of the sizes of our proposed methods (i.e., QMVXcat, QMVZmax, QXcat and QZmax) and the seven existing methods (i.e., Tchenw, Tplinkw, Tchen, Tplink, wM3VNA, wM3V and Fisher) in scenario (1) under HWE (i.e., ρ = 0) when the trait value follows a normal distribution. In Table 3, we find that all of these methods control the sizes well regardless of allele frequencies and sex ratios. Supplementary Table S1 shows the empirical sizes of all these methods when ρ = 0.05. It can be seen that the sizes of all the methods still maintain levels close to the nominal level 10−4, and the values of ρ have little effect on the empirical sizes.

Table 3 Empirical sizes (×10−4) of the mean-variance-based tests (QMVXcat and QMVZmax), mean-based tests (QXcat, QZmax, Tchenw, Tplinkw, Tchen and Tplink) and variance-based tests (wM3VNA, wM3V and Fisher) at the significance level of α = 10−4 based on 105 replications in scenario (1) (i.e., no SNP effect) when ρ = 0 and the trait value follows a normal distribution.

Scenario (2): SNP effect on variances only

Table 4 shows the estimated sizes of the six mean-based tests (QXcat, QZmax, Tchenw, Tplinkw, Tchen and Tplink) in scenario (2) when ρ = 0 and 0.05, and the trait value follows a normal distribution. It should be noted that only the sizes of QXcat, QZmax, Tchenw and Tplinkw are controlled well when the variances of the trait value are unequal, while the type I error rates of Tchen and Tplink are higher.

Table 4 Empirical sizes (×10−4) of the mean-based tests (QXcat, QZmax, Tchenw, Tplinkw, Tchen and Tplink) at the significance level of α = 10−4 based on 105 replications in scenario (2) (i.e., SNP effect on variances only) when the trait value follows a normal distribution.

Power comparison

Scenario (2): SNP effect on variances only

The simulated powers of the two mean-variance-based tests (QMVXcat and QMVZmax) and the three variance-based tests (wM3VNA, wM3V and Fisher) against nf:nm in scenario (2) under HWE when the trait value is normally distributed are displayed in Supplementary Fig. S1. It is shown in Supplementary Fig. S1 that wM3VNA has better performance in terms of power than the other methods. Because the mean-based tests QXcat and QZmax give the type I error rates under scenario (2), the powers of QMVXcat and QMVZmax are close to each other and are less than those of the three methods for testing variances. Generally, when (qf, qm) remains unchanged, the powers of the five methods gradually become less when nf:nm changes from 2:1, 1:1 to 1:2 (i.e., more male individuals). The powers of these methods for (qf, qm) = (0.2, 0.3) and (0.3, 0.2) are higher than those for (qf, qm) = (0.2, 0.2) when nf:nm is fixed (Supplementary Fig. S1b vs. Supplementary Fig. S1a and Supplementary Fig. S1c vs. Supplementary Fig. S1a). The corresponding test powers when ρ = 0.05 are presented in Supplementary Fig. S2. We find that the performances of the tests in Supplementary Fig. S2 are similar to those in Supplementary Fig. S1.

Scenario (3): SNP effect on means only under XCI-E

Under scenario (3), the methods for testing variances (wM3VNA, wM3V and Fisher) present the type I error rates instead of the test powers (data not shown for brevity). In addition, Supplementary Table S2 shows that when βg = 0.085 and ρ = 0 for a normally distributed trait value in scenario (3), the powers of the existing mean-based tests Tchen and Tplink are very close to those of Tchenw and Tplinkw, respectively. Hence, we remove the simulation results of the three variance-based tests, Tchen and Tplink from all the figures under this scenario for simplicity. The estimated powers of the two methods for simultaneously testing means and variances (QMVXcat and QMVZmax) and the four methods for testing means (QXcat, QZmax, Tchenw and Tplinkw) against nf:nm in scenario (3) when βg = 0.085, ρ = 0 and the trait value follows a normal distribution are plotted in Fig. 1. From Fig. 1, we find that the mean-based test QZmax performs the best and the performance of the mean-variance-based test QMVXcat is the worst. Testing means using QXcat is more powerful than testing means using the mean-variance-based test QMVZmax or the two existing mean-based tests (i.e., Tchenw and Tplinkw). QMVZmax and Tchenw have similar performance in terms of power, and the power of Tplinkw is larger. All the methods in Fig. 1 become less powerful as nf:nm decreases (i.e., more male individuals). When nf:nm is unchanged, the powers of these methods when (qf, qm) = (0.2, 0.3) and (0.3, 0.2) are higher than those when (qf, qm) = (0.2, 0.2) (Fig. 1b vs. Fig. 1a and Fig. 1c vs. Fig. 1a). The powers of these methods in scenario (3) (i.e., SNP effect on means only under XCI-E) when βg = 0.098 and ρ = 0 are given in Supplementary Fig. S3, and the corresponding results for ρ = 0.05 when βg = 0.085 and 0.098 are shown in Supplementary Figs. S4 and S5, respectively. From these figures, we can see that the power when βg = 0.098 is higher than those when βg = 0.085 (Supplementary Fig. S3 vs. Fig. 1 and Supplementary Fig. S5 vs. Supplementary Fig. S4). Different values of ρ have minimal effect on the power.

Fig. 1: Powers of the two mean-variance-based tests and the four mean-based tests against nf:nm.
figure 1

The two mean-variance-based tests are QMVXcat and QMVZmax. The four mean-based tests are QXcat, QZmax, Tchenw and Tplinkw. These results are based on 105 replications in scenario (3) (i.e., SNP effect on means only under XCI-E), where N = 6000, βg = 0.085 and ρ = 0 at the significance level of α = 10−4 when the trait value follows a normal distribution. a (qf, qm) = (0.2, 0.2). b (qf, qm) = (0.2, 0.3). c (qf, qm) = (0.3, 0.2).

Scenarios (4) and (5): SNP effect on both means and variances

Since Tchen and Tplink for testing means have increased empirical sizes when the variances of the trait value across genotypes are unequal, we remove them from all the figures in scenarios (4) and (5). Figure 2 gives the estimated power of the two mean-variance-based tests (QMVXcat and QMVZmax), the four mean-based tests (QXcat, QZmax, Tchenw and Tplinkw) and the three variance-based tests (wM3VNA, wM3V and Fisher) against different γ values in scenario (4) (i.e., SNP effect on both means and variances under XCI) when βg = b = 0.085, ρ = 0 and the trait value follows a normal distribution. We can see from Fig. 2 that the two mean-variance-based tests have almost the same performance in terms of power and are more powerful than the other tests. For the four methods testing for means, when γ = 2 and nf:nm = 2:1 or 1:1 (subplots 2a-2f of Fig. 2), the powers of QXcat, Tchenw and Tplinkw are close to each other and are slightly larger than that of QZmax. However, when γ = 2 and nf:nm = 1:2 (subplots 2g-2i of Fig. 2), the four mean-based tests perform similarly. For the cases when γ = 0, the proposed QXcat test generally performs the best, and the other three mean-based methods have similar powers, except for the situations where (qf, qm) = (0.3, 0.2). For the cases when γ = 0 and (qf, qm) = (0.3, 0.2), the existing Tplinkw test has the least power when nf:nm = 2:1 or 1:1 (subplots 2c and 2f of Fig. 2), while the two existing tests (Tchenw and Tplinkw) have similar powers and perform worse than the two proposed tests (QXcat and QZmax) when nf:nm = 1:2 (subplot 2i of Fig. 2). When γ = 0.5, 1 and 1.5, the powers of the four mean-based tests are not much different when nf:nm = 2:1 and 1:1 (subplots 2a-2f of Fig. 2), while the existing Tchenw test has the smallest power when nf:nm = 1:2 (subplots 2g-2i of Fig. 2). In addition, the powers of the two mean-variance-based tests and four mean-based tests increase as γ increases, while the powers of the methods testing for variances under different values of γ are not different because the extra variance for heterozygous females caused by XCI (i.e., \(\frac{\gamma }{2}\left( {1 - \frac{\gamma }{2}} \right)b^2\)) attains the maximum value of 0.0018 when γ = 1, which is very small. For each fixed (nf, nm), all the methods when (qf, qm) = (0.2, 0.3) and (0.3, 0.2) perform better than those when (qf, qm) = (0.2, 0.2) (e.g., Fig. 2b vs. Fig. 2a and Fig. 2c vs. Fig. 2a). For each value of (qf, qm), the two methods for simultaneously testing means and variances and the four methods for testing means become more powerful when nf: nm changes from 2:1, 1:1 to 1:2 (e.g., Fig. 2a vs. Fig. 2d, Fig. 2a vs. Fig. 2g and Fig. 2d vs. Fig. 2g), while the powers of the methods for testing variances generally appear less. These results indicate that larger values of qf and qm may improve the powers of all the methods and that the three variance-based tests can be more efficient with higher nf: nm (i.e., larger female individuals). However, a lower nf: nm (i.e., more male individuals) may cause the two methods simultaneously testing for means and variances and the four mean-based tests to be more powerful.

Fig. 2: Powers of the two mean-variance-based tests, four mean-based tests and three variance-based tests against γ.
figure 2

The two mean-variance-based tests are QMVXcat and QMVZmax. The four mean-based tests are QXcat, QZmax, Tchenw and Tplinkw. The three variance-based tests are wM3VNA, wM3V and Fisher. These results are based on 105 replications in scenario (4) (i.e., SNP effect on both means and variances under XCI), where βg = b = 0.085 and ρ = 0 at the significance level of α = 10−4 when the trait value follows a normal distribution. a (nf, nm) = (4000, 2000) and (qf, qm) = (0.2, 0.2). b (nf, nm) = (4000, 2000) and (qf, qm) = (0.2, 0.3). c (nf, nm) = (4000, 2000) and (qf, qm) = (0.3, 0.2). d (nf, nm) = (3000, 3000) and (qf, qm) = (0.2, 0.2). e (nf, nm) = (3000, 3000) and (qf, qm) = (0.2, 0.3). f (nf, nm) = (3000, 3000) and (qf, qm) = (0.3, 0.2). g (nf, nm) = (2000, 4000) and (qf, qm) = (0.2, 0.2). h (nf, nm) = (2000, 4000) and (qf, qm) = (0.2, 0.3). i (nf, nm) = (2000, 4000) and (qf, qm) = (0.3, 0.2).

We plot the powers of all these methods in scenario (4) (i.e., SNP effect on both means and variances under XCI) when βg = b = 0.098 and ρ = 0, and the corresponding results for ρ = 0.05 when βg = b = 0.085 and βg = b = 0.098 in Supplementary Figs. S6S8, respectively. By comparing Fig. 2 with Supplementary Fig. S6 or comparing Supplementary Fig. S7 with Supplementary Fig. S8, we find that for the methods testing for variances, the powers when βg = b = 0.085 are similar to those when βg = b = 0.098 because for different values of γ, the additional variances caused by XCI (i.e., \(\frac{\gamma }{2}\left( {1 - \frac{\gamma }{2}} \right)b^2\)) for b = 0.085 are close to those for b = 0.098; for the two mean-variance-based tests and the four mean-based tests, the powers when βg = b = 0.098 are higher than those when βg = b = 0.085.

The estimated powers of the two methods for simultaneously testing means and variances (QMVXcat and QMVZmax), four methods for testing means (QXcat, QZmax, Tchenw and Tplinkw) and three methods for testing variances (wM3VNA, wM3V and Fisher) against nf:nm in scenario (5) (i.e., SNP effect on both means and variances under XCI-E) when βg = 0.085 and ρ = 0 are presented in Fig. 3. The corresponding results when βg = 0.098 and ρ = 0 and those with ρ = 0.05 when βg = 0.085 and 0.098 are given in Supplementary Figs. S9S11. It can be seen from these figures that under scenario (5), QMVZmax for simultaneously testing means and variances is the most powerful, the two mean-variance-based tests are more powerful than the other seven methods, and the power of Tchenw for testing means is the worst. Among the four mean-based tests (QXcat, QZmax, Tchenw and Tplinkw), the order of the performance in terms of power is QZmax > QXcat > Tplinkw > Tchenw. In addition, the power performances of the three variance-based tests in Fig. 3 and Supplementary Figs. S9S11 are similar to those in Supplementary Figs. S1 and S2.

Fig. 3: Powers of the two mean-variance-based tests, four mean-based tests and three variance-based tests against nf:nm.
figure 3

The two mean-variance-based tests are QMVXcat and QMVZmax. The four mean-based tests are QXcat, QZmax, Tchenw and Tplinkw. The three variance-based tests are wM3VNA, wM3V and Fisher. These results are based on 105 replications in scenario (5) (i.e., SNP effect on both means and variances under XCI-E), where N = 6000, βg = 0.085 and ρ = 0 at the significance level of α = 10−4 when the trait value follows a normal distribution. a (qf, qm) = (0.2, 0.2). b (qf, qm) = (0.2, 0.3). c (qf, qm) = (0.3, 0.2).

Other simulation results

We also simulate the type I error rates and powers for all the considered test statistics for all the abovementioned situations when the trait value follows a log-normal distribution. The simulation results are shown in Supplementary Tables S3S5 and Supplementary Figs. S12S25. From Supplementary Tables S3S5, all the sizes stay close to the nominal level, except for the mean-based tests Tchen and Tplink under scenario (2), where the variances across genotypes can be unequal. From Supplementary Figs. S12S25, we find that the power performances of all the methods and the impact of (qf, qm), nf: nm, γ, ρ, βg and b on the powers of all the methods in scenarios (2)–(5) are similar to those when the trait value is normally distributed.

Application to the MCTFR data

The Minnesota Center for Twin and Family Research Genome-Wide Association Study of Behavioral Disinhibition is a family-based study that includes age (covariate) and five quantitative traits: the nicotine composite score (NIC), the alcohol consumption composite score (CON), the alcohol dependence composite score (DEP), the behavioral disinhibition composite score (BD) and the illicit drug composite score (DRG). This dataset is available from the database of Genotypes and Phenotypes (https://www.ncbi.nlm.nih.gov/gap/) with the accession number phs000620.v1.p1. This dataset includes 2183 families and 7377 individuals, including 3546 males and 3831 females. There are four types of offspring in this dataset, which are monozygotic twins, full biological nontwin siblings, adopted siblings and mixed siblings, which include one biological offspring and one adopted offspring. More details of the family structure in this dataset can be found in Fig. 7 of Li et al. (2021) and Supplementary Fig. S26 in this article for easy reference. In this dataset, 12,354 SNPs on the X chromosome are genotyped.

To ensure that the included individuals are independent, we only use the data of parents in the dataset. Then, the quality control procedures are conducted, in which we first exclude the individuals with a missing genotype rate greater than 10% and select the SNPs for which the minor allele frequencies are greater than 5%, the missing rates are less than 10%, the minimum genotype counts are larger than 20 and the p values of the HWE test are larger than 1 × 10−6 (Ma et al. 2015b; Soave et al. 2015; Marees et al. 2018). As a result, a total of 3649 independent individuals (1949 females and 1700 males) and 9963 SNPs are included in this application. We apply our proposed methods (i.e., QMVXcat, QMVZmax, QXcat and QZmax) and the existing methods (i.e., Tchenw, Tplinkw, Tchen, Tplink, wM3VNA, wM3V and Fisher) to this subset of the MCTFR data.

Note that sex dimorphism of the five quantitative traits generally exists, and the histograms of the five traits for all the individuals, females only and males only are different in the MCTFR data, which are shown in Supplementary Fig. S27. Furthermore, all the residuals estimated from Models (1) and (2) fail to pass the normality tests. According to McCaw et al. (2019), we use the I-INT method to transform the five quantitative traits in females and males and then apply the 11 methods mentioned above to conduct the corresponding association analysis. Here, we include age as the covariate.

Since the five traits in this dataset share many similarities, similar to Schifano et al. (2013), we set the significance level to 1 × 10−3 to find the SNPs that are simultaneously associated with multiple traits. As a result, SNP rs808144 is identified to be simultaneously associated with four traits (BD, DEP, DRG and NIC). Table 5 shows the p values of all the methods for SNP rs808144, from which we discover that SNP rs808144 only influences the mean values of these four traits while having no effect on their variances (all the p values of the variance-based tests wM3VNA, wM3V and Fisher are larger than 1 × 10−3). The p values of the proposed mean-based tests QXcat and QZmax are close to those of the existing mean-based tests (Tchenw, Tplinkw, Tchen and Tplink). In addition, nine SNPs (rs808141, rs5934722, rs5926861, rs7064741, rs5942608, rs17261621, rs204332, rs5977759 and rs5925540) are found to be simultaneously associated with two traits. The p values of all the methods for these nine SNPs are given in Supplementary Table S6. Specifically, SNPs rs808141, rs5926861, rs7064741, rs204332, rs5977759 and rs5925540 only have effects on the mean values of the traits. Among these six SNPs, BD is statistically significantly associated with SNPs rs5926861, rs204332 and rs5925540; CON is only associated with SNP rs5977759; DEP is associated with SNPs rs5926861, rs7064741 and rs5977759; DRG is associated with SNPs rs808141, rs7064741 and rs204332; and NIC is associated with SNPs rs808141 and rs5925540. For SNPs rs808141, rs5926861, rs7064741 and rs5977759, the p values of six mean-based tests (QXcat, QZmax, Tchenw, Tplinkw, Tchen and Tplink) are close to each other, while for SNP rs204332, the p values of QZmax are much larger than those of the other five mean-based tests, and for SNP rs5925540, the p values of QZmax, Tplinkw and Tplink are not much different and are much larger than those of QXcat, Tchenw and Tchen. In addition, we find that SNP rs5934722 only affects the variances of BD (pwM3VNA = 4.18 × 10−4, pwM3V = 5.01 × 10−4 and pFisher = 5.04 × 10−4) and DRG (pwM3VNA = 8.81 × 10−4 and pwM3V = 8.70 × 10−4). SNP rs17261621 is statistically significantly associated with the variance differences of BD (pwM3V = 7.66 × 10−4). From the p values of all the methods for SNP rs17261621 and DRG, only the mean-variance-based test QMVZmax gives the statistically significant result (\(p_{{{{\mathrm{QMVZ}}}}_{{{{\mathrm{max}}}}}} = 3.98 \times 10^{ - 4}\)). Additionally, from the p values of all the methods for SNP rs5942608, only the p values of QMVXcat for simultaneously testing means and variances are lower than the significance level 1 × 10−3, where the p values of QMVXcat for DEP and NIC are 4.53 × 10−4 and 8.02 × 10−4, respectively. This indicates that either the means or the variances of the trait values across different genotypes are different, which needs to be further investigated.

Table 5 p values of the mean-variance-based tests (QMVXcat and QMVZmax), mean-based tests (QXcat, QZmax, Tchenw, Tplinkw, Tchen and Tplink) and variance-based tests (wM3VNA, wM3V and Fisher) for SNP rs808144, which is statistically significantly associated with four traits at the significance level of 1 × 10−3.

We summarize the positions, minor alleles, major alleles, minor allele frequencies, p values of the HWE test and the genes consisting of the abovementioned 10 SNPs in Supplementary Table S7. We find that SNP rs5934722 is within the SHROOM2 gene, which is reported to be associated with autistic disorder and neurodevelopmental disorders (Kearney et al. 2011; Richards et al. 2015). SNP rs5926861 is included in the DCAF8L2 gene, which has been reported to be associated with autistic disorder, neurodevelopmental disorders and syndromic X-linked intellectual disability Lubs type (Kushima et al. 2018). SNP rs7064741 is located in the GLRA4 gene, which is related to intellectual disability, behavioral problems and craniofacial anomalies (Labonne et al. 2016). SNP rs5977759 is in the HS6ST2 gene, which is associated with the development of myopia and cognitive impairment (Paganini et al. 2019).

Discussion

In this article, we propose four association tests (QMVXcat, QMVZmax, QXcat and QZmax) for X-linked quantitative traits under the assumptions that the risk alleles for females and males are the same and the SNP being studied satisfies the generalized genetic model in females. Among these tests, QXcat and QZmax focus on testing for the mean differences of quantitative traits, while QMVXcat and QMVZmax simultaneously test for both the mean and variance differences of quantitative traits. In addition, we choose two ways to incorporate the XCI information. In QMVXcat and QXcat, we introduce two indicator variables for females, which can be used in testing for the association under all the XCI patterns, and then directly combine the p values of the test statistics based on females and males. In QMVZmax and QZmax, we combine the test statistics for females and males by different weights to consider different dosage compensation patterns and then obtain the test statistic by maximizing these combined test statistics. Extensive simulations are conducted to evaluate the type I error rates and the test powers of these proposed methods and the existing methods Tchenw, Tplinkw, Tchen, Tplink, wM3VNA, wM3V and Fisher. The simulation results show that our proposed methods control the type I error rates in various scenarios well. In the simulated scenarios where the mean values of the trait value are affected by the SNP, two proposed mean-based tests QXcat and QZmax have better performance in terms of power than the existing methods for testing means under XCI-E and in some cases of XCI. In the simulated scenarios where both the means and the variances of the trait value are affected by the SNP, the two proposed mean-variance-based tests QMVXcat and QMVZmax outperform the others, as expected.

For the combination of p values, we use Fisher’s method (Fisher et al. 1967), Stouffer’s method (Stouffer et al. 1949) and Cauchy’s method (Liu and Xie 2020) to combine the p value of wM3VNA for testing variances with those of QXcat and QZmax for testing means to obtain the p values of QMVXcat and QMVZmax for simultaneously testing means and variances. In Stouffer’s method, two p values are transformed to the p upper quantiles of the standard normal distribution, and then \(\frac{1}{{\sqrt 2 }}\) times the sum of these two quantiles is used as the final test statistic, which follows the standard normal distribution under \({{{\mathrm{H}}}}_0^{{{{\mathrm{MV}}}}}\). In Cauchy’s method, we first transform two p values to the corresponding quantiles of the standard Cauchy distribution and then calculate the average of these two quantiles as the final test statistic, which follows the standard Cauchy distribution under \({{{\mathrm{H}}}}_0^{{{{\mathrm{MV}}}}}\). We compare the test powers of the two mean-variance-based tests (QMVXcat and QMVZmax) using the three combination methods under HWE for scenario (3) (i.e., SNP effect on means only under XCI-E) with βg = 0.085 and scenario (4) (i.e., SNP effect on both means and variances under XCI) with βg = b = 0.085 when the trait value follows a normal distribution. The estimated powers of both methods under scenario (3) are listed in Supplementary Table S8, and the corresponding results of QMVXcat and QMVZmax under scenario (4) are given in Supplementary Tables S9 and S10, respectively. From Supplementary Table S8, both QMVXcat and QMVZmax achieve the highest powers when using Cauchy’s method in scenario (3), which are slightly larger than those with Fisher’s method. The power using Stouffer’s method are much less than those using the other two combination methods. In Supplementary Tables S9 and S10, we find that the test powers utilizing Fisher’s method and Stouffer’s method are close to each other, and both are much larger than that of Cauchy’s method. Therefore, we finally choose the robust Fisher’s method to construct the mean-variance-based tests QMVXcat and QMVZmax. Additionally, Chen (2022a) recently proposed a method based on the constrained likelihood ratio test for combining independent p values and showed that this combination method is robust and powerful under many conditions. Moreover, two novel robust tests for combining dependent p values (i.e., MCM and CMC) were suggested by Chen (2022b). Both the simulation results and the real data application demonstrated that the MCM and CMC methods are robust and powerful under many situations and can be considered alternatives to Cauchy’s method. We use the combination methods proposed in the work by Chen (2022a) and Chen (2022b) to calculate the p values of QMVXcat and QMVZmax for simultaneously testing the means and variances in the future and compare the powers of QMVXcat and QMVZmax utilizing these three methods with those using Fisher’s method.

For the mean-based test QXcat, we consider three combination methods to construct the test statistic. The first way is to directly combine two p values for females (i.e., \(p_{f1}^A\) and \(p_{f2}^A\) if the risk allele is A) with the p value for males (i.e., \(p_m^A\) if the risk allele is A) based on Fisher’s method and obtain the corresponding test statistic. The second way is to first combine two test statistics for females (i.e., \(T_{f1}^A\) and \(T_{f2}^A\)) to \(T_f^A\) and compute the corresponding p value, then combine it with the p value for males based on Fisher’s method. The third way is the one we choose for QXcat in this article, which has been introduced in the Materials and methods section. The power performances of QXcat under three combinations are also compared in different scenarios, and we find that QXcat under the third combination achieves the highest power in general (data not shown for brevity).

For the mean-based test QZmax, two test statistics that incorporate more dosage compensation patterns, i.e., \({{{\mathrm{QZ}}}}_{{{{\mathrm{max}}}}3} = {{{\mathrm{max}}}}\left( {\left| {T_{\lambda _1}} \right|,\left| {T_{\lambda _{1.5}}} \right|,\left| {T_{\lambda _2}} \right|} \right)\) and \({{{\mathrm{QZ}}}}_{{{{\mathrm{max}}}}5} = {{{\mathrm{max}}}}\left( {\left| {T_{\lambda _1}} \right|,\left| {T_{\lambda _{1.25}}} \right|,\left| {T_{\lambda _{1.5}}} \right|,\left| {T_{\lambda _{1.75}}} \right|,\left| {T_{\lambda _2}} \right|} \right)\), are also considered. We compare their power performance with \({{{\mathrm{QZ}}}}_{{{{\mathrm{max}}}}} = {{{\mathrm{max}}}}\left( {\left| {T_{\lambda _1}} \right|,\left| {T_{\lambda _2}} \right|} \right)\) under HWE and (qf, qm) = (0.2, 0.2) for scenario (3) (i.e., SNP effect on means only under XCI-E) with βg = 0.085 and scenario (4) (i.e., SNP effect on both means and variances under XCI) with βg = b = 0.085 when the trait value follows a normal distribution. The corresponding results are given in Supplementary Table S11, which shows that the powers of QZmax, QZmax3 and QZmax5 are close to each other. Note that QZmax3 and QZmax5 are much more computationally intense than QZmax. Therefore, we recommend choosing the test statistic \({{{\mathrm{QZ}}}}_{{{{\mathrm{max}}}}} = {{{\mathrm{max}}}}\left( {\left| {T_{\lambda _1}} \right|,\left| {T_{\lambda _2}} \right|} \right)\) in practice.

The proposed mean-based tests QXcat and QZmax assume that the risk alleles for females and males are the same, and the SNP being studied satisfies the generalized genetic model in females (i.e., μf2 ≥ μf1 ≥ μf0). When these two assumptions are satisfied in practice, the methods of constructing the test statistics QXcat and QZmax can effectively incorporate the information from these two assumptions and hence can improve the test powers. For instance, if the risk alleles in females and males are both A and μf2 > μf1 > μf0, the signs of βf1 and βf2 in Model (1) and that of βm1 in Model (2) are the same, and all of them are positive. For QXcat, the one-sided p values \(p_{f1}^A = 1 - {{\Phi }}\left( {T_{f1}^A} \right)\) and \(p_{f2}^A = 1 - {{\Phi }}\left( {T_{f2}^A} \right)\) are smaller than the one-sided p values \(p_{f1}^a = 1 - {{\Phi }}\left( {T_{f1}^a} \right)\) and \(p_{f2}^a = 1 - {{\Phi }}\left( {T_{f2}^a} \right)\), respectively, in females. Thus, \(Q_f^A = - 2{{{\mathrm{ln}}}}\left( {p_{f1}^Ap_{f2}^A} \right)\) is larger than \(Q_f^a = - 2{{{\mathrm{ln}}}}\left( {p_{f1}^ap_{f2}^a} \right)\), and the corresponding p values satisfy \(p_f^A \,<\, p_f^a\). Similarly, \(p_m^A\) is smaller than \(p_m^a\) in males. Then, \(Q^A = - 2{{{\mathrm{ln}}}}\left( {p_f^Ap_m^A} \right)\) is larger than \(Q^a = - 2{{{\mathrm{ln}}}}\left( {p_f^ap_m^a} \right)\). By utilizing the information that \(p_{f1}^A\), \(p_{f2}^A\) and \(p_m^A\) are smaller than \(p_{f1}^a\), \(p_{f2}^a\). and \(p_m^a\), respectively, a final test statistic with a relatively large absolute value is obtained by maximizing QA and Qa, so the test power of QXcat = max(QA, Qa) will increase. For QZmax, because both \(T_{f1}^A\) and \(T_{f2}^A\) should be positive with a high probability, \(T_f^A = \frac{1}{{\sqrt 2 }}\left( {T_{f1}^A + T_{f2}^A} \right)\) is also larger than zero. In addition, note that both \(T_f^A\) and \(T_m^A\) should be greater than zero, so the signs of \(T_{\lambda _1} = \sqrt {\lambda _1} T_f^A + \sqrt {1 - \lambda _1} T_m^A\) and \(T_{\lambda _2} = \sqrt {\lambda _2} T_f^A + \sqrt {1 - \lambda _2} T_m^A\) are the same as those of \(T_f^A\) and \(T_m^A\). Therefore, the test power of \({{{\mathrm{QZ}}}}_{{{{\mathrm{max}}}}} = {{{\mathrm{max}}}}\left( {\left| {T_{\lambda _1}} \right|,\left| {T_{\lambda _2}} \right|} \right)\) is improved by the weighted average of the test statistics \(T_{f1}^A\), \(T_{f2}^A\) and \(T_m^A\) having the same signs.

However, if either of these two assumptions is violated, both QXcat and QZmax may lose the test power. For example, considering the situation where the risk allele in females is A while that in males is a, the signs of βf1 and βf2 are different from that of βm1 (i.e., \(p_f^A \,<\, p_f^a\) and \(p_m^A \,>\, p_m^a\)). Then, both the test statistics \(Q^A = - 2{{{\mathrm{ln}}}}\left( {p_f^Ap_m^A} \right)\) and \(Q^a = - 2{{{\mathrm{ln}}}}\left( {p_f^ap_m^a} \right)\) are less than \(- 2{{{\mathrm{ln}}}}\left( {p_f^Ap_m^a} \right)\) (assuming that the risk alleles for females and males are known), and QXcat = max(QA, Qa) is not the best combination of \(p_{f1}^A\), \(p_{f2}^A\), \(p_{f1}^a\), \(p_{f2}^a\), \(p_m^A\) and \(p_m^a\), which may reduce the test power. For QZmax, the signs of \(T_f^A\) and \(T_m^A\) should be different with a high probability; then, \(T_f^A\) and \(T_m^A\) may be canceled out in the calculation of \(T_{\lambda _1} = \sqrt {\lambda _1} T_f^A + \sqrt {1 - \lambda _1} T_m^A\) and \(T_{\lambda _2} = \sqrt {\lambda _2} T_f^A + \sqrt {1 - \lambda _2} T_m^A\). In this case, a smaller value of the final test statistic \({{{\mathrm{QZ}}}}_{{{{\mathrm{max}}}}} = {{{\mathrm{max}}}}\left( {\left| {T_{\lambda _1}} \right|,\left| {T_{\lambda _2}} \right|} \right)\) will be obtained, and hence, the power of QZmax will be reduced. However, if the SNP being studied does not satisfy the generalized genetic model in females (e.g., μf1 > μf2 > μf0), βf1 > 0 and βf2 < 0 (i.e., \(p_{f1}^A \,<\, p_{f1}^a\) and \(p_{f2}^A \,>\, p_{f2}^a\)) when A is assumed to be the risk allele. As such, \(Q_f^A = - 2{{{\mathrm{ln}}}}\left( {p_{f1}^Ap_{f2}^A} \right)\) and \(Q_f^a = - 2{{{\mathrm{ln}}}}\left( {p_{f1}^ap_{f2}^a} \right)\) are smaller than \(- 2{{{\mathrm{ln}}}}\left( {p_{f1}^Ap_{f2}^a} \right)\). Hence, the final test statistic QXcat = max(QA, Qa) may be very small, and the test power may be low. For QZmax, \(T_{f1}^A\) (larger than zero) and \(T_{f2}^A\) (less than zero) can be canceled out in calculating \(T_f^A = \frac{1}{{\sqrt 2 }}\left( {T_{f1}^A + T_{f2}^A} \right)\). Therefore, the final test statistic \({{{\mathrm{QZ}}}}_{{{{\mathrm{max}}}}} = {{{\mathrm{max}}}}\left( {\left| {T_{\lambda _1}} \right|,\left| {T_{\lambda _2}} \right|} \right)\) is reduced, and the corresponding test power is lower. Note that it is generally reasonable to assume that the risk alleles for females and males are the same and that the SNP being studied satisfies the generalized genetic model for females (Chen et al. 2017). Furthermore, the ideas of constructing the test statistics QXcat and QZmax are similar to those in Chen et al. (2017) and Wang et al. (2019a), respectively, and both the simulation results of Chen et al. (2017) and Wang et al. (2019a) showed that the powers of their proposed methods are generally higher than those of other existing association tests. Additionally, under the simulated scenarios, both the proposed mean-based tests QXcat and QZmax have better performance in power than the existing mean-based tests under XCI-E and in some cases of XCI. We also apply all the considered methods to the MCTFR data, and some further discussions on the violation of the assumptions can be found in Appendix C.

In this article, we consider the departure from HWE by fixing the inbreeding coefficient ρ at 0.05. To further assess the validity of our proposed methods without the HWE assumption, we simulate the following population stratification model by referring to the simulation settings of Haldar and Ghosh (2012) and Xia et al. (2013). Suppose that the whole population consists of two subpopulations, each of which is HWE. The sample of size N = 6000 is composed of N1 and N2 individuals from the first and second subpopulations, respectively. The ratio N1:N2 is set to be 2:3 and 1:1, and the sex ratio in each subpopulation is fixed at 2:1, 1:1 and 1:2. Let qf1 and qm1 (qf2 and qm2) denote the frequencies of allele A for females and males in the first (second) subpopulation, respectively, and (qf1, qm1, qf2, qm2) are assumed to be (0.1, 0.1, 0.9, 0.9) and (0.2, 0.2, 0.5, 0.5), respectively. The simulated type I error rates of four proposed tests (QMVXcat, QMVZmax, QXcat and QZmax) under scenario (1) (i.e., no SNP effect) when ρ = 0 and the trait value follows a normal distribution are shown in Supplementary Table S12, while the empirical sizes of two mean-based tests QXcat and QZmax under scenario (2) (i.e., SNP effect on variances only) when ρ = 0 and the trait value follows a normal distribution are presented in Supplementary Table S13. It can be seen from these two tables that our proposed methods can control the sizes well, which verifies their validity under population stratification.

Our proposed methods have several advantages. First, the proposed mean-variance-based tests have higher powers than the existing methods in the simulated scenarios where both the means and the variances of the trait value across different genotypes are different. Second, our methods incorporate XCI information in two different ways that are necessarily considered when conducting X chromosome association tests. Third, we use the information of the two sexes, which improves the test power. Nonetheless, there are some limitations in our methods. When two assumptions (i.e., the risk alleles in females and males are the same and the genetic effect of heterozygous females is between those of two homozygous females) are not satisfied in practice, the powers of the proposed association tests may decrease. In addition, these methods cannot test for the association between SNP sets and a trait. These methods cannot incorporate the information of family structure, which results in a loss of power and needs to be improved in the future. In summary, our proposed methods not only effectively consider the XCI but are also powerful under XCI-E and in some cases of XCI.

Software

The R package QMVtest is publicly available at https://github.com/yuxinyuanqt/QMVtest, which is implemented by R software (version 4.1.2).