Testing Association between Mixed Type Outcomes and Covariates Jointly by the Use of a Latent Variable

Multiple outcomes are often collected simultaneously in biomedical fields in order to identify whether a continuous response and an ordinal response are associated with some covariates simultaneously. Here we propose a joint statistical model by the use of a latent variable underlying the ordinal response. Asymptotic results are obtained and a jointly test is proposed for testing the continuous response and the ordinal response are associated with some covariates simultaneously. Extensive simulations and real data analysis results indicate more efficient performances of the proposed method than that of the combined p-values method.

Multiple outcomes are often encountered in a variety of fields including social sciences, economics, and biomedical, in order to characterize the effect of a covariate or investigate the association between multiple outcomes and the interested variables. In many cases, these outcomes are of mixed types in the sense that some are continuous, and others may be ordinal. For example, in a mental health study 1 in Florida, USA, the collected outcomes consist of mental impairment, life events, and socioeconomic status(SES) which are of different data types. The mental impairment is ordinal, with 4 categories(1 = well, 2 = mild symptom, 3 = moderate symptom formation, 4 = impaired). For the life events index, it is a composite measure which includes the number and severity of important life events occurred to the subject within the past 3 years, such as the birth of a child, a new job, a divorce, or a death in the family. The life events index can be taken as a continuous response. SES is measured as binary which can be taken as a covariate. Investigators want to judge whether mental impairment or life events index is associated with SES.
Various approaches have been developed to model mixed outcomes in the literature. A direct procedure is to ignore the correlations among multiple outcomes and fit each outcome with a model that best suits its type separately. Dale 2 proposed global cross-ratio models as a measure of association for bivariate, discrete and ordinal responses. For testing association between a bivariate trait with a continuous and discrete (taking values 0, 1) outcomes and a covariate jointly, a widely used class of approaches are latent variable models. Catalano and Ryan 3 proposed a bivariate latent variable models for clustered discrete and continuous outcomes with the joint distirbution being a product of a standard random effects model for the continuous variable and a probit model for the discrete variable. Fitzmaurice and Laird 4 proposed a model for a correlated binary outcome and a continuous outcome based on the factorization of the joint distribution of the outcomes. Sammel et al. 5 presented a latent variable model by assuming that the observed outcomes are physical manifestations of a latent variable. Gueorguieva et al. 6 proposed a correlated probit model to model clustered binary and continuous responses jointly. Teixeira-Pinto et al. 7 provided a new joint model for a binary and a continuous outcomes. In genome-wide association analysis field, Liu et al. 8 developed an extended generalized estimating equation method for bivariate association analyses of continuous and binary traits. Their simulation results demonstrated that, compared with univariate analysis, bivariate analysis could substantially improve power while having comparable type I error rates under certain situations. Yuan et al. 9 extended the joint linkage analysis of multivariate qualitative and quantitative traits described by Williams et al. 10,11 to association analysis. Yuan et al. 9 also assumed two latent variables specified for the qualitative and quantitative traits followed a bivariate normal distribution. With such modeling, likelihood-based inference procedures are introduced to test for pleiotropic genetic effects.
In dealing with an single ordinal(taking values 1, 2, …) outcome, the proportional odds model 12 is usually adopted. In particular, the proportional odds model with the outcome belonging to a set of ordered categories can be regarded as an extension of the logistic regression model for binary outcomes. It can be expressed as a series of logistic regression models for dependent binary variables with common regression parameters reflecting the proportional odds assumption 13 . When the outcome is continuous, the ordinary linear model is the most often used. In order to test the association between a bivariate trait with a continuous and ordinal outcomes and some covariates jointly, we also applied a latent variable model and proposed a statistical approach to model the ordinal outcome and continuous outcome simultaneously. Let us begin with two ordinary linear models , G (p) ) τ represents a p-dimensional covariate, α 0 and β 0 are the intercept parameters, β = (β 1 , , β p ) τ and α = (α 1 , , α p ) τ are parameters, and (ε 1 ,ε 2 ) τ follow the bivariate normal distribution with mean vector (0, 0) τ and symmetric covariance matrix consisting of diagonal elements being σ 1 2 , and σ 2 2 , and off-diagonal elements both being ρσ 1 σ 2 , respectively. We note that standard errors σ 1 , σ 2 and correlation coefficient ρ are unknown. In addition, suppose that Z is a latent variable underlied an observed ordinal response Y 2 which can be observed in the following manners: where k is an integer related to the number of categories for Y 2 and γ 1 ,γ 2 , , γ k are k ordered cutpoint values. Note that the latent variable Z is unobserved.
Based on the foregoing assumption, (Y 1i , Z i ) τ are independently distributed as a bivariate normal distribution with mean vector α α β β 2 . By some algebra, for j = 1, , k, and i = 1, …, n, it can be obtained that , for j = 1, , k, and i = 1, …, n. In this paper, we propose a joint test for testing the association between a bivariate with an ordinal response and a continuous response and the covariates of interest. We derive the asymptotic properties for the estimators for parameters of interested covariates in the joint model. Extensive simulations are conducted to compare the performances of our proposed method to those of the existing combined p-values method. Application to the aforementioned mental impairment study in Florida further demonstrate good performances of our new method.

Joint Model for a Bivariate with a continuous response and an ordinal responses and a Covariate.
represents error term and has expectation value 0, so β * in (2) can be taken as a measure for the association between the ordinal response Y 2 and covariates G after adjusting the effect of continuous response Y 1 . But is unobserved for = .  i n 1, , Alternatively, a more common procedure instead of (2) is to use 0 , α * = αθ * , and α α θ = ⁎ ⁎ 0 0 . It is obvious that the model (3) links the ordinal response Y 2 with G and Y 2 with a similar manner as the proportional odds model. The main difference is that (3) uses the standard normal distribution Φ as the link function, while logit function is utilized by the proportional odds model.
With α = α * /θ * , the joint model of continuous variable Y 1 and ordinal variable Y 2 can be constructed as follows:

Testing Association between a Bivariate and a Covariate. The maximum likelihood estimates (MLEs) for
, and σ 1 2 can be obtained by maximizing the log-likelihood function which is implemented by solving score equations in the method section. Denote and denote the corresponding MLE by Following the statistical asymptotical theory 14 , under the null hypothesis where neither the continuous response Y 1 nor the ordinal response Y 2 is associated with covariates G, we have − τ τ n a a ( ) asymptotically follows from the normal distribution with mean vector 0 (2p+k+4)×1 and covariance I −1 (a), where I(a) is the Fisher information matrix(see method section). As mentioned earlier, β * in (2) measure the association between the ordinal response Y 2 and G by adjusting the effect of continuous response Y 1 . Denote β β α = + ⁎ ⁎ ⁎ ⁎ and let V be a submatrix corresponding to the 3rd to the (2p + 2)th rows and columns of I −1 (a), and B = (I p , I p ), where I p is a p × p identity matrix. Then we can conclude that β  ⁎ is an asymptotic unbiased estimate for the parameter β * and it follows asymptotically from the normal distribution with mean vector β ** + α * and covariance matrix BVB τ as the sample size n goes to infinity according to the multi-normal distribution theory 15 .
Our goal of this paper is to test whether the bivariate with the outcomes of mixed types (Y 1 and Y 2 ) is associated with covariates G of interest. The null hypothesis is neither the continuous response Y 1 nor the ordinal response Y 2 is associated with the variable G.
. The detailed derivation of W is displayed in method section. With this, we propose a joint test denoted by 1 which follows from a Chi-Squared distribution with degree freedom of 2p under the null hypothesis, where = τ W HVH is a consistent estimate of the covariance matrix W, and  V is a consistent estimate of the covariance matrix V corresponding to the 3rd to the 2p + 2th rows and columns of −Î a ( ) 1 .
Simulation Results. In this section, we explore the performances of JT by comparing it to a method of combined p-values 16 (denoted by CP). In our case, CP can be implemented as: firstly, we apply the ordinary linear model to regressing Y 1 on G and calculate the p-value denoted by pv 1 for testing the association between Y 1 and G; secondly, we apply the proportional odd model to regressing the ordinal response Y 2 on G and denote the resulted p-value by pv 2 ; lastly, we use the statistic CP = −2log(pv 1 ) − 2log(pv 2 ) as the final test. The p-value of CP can be calculated by the permutation method with 200 iterations. Assume that p = 1, and n = 500. We generate a 1-dimensional covariate G from standard normal distribution with sample size n and fix it as a covariate in our simulation iterations. The correlated error terms can be sampled from a bivariate normal distribution with mean vector (0, 0) τ and covariance matrix in which its diagonal elements are both equal to 1 and the non-diagonal element is equal to ρ. We generate continuous variable Y 1 and the latent continuous variable Z based on the former two linear models in introduction section with parameter α 0 , β 0 , α 1 and β 1 . We consider 2 scenarios with 3 ordinal category values(k = 2) and 4 ordinal category values(k = 3). Tables 1 and 2 show the empirical type I error rates for k = 2 and k = 3 respectively. Tables 3 and 4 show the empirical powers for k = 2 and k = 3 respectively. For the first scenario with k = 2, the final ordinal outcome Y 2 can be obtained by dichotomizing Z into 1, 2, 3 with the thresholds of γ 1 = 0 and γ 2 = 1. The nominal significant level is set to be 0.05 to calculate the empirical type I error rate and power. All results are calculated based on 500 replicates. From Table 1, we can see that both JT and CP have correct type I error rate since the values are always close to the nominal significance level of 0.05. For example, when α 0 = 0.2, β 0 = 0.2, and ρ = 0.3, the empirical type I error rates of JT and CP are 0.047 and 0.052, respectively. For example, when α 0 = 1.2, β 0 = 1.2, and ρ = 0.9, the empirical type I error rates of JT and CP are 0.048 and 0.053, respectively. From Table 3, it can be seen that the performance of CP is unsatisfactory when the correlation coefficient ρ is large. However, our proposed test JT always has more desirable powers than CP when parameter α 1 is much smaller than β 1 and the correlation between Y 1 and Y 2 is not weak. For the second scenario with k = 3, the final ordinal outcome Y 2 can be obtained by dichotomizing Z into 1, 2, 3, 4 with the thresholds of γ 1 = −0.6, γ 2 = 0 and γ 3 = 0.6. The nominal significant level is set to be 0.05 to calculate the empirical type I error rate and power. All results are calculated based on 500 replicates. From Table 2, we can see that both JT and CP have correct type I error rate since the values are always close to the nominal significance level of 0.05. For example, when α 0 = 0.2, β 0 = 0.2, and ρ = 0.4, the empirical type I error rates of JT and CP are 0.05 and 0.046, respectively. For example, when α 0 = 1.2, β 0 = 1.2, and ρ = 0.8, the empirical type I error rates of JT and CP are 0.046 and 0.049, respectively. From Table 3, it can be seen that the performance of CP is unsatisfactory when the correlation coefficient ρ is large. However, our proposed test JT always has more desirable powers than CP when parameter α 1 is much smaller than β 1 and the correlation between Y 1 and Y 2 is not weak.  Table 1. Empirical type-1 error rates for tests JT and CP with 3 ordinal categories(k = 2).

Real Data Analysis.
To further explore the performance of JT and CP on the testing for the association between multiple outcomes of mixed types and interested covariates, we apply them to the mental health study 1 Table 3. Empirical powers for tests JT and CP with 3 ordinal categories(k = 2).

Discussion
In this paper, we propose a joint model for modeling the association between a bivariate with a continuous outcome and an ordinal outcome by using a latent variable. We conduct some statistical inferences on the parameters in the proposed joint model. Furthermore, a test method is proposed to test whether the the continuous response or ordinal response is associated with covariates. Extensive simulations are conducted to assess the performances of the proposed test procedure. From the simulation results, the proposed method always outperforms the combined p-value method when the correlation of the continuous response and the ordinal response is not weak. Application to a real data analysis further demonstrates the superiority of the new method. When Y 1 and Y 2 act as traits related to genetic, and G acts as genotypes, our proposed joint model and test can be applied in modern genome wide association study analysis as Hu et al. 17 . In addition, when the dimension number p and categories related number k become large, it is hard for JT to solve a large number of equations. A feasible approach for ordinal data is to build the test based on ranks 18,19 . Our analysis is based on an assumption that the observations are from a bivariate normal distribution. When this assumption is not satisfied, a robust method is more appealing. So it deserves further study.