The comparison of censored quantile regression methods in prognosis factors of breast cancer survival

The Cox proportional hazards model is a widely used statistical method for the censored data that model the hazard rate rather than survival time. To overcome complexity of interpreting hazard ratio, quantile regression was introduced for censored data with more straightforward interpretation. Different methods for analyzing censored data using quantile regression model, have been introduced. The quantile regression approach models the quantile function of failure time and investigates the covariate effects in different quantiles. In this model, the covariate effects can be changed for patients with different risk and is a flexible model for controlling the heterogeneity of covariate effects. We illustrated and compared five methods in quantile regression for right censored data included Portnoy, Wang and Wang, Bottai and Zhang, Yang and De Backer methods. The comparison was made through the use of these methods in modeling the survival time of breast cancer. According to the results of quantile regression models, tumor grade and stage of the disease were identified as significant factors affecting 20th percentile of survival time. In Bottai and Zhang method, 20th percentile of survival time for a case with higher unit of stage decreased about 14 months and 20th percentile of survival time for a case with higher grade decreased about 13 months. The quantile regression models acted the same to determine prognostic factors of breast cancer survival in most of the time. The estimated coefficients of five methods were close to each other for quantiles lower than 0.1 and they were different from quantiles upper than 0.1.

In many medical studies, the outcome of interest is the time to event. For instance, in cancer, the event of interest is death or the relapse of illness, and in transplantation, the rejection of transplanted organ can be considered as the event. In case of the uncertainty of time for study inclusion and the incidence of an event in some studied units, time is regarded as the censored data 1 . The Cox proportional hazards (Cox) model is a widely used statistical method for the censored data. However, this model is limited by the assumption of a constant hazard ratio (HR) over time (i.e., proportionality), and models the hazard rate rather than the survival time directly 2 . Also, the complexity of the HR estimate interpretation was recognized as a problem in the Cox models. To overcome this limitations, other methods such as accelerated failure time (AFT) models and censored quantile regression (CQR) models were introduced for censored data with a more straightforward interpretation 3 . The AFT model is a model that assumes that a treatment or an exposure either extends or reduces the time to the development of event. Although the AFT model allows a direct interpretation of covariate effects on event time, it requires a strong assumption of homogeneous treatment effect 4 . The AFT model can only capture location shifts effects, and hence may fail to capture heterogeneity of covariate effects. The CQR model does not require the assumption

Methods
Data description. The present study aimed to determine the factors affecting breast cancer survival using CQR models. The studied data included 800 females patients with breast cancer (based on breast cancer pathology diagnosis) referring to Imam Khomeini Hospital, Tehran, Iran during 1996-2005. The required data were extracted from patients' files. The latest condition of the patient was informed via phone contact.
This study was approved by the Ethics Committee of School of Public Health & Allied Medical Sciences-Tehran university of Medical Sciences (approval ID: IR.TUMS.SPH.REC.1397.212) and was carried out according to relevant guidelines and regulations. The informed consent was obtained from all participants.
The event of interest was death from breast cancer, and the survival time was defined as the duration (months) from diagnosis to death due to breast cancer. The prognostic factors included age at diagnosis (year), type of surgery (Modified radical mastectomy (MRM) and Breast conserving surgery (BCS), tumor grade (grade 1-3), and stage of disease (Stage 1-4) based on the seventh edition of the TNM classification. The p-values less than 0.05 were considered to be significant. The analysis was performed using STATAversion 12 and quantreg package in R.

Models.
Quantile regression is a statistical technique intended to inference about conditional quantile functions. This method offer a mechanism for estimating models for the conditional median function, and the full range of other conditional quantile functions.
Let T denote the failure time and X = (X 1 , X 2 , . . . , X p ) T , denote a p × 1 vector of covariates. let Q Y (θ|X ) = inf{t: Pr(Y ≤ t|X ) ≥ θ }, denote the θ th conditional quantiles of Y = log(T), (or another monotone transformation of T) given X , where θ ∈ (0, 1) . For randomly censored data, let C denote time to censoring and let T = min(T, C)andδ = I(T ≤ C) . The observed data consist of n i.i.d replicates of ( T , δ, X) , denoted by ( The linear QR model takes the form where 0 < θ L < θ U < 1, and β(θ) is a vector of unknown regression coefficients that represents the change in the θ th conditional quantile of Y given a one-unit change in the corresponding covariate 30 . When θ L = θ U , model (1) is referred to as a locally linear quantile regression model. When θ L < θ U , model (1) referred to as a globally linear quantile regression model. The most applying methods during the recent years are Portnoy 5 , Wang and Wang 7 , Bottai and Zhang 8 , Yang et al. 9 and De Backer et al. 10 methods. In the following, we present a brief overview of methodological framework for these models.
Portnoy method. Portnoy 5 , using Efron's 31 interpretation of Kaplan-Meier as shifting mass of censored observations to the right, proposed an estimation algorithm under the standard random right censoring assumption to estimate β(θ).
Bottai and Zhang method. Bottai and Zhang 8 , to estimator β(θ) considered a regression model where the error term is assumed to follow asymmetric Laplace distribution. They explored its use in the estimation of conditional quantiles of a continuous outcome variable given a set of covariates in the presence of random censoring.
They supposed that exists a fixed r-dimensional parameter vector β(θ) such that where ε i is an independent and identically distributed residual whose θ th quantile equals zero ( P(ε i ≤ 0|x i ) = θ).
Let T i conditionally on X i , follows a form of asymmetric Laplace distribution with probability density function In the presence of censored observations, the likelihood function is proportional to The maximum likelihood estimators for the parameters are defined as maximizes of l(β(θ), σ(θ)|T i ) . They used algorithm proposed by Nelder and Mead 33 , to estimate parameters and inference on the parameters obtained by bootstrapping the point estimates for quantile of interest 8 .
Wang and Wang method. A locally weighted method was proposed by Wang and Wang 7 to estimate a locally linear quantile regression model, which assumes that θ L = θ U , in model (1), i.e.
Wang and Wang 7 , for random censoring, by twisting the idea of the self-consistent Kaplan-Meier estimator 31 , proposed to modify the standard quantile loss function. The fundamental idea of Wang and Wang 7 is to redistribute the probability mass (Pr(T i > C i |C i , X i )) , of the censored cases to the right through a local weighting scheme 7 . An estimator of β(θ) can be obtained by minimizing the following objective function of β: where F 0 (t|x) ≡ Pr(T > t|X = x) is known and Wang and Wang 7 proposed to minimize the objective function (5), when F 0 (t|x) is unknown, with F 0 () replaced by the Beran's local Kaplan-Meier estimator 34 where N(t) = I Ỹ ≤ t, δ = 1 , and B nk (x) is a sequence of nonnegative weights adding up to 1, for example, Nadaraya Watson's type weight, where K () is a density kernel function and h n is a positive bandwidth converging to 0 as n → ∞ 7 .
De Backer method. De Backer et al. 10 proposed to estimate model (4) based on a minimum distance loss function, given by They further suggested using a smooth double kernel version of F(.|X i ) . Let Y u i denote the i-th order statistic of the uncensored responses, n u = n i=1 δ i , and let H * (t) = t −∞ K(u)du , for some kernel density K . They propose to estimate F(t|x) by F s (t|x) , where.
Yang method. Yang et al. 9 proposed a new and unified approach, to estimate the quantile regression model (1) with θ U = 1. They used a variation of the data augmentation algorithm base on the general principle of data augmentation 15 . The algorithm starts with a set of initial values, β (0) (θ k ), k = 1, . . . , M n , obtained by parallel quantile regression estimators or existing quantile regression estimators. Draw Y (u) i , for u = 1, . . . , U , from the quantile process approximated by X T i β (u−1) (θ k ), conditional on the set of possible values for Y i . Then, obtain updated estimates β (u) (θ k ), viastandarduncensoredquantileregression, based on a pairwise bootstrapping sam- 9 . The proposed method adapts easily to different forms of censoring including doubly censored and interval censored data 9 .
Ethics approval and consent to participate. This study was approved by the Ethics Committee Written informed consent for publication of their clinical details was obtained from the patient relative of the patient. A copy of the consent form is available for review by the Editor of this journal.   Table 3.
The CQR coefficients estimated and the 95% confidence intervals (CI) with Portnoy, Bottai and Zhang, Yang, Wang and Wang and De Backer methods and conditional quantile effects estimated by Cox model for θ ∈ (0.01, 0.10, …, 0.40) were displayed in Figs. 2, 3,4,5 and 6. Figure 7 shows the estimated coefficients of five methods. In the Cox model, the estimated quantile measure for each covariate was computed using Eq. (9) of Portnoy 5 . The effects of the Cox model were almost the same in different quantiles while they changed in quantile regression models as the quantiles vary.

Discussion
In the present study, CQR methods were compared in modeling the prognosis factors of breast cancer survival based on the breast cancer data collected from Imam Khomeini Hospital. Tehran, Iran. The analyses of accelerated failure-time models and Cox model showed that the stage of disease and grade of tumor were identified as the prognostic factors of breast cancer survival. According to the Cox model, the stage of disease and tumor grade increased the risk of death. In the AFT model, the median of survival time decreased as the stage of disease and tumor grade increased. The analyses of CQR models showed that type of surgery, stage and grade were the effective factors on the survival of patients with breast cancer. These results are consistent with most findings in the existing literature, although a direct comparison of the effect size is difficult due to the fact that the majority of works report hazard ratios or odds ratios. Among the significant factors in this study, the effect of surgical method on the survival of breast cancer patients is one of the most addressed issues in recent studies [35][36][37] . Our study showed that 20 th percentile survival time increased for women with BCS surgery, based on Portnoy methods. In the study of Hofvind, by controlling other factors, the hazard of death in MRM is 1.7 times higher than BCS 38 . Meanwhile, there has been no significant difference between two surgical methods in the study of Quan 39 . In most of quantiles stage and grade have significant effect on survival. In recent study, Saadatmand describe overall survival of female patients with breast cancer from two time cohorts (1999-2005 and 2006-2012) in a nationwide population based study 40 . Their results emphasize the importance of tumor stage at diagnosis of breast cancer, as it still greatly affects overall survival. Rottenberg examine differences in survival among older women diagnosed with breast cancer, according to age and disease stage at time of diagnosis. And showed that stage disease among older women became a less powerful predictor of mortality with rising age 41 .
The CQR models allow covariate effects to change in people with different risk. Thus, it is a flexible model for controlling heterogeneity due to covariates. In our studies, coefficients of prognostic factor were different in quantiles that showed different effect of prognostic factor survival time in each quantile. Base on Portnoy method  The difference in the interpretation of the parameters is the biggest one among these models. The Cox model examines the covariate effects on the hazard function. In addition, the Cox model shows the hazard of death per unit of increase in covariate when the event is death. In this study, the hazard of death was 3.12 times for a    www.nature.com/scientificreports/ higher stage of the disease, and the hazard of death was 1.71 times for a higher degree of tumor grade. Contrary to the proportional hazards model which describes how predictors influence the hazard function, the AFT model assumes a direct association between predictors and survival time, which makes interpretation easier. If the median survival time to event is considered and the accelerate factor is greater than one, the median survival time increases by the accelerate factor with increasing one unit covariate while the median survival time to event decreases if it is less than one. In the present study, based on AFT model, the median survival time decreased by a factor of 0.38 and 0.63 with increasing stage of disease and grade of tumor, respectively. The results of this model can be expressed as a proportional hazard, in which the interpretation is similar to the Cox model. The interpretation of coefficients in CQR model is considered as the changing rate quantile of dependent variable per one unit change in independent variable, like other linear models. If the event is death, it can be expressed as the covariate effects on the patients' lifetime.
Modeling the breast cancer data with CQR models indicated that most of the time all models acted the same to determine prognostic factors of breast cancer survival but sometime, significant factors and their coefficients were different. All models considered the stage of disease and grade of tumor as prognosis factors. With regard to the coefficients of covariates in different quantiles, the coefficients of Portnoy and Yang method were close to each other and the coefficients of Bottai and Zhang and De Backer methods were close to each other and they were different from Wang and Wang method. Peng and Huang compared their method with Portnoy's method and showed that both methods could represent very similar results 6 . Bottai and Zhang compared Laplace regression method with Peng-Huang and Portnoy methods by using simulation and indicated that the advantages of their method include giving the same results and accurate convergence, while two other methods sometimes failed to converge, and involve fast calculations 8 . Wang and Wang showed that the new approach adopts a preliminary local Kaplan-Meier estimator and results a weighted quantile regression. They established, utilizing results in modern empirical process theory, the consistency and asymptotic normality of the resulted estimator 7 . Base on simulation studies and the analysis of real data, the proposed method has shorter interval estimates than Portnoy's procedure 7 . De backer et al. in their study indicated in an extensive simulation study that the resulted quantile regression estimator respect to established check-based formulations have less variance results. From a theoretical prospect, both consistency and asymptotic normality of the proposed estimator for linear regression are obtained under classical regularity conditions 10 . Yang et al. indicated that the Yang's method presents an estimator is able to achieve significant efficiency gains in comparisons with Portnoy's estimator 9 .
The assumptions required in each of the models should be considered while using these models. The proportional hazards assumption is the most important assumption of the Cox model and what it means is that the ratio of the hazards for any two individuals is constant over time. In this model, no assumptions are made about form of the baseline hazard. However, the distribution of survival time is sometimes specific or assuming a parametric form is logical. In these cases, parametric methods are used. Common parametric distributions in survival models include Weibull, Generalized Gamma, Log-Normal, Log-Logistic. In Bottai and Zhang method, it is assumed that the error terms follow asymmetric Laplace distribution. However, Yu and Moyeed (2001) showed that the model performs well when the error terms follow other distributions 42 . Portnoy and Yang methods require just global linearity assumption 5,9 . Wang and Wang and De Backer methods have local linearity assumption 7,10 .
Computational time is another important issue in comparing these models. According to our data, the computational time of Portnoy, Bottai and Zhang and Yang methods is shorter than other methods.
It is necessary task to measure the goodness of survival models. Although for the model diagnostics of quantile regression with complete data some tools, such as the worm plot, have been proposed, for censored quantile regression is still greatly underdeveloped 43,44 . Designing effective model diagnostic tools for censored quantile regression warrants more in-depth research.
The high percentage of right censoring is regarded as one of the limitations of this study. By this way, modeling 50th percentile of survival time requires more follow-up time to increase the percentage of event. Thus, the comparison of models in higher quantiles was not possible.

Conclusions
For the CQR models, various approaches have been proposed that the most practical of them include: Portnoy 5 , Wang and Wang 7 , Bottai and Zhang 8 , Yang et al. 9 and De Backer et al. 10 methods. Portnoy 5 generalized the Kaplan-Meier method with a recursively weighted estimation algorithm under the global linearity assumption of the conditional quantile functions. To overcome the linearity assumption, Wang and Wang 7 developed a method by non-parametrically estimating the conditional survival distribution via kernel smoothing. In 2010, Laplace regression was introduced as a parametric method for modeling the conditional quantile of censored data by Bottai and Zhang. Yang et al. 9 employed a variation of the data augmentation algorithm base on the general principle of data augmentation 15 . De Backer et al. 10 investigate a new procedure that used "check" loss function. The CQR methods acted the same to determine prognostic factors of breast cancer survival in most of the time. The estimated coefficients of five methods were close to each other for quantiles lower than 0.1 and they were different from quantiles upper than 0.1.

Data availability
The datasets used and/or analyzed during the current study are available from the corresponding author on reasonable request.