Introduction

Clinical trialists often use the Cox proportional hazards model to estimate hazard ratios between different treatment cohorts whilst controlling potential confounding by other co-variates from retrospective observational datasets [1]. Several approaches produce estimates of adjusted survival functions which consistently represent results of the Cox model [2,3,4]. For example, the direct adjusted survival function method relies on directly averaging predicted survival probabilities of subjects in pooled samples [2, 3, 5]. This approach overcomes limitations of the alternative average covariate method and better represents the underlying populations [4]. Estimates of direct adjusted survival functions can be interpreted as the survival probabilities of populations with similar prognostic co-variates [3]. Using a stratified Cox model also allows time-varying effects between treatment groups [6]. Confidence bands for the survival estimates can be constructed using a Monte Carlo method for pair-wise comparisons over given time periods [7, 8]. Similarly, p values can be derived for simultaneous hypothesis testing [8].

Lee et al. [9] developed several statistical programmes to compute direct adjusted survival estimates and Ghali et al. [10]. Zhang et al. [6] developed a SAS macro %adjsurv() able to produce direct adjusted survival estimates and estimates of standard errors and point-wise confidence limits. This macro was incorporated by the SAS Institute Inc. into SAS/STAT 12.1. More recently, Wang and Zhang improved the Zhang/SAS macro by adding the ability to produce confidence bands and simultaneous p values [8].

Despite these improvements there are limitations to the latest macro. First, it is inefficient for datasets with large sample sizes or highly stratified treatment groups, features common in multi-center observational studies like those of the Centre for International Blood and Marrow Transplant Research (CIBMTR) and European Bone Marrow Transplant Group (EBMT). Second, none of the programmes we discuss can handle left-truncated retrospective data. (Left truncation occurs when certain subjects from the underlying populations are unknown to the observers when their event-time fails to surpass certain time threshold such as subjects relapsing before a transplant can be done.) Choosing a starting time before the enrollment time in retrospective observational datasets introduces left-truncation. It also occurs when age is used as the time-scale instead of time-on-study [11, 12].

To address these issues we developed a new SAS macro, %adjsurvlt() which is more computationally efficient and which handles left-truncated and right-censored data. Additional features include capabilities to produce confidence bands for survival differences and directly output survival curve plots in user-defined formats. An R package using the same algorithm is currently being developed and will be uploaded to the Comprehensive R Archive Network (CRAN).

Computational methods

Detailed formulae to estimate direct adjusted survival functions and their variances are published [6]. Constructing confidence bands and computing simultaneous p values is described by Wang and Zhang [8]. For left-truncated data,the risk-set needs to be appropriately adjusted at estimating time individually.

Improvements in computational performance

%adjsurvlt() has several computational enhancements. First, instead of repeating large blocks of codes within do-loops it mostly relies on matrix operations which greatly reduces computing time. Second, the co-variate matrix is constructed with only the distinct co-variate vectors of the pooled sample. Doing so reduces the dimensions of the co-variate matrix because numbers of distinct co-variate vectors is often less than numbers of subjects.

Comparison of computational performance

Using %adjsurvlt() we repeated Wang and Zhang’s simulation study [8]. Data sets with a sample size of 100 and two therapy cohorts were created in the original study using constant baseline hazard functions of three different parameter settings. Confidence bands for the differences between the two adjusted survival functions were constructed based on 1000 simulation processes with a 5% significance level. A thousand replications were performed for each setting. The rejection rate for each setting was calculated from the percentage of replications where the confidence bands failed to fully cover the zero reference line. Rejection rates achieved by %adjsurvlt() were consistent with those achieved using the Wang and Zhang macro.

Next, we compared performance of %adjsurvlt() and the Wang and Zhang macro. The original simulation study used a sample size of only 100 with two co-variates and a therapy-assignment variable with two strata. To better illustrate performance differences we increased sample sizes to 1000, 5000, 10,000 and 20,000 with 10 co-variates and 10, 25, 50 and 100 therapy strata.

Treatment assignment was based on a discrete uniform distribution of K values where K was the total number of strata. We constructed 10 co-variates. One-half were continuous variables based on standard normal distributions independent of the therapy-assignment and the other half dichotomous categorical co-variates dependent on therapy assignment. Baseline hazard functions were constant with a hazard rate of 0.1. Right-censoring times were generated from exponential distributions with varying hazard rates to keep the censoring rate close to 30%. 95% confidence bands were constructed based on 1,000 simulations. Performance tests were conducted on a server equipped with two Intel Xeon E5-2670 v3 CPUs at 2.30 GHz with 24 cores, 48 logic processors and 64GB of memory. SAS 9.4 (TS1M4) with SAS/IML 14.2 was used for the simulation.

Table 1 shows improved performance of %adjsurvlt() over the Wang and Zhang macro. As the sample sizes and the number of comparison strata increased, the percentage of CPU time saved by %adjsurvlt() increased. To complete the analysis of a simulated data set with a sample size of 10,000 and 50 strata, %adjsurvlt() used only 0.07% of the CPU time required by Wang and Zhang macro, 15 h less. With a sample size of 20,000 and 100 strata %adjsurvlt() required only 0.06% of the CPU time or 120 h less.

Table 1 Results of simulation study of CPU time between %adjsurvlt() macro and Wang and Zhang (W&Z) macro.

We also evaluated the performance of %adjsurvlt() for analyzing data with and without left-truncation. To simulate left-truncated data initial sample sizes were increased by 50%. Left-truncation times were generated from exponential distributions with varying hazard rates to keep the truncation rate near 33.3%. Left-truncated observations were dropped from the final samples. As displayed in Table 2 performance of %adjsurvlt() was similar for analyzing data sets with the same sample sizes regardless of left-truncation. It took 22 s to process a data set of 10,000 subjects which was not left-truncated versus 31 s for a left-truncated dataset. The source code of %adjsurvlt() and the specifications of the macro are listed in the supplemental part.

Table 2 Simulation study of CPU time of %adjsurvlt() to analyze data with or without left-truncation.

An example

A CIBMTR analysis compared outcomes of allotransplants in persons with myeloproliferative neoplasms (MPN) in blast phase compared with persons with de novo acute myeloid leukemia (AML) or with myelodysplastic syndromes (MDS) transformed to AML [13]. A significant interaction was found for survival between the diseases and disease state pretransplant.

We show use of %adjsurvlt() on a subset of 96 subjects with blast phase MPN and 2825 subjects with de novo AML in remission pretransplant with death the outcome of interest and with end of follow-up considered right-censoring. Subject- and disease-related co-variates associated with survival were age at transplant (40‒49 years, 50‒59 years, 60‒69 years or ≥ 70 years), recipient sex, Karnofsky performance score (90‒100, < 90, or missing), cytogenetics (favorable/intermediate, poor, or missing), donor-type (HLA-matched sibling, other HLA-matched relative, HLA-well-matched unrelated or HLA-partially/mismatched unrelated donor), conditioning regimen (total body irradiation-based myeloablative, chemotherapy-based myeloablative or reduced-intensity/non-myeloablative) and year of transplant (2001‒2005, 2006‒2010 or 2011‒2015). Compared with subjects with de novo AML, subjects with blast phase MPN were older (chi-square p < 0.01), more likely male (p = 0.04), more likely to have a low Karnofsky performance score (p = 0.02) and more likely to have been transplanted recently (p = 0.04). Using a Cox regression model to adjust these prognostic co-variates showed a hazard ratio of 1.40 (95% Confidence Interval [CI], 1.11, 1.76; p < 0.01) for death in subjects with blast phase MPN compared with subjects with de novo AML.

A SAS data set inlib.final was prepared for the example. The data set contains these categorical variables: disgp for disease indication, agegp for age at transplant, sex for sex, karnofcat for Karnofsky score at transplant, cytoab for cytogenetics, donorgp for donor type, condint for conditioning regimen and yeartxgp for year of transplant. It also includes death as the event indicator and intxsurv onths.

We used this statement to evoke the macro:

%adjsurvlt(indata = inlib.final, event = dead, time = intxsurv, strata = disgp, covlst = agegp sex kpsgp cytoab donorgp condint yeartxgp, seed = 86311, nsim = 10000, timelist = 12 36 60, maxtime = 60, outsurvplot = 2, outdiffplot = 1, showci = 0, tickvalues = 0 12 24 36 48 60, width = 800px, height = 450px, imagefmt = pdf);

By specifying strata = disgp and covlst = agegp sex kpsgp cytoab donorgp condint yeartxgp, we estimated survival differences between diseases whilst adjusting for prognostic co-variates. By setting timelist = 12 36 60 and maxtime = 60, we shortened the output results to 12, 36, and 60 months and set the upper boundary of confidence bands to 60 months. Without specifying a value for mintime, we let the macro take its default value of the minimum observed event time as the lower boundary of confidence bands. We also specified outsurvplot = 2 and outdiffplot = 1 to generate the adjusted survival plot and the survival difference plot. Using showci = 0 we suppressed the display of confidence limits in the plots. We manually set the values of the tick marks on the x axis using tickvalues = 0 12 24 36 48 60. Lastly, we used width = 800px, height = 450px and imagefmt = pdf to set the resolution and file format of the plots.

Table displays output results of %adjsurvlt(). Adjusted survival estimates for the myeloproliferative neoplasms in blast phase cohort at 1, 3 and 5 years were 52% (42, 61%), 34% (25, 44%) and 26% (18, 35%). Similar survival estimates for the de novo AML cohort are 62% (60, 64%), 45% (43, 47%) and 40% (38, 41%). Adjusted survival difference estimates were 10% (0, 19%), 11% (1.4, 20%) and 13% (4, 23%) at 1, 3 and 5 years favoring the de novo AML cohort. Adjusted survival plots and the difference plot were saved as PDF files (Figs. 1 and 2). There was a time-varying effect between the two cohorts before and after the first 6 months (proportionality test p = 0.10). Consequently, a simultaneous p value derived from the direct adjusted survival estimates would be more appropriate than a p value of a Cox regression. As indicated in Table 3 the difference in adjusted survival between the cohorts was significant with a simultaneous p value of 0.03.

Fig. 1
figure 1

Direct adjusted survival curves for subjects with blast phase MPN (blue solid line with 95% confidence band) and those with de novo AML in remission pretransplant (red solid line with 95% confidence band).

Fig. 2
figure 2

Differences in direct adjusted survival curves for between subjects with blast phase MPN and those with de novo AML in remission pretransplant (solid line with 95% confidence band).

Table 3 Output results of %adjsurvlt().

Discussion

We show %adjsurvlt() is computationally efficient for estimating direct adjusted survival functions for datasets with large sample sizes and highly stratified therapy cohorts and can handle left-truncation in right-censored data. %adjsurvlt() provides estimates of standard errors, confidence limits and confidence bands for adjusted survival functions, for pair-wise differences and p values for simultaneous hypothesis testing of survival differences over specified time intervals.

%adjsurvlt() has some limitations. First, although it can conduct pair-wise comparisons between therapy cohorts, further development is needed to support comparisons across three or more therapy cohorts. Second, the underlying Cox model only applies when no competing risk precludes the outcome(s) of interest. When there are competing risks we need to use alternative multiple regression models such as the Fine-Gray model and its corresponding direct adjusted cumulative incidence [14, 15]. However, compared with other statistical programmes %adjsurvlt() provides improved computational performance and more flexibility in handling right-censored time-to-event data with or without left-truncation. We hope readers will find it useful.