Identifying causal relationships of cancer treatment and long-term health effects among 5-year survivors of childhood cancer in Southern Sweden

Background Survivors of childhood cancer can develop adverse health events later in life. Infrequent occurrences and scarcity of structured information result in analytical and statistical challenges. Alternative statistical approaches are required to investigate the basis of late effects in smaller data sets. Methods Here we describe sex-specific health care use, mortality and causal associations between primary diagnosis, treatment and outcomes in a small cohort (n = 2315) of 5-year survivors of childhood cancer (n = 2129) in southern Sweden and a control group (n = 11,882; age-, sex- and region-matched from the general population). We developed a constraint-based method for causal inference based on Bayesian estimation of distributions, and used it to investigate health care use and causal associations between diagnoses, treatments and outcomes. Mortality was analyzed by the Kaplan–Meier method. Results Our results confirm a significantly higher health care usage and premature mortality among childhood cancer survivors as compared to controls. The developed method for causal inference identifies 98 significant associations (p < 0.0001) where most are well known (n = 73; 74.5%). Hitherto undescribed associations are identified (n = 5; 5.1%). These were between use of alkylating agents and eye conditions, topoisomerase inhibitors and viral infections; pituitary surgery and intestinal infections; and cervical cancer and endometritis. We discuss study-related biases (n = 20; 20.4%) and limitations. Conclusions The findings contribute to a broader understanding of the consequences of cancer treatment. The study shows relevance for small data sets and causal inference, and presents the method as a complement to traditional statistical approaches.


Data preparation details
The schematic outline of the study is shown in figure 1. The demographical data and detailed treatment data of all 5 year survivors (n=2400; living and deceased, above and below 18 years of age) in the regional quality registry BORISS 1 was extracted. The basis of their inclusion in this registry was the registration in the national cancer registry. For each childhood cancer survivor, five control subjects from the general population were selected (or fewer if not 5 matches could be found) based on sex, year of birth and place of residency, and included in the study (n= 11 882). The individuals drawn for the control population only occur once in the cohort and could only be used to match one single CCS. For both survivors and controls the respective outcomes data was extracted and aligned by the Statistical Services at the Swedish National Health and Welfare Board. The datasets were pseudonymized and then returned for analysis. In this step 58 CCS were excluded due to lack of cancer diagnosis codes (as confirmed by pathology reports), 24 CCS due to lack of valid personal ID numbers, and 3 CCS due to missing other essential registry data, resulting in 2315 remaining CCS cases. They were diagnosed with a childhood cancer between the years 1970 and 2012. The patients diagnosed with childhood cancer in 2013-2015 were not yet 5-year survivors (n=186) at the time of data extraction. The detailed treatment data from these patients contributed to the analyses but their outcomes did not.
For this study the outcomes data included in-patient care, out-patient care, and causes of death. All ICD-7, 8 and -9 codes among the outcome codes were converted to ICD-10 by two of the authors and cross- Trisomy-21, trisomy21). We excluded any outcome code that was identical to the first CCD if it occurred later than the 5-year-since diagnosis date based on the assumption that it was a routine repetition.
To compare CCS health care usage with the control population we introduced a mock time-of-diagnosis as the starting point for the comparisons. The average age of diagnosis among CCS was 9,4 years. The same age was used as a starting point for the control population. The observation time was from date of diagnosis (or 9,4 years of age for the control group) to a) The first registered event, b) date of death, c) possible emigration of the patient, d) the end date of the registers. In the last two cases the observation was censored one-sidedly. For outpatient care, starting in the year 1997, a 2-sided censoring was applied.

Choice of significance level
In the current study, we did not test a single hypothesis, but searched broadly for correlations. This requires a careful consideration of significance levels. If a significance level of 5% were selected, this would mean that 5% of all performed tests would indicate a significance correlation when in fact there was none. If all performed hypotheses tests were independent, we could have divided the final desired significance level by the number of tests. However, treatments and childhood cancer diagnoses are all correlated, as are the different outcomes, which warrants the approach of causal inference on the whole graph. The alternative is to use a Bayesian approach, and select a significance level which will make the probability that a found correlation is spurious acceptably small. A threshold for this probability can be achieved by considering the number of found correlations at different significance levels.
In this study we found 274 potential causal relations at the 0.01 significance level. Of those, 146 were at a significance level in the interval 0.001 -0.01, 32 at a significance level in the interval 0.0001 -0.001, and 94 at a significance level in the interval 0.0 -0.0001. If we, as a worst case assume that all 146 found relations with significance level in the interval 0.001 -0.01 are in fact spurious, then we would expect a tenth this many spurious cases in the ten times smaller interval 0.0001 -0.001, that is some 15 out of 32 found relations, and still 9 times fewer in the interval 0.0 -0.0001, that is around 1.6 out of 98 relations. This is less than 2%, which was considered small enough. We therefore continued the analysis with the 98 relations that had a significance level better than 0.0001 (keeping in mind that around 2 of them may be spurious).