Table 1 Parameters for sampling images and modeling outcome data.

VariableVariable model
$${u}_{1}$$Aggressiveness$$N(0,0.7071)$$
$${u}_{2}$$Fitness$$N(0,0.7071)$$
$$z$$Heterogeneity$$N(0,1)$$
$$x$$Size$$N({u}_{1}-{u}_{2};0.05)$$
$$t$$Treatment$$Bern(invlogit(N({u}_{2}-0.5,0.25)))$$
$$y$$Survival$$N(t-z-2{u}_{1}-0.5;0.05)$$
1. For each observation $$i$$, an image is drawn from the total pool of images with the closest $${x}_{i}$$ and $${z}_{i}$$. This ensures the required association between factors of variation in the image and the simulated outcome data. The parametric equations follow the DAG presented in Fig. 1: $${u}_{1},{u}_{2},z$$ are continuous independent noise variables. The collider $$x$$ is the difference between $${u}_{1}$$ and $${u}_{2}$$, with a small amount of Gaussian noise (standard deviation of noise $$=0.05$$). $${u}_{1}$$ and $${u}_{2}$$ have a standard deviation of $$0.7071\approx \sqrt{2}/2$$ to ensure that $$x$$ has a standard deviation of $$\approx 1$$. Treatment $$t$$ is modeled as a Bernoulli variable with a logistic link function, where increased $${u}_{2}$$ increases the probability of being treated. $$0.5$$ is subtracted to assure that ~$$50 \%$$ of patients are treated. Gaussian noise of standard deviation $$0.25$$ is added to the inverse log-odds of being treated to assure that every patient has some probability of being treated with the more intense treatment. This reflects the clinical world better as some patients may have strong preferences regarding their treatment, regardless of their underlying health status. Overall survival ($$y$$) increases with treatment (the true treatment effect is $$1$$) and decreases with heterogeneity in radiodensity and tumor aggressiveness. Again, Gaussian noise of standard deviation $$0.05$$ is added to introduce some uncertainty in the data