Decision theory for precision therapy of breast cancer

Correctly estimating the hormone receptor status for estrogen (ER) and progesterone (PGR) is crucial for precision therapy of breast cancer. It is known that conventional diagnostics (immunohistochemistry, IHC) yields a significant rate of wrongly diagnosed receptor status. Here we demonstrate how Dempster Shafer decision Theory (DST) enhances diagnostic precision by adding information from gene expression. We downloaded data of 3753 breast cancer patients from Gene Expression Omnibus. Information from IHC and gene expression was fused according to DST, and the clinical criterion for receptor positivity was re-modelled along DST. Receptor status predicted according to DST was compared with conventional assessment via IHC and gene-expression, and deviations were flagged as questionable. The survival of questionable cases turned out significantly worse (Kaplan Meier p < 1%) than for patients with receptor status confirmed by DST, indicating a substantial enhancement of diagnostic precision via DST. This study is not only relevant for precision medicine but also paves the way for introducing decision theory into OMICS data science.

The main parts of this paper address readers familiar with DST and demonstrate how to combine several evidences for single hormone receptors and further combine several receptor estimates towards an overall hormone receptor status.
Readers not yet familiar with DST may first see the 'supplementary methods' and then resume to read the following chapters.

Materials and methods
Data curated and used. In order to establish a comprehensive database 68 , we decided to re-use published gene expression data 69 and screened the Gene Expression Omnibus (GEO) 70,71 for breast cancer studies using the Affymetrix chip U133A + 2.0 72 . We found and curated 38 studies with 3753 samples.
Out of numerous methods available for normalization 46,[73][74][75][76] as compared by Bolstad 52 and evaluated in our previous work 77 , we used RMA (MATLAB affyrma) 78 and standardized data for each sample. Further batch corrections have been scrutinized in our previous work 77 and were found ambiguous. We therefore refrained from performing them. We adopted receptor genes and selected co-genes (see Table 1) from our previous paper 57 . Whereas in multiple logistic regression all genes would be incorporated simultaneously in the predictor (multiple logistic regression), the ODDS method performs logistic regressions on single genes and then combines the odds. In other words, in this former work, probabilities from IHC, gene and co-gene were joined according to conventional statistics, i.e. by multiplying odds. Hence we refer to this method as 'ODDS' in the following. Like most binary classifiers 79 , ODDS builds a score out of input variables (i.e. gene expression). This score is either compared to a threshold and the corresponding class assigned (crisp classification, + or −). Or else, the sharp threshold is replaced by an interval, within which the score is considered 'uncertain' . We chose the latter possibility in the setup of ODDS and excluded samples classified 'uncertain' . This lets the remaining samples (+ , −) gain certainty in their assessment, which is appropriate for a data basis on which our new discrimination method (DST) is evaluated.
When performing ODDS like in our previous paper, results may contradict IHC-for a few samples. As opposed to this, in the current work we took IHC-estimates for granted and applied ODDS only to samples where no IHC estimate was available. Hence, values imputed by ODDS could never contradict IHC-estimates-just to be on the save side. Moreover, 'uncertain' results from ODDS were also discarded. All in all, to establish the database for the present work, we retained results either from IHC or safely imputed by ODDS and call this procedure 'sODDS' (safeOdds).
As first step of data cleansing we ruled out crosstalk by HER2: We dismissed HER2 + samples and accepted only those definitely assessed negative, either by IHC (1798 samples) or safely imputed via ODDS (1010 samples). For Table 1. Receptor genes, co-genes and parameters from logistic regression. Probe sets refer to the Affymetrix chip U133A + 2.0. For 'deviance of fit' , see p.118 in McCullagh 81 . Note the double role of ESR1: It is the very receptor gene for ER but also serves as best co-gene for PGR, see our previous work 57 regarding the selection of co-genes. Upper limits for beliefs (α,β) are explained in the next section (Eqs. 4-6). Survival rates and quality of previous receptor status assessments. A study-wise evaluation of receptor status assessment via IHC and ODDS was performed, see Supp. Table 2. The concordance of receptor status between IHC and ODDS is much better for ER than for PGR, due to larger variance of ER in gene expression data. Results were further elaborated in a control chart, see Fig. 3. The dotted line represents p, the overall average rate of differences, computed from Supp. Table 2 as p = 159 / 2227 ~ 0.071. Markers within bars represent expectation values (Δ IHC, ODDS + 1)/(N IHC + 2) of discordance rates derived from the beta-distribution, see the mathematical formalism in section 'Concordance between IHC and gene expression' in supplementary materials. Bars denote 95% confidence intervals of the respective mean values, allowing the qualification of studies against each other: If an upper bound lies below the dotted line in Fig. 3, the respective study has a discrepancy rate significantly below average (green). If a lower bound lies above the dotted line, the discrepancy rate of that study is significantly above average (red).
Similarly, a control chart for progesterone is shown in Supp. Fig. 2. Additionally, we evaluated the scmgene-marker (from package genefu 80 ) for estrogen only, since scmgene does not give PGR estimates. Note that scmgene was evaluated for all (3753) samples but results were only compared for those 2559 with HER2 − and reliable hormone receptor status, see Supp. Table 1.
Survival has been related to receptor status for ER (Supp. Fig. 3), PGR (Supp. Fig. 4) and hormone overall, i.e. either estrogen or progesterone being positive, see Fig. 4.

Responsibility functions for gene expression.
In order to compute DST evidences from gene expression, we will first define responsibility functions as a prerequisite. The concept was coined by Hastie 79 for weighing two distributions against each other regarding membership of a measurement in question. We use this for distributions of gene-expression data and their indication of receptor status. Based on these, 'mass functions' will be computed, and these converted into evidence: Gene expression + IHC → responsibility functions (→ mass functions) → evidence.
In the main part of this paper for brevity we skip the concept of masses (hence above shown in parenthesis) and present formulae for evidence directly based on responsibility functions. Masses are relegated to supplementary methods.  www.nature.com/scientificreports/ Responsibility functions are exemplified now for the expression of one gene, labelled 'Expr' , to keep notation slim. It applies analogously to the other genes considered.
We construct the responsibility function, r + , for receptor positivity from logistic regression (fitting coefficients: c 0 , c 1 ) of IHC receptor status versus gene expression, x Expr , see Eq. (1) and the increasing dotted red line in Fig. 5. The (dual) responsibility function, r − , for receptor-negativity is simply given by r − = 1 − r + , see Eq. (1) and the decreasing blue dotted line in Fig. 5.
Obtaining evidences. We define DST masses (see supplementary methods) and evidences by drawing on responsibility functions as follows: We observe in Fig. 5 that r + → 1 as gene expression approaches its maximum. However, not even maximum gene expression in reality provides total certainty of receptor status being positive. To account for this fact, the responsibility function has to be multiplied by an upper bound, α Expr , for the belief in ' + ' . Similarly β Expr acts as upper bound for the belief in '−' . Next we show how α Expr and β Expr are obtained, see Fig. 6.
Out of all positive IHC-measurements (represented by the interval (0,1)), a certain fraction has been confirmed positive by gene expression, due to virtue of the method. This fraction of measurements is represented by the interval (0,α Expr ) as part of the interval (0,1). Given a positive IHC measurement, the plausibility β Expr = 0 , and hence the rest of the interval (0,1) represents nothing but uncertainty according to DST: Fig. 6, some fraction, θ + , of uncertain predictions from gene expression will result positive merely by chance, not by virtue of the method. Both parts together represent all true positive  www.nature.com/scientificreports/ θ + Expr θ − Expr among the uncertain ones from gene expression is similar (equal) to the ratio of positives and negatives found by IHC, see also Shoyaib 82 . Considering the contingency table of IHC-measurements, note that positive measurements comprise TP + FN , whereas negative ones are composed as TN + FP . Hence we may put Moreover, the fraction of gene expression measurements being positive by virtue or chance, α Expr + θ + Expr , is set equal to the fraction of IHC-true positives: Since α Expr + θ + Expr + θ − Expr = 1 , Eqs. (2 and 3) can now be solved to yield the maximum possible belief in positive receptor status (derived from the gene in question): Likewise we obtain for the maximum belief in negative status: Considering these maximum beliefs, we finally obtain for the evidence derived from gene expression, for illustration see Fig. 5:  Figure 5. Decision theory evidences obtained from logistic regression. A logistic regression of IHC receptor status (IHC + ≙ 1, IHC -≙ 0) versus gene expression (x Expr ) was performed to obtain the responsibility function for receptor positivity r + (dotted red curve) and r -(dotted blue). It will be shown later (Eq. 6) that r + has to be multiplied by an upper limit, α IHC , to obtain the actual belief α Expr x Expr = α IHC · r + x Expr , see the solid red curve. Likewise β Expr x Expr = β IHC · r − x Expr (solid blue). Uncertainty: ochre. For a given expression value, e.g. x Expr = 2, one can read off belief in positive (α), belief in negative (β) and uncertainty (θ). www.nature.com/scientificreports/ Note that the whole formalism (Eqs. 1-6) is performed separately for each gene (and co-gene), using the data of corresponding IHC measurements. For generality and brevity of above notation, the subscript 'Expr' stands for each of those genes.
Completing the decision theory framework. With responsibility functions and evidences available we may now assemble the decision framework as follows.
Each hormone receptor (ER, PGR) is assessed via three sources of information, each yielding separate evidence (in our case of dichotomous data: 2 numbers): (1) IHC → evidence (α IHC , β IHC ) (2) gene expression of the receptor gene → evidence (α Gen , β Gen ) (3) gene expression of a co-gene → evidence ( α Co ,β Co ) In a first step, evidences for the receptor gene ( α Gen ,β Gen ) and its co-gene ( α Co ,β Co ) are combined by the evidence combination rule (ECR) introduced by Dempster 83 (labelled ' ⊕ D ') to obtain joint evidence ( α Expr ,β Expr ) from gene expression: As explained in supplementary methods, the operation ⊕ D yields in detail: This evidence for gene expression is-in a second step-combined with the evidence from IHC, ( α IHC ,β IHC ), either using rule ' ⊕ D ' once more or, alternatively the ECR defined by Yager 83 , labelled ' ⊕ Y ' . Since ⊕ Y more easily accommodates contradicting evidence (from IHC and gene expression), we opt for ⊕ Y and obtain: In supplementary methods we explain in detail how general concepts of DST (capable of handling a set of multiple outcomes) boil down to Eqs. (8 and 9) in case of just two outcomes, receptor status ′ + ′ , ′ − ′ .
The above procedure is carried out similarly to obtain the combined evidence for the PGR-status ( α PGR ,β PGR ). Note that notation is turned from general to specific in the following ( α Rez → α ER and α Rez → α PGR , respectively).
In a last step, the clinical decision for 'hormone versus chemo' is modelled along the lines of DST. Clinically, a patient is considered receptor positive (and will receive hormone therapy) if either ER or PGR (or both) are positive. Classically, this is a crisp, logical decision as stated. DST however, allows more elaborate combination rules to combine evidences for estrogen ( α ER ,β ER ) and progesterone ( α PGR ,β PGR ).
Thus, considering ER as well as PGR, each being assessed by IHC, gene and co-gene, performing some algebra (detailed in supplementary methods) yields the overall evidence for 'hormone receptor status' α H = bel H (pos), β H = bel H (neg) : As always we have to accept an amount of uncertainty given by θ H = 1 − α H − β H , indicating hormone receptor status being indeterminable. From these evidences, a most reasonable decision rule can be derived: In our simple case with only two outcomes these criteria reduce to α H > 0.5 for definitely receptor positive and β H > 0.5 for definitely receptor negative. (6) α Expr x Expr =α Expr · r + x Expr |c 0 , c 1 β Expr x Expr =β Expr · r − x Expr |c 0 , c 1

Results
DST results for patient cohort. In our previous work 57 , we considered receptors (ER, PGR) separately.
For each of them routine estimates (based on IHC only, 1714 samples, see Fig. 2) could be significantly improved by adding information from gene expression via conventional statistics (odds ratio) and we label this former method 'ODDS' . Note that for patients lacking one or both IHC estimates, receptor status was imputed from gene expression according to ODDS, including an uncertainty region, as in our previous paper. However, to obtain a dataset of optimum quality for the present work, only samples rendering safe imputations were retained (513 + 332 = 845), see Fig. 2.
Then we demonstrated that further improvement in receptor assessment can be achieved by applying decision theory. Besides yielding new estimates for each receptor, DST allows to combine estimates for ER and PGR into a joined 'hormone' receptor estimate. Based on Eqs. (10 and 11), hormone receptor status was newly predicted by DST for each patient, see the contingency Table 2 and Fig. 7.
To demonstrate the clinical benefit of DST, we consider two groups: (1) Patients diagnosed receptor positive by sODDS, and being confirmed by DST. They have correctly received hormone treatement, see the flow labelled 'a' in Fig. 7. (2) Patients diagnosed receptor positive by sODDS, but being questioned by DST, see flow 'b' . They have very probably also received hormone treatment but it remained ineffective. At the same time they might have been deprived of life-saving chemo. Figure 8 shows a strikingly worse survival of patients assumed receptor positive although being negative (logrank-Wilcoxon p = 0.009). Note that out of 1432 patients considered positive by sODDS, only for 651 survival data were available. Among these, 26 were questioned by DST. Nonetheless the difference was statistically significant at the 1%-level.

Table 2. Receptor status due to decision theory (DST) compared with previous method (ODDS). 2559
Patients have been assessed for hormone receptor status by method sODDS to comprise the input dataset. DST was then applied to this dataset. A sample was considered H + if at least one receptor was positive. Difference between methods sODDS and DST is reflected in Cohen's κ = 0.88. Note that uncertain results were considered as separate category in computing κ. www.nature.com/scientificreports/ DST compared to ODDS. It is interesting to see what DST achieves as compared to ODDS when starting from the same patient cohort. We therefore took the same sample of patients (2559) as in Table 2 and Fig. 7, with 1432 receptor status positive and 1127 negative according to sODDS. Instead of DST we this time applied the ODDS method, including a uncertainty region, see Table 3. Most strikingly, DST had flagged almost double as many patients 'uncertain' (156, see Table 2) as compared to ODDS (82), although the very same safety threshold was used. This clearly reflects an advantage of DST, since patients with questionable status would be re-evaluated before assigning the adequate therapy.
To characterize these differences in judgement we performed a face-to-face comparison between DST and ODDS on the very same input dataset, see Table 4. Samples additionally labelled 'uncertain' by DST originate to almost equal parts from negative (44) and positive (39) estimates according to ODDS. Only very few samples considered uncertain by ODDS, migrate to negative (1) or positive (11) according to DST.

Discussion
Dempster Shafer decision theory is a very potent framework. Its strength lies in combining information from several sources about the same item (here exemplified by IHC, Gene and Co-Gene estimates for one receptor) and also evidences for several different items, interacting in a biological setting (here exemplified by estrogen and progesterone regarding precision therapy of breast cancer). We have started from some best methodology available up to now (sODDS), i.e. receptor status assessment via IHC enhanced by gene expression. ODDS had already been shown to improve precision therapy 57 , and here we demonstrate that further improvement is possible by application of DST. Since absolute truth of receptor status is unknown, the usefulness of our approach is demonstrated by disease free survival being severely degraded in patients flagged by DST as possibly wrongly diagnosed and treated. Explicit data on treatment are rare in the breast cancer studies downloaded from GEO. Hence we performed our calculations on the reasonable assumption that treatment was given according to the best knowledge available about receptor status.
The potency of DST is further underpinned by directly comparing both methods applied to the same dataset: DST flags about twice as many patients as 'uncertain' (153) as ODDS (82). Additional patients subjected to a re-evaluation of hormone receptor status will improve precision of therapy.
Formulae and results derived therefrom incorporate three assumptions which we think were reasonable: To compute basic belief assignments for each patient, we obtained DST mass functions by logistic regression. It proved more stable than other possible methods, e.g. fitting bimodal distributions such as Gaussian mixtures, as done in a previous study 58 .
For joining evidences from 2 genes for the same receptor we chose the Dempster ECR. For adding IHC evidence to those from gene expression, we opted for the Yager ECR.
To obtain upper limits α,β for belief we reasonably assumed that the fraction of positive results by virtue or chance equals the fraction of IHC-true positives (Eq. 3). Likewise we assumed that the ratio of uncertain positive and negative cases equals the ratio of positives and negatives (Eq. 2).
It is interesting to note that up to now clinical decisions follow the conventional (Cantor) non-exclusive ORlogics: 'If ER or PGR receptor estimates is/are positive, the patient is considered positive' . In part, the strength of DST originates from elaborate evidence joining: Simple and/or combinations (such as estrogen-or-progesterone receptor positivity) are fundamentally refined, since DST draws on two numbers (belief and plausibility) in each decision, instead of just a single number as conventional probability theory does. In addition, information from different sources is joined according to mathematical rules rather than 'clinical intuition' .
Joining evidences from both receptors in a more intricate way according to DST, allows for an intuitively convincing view on the classification, see Fig. 9. From the definition of θ H = 1−α H −β H follows α H + β H + θ H = 1 and, accordingly, DST beliefs (α H , β H , θ H ) of all samples lie in one single plane in 3-dimensions, cutting through the axes at α H = 1, β H = 1 and θ H = 1, see panel A. Decision boundaries (dotted lines in Fig. 9) originate at 0.5 on each axis of evidence (α H = 0.5, β H = 0.5) and confine tetrahedrons, each containing exclusively positive (red) or negative (blue) samples, respectively. These tetrahedrons appear as equilateral triangles in the view shown www.nature.com/scientificreports/ in panel B. Uncertain samples (shown in ochre) appear within the pentahedron (panel A), which reduces to a kite-shaped area in the view shown in panel B. While in this paper we have presented a very simple example to demonstrate feasibility and benefit of the method, possible applications are wide in scope. Pathology and laboratory medicine in many cases have to deliver crisp decisions even if measurements by complementary methods lack agreement. DST is the method of choice to handle such situations.
The final evidence obtained by DST also allows assigning weights (loss functions) to certain outcomes in order to model clinical consequences (Eq. 11). Suppose, for example, a false positive outcome entails much more detrimental consequences than a false negative does.
Several different methods are available for joining: The 'Dempster additivity rule' , ⊕ D , works best with rather concordant variables. Needless to mention that correlation must not be too close to unity, as in this case the second variable would not add any significant information to the first one. As opposed, the 'Yager rule' , ⊕ Y , lends itself to join evidences from rather contradicting sources.
OMICS data often comprise a considerable number of estimates, say counts for multiple fragments out of a gene, needing to be condensed to characterize the gene (activity) as a whole. A plethora of similar situations can rigorously be handled by DST. Even if some of these estimates should be (partly) contradictive, DST is an appropriate tool for consolidating. Formulae need to be adapted to the particular situation in question.
We think that this work may help pave the way for decision theory entering OMICS data science.