Incomplete donor chimerism (DC) after hematopoietic stem cell transplantation (HSCT) is associated with reduced overall and disease-free survival and decreasing DC can precede overt relapse of malignant diseases.1, 2 Sensitive chimerism quantification may therefore allow for early therapeutic intervention, potentially improving treatment response.3, 4 Frequently, post-HSCT chimerism monitoring involves bone marrow (BM) biopsies, although hematologic malignancies are frequently associated with risk factors for adverse events (for example, low platelet count) and dry taps can preclude chimerism analyses.5, 6 However, new PCR strategies for chimerism quantification have dramatically increased technical sensitivity with current methods enabling the detection of chimerism below 0.1%.7, 8, 9 Therefore, applying these methods for chimerism quantification in peripheral blood (PB) may reduce the need for BM analysis. The influence of sample source on the sensitivity of chimerism analysis in the context of highly sensitive quantification methods, however, is not well substantiated. In order to address this, we compared sensitive chimerism analyses performed on paired BM and PB samples from 219 HSCT patients (Supplementary Table 1). The detailed materials and methods are available as Supplementary Information.

In order to determine the overall congruence of DC in PB and BM we performed a correlation analysis on 825 indel quantitative PCR (qPCR) chimerism results (Figure 1a). Globally, the correlation between PB and BM DC was statistically significant using Pearson’s correlation (r=0.74, P<0.0001). However, the trend of linear regression suggested that BM DC may be systematically lower than PB DC. Overall, the difference between PB and BM DC (PB DC-BM DC) in matched sample pairs was 1.9% (95% ci 1.1–2.8%, P<0.0001, Figure 1b), excluding 513 DC results with concurrent complete DC in both samples. Potentially, a systematic confounding effect for this observation is carry-over of recipient-derived non-hematopoietic tissue into BM samples. Therefore, we limited the impact of trace recipient contamination by applying cut-offs for scoring incomplete DC (99.8%) and for absolute differences between PB and BM DC (0.2%). In total, 294 sample pairs (35.6%) displayed absolute differences between BM and PB DC above cut-off. Interestingly, BM DC was lower than PB DC in 75.5% and higher in 24.5% of these 294 cases, corresponding to 26.9 and 8.7% of all test results, respectively (Supplementary Figure S1). To address whether repopulation kinetics in BM and PB after allogeneic HSCT affected the results, we specifically focused on samples collected within 35 days after HSCT (n=79). However, there was no statistically discernable difference between PB (98.6%) and BM DC (98.6%, P=0.94, not shown), reducing the likelihood of a significant impact on the study. These data are therefore consistent with a systematic overestimation of BM DC by PB DC.

Figure 1
figure 1

Correlation between BM and PB DC. (a) Scatter plot of DC results for 825 matched BM and PB obtained by real-time qPCR. Note that 513 data points are located at 100% PB/100% BM DC. The linear trend for the overall correlation is depicted by a solid line, the 95% confidence interval of the regression by dotted lines. (b) Scatter dot plot of differences between PB and BM DC for the 312 samples (825 excluding 513 samples with 100% BM and 100% PB DC). The mean difference with 95% confidence interval is indicated by the horizontal line. (c) Chimerism status in 468 matched sample pairs with available chimerism data from the next follow-up visit. The pie chart indicates the number of cases in each of the four categories depicted (+, complete DC; −, incomplete DC) for the primary sample pairs, the bar graph represents the corresponding follow-up sample pairs. (d) Duration of complete PB DC/incomplete BM DC mismatch status for the 37 samples with persisting DC incongruences. Duration of the mismatch between sample pairs with multiple consecutive disparate diagnoses was totaled. The mean duration with 95% confidence interval is indicated.

Next, we assessed the capacity of PB DC status to accurately reflect BM DC status. Overall, 656 of 825 DC analyses showed either matching complete (513) or incomplete (143) DC in both samples. However, 156 (18.9%) results exhibited complete PB DC (cPBDC) concurrent with incomplete BM DC (iBMDC), while 13 (1.6%) analyses presented the reverse scenario. Consequently, these data indicate that incomplete PB DC (iPBDC) is a specific measure for iBMDC (specificity 97.5%, Supplementary Table 2). Importantly, however, cPBDC is highly insensitive as a measure for complete BM DC (cBMDC, sensitivity 47.8%). The majority of the 156 occurences of cPBDC/iBMDC displayed high levels of BM DC, with 112/156 (71.8%) ranging between 99.0 and 99.8% (Supplementary Figure S2A). This supports that very low BM DC is more tightly associated with iPBDC. Consistently, placing the cut-off for complete DC at >99% reduced the number of incongruent results from 156 to 62 in our study. Simultaneously, the number of total cases of iBMDC also dropped from 299 to 164. This suggests a considerable trade-off between sensitivity and consistency of BM and PB chimerism analyses. Stratifying our cohort by conditioning regime (myeloablative versus reduced intensity, Supplementary Table 3) or underlying disease (Supplementary Table 4) revealed no significant divergence in specificity (92.5–100%) and sensitivity (40.9–59.6%) from the overall results. Analyses of lymphoma samples, a hematologic malignancy not thought to originate in the marrow, revealed a similar low sensitivity of PB DC analysis (50.0%, n=31, not shown). However, the number of lymphoma patients included in this study is small and this patient population may warrant further investigation. Notably, incongruent results occurred in 89 (40.6%) patients and were not limited to specific subgroups. Importantly, our data suggest that PB DC analysis displays a considerable lack of sensitivity compared to BM DC analysis, which may result in unreported loss of full DC in a significant number of cases when relying on PB DC alone.

In order to estimate persistence of dissenting PB and BM chimerism status, we focused the analysis on 468 sample pairs from 152 patients where chimerism data from the next follow-up visit was available (Figure 1c). Divergent results from BM and PB were observed in 107 cases (22.7%), 100 (21.4%) of which exhibited cPBDC and iBMDC. On follow-up, 53/100 BM samples shifted to cBMDC, whereas the remaining 47/100 displayed persisting iBMDC. Although mean BM DC was lower in the group with persisting iBMDC (96.2 versus 96.8%, Supplementary Figure S2B), the difference was not statistically significant (P=0.69). Therefore, isolated discrepancies between BM and PB DC are likely not caused by transient low level chimerism. Moreover, the absolute level of BM DC was not associated with persistence of incongruent results from PB and BM. Among the 47 cases of persisting iBMDC, we identified 37 instances of continuing cPBDC/iBMDC, whereas 10 converted to matching incomplete chimerism in both samples at follow-up (Figure 1c). Notably, for the former 37 cases cPBDC/iBMDC persisted for 74 days on average (range 26–294, Figure 2d). Importantly, in the absence of other sensitive monitoring techniques like BM DC or MRD diagnostics this time is indicative of the potential delay until further treatments can be considered for affected patients. Importantly, for 8/219 patients PB DC status did not reflect iBMDC for more than 2 consecutive follow-up visits in the study. The chimerism time courses for these 8 patients are depicted in Figure 2. Three patients did not experience relapse during the monitoring period. Patient 322 reverted to stable complete chimerism after an initial period of incomplete BM DC. Patients 299 and 498 displayed extended chimerism fluctuations and presented clinically with severe graft-versus-host disease. The five remaining patients experienced overt relapse, either terminal or treated by a second HSCT with periods of complete PB DC and incomplete BM DC before relapse. The latter cases demonstrate, that for a subset of patients, PB DC analysis alone would not reveal imminent relapse. Therefore, we conclude that the observed lower sensitivity of chimerism analysis from PB compared to BM can be inadequate to diagnose incomplete chimerism and impending relapse in a significant minority of HSCT patients.

Figure 2
figure 2

Chimerism timelines in patients with persistent complete PB DC and incomplete BM DC. DC following primary allogeneic stem cell transplantation (HSCT, t=0) for 8 individual patients with persistent mismatched complete PB/incomplete BM DC status (ALL, acute lymphoblastic leukemia; AML, acute myeloid leukemia; MPN, myeloproliferative neoplasm). Vertical lines represent the time points of second HSCT (t), relapse-related death (†) or evident molecular relapse (mr).

In summary, our study demonstrates that the sensitivity of chimerism analyses depends on the sample source when highly sensitive quantification methods are employed. Therefore, we corroborate two earlier studies showing that mixed chimerism is more readily detectable by a less sensitive method in BM compared to PB.10, 11 Interestingly, also MRD levels were reported to be elevated in BM compared to PB, supporting that BM analysis may generally provide a more sensitive assessment of post-HSCT status.12 Conversely, several studies showed that MRD monitoring in PB samples displays superior correlation with prognosis than the analysis of matched BM samples.13, 14 These findings imply that high sensitivity does not necessarily provide the best correlation with clinical parameters. This, however, is likely a consequence of lower sensitivity directly resulting in higher detection thresholds. Accordingly, higher MRD or lower DC levels are required for detection and those in turn are conceivably associated more tightly with a worse clinical prognosis. Importantly, our data indicate that reduced sensitivity is also associated with a delayed detection of complications. Therefore, prospective studies are needed to define a clinically relevant sensitivity threshold without negative effect on outcome because of delayed treatment. Presently, however, high sensitivity DC and MRD analysis may indicate shortening of monitoring intervals and potentially allow for therapeutic intervention at earlier time points. Of note, relapse prediction by chimerism quantification appears to be less accurate compared to MRD monitoring, likely due to an inherent lack of disease specificity.11 Conversely, chimerism analysis is more broadly applicable because MRD markers can be completely absent and disappear during relapse under therapy.11, 15

Taken together, our study demonstrates that PB chimerism status does not sensitively reflect BM status. This can result in unreported incomplete DC, potentially prolonging the time until affected patients are considered for further treatment. Therefore, we propose that for routine chimerism analysis BM samples are preferable to provide a sensitive assessment of chimerism status. In addition, cautious interpretation of complete PB DC in the absence of other data is advisable.