Dual adversarial deconfounding autoencoder for joint batch-effects removal from multi-center and multi-scanner radiomics data

Medical imaging represents the primary tool for investigating and monitoring several diseases, including cancer. The advances in quantitative image analysis have developed towards the extraction of biomarkers able to support clinical decisions. To produce robust results, multi-center studies are often set up. However, the imaging information must be denoised from confounding factors—known as batch-effect—like scanner-specific and center-specific influences. Moreover, in non-solid cancers, like lymphomas, effective biomarkers require an imaging-based representation of the disease that accounts for its multi-site spreading over the patient’s body. In this work, we address the dual-factor deconfusion problem and we propose a deconfusion algorithm to harmonize the imaging information of patients affected by Hodgkin Lymphoma in a multi-center setting. We show that the proposed model successfully denoises data from domain-specific variability (p-value < 0.001) while it coherently preserves the spatial relationship between imaging descriptions of peer lesions (p-value = 0), which is a strong prognostic biomarker for tumor heterogeneity assessment. This harmonization step allows to significantly improve the performance in prognostic models with respect to state-of-the-art methods, enabling building exhaustive patient representations and delivering more accurate analyses (p-values < 0.001 in training, p-values < 0.05 in testing). This work lays the groundwork for performing large-scale and reproducible analyses on multi-center data that are urgently needed to convey the translation of imaging-based biomarkers into the clinical practice as effective prognostic tools. The code is available on GitHub at this https://github.com/LaraCavinato/Dual-ADAE.

1 Patients' characteristics Table S 1. Patients' characteristics in Institution 1: variables are divided into categorical (number, percentage on the total) and numerical (mean, standard deviation).In the first group, they are listed the stage (four statuses), the sex (female F and male M), the presence of B symptoms like fever, sweats, weight loss (yes Y and no N), status of the disease (extranodal disease: yes Y and no N; bone disease: yes Y and no N), administration of radiotherapy (yes Y and no N), the outcome of interim PET (iPET, Deauville Score DS of the PET), end of treatment PET (EOT PET, Douville Score DS of the PET).Statistics are stratified by the treatment response, thus patients are divided into responders and non-responders.

Software
At this link, the code for Dual AD-AE is distributed.We implemented the training of the dual AD-AE model, adhering to conventional practices involving several procedural steps.Following data preparation and model architecture definition, the training loop iterated through the training data.In each iteration, a batch of input data underwent encoding to derive latent representations, which were subsequently passed through the decoder to generate reconstructed data.The loss function was employed to quantify the dissimilarity between the input and reconstructed data, and backpropagation facilitated the computation of gradients pertaining to the model parameters.These gradients, in turn, drove the updating of model weights through the chosen optimizer.Periodic utilization of the validation set allowed for performance evaluation and the monitoring of training progression, facilitating early stopping to counteract overfitting.The choice to employ a unique validation set with multiple epochs, rather than opt for cross-validation involving numerous train/test splits with fewer epochs, was underpinned by various considerations.These encompassed the size of our dataset, the inherent nature of the problem, and available computational resources.Although this approach harbored the potential for overfitting and a potentially less precise evaluation of model generalization, its implementation was both straightforward and computationally efficient, given its reliance on the complete dataset.It further provided a singular model framework endowed with a consistent validation set, thereby enabling continuous monitoring of performance dynamics.
However, a cross-validation approach can be interesting as an other training option, particularly maximizing the utility of limited data and furnishing a more robust assessment of generalization, thus we subsequently introduced an alternative training regimen for the AD-AE model.In this regimen, we implemented a cross-validation setup comprising 50 splits, each spanning 100 epochs.

Table S 2
. Patients' characteristics in Institution 1: variables are divided into categorical (number, percentage on the total) and numerical (mean, standard deviation).Among the numerical variables, there are age, number of nodal lesions of the patients, number of extranodal lesions of the patients, and time to relapse (for censored patients, the time to last follow-up is taken).Statistics are stratified by the treatment response, thus patients are divided into responders and non-responders.

Table S 3
. Patients' characteristics in Institution 2: variables are divided into categorical (number, percentage on the total) and numerical (mean, standard deviation).In the first group, they are listed the stage (four statuses), the sex (female F and male M), the presence of B symptoms like fever, sweats, weight loss (yes Y and no N), status of the disease (extranodal disease: yes Y and no N; bone disease: yes Y and no N), administration of radiotherapy (yes Y and no N), the outcome of interim PET (iPET, positive or negative), end of treatment PET (EOT PET, positive or negative).Statistics are stratified by the treatment response, thus patients are divided into responders and non-responders.

Table S 5
. Image acquisition protocols and scanner specification in Institution 1: 85 patients were scanned with Siemens Biograph scanner; 51 patients were scanned with General Electric Discovery 690 scanner; 5 were scanned with other unspecified scanners.

Table S 6
. Image acquisition protocols and scanner specification in Institution 2: 34 patients were scanned with General Electric Discovery 710 scanner; 38 patients were scanned with Philips Gemini scanner; 1 patient was scanned with other unspecified scanners.

Table S 7
. Descriptive Statistics and statistical comparisons of radiomics variables in terms of mean values and standard deviations for the two cohorts.