Assessing radiomic feature robustness to interpolation in 18F-FDG PET imaging

Radiomic studies link quantitative imaging features to patient outcomes in an effort to personalise treatment in oncology. To be clinically useful, a radiomic feature must be robust to image processing steps, which has made robustness testing a necessity for many technical aspects of feature extraction. We assessed the stability of radiomic features to interpolation processing and categorised features based on stable, systematic, or unstable responses. Here, 18F-fluorodeoxyglucose (18F-FDG) PET images for 441 oesophageal cancer patients (split: testing = 353, validation = 88) were resampled to 6 isotropic voxel sizes (1.5 mm, 1.8 mm, 2.0 mm, 2.2 mm, 2.5 mm, 2.7 mm) and 141 features were extracted from each volume of interest (VOI). Features were categorised into four groups with two statistical tests. Feature reliability was analysed using an intraclass correlation coefficient (ICC) and patient ranking consistency was assessed using a Spearman’s rank correlation coefficient (ρ). We categorised 93 features robust and 6 limited robustness (stable responses), 34 potentially correctable (systematic responses), and 8 not robust (unstable responses). We developed a correction technique for features with potential systematic variation that used surface fits to link voxel size and percentage change in feature value. Twenty-nine potentially correctable features were re-categorised to robust for the validation dataset, after applying corrections defined by surface fits generated on the testing dataset. Furthermore, we found the choice of interpolation algorithm alone (spline vs trilinear) resulted in large variation in values for a number of features but the response categorisations remained constant. This study attempted to quantify the diverse response of radiomics features commonly found in 18F-FDG PET clinical modelling to isotropic voxel size interpolation.


Acquisition / Reconstruction
A GE 690 PET/CT scanner (GE Healthcare, Buckinghamshire, UK) was used for all patients. Patients were fasted for at least 6 hours prior to tracer administration. Serum glucose levels were routinely checked and confirmed as less than 7.0 mmol/L prior to imaging. Patients received a dose of 4 MBq of 18F-FDG/kg. Uptake time was 90 minutes, which is standard practice at our institution. PET images were acquired at 3 minutes per field of view. The length of the axial field of view was 15.7 cm (skull base to mid-thigh). Images were reconstructed with the ordered subset expectation maximisation algorithm, with 24 subsets and 2 iterations. Matrix size was 256 x 256 pixels, using the VUE PointTM time of flight algorithm. CT images were acquired in a helical acquisition with a pitch of 0.98 and tube rotation speed of 0.5 seconds. Tube output was 120 kVp with output modulation between 20 and 200 mA. Matrix size for the CT acquisition was 512 x 512 pixels with a 50 cm field of view. No oral or intravenous contrast was administered. Further details can be found in Foley et al. [1].

Software
CERR (Computational Environment for Radiological Research) [2] was used to import DICOM files into Matlab. Feature extraction was done with SPAARC (Spaarc Pipeline for Automated Analysis and Radiomics Computing), an in-house software built on the MATLAB platform (MathWorks, Natick, MA, USA).

Data availability
Original DICOM data cannot be made publically available at this time. However, visualisations of Scan VOIs are available on request.

Data conversion
PET imaging was converted from Becquerel/ml (Bq/ml) to Standardised Uptake Value (SUV) using CERR functionality that pulls the necessary information for conversion from the DICOM image tags.

Segmentation
The PET VOIs were segmented using ATLAAS [3]. Contours were rated by an expert radiologist (author KF) using a binary score (acceptable=1, unacceptable=0). Segmentations were performed on the original scan dimension 2.73 x 2.73 x 3.27 mm.

Interpolation
This study assessed isotropic interpolation to voxel dimensions of 2.7mm, 2.5mm, 2.2mm, 2.0mm, 1.8mm, and 1.5mm. Both tri-linear and spline interpolation methods were explored. The mask defining the VOI was always interpolated linearly, following IBSI guidelines [4].

Discretisation
We used a fixed bin width method to discretise the scan values [4]. The bin width was set to 0.5 SUV.
NOTE: Morphological features that depend only on the mask and not the scan values will always have exactly the same results between linear and spline because the binary mask defining the VOI is always interpolated linearly in our implementation (as per IBSI recommendations).

Visualising Feature Response to Interpolation
For all 141 features we visualised the change in feature values when resampling to different isotropic voxel sizes, using both linear and spline methods (examples of this are used in Figure 3 of main manuscript). These results were generated using the testing datasets. We also compared the difference in feature values between linear and spline methods for the 2.7mm voxel size results.

Plot explanation
Each of the 141 features has 6 plots, labelled a to f, that are summarised as follows: a) (Linear interpolation) -Visualisation of feature response to voxel-size resampling. Each patient was given a rank based on the results of 2.7mm dimension which was plotted against the extracted feature values for all voxel dimensions.
b) (Spline interpolation) -Visualisation of feature response to voxel-size resampling. Patient ranking was kept from the linear results to enable a direct comparison between plots a) and b).                                           glcm3d-sumEntropy (LINEAR)            gldzm3d-largeDistanceEmphasis (LINEAR)                                              glrl3d-runEntropy (LINEAR)                                  glszm3d-zoneSizeEntropy (LINEAR)                                                                                               The 34 features marked as correctable due to systematic behaviour underwent the surface fitting method for correction as described in the methods section of the main document. The surfaces, developed using the testing cohort, were evaluated by rescaling feature values in the validation cohort. For each VOI, feature value and voxel size were plotted against the percentage change in feature value compared to the 2.7mm voxel dimension result. Surfaces were generated using the fit function from MATLAB's curve fitting toolbox (MathWorks, Natick, MA, USA). For VOIs with the 2.7mm voxel dimension, the percentage change in feature value compared to the 2.7mm result is clearly zero. This process was demonstrated in the main manuscript in Fig.4 for the feature glrl feature run length Non uniformity. Below we plot the same visualisations for all 34 features that we attempted to correct.
For each figure, plot a) is the testing dataset feature values and voxel sizes plotted against the percentage change in feature values. Plot b) is a ranked plot of the validation dataset for a feature before applying a correction. Plot c) is the validation dataset after applying the correction based on equation (1) in the main manuscript.