Introduction

Intensity-modulated radiotherapy (IMRT) and volumetric modulated arc therapy (VMAT) treatment planning require trial and error during the optimization process to obtain an ideal dose distribution. The plan quality for IMRT and VMAT depends on the knowledge and experience of the planner or institution during optimization, which can cause large intra- and inter-institutional variability1,2,3,4,5, and sometimes even affect treatment outcomes6.

RapidPlan (RP) (Varian Medical Systems, Palo Alto, CA, USA), a knowledge-based planning software, uses a model library containing the dose-volume histogram (DVH) of previous treatment plans. It automatically provides optimization objectives for future patients based on a trained model for VMAT planning. Previous studies concluded that RP with a single optimization could create clinically acceptable VMAT plans for prostate cancer, and could also reduce the optimization time independently of the planner’s skill level and knowledge7. Furthermore, it was expected that RP would be shared among institutions and thereby standardize the plan quality between them8,9,10,11. However, sharing the single-institution model with multiple institutions remained a challenge, because RP depended on registered plans, including the planning strategies at each institution, such as the prescribing method to the targets and the dose constraint of the organs at risk (OARs)12.

Panettieri et al. attempted to share a model trained with 110 treatment plans from multiple institutions that had different irradiation methods (IMRT and VMAT), contouring, planning strategies, and prescription doses contributing to reducing the intra- and inter-institutional variability13. However, all the plans in the multi-institution model were standardized by achieving the DVH constraints of their group. Therefore, the sharing of their multi-institution model was limited to the institutions that had the different planning strategies and experience.

We hypothesized that the model with a large number of plans could be applied to the various planning strategies. To examine this, we established and evaluated a multi-institution model (big model) that aggregated over 500 treatment plans from five institutions with different planning strategies and constraints for targets and OARs in prostate cancer VMAT. In this study, we compared the big model with each institutional model (single-institution model) by using the dosimetric parameters of the planning target volume (PTV), rectum, and bladder for two validation VMAT plans. The efficacy of the big model, including the large number of registered treatment plans, and the potential to reduce the inter-variability of the plan quality were clarified to be able to share it.

Methods

Institutions and plan design

Five institutions (A–E) that treated prostate cancer cases with VMAT in Japan were enrolled. The definition of gross tumor volume, the margins defining the clinical target volume (CTV) and PTV in each direction, and the dose constraints have been described in a previous study12,14. Table 1 shows the dose constraints used by each institution. The five institutions had different planning strategies. All methods were performed in accordance with the relevant guideline.

Table 1 Dose constraints at each institution.

Development of the single-institution model and the big model

An RP model is a mathematical model that uses knowledge from the included treatment plans to generate the estimated DVH and estimate-based objectives in the optimization process. The RP algorithm was explained in detail by Fogliata et al.15. The single-institution model and big model for RP were created using the prostate VMAT plans for clinical use at each institution. The number of single-institution models of registered cases in institutions A, B, C, D, and E were 123, 53, 20, 60, and 100, respectively. To build the big model, 561 approved clinical plans, including 150 from A, 153 from B, 49 from C, 60 from D, and 149 from E, were anonymized and submitted by each institution. These clinical plans were used at each institution from April 2017 to April 2019. The clinical plans used to configure the single-institution model were also registered in the big model, and the outliers were not excluded.

Validation of each model

Two sets of computed tomography (CT) data and structures (cases I and II) used at institution B were anonymized and delivered to other institutions. CT image thickness was 2.5 mm and the field of view was 50 cm. The target and OARs were contoured by a radiation oncologist according to the protocol of institution B. The bladder volume was 83.8 cm3 in case I and 181.8 cm3 in case II. The planners who had sufficient experience with using RapidPlan at each institution calculated the dose distributions with the single-institution model and big model using Eclipse ver. 13.0 or 15.6 (Varian). The objective settings for the big model, as shown in Table 2, were the same as the settings of the single-institution model of each institution.

Table 2 Objective settings for optimization with the big model and each single-institution model.

To evaluate the dose distributions calculated with the single-institution model and big model, the minimum dose (in Gy) to 2%, 50%, 95%, and 98% (D2, D50, D95, and D98) of the PTV and the volume ratio receiving 90%, 80%, and 50% of the prescribed dose (V90, V80, and V50) for the rectum and bladder were calculated in two cases. The homogeneity index (HI; defined as HI = [D2–D98]/D50) was calculated. In this study, a dose prescription of 78 Gy (in 39 fractions) was used for the calculation. The differences of dosimetric parameters between the single-institution model (Ds) and big model (Db) were calculated as follows:

$${\text{Difference }} = {\text{ D}}_{{\text{s}}} - {\text{ D}}_{{\text{b}}}$$

Model analysis

In RP, the principal component analysis between geometrical dose-volume histogram (GEDVH) and actual DVH was performed. The regression model with the principal component score (PCS) of GEDVH and DVH was used to estimate the ideal DVH for a new case, which indicated the performance of its estimation. The goodness-of-fit for the regression models, coefficient of determination (R2), and average chi squared (χ2) value were evaluated. The R2 value ranges from 0 to 1, with a larger value indicating a better model fit. The χ2 value closer to 1.0 provides more certainty that the quality of the regression model is good. In addition, to evaluate the outliers of the rectum and bladder in each model, the following four parameters were evaluated: Cook’s distance (CD), modified Z-score (mZ), studentized residual (SR), and areal difference of estimate (dA). CD indicates the influential data points in a regression model. A high CD value has a significant effect on the regression line. The mZ value measures the difference of an individual geometric parameter from the median value in a training set and identifies geometric outliers. The SR value measures the difference of PCSs of the DVHs between the original data and the estimated data (e.g., first PCS of the original DVH versus first PCS of the estimated DVH), which reveals dosimetric outliers. The dA value indicates the difference between the estimated dose distribution and the actual one, and is essentially the difference between the estimated DVH curve and the actual DVH curve.

To investigate whether each institution model’s and big model’s training sets covered the geometrical characteristics of cases I and II, such as targets, rectum, and bladder, we investigated whether the following parameters were within the threshold of two standard deviations from the median of the training set: target and OAR volumes, OAR out-of-field volume percentage, OAR overlap volume percentage to target, and geometric distribution PCS. A more detailed description of the RP and DVH estimation algorithm can be found in reference16.

Statistical analysis

For analysis among plans under the single-institution model and big model, the paired Wilcoxon signed-rank test was performed to calculate the differences in dosimetric parameters using JMP Pro 16 software (SAS Institute, Inc., Cary, NC, USA). A P-value < 0.05 was considered significant.

Ethics approval and consent to participate

This study was approved at all Institutional Ethical Review Committees. (Kindai University Review Board No. 31–273, Kyushu University Review Board No. 2020–286, JFCR Review Board No. 2020–1049, Seirei Hamamatsu General Hospital Review Board No. 3333, Osaka International Cancer Institute Review Board No. 20050).

Results

Dosimetric parameters for the PTV, rectum, and bladder

Table 3 shows the mean and standard deviation (SD) values of dosimetric parameters for the PTV, rectum, and bladder that were calculated with each single-institution model and big model. There were no significant differences in the dosimetric parameters (P > 0.05) between each single-institution model and the big model. In the rectum, all averages of dosimetric parameters with the big model were lower than those with the single-institution models. An average difference of more than 10% was observed in V50 for the rectum for each case. For the PTV, there were similar SD values between the single-institution models and big model. However, for both the rectum and bladder V50, the big model had lower SD values compared with those for the single-institution model in each case.

Table 3 Mean ± SD values of dosimetric parameters and differences between big model versus single-institution model.

Figure 1 shows the dosimetric parameter differences for the PTV, rectum, and bladder between the single-institution models and big model in each case. For the PTV, there were small differences between the single-institution models and big model. The maximum difference in D95 for the PTV among institutions was 3.9 Gy in institution D. Dosimetric parameters for the rectum calculated with the big model were lower than those calculated with the single-institution model. The maximum difference in V50 between the big model and single-institution model was 37.2% in institution D. The maximum differences among institutions for the single-institution model and big model were 9.3% and 10.2% for V90, 4.4% and 8.6% for V80, and 37.3% and 10.5% for V50, respectively. For V50, the big model was able to reduce the difference between each institution compared with each single-institution model. However, for both V90 and V80, the big model could not reduce the differences between each institution compared with each single-institution model. In the bladder, the dosimetric parameters calculated with the big model were lower than or equivalent to those calculated with the single-institution model, except for institution D. The maximum differences among institutions for the single-institution model and big model were 10.4% and 5.1% for V90, 9.6% and 4.1% for V80, and 12.0% and 5.1% for V50, respectively. In all dosimetric parameters, the big model had lower differences between institutions than the single-institution model.

Figure 1
figure 1figure 1

Dosimetric parameters for the (a, b) PTV, (c) rectum, and (d) bladder. For PTV, there were small differences between the single-institution models and big model in each case. Dosimetric parameters for the rectum calculated using the big model were lower than those calculated with the single-institution model. The volume ratio receiving 50% of the prescribed dose (V50) for institution D had the maximum difference (37.2%) between the big model and single-institution model. For the bladder, the dosimetric parameters calculated with the big model were lower than or equivalent to those calculated with the single-institution model, except for institution D.

Model analytics

Table 4 shows R2 and χ2 values of regression models in each model. The R2 value calculated from regression lines between PCSs of DVH and GEDVH for the big model was comparable to those from each model. The χ2 value for the big model was the closest to 1.0 compared with each single-institution model. Table 5 shows the ratio and number of outliers for each index, such as CD > 4.0 17, mZ > 3.5, SR > 3.0, and dA > 3.018, for the rectum and bladder in the training data for each model. The ratio and number of outliers in the big model were comparable to those from each single-institution model.

Table 4 R2 and χ2 values for regression model in each model.
Table 5 Ratio (%) and the number of outliers for each index.

The big model and single-institution A model covered all geometrical characteristics of cases I and II, while other single-institution models did not cover any geometric data for case I and II as follows:

institution B model: out-of-field volume percentage of the bladder; institution C model: bladder volume, overlap volume between target and OARs, and geometric distribution PCS of OARs; institution D model: out-of-field volume percentage of the bladder, overlap volume between target and the rectum, target volume, and geometric distribution PCS of the bladder; institution E model: target volume.

Discussion

In this study, the multi-institution model (big model) was developed with 561 VMAT plans from five institutions with different planning strategies for prostate cancer. We evaluated the dose parameters of the VMAT plans generated with this big model. The big model could generate better or comparable dosimetric parameters compared with each single-institution model. The dosimetric parameters of the OARs were improved, especially V50, which can prevent radiation toxicity from occurring in the rectum and bladder during treatment19. Additionally, it can maintain coverage for the PTV and reduce inter-institution variation in the OARs.

The dose coverage of the PTV for the VMAT plan with the big model was comparable to the single-institution model, as shown in Table 3. It reflected the planning strategies of each institution, even though each institution used different prescribing methods. The original objective for the PTV at each institution in Table 2 could reflect the planning strategy of the VMAT plans with the big model. Thus, the big model could be used for several institutions by setting the PTV objectives for each institution’s planning strategy. Moreover, the VMAT plans with the big model could reduce the doses to the rectum at all institutions, as well as to the bladder at all institutions except for institution D, compared with the single-institution model in Fig. 1, although there were no significant differences. This is because the big model has a wide range of geometrical information from the 561 plans and thus could cover any geometrical characteristics of the patients. The geometric characteristics of cases I and II were out of the range in all single-institution models except for institution A. This indicates that the estimation accuracy of those models could potentially deteriorate, while the big model covered the anatomical characteristics of those cases. Tol et al. noted that the wide range of anatomical information in the RP model was important for generating better plan quality compared with the clinical plans20. The line objectives along the DVH lower bounds were also useful for optimizing the estimated DVHs predicted from the big model with the large number of combinations between anatomical and dosimetric characteristics of registered plans15,21,22. In the rectum, the big model could not reduce the differences in the V90 and V80 values between each institution compared with each single-institution model, as shown in Fig. 1. This is because the rectum V90 and V80 are areas that overlap with PTVs, and were affected by the different planning strategies of PTVs in each institution.

In model analysis, the big model regression line had an equivalent or superior goodness-of-fit compared with each single-institution regression line, as shown in Table 4. The ratios of outliers in the big model were also comparable to each single-institution in Table 5. These results indicate that the big model regression quality could be used in the same way as each single-institution model without the impact of outliers previously seen in other studies11,23.

The sharing of one RP model among multiple institutions can reduce the inter-institution variations from the reduction of SD values, as shown in Table 3, leading to standardization14. A previous study noted that an RP model is difficult to share among other institutions because of different planning strategies12. Our big model, as described in the current study, can cover any combination between anatomical and dosimetric characteristics based on the large number of plans, which can possibly overcome this issue. Therefore, sharing the big model generated from more plans found worldwide should realize the standardization of plan quality at any institution. For example, at a new institution, the planners will use the optimization parameters predefined by the big model, and then, they may customize those or use their own parameters in the case where those plans do not meet the dose criteria and/or planning strategy at that institution. The KBP can also serve as a training tool for the planners and institutions to implement the manual optimization14. One limitation is that this study included only two cases for evaluation, however: those were familiar prostate cancer cases; a study was performed to compare the dosimetric performance of the KBP models among five institutes12 and another one was used to evaluate whether the KBP models could improve dosimetric performance over the treatment period14. It is necessary to investigate more cases for various sites. A big model like the one presented here might also be applied to stereotactic radiotherapy because of its simple anatomical characteristics, while further study is needed for complicated anatomical cases such as head and neck cancer. The mechanical performance and delivery accuracy of the plans generated with the big model should also be verified before clinical use24.

Conclusions

The big model, trained with over 500 clinical plans from multiple institutions with various planning strategies for targets and OARs in prostate cancer, could generate a superior or comparable plan quality compared with the VMAT plans generated with the single-institution models. Our work suggests a potential for plan quality standardization and reduction of inter-institution variability by using the big model.