Template-based automation of treatment planning in advanced radiotherapy: a comprehensive dosimetric and clinical evaluation

Despite the recent advanced developments in radiation therapy planning, treatment planning for head-neck and pelvic cancers remains challenging due to large concave target volumes, multiple dose prescriptions and numerous organs at risk close to targets. Inter-institutional studies highlighted that plan quality strongly depends on planner experience and skills. Automated optimization of planning procedure may improve plan quality and best practice. We performed a comprehensive dosimetric and clinical evaluation of the Pinnacle3 AutoPlanning engine, comparing automatically generated plans (AP) with the historically clinically accepted manually-generated ones (MP). Thirty-six patients (12 for each of the following anatomical sites: head-neck, high-risk prostate and endometrial cancer) were re-planned with the AutoPlanning engine. Planning and optimization workflow was developed to automatically generate “dual-arc” VMAT plans with simultaneously integrated boost. Various dose and dose-volume parameters were used to build three metrics able to supply a global Plan Quality Index evaluation in terms of dose conformity indexes, targets coverage and sparing of critical organs. All plans were scored in a blinded clinical evaluation by two senior radiation oncologists. Dose accuracy was validated using the PTW Octavius-4D phantom together with the 1500 2D-array. Autoplanning was able to produce high-quality clinically acceptable plans in all cases. The main benefit of Autoplanning strategy was the improvement of overall treatment quality due to significant increased dose conformity and reduction of integral dose by 6–10%, keeping similar targets coverage. Overall planning time was reduced to 60–80 minutes, about a third of time needed for manual planning. In 94% of clinical evaluations, the AP plans scored equal or better to MP plans. Despite the increased fluence modulation, dose measurements reported an optimal agreement with dose calculations with a γ-pass-rate greater than 95% for 3%(global)-2 mm criteria. Autoplanning engine is an effective device enabling the generation of VMAT high quality treatment plans according to institutional specific planning protocols.

www.nature.com/scientificreports www.nature.com/scientificreports/ (spinal cord, brainstem, optic chiasm and optic nerves) isotropically expanding the corresponding OAR by 5 mm. For all serial PRV_OARs, the dose to 0.035 cc was considered as maximal dose.
High-risk prostate cancer. The prostate plus 5-mm periprostate tissue and 5 to 20 mm of caudal seminal vesicles based on risk category 24 were defined as CTV1. CTV2 included the obturator, internal and external iliac, and presacral lymph nodes. PTV2 was obtained by adding an isotropic 8-mm margin to the CTV2; PTV1 was obtained by adding a 8-mm margin in all directions to the CTV1, except posteriorly where a 6-mm margin was given. The PTVs were simultaneously irradiated over 25 daily fractions prescribing 65.0 Gy (2.6 Gy/fraction) and 45.0 Gy (1.8 Gy/fraction) to PTV1 and PTV2, respectively. Main OARs were considered the rectum, the bladder, the small bowel and the femurs. endometrial cancer. The PTV1 consisted of the upper two thirds of vagina plus resection lines in the parametria (CTV1) with a 8 mm margin. CTV2 included the obturator, external iliac, internal iliac and presacral nodes. Corresponding PTV2 was obtained adding 8 mm margin. The PTVs were simultaneously irradiated over 25 daily fractions with a dose of 60.0 Gy (2.4 Gy/fraction) and 45.0 Gy (1.8 Gy/fraction) to PTV1 and PTV2, respectively. Main OARs were the rectum, the bladder, the small bowel and the femurs.
Manual and auto-VMAt planning. Manual VMAT plans were generated with "dual-arc" feature using the inverse optimization process previously described in more details 25 for coplanar 6-10 MV photon beams of an Elekta VersaHD linac (Elekta Ltd., Crawley, UK). A full gantry rotation was described by a sequence of 90 control points, i.e. one every 4°. Collimator was set at 10° to minimize the tongue-and-groove cumulative effect. For the three anatomical sites, treatment planning was performed accounting to the clinical objectives reported in Table 1.
Automated plans were created for each patient using the Autoplanning module implemented in the version 16.0 of Pinnacle 3 TPS. For each anatomical site, the Technique was created based on the same beam parameters, dose prescription and clinical objectives adopted for manual plans. During the optimization process, the AP engine automatically generates several dummy structures including: (a) rings around the PTVs to manage the dose fall-off, (b) residual targets structures where overlaps between no-compromised OARs are removed, (c) residual OARs structures where overlaps between targets are removed, (d) body structures used to control body dose and (e) hot-spot and cold-spot structures to manage target dose uniformity. New objectives are then automatically added to the aforementioned structures in order to achieve better OARs sparing and target uniformity and conformity. This process is iteratively performed during multiple optimization loops by adjusting the optimization parameters in order to continually spare the OARs without compromising the target coverage, i.e. mimicking what a manual experienced planner would usually do. Five patients for each site, not included in the present series, have been used to create and tweak the initial Techniques in order to generate plans fulfilling the clinical objectives. For both MP and AP plans, dose calculations were performed using the Pinnacle 3 collapsed cone convolution dose calculation algorithm with a dose grid resolution of 2 mm. An example of Technique used for head-neck cases optimization is presented in Fig. 1. The objectives for the PTVs are only defined by numbers close to prescription doses (in our experience we chose as target goals the prescription doses plus 1 Gy, so as to avoid possible under dosage in PTVs boundary). The OARs objectives include maximum dose, mean dose and dose-volume histogram points; they can have three different priority levels (high, medium and low) and can be set compromised or uncompromised. These last choice is applicable when there is an overlap between PTVs and OARs; in the case of compromise option, the PTVs owns the overlapping voxels for the benefit of target coverage. Only for serial OARs, as spinal cord and brainstem, we chose to have higher priority than target volumes. In the advanced settings template (Fig. 1b), the planner can set up: (a) the tuning balance (i.e. the balance between target dose conformity and OARs sparing), (b) the dose fall-off margin (i.e. the distance across which the dose should decrease from 80% to 20% in an automatically generated tuning ring structure around the PTVs) and (c) the Cold-Spot ROI (i.e. the identification of cold regions inside the PTVs and the automatic creation of new tuning volumes and relative dose objectives to increase dose in the last optimization loops).
For head-neck cases, a manual fine tuning of 15 minutes at the end of the automated optimization process is usually needed in order to further lower the dose to serial OARs (eg. spinal cord).
Based on the Quantitative Analyses of Normal Tissue Effects in the Clinic (QUANTEC) guidelines for normal tissue sparing 26 , similar templates were created for high-risk prostate and endometrial cancer cases. plan evaluation and analysis. Following the suggestions of Leung et al. paper for plan comparison 27 , several dosimetric parameters were used to build three metrics able to supply a global evaluation 1 : a healthy tissue conformity index (H) to describe the overall plan conformity 2 , a merit function (M) to describe the targets coverage and 3 a penalty function (P) to evaluate the sparing of critical organs. www.nature.com/scientificreports www.nature.com/scientificreports/ where r is the number of PTVs of different prescription dose, TV RI is the target volume covered by the reference isodose and V RI is the volume of the reference isodose cloud. Reference isodoses were set at 95% of each prescription dose level.

Healthy tissue conformity index (H). This index was defined as
For example, for the two PTVs in high-risk prostate case, the equation is expanded as:  where r is the number of targets of different prescription dose, p is the number of cold spot checks, q is the number of hot spot checks, V Tj,Di is the volume of the jth target in % receiving a dose of at least the ith dose level, V Tj , R Di is the minimum volume of the jth target in % receiving at least the ith dose level and V Tj,ADi is the allowable volume of the jth target in % receiving at least the ith dose level. For example, for the two PTVs objectives in high-risk prostate case as reported in Table 1, the equation is expanded as: Note that the denominators represents the maximum possible score.
normal tissue sparing index (p). This index was defined as: where n is the number of critical organs to be monitored, m is number of check points used for the jth critical organ, V Oj,Di is the volume of the jth critical organ in % receiving a dose of at least the ith dose level and V Oj,ADi is the allowable volume of that organ in % receiving at least the ith dose level. For example, in the high-risk prostate case, P becomes where, for the rectum QUANTEC objectives reported in Table 1, the equation is expanded as: 1  3  1  50  50  1  60  35  1  65  25 rectum and similarly for the other critical structures. Main relevant OARs for the determination of P index were the rectum, the bladder, the small bowel and the femurs for prostate and endometrial cases, and PRV_spinal cord, PRV_brainstem, PRV_optic chiasm, parotid glands, lens and pharyngeal constrictor muscles for head-neck cases. plan quality index (pQi). A comprehensive Plan Quality Index (PQI) can be formulated consolidating the three different metrics H, M and P into a single figure obtained using the following Euclidean distance between the points (H,M,P) and (1,1,1).
This index represents the overall quality of a treatment plan. This choice was made because it was considered the most appropriate to represent the plan quality deviation from the ideal case. i.e. the point (0,0,0). This must be interpreted as how far a plan is away from perfection, i.e. H = 1, M = 1, P = 1 or (1,1,1). So, for an ideal case, PQI = 0 while for the worst scenario is PQI = √3.
In order to compare previous values with more common indices we calculated the conformation numbers (CNs) for each target volume as suggested by the Van't Riet et al. 28  www.nature.com/scientificreports www.nature.com/scientificreports/ where TV RI was the target volume covered by the reference isodose, TV was the target volume, and V RI was the volume of the reference isodose. The first part of this equation defines the quality of target coverage and the second part defines the volume of healthy tissues receiving a dose greater than or equal to the prescribed dose. CN ranges from 0 (complete PTV geographic miss) to the ideal value 1 (perfect conformity of the reference isodose to the PTV). Reference isodose was selected as 95% of the prescribed dose. Note that the second part of this equation represents the H value for each target.
Last, the integral dose (ID) received by non-tumour tissues was calculated as the product between mean dose and non-tumour tissue volume (Gy • cc).
Differences between manual and automated plans were quantified using the Wilcoxon matched-pair signed rank with a statistical significance at p < 0.05. plans variability. To evaluate the variability between MP and AP plans, we first calculated the coefficient of quartile variation (CQV) of the aforementioned metrics (CNs, H, M, P and PQI) for each of the three anatomical site. CQV was defined as the ratio between the difference and the sum of first and third quartiles and was adopted because of its statistical robustness when dealing with data with outliers and/or skewed distributions 29 .
Then, in order to quantify the differences in standard deviations (SDs) of CNs, H, M, P and PQI metrics we performed the Levene's test for homogeneity of SDs when data comes from non normal distributions, with statistical significance at p < 0.05.
Planning and treatment efficiency. For all patients, the total planning time (human inputs, optimization loops and dose calculation times) and the total number of monitor units were recorded for both MP and AP plans; all optimization processes were performed on a local server (HP Z800 workstation, 2.80 GHz).
Dosimetric verification. Dose distributions were measured utilizing the 1500 2D ion-chamber array together with the Octavius-4D phantom 30 both developed by PTW (PTW, Freiburg, Germany). The 1500 2D-array consists of a matrix of 1405 ion chambers with a size of 4.4 mm × 4.4 mm × 3.0 mm. This array is inserted into the Octavius-4D motorized cylindrical polystyrene phantom, capable to rotate synchronously with the gantry, so that the beam always hits the array in a perpendicular way, then allowing the possibility of 3-dimensional dose reconstruction and comparison. Measured and calculated dose distributions were compared by means of the gamma evaluation, based on the theoretical concept introduced by Low et al. 31 . Following the recent suggestions of the AAPM report No. 218 32 , dosimetric verification was considered optimal if the percentage of points fulfilling gamma index criteria exceeded 95% using 3% for dose criterion (global) and 2 mm for the distance to agreement criterion. physician's plan scoring. Two senior radiation oncologists independently performed a blind clinical evaluation of all AP and MP plans, based on the dose distributions, DVHs for all structures and a summary table reporting the most important parameters. The radiation oncologists rated the plans at first judging the clinical acceptability of each plan (pass or not pass) and secondary expressing their preferences using a clinical judgement based on a three-score scale (AP better than MP, MP better than AP and no preference). Cohen's kappa coefficient k was calculated to assess the inter-clinicians agreement 33 , with score defined as excellent (k > 0.81), good (0.61 < k < 0.80), moderate (0.41 < k < 0.60), fair (0.21 < k < 40) and poor (k < 0.20). Table 2.

Head-neck cancer cases. A summary of H, M and P indexes for MP and AP plans is reported in
Global PQI scores for MP and AP plans were found to be 0.796 ± 0.059 and 0.722 ± 0.056, respectively. No significant differences were found for the target coverage M index for nodal target volumes (PTV2 and PTV3) but MP plans showed larger hot-spot regions inside the PTV1 providing a worse M value. H index values show a higher capability of AP plans to better conform the doses to target volumes, especially to the prophylactic volumes. Regarding P index values for organs at risk sparing, AP plans provided a major sparing of parotid glands and a reduction of mean dose of 10% (3.7 Gy, p = 0.022). Wilcoxon test also showed that the integral dose delivered to the patient body was significantly lower for AP plans than for the MP plans, with a reduction of 6.6% (p = 0.003). Figure 2 shows the dose distributions comparison for a representative patient.

High-risk prostate cancer cases. A summary of H, M and P indexes for MP and AP plans is reported
in Table 3. Global PQI scores for MP and AP plans were found to be 0.429 ± 0.053 and 0.400 ± 0.0.049, respectively. No significant differences were found for the target coverage M index alone. H conformity index values was significantly better for AP plans (p = 0.003) suggesting an higher capability of dose conformation to the large concave shaped nodal target volume, as shown graphically in Fig. 3 for a representative patient. This ability also translated into a significant reduction of integral dose to non-tumour volumes of 7.2%. Regarding P values for OARs sparing, no significant differences were found for all relevant OARs between the two techniques (p = 0.477). endometrial cancer cases. A summary of H, M and P indexes for MP and AP plans is reported in Table 4.
Global PQI scores for MP and AP plans were found to be 0.532 ± 0.095 and 0.472 ± 0.081, respectively. As for prostate cases, while no significant differences were found for the target coverage M index, H values showed a higher capability of AP plans to better conform the dose distribution to target volumes, especially to the large concave nodal volumes. This capability is highlighted in Fig. 4 showing the dose distribution for a representative patient and it is also evidenced by a significant reduction in the integral dose to non-tumor tissues of 10% (p = 0.010). Regarding P values for OARs sparing, no significant differences were found for all relevant OARs between the two techniques (p = 0.953).
www.nature.com/scientificreports www.nature.com/scientificreports/ plans variability comparison. Figure 5 shows the whiskers box-plots of CNs, H, M, P and global PQI for the three anatomical sites. In particular, 1 out of 36 patients (an high-risk prostate patient) reported a worse PQI value for AP plans. The figure also shows a qualitative reduction of CQV values for plans optimized with AP technique for almost all metrics. Table 5 reports the CQV and SD values calculated for the principal metrics used for plan comparison. AP plans reported a narrowing of the variability range of CQV for the global PQI values from 21% for endometrial cases to 37% for head neck cases. For each metric, the results of Levene's test for homogeneity of SD between MP and AP plans are reported. In particular, AP plans reported a significant decrease in plans variability for the conformation numbers related to dose conformity to the large concave and irregular lymph-nodal volumes (CN2 and CN3).  www.nature.com/scientificreports www.nature.com/scientificreports/ Treatment efficiency and dosimetric verification. Table 6 reports a summary of the treatment planning efficiency and the delivery metrics. Average total treatment planning time was about 60 minutes for AP plans in high-risk prostate and endometrial cases and 80 minutes for head-neck cases. Compared to MP plans, a significant larger number of monitor units was observed for AP plans, especially for head-neck cases, reflecting an increased level of fluence modulation.
Pre-treatment verification was performed for all plans. With criteria equal to 3%(global) −2 mm for γ-index, the average pass-rate was 98.2 ± 1.4% for MP plans and 98.1 ± 1.4% for AP plans (p = 0.882).
physician's plan scoring. Cohen's kappa coefficient resulted in a intra-observer variability equal to 0.83 indicating an almost perfect agreement. Regarding the assessment of all plans by the two radiation oncologists, the clinical score of AP plans was equal or better than MP plans in 97% and 94% of cases for both clinicians, respectively. The clinical evaluation of plan quality was favourable to AP plans for both radiation oncologists.  Table 3. Comparison of scoring metrics between manual and automated planning for high-risk prostate cancer cases.

Discussion
In the present study we explored the potential of a fully template-based automated VMAT planning engine implemented in Pinnacle TPS for challenging treatments executed in clinical routine. Head-neck, high-risk prostate and endometrial cancer sites were chosen because they involve large concave-shaped target volumes, multiple dose prescription, use of simultaneous integrated boost strategy and a large number of OARs adjacent or partially overlapping targets, then presenting the most complex and challenging problem for the plan optimization algorithms. The resulting plans were then compared with clinically accepted VMAT treatment plans generated by experienced medical physicists. The selection of optimal plans from different competing techniques or planning strategies has always been a daunting task, relying on dose volume histogram metrics and visual inspection of isodose distributions, often providing ambiguous evaluations. For this reason, a few quantitative indexes have been introduced to quantitatively describe the quality of a given plan 27,34 . In particular, Leung et al. proposed a new dose-volume based index for intensity-modulated plans called Plan Quality Index, able to simultaneously describe the overall plan conformity, the target coverage and the doses to critical organs 27 . The authors reported that this index improved the plan quality discerning power with respect to conventional comparison strategies. Following the suggestions of Leung et al. paper, we adopted the proposed PQI as fundamental metric  Table 4. Comparison of scoring metrics between manual and automated planning for endometrial cancer cases. www.nature.com/scientificreports www.nature.com/scientificreports/ to determine plan quality and for plan comparison purposes. Our results show that for all the anatomic sites, AP plans were able to provide similar, and in some cases better, plan quality of MP plans. AP plans significantly improved dose conformity, especially to large concave nodal target volumes, in all anatomical sites, but no statistically significant differences were found in terms of targets coverage. Similarly, no statistically significant difference were found for all relevant OARs dose sparing in high-risk prostate and endometrial cancer cases. However, AP plans showed an improvement in OARs sparing in head-neck cancer cases, i.e. in cases with the most complex anatomic scenario. In this case, maximal dose to PRV brainstem was reduced on average by 4.3 Gy and parotids mean dose was reduced on average by 3.7 Gy, that may be clinically relevant to reduce xerostomia.  These dosimetric findings were confirmed by the two radiation oncologist in the blind clinical evaluation session who considered AP plans better or equal to MP plans in more than 90% of cases.
A potential bias of this kind of planning comparison studies is that the quality of MP plans should be as high as possible (poor quality of MP plans obviously would favour AP plans). In our case, all clinically MP plans were created by two medical physicist with 10 years experience in VMAT planning, with the aim to obtain not only high quality plans but also a reduction of interplanner variability. As reported in Table 6, AP plans achieved a reduction of variability expressed by the CQV metrics for almost all dosimetric metrics for the three anatomical sites, with statistical significance for dose conformity. AP engine not only significantly improved dose conformity for complicated target geometry (including nodal involvement) but it has also the potential to drive a reduction of human-caused variability in VMAT planning for conformal coverage and dose distributions.
It must be highlighted that manual planning for complex cases is a challenging task, based mainly on the planner experience. Planners, although very experienced, never "a priori" know how much a plan can be optimized nor they can ensure that all dosimetric constraints on all OARs have been tightened as much as possible. This result has also been observed in other recent studies 5,15 focused on the optimization of prostate treatment with different automated algorithms as Rapidplan and Erasmus-Icycle. From this point of view, automated planning could allow more consistent outcomes in treatment planning studies and clinical trials thanks to their greater ability to reduce the inter-and intra-planner variations.
Regarding planning and treatment efficiency, AP plans resulted in 8-15% increase of MUs, a result in agreement with other experiences with Autoplanning engine 22 , and suggesting an increase of plan complexity and fluence modulation. However, the MUs increase did not translate in a lower pass rate during pre-treatment verification on the Octavius-4D phantom, which resulted in strong agreement with MP plans pass rate (p = 0.882). Moreover, unlike expected, the increase of MUs number did not increase the integral dose to the patient, which was lower by 6.6-10.1%, theoretically reducing the risk of secondary malignancies 35 .
Mean overall planning time including human inputs, optimization loop processes and calculation times was 60 minutes and 80 minutes for pelvic and head-neck AP plans, respectively (about a third of time needed for manual planning).
Perhaps the most important feature of the Autoplanning module is its ability, according to the vendor, to push the OARs dose sparing beyond the constraints specified in the Technique, towards physical achievable limits. This feature is unique and represents a significant change compared to other dosimetric planning engines or to the natural human planning strategy, in which the primary goal is the achievement of objectives judged to be clinically effective. In the present paper, this ability was reported in the treatments of the head-neck district, where AP plans showed a significant reduction in the average dose to the parotid glands of about 10% and of the maximum dose to brainstem and spinal cord, well beyond the objectives that had been assigned in the Technique. Clearly, to definitively prove the aforementioned claim it would be necessary to demonstrate that AP plans are Pareto optimal, i.e. one or more objectives (as OARs sparing) cannot be improved without worsening at least one other (as target coverage). This demonstration is a challenging mathematical task 36 and is beyond the scope of the present paper.
An advantage of Autoplanning with respect to other strategies for automatic planning based on KBP knowledge-based approach is that it does not rely on a database of prior patients. This database must usually be filled with a large number of high quality plans for each protocol and disease site, whose clinical implementation translated in a labor-intensive process. Any changes in contouring protocol or dose prescription or planning techniques could require the generation of a new database. On contrary, in our experience only a small set of training patients for each anatomical site (five patients in our experience) was necessary as starting point for the implementation of the Techniques in Autoplanning by an expert team of medical physicists.  www.nature.com/scientificreports www.nature.com/scientificreports/ However, in the case of head-neck cancer site, we faced the problem of high point doses in serial OARs as the spinal cord or brainstem when they lie very close or partially inside the PTV. These cases have been solved with a further manual tuning of dose objectives in the post optimization so as to decrease the dose to these serial OARs below the acceptable values. This re-optimization step does not require more than 15 minutes of dose calculation time. From this point of view, AP strategy can always be considered a high quality starting point for further plan optimization, that is a tool able to increase the overall quality of planning, rather than a tool that could completely remove the need of manual optimization.
A potential limitation of the PQI evaluation method is that different combinations of H, M and P values can provide identical PQI values when comparing two plans (i.e. one plan may have a better M while another plan may have a better P). In this scenario, clinicians would inevitably decide which one would benefit the patient most, focusing attention on the coverage of the targets rather than on dose conformity or OARs sparing. Moreover, the actual H, M and P indexes are defined so that each dosimetric parameter has the same weight, while, in some specific clinical cases, clinicians may prefer that an objective for an OAR or a target have more weight than another.
Two additional potential benefits of Autoplanning template-based module are currently under investigation. The first one is the feasibility of rapid and easy knowledge-sharing between different institutions. The In our experience, for high-risk prostate and endometrial treatments, Autoplanning normally created optimal plans in a "one-button click" procedure without any planner intervention for manual tuning. Techniques for specific anatomical sites can be successfully shared and implemented across multiple centres with simple adaptations to local protocols, allowing each centre to obtain optimal plans with the same quality 37 . The second benefit concerns the use of Autoplanning in the adaptive radiotherapy setting. In this case, the goal to correct for daily tumour and normal tissue variations through modification of original plan is hampered by the time-consuming re-planning process, representing nowadays the major obstacle for large scale implementation of this strategy. Improvement in Autoplanning, therefore, has the potential to make routine online adaptive radiotherapy a possibility 38 .

conclusion
We evaluated the Pinnacle Autoplanning engine to be a robust clinical tool, reporting significant increase of dose conformity with respect to manual planning. The blinded clinical scoring confirmed the dosimetric results, showing that in more than 90% of the evaluations AP plans were judged of equal or better quality with respect to MP plans. The reductions of plans variability and overall treatment time suggest the use of Autoplanning as a valuable tool to standardize high plan quality and improve clinic efficiency. Owing to dosimetric and clinical advantages, Autoplanning engine is an effective device enabling the generation of VMAT high quality treatment plans according to institutional specific planning protocols.
Future studies are needed to expand Autoplanning to other treatment techniques such as extracranial stereotactic radiotherapy.