The ProMotion LMU dataset, prostate intra-fraction motion recorded by transperineal ultrasound

Intra-fraction motion of the prostate was recorded during 721 fractions of image guided radiotherapy (IGRT) in 28 patients, 14 of which were treated by intensity modulated radiation therapy (IMRT), and 14 of which were treated by volumetric arc therapy (VMAT). The prostate was imaged by three-dimensional and time-resolved transperineal ultrasound (4D-US) of type Clarity by Elekta, Stockholm, Sweden. The prostate volume was registered and the prostate position (center of volume) was recorded at a frequency of 1.6 samples per second. This raw data set contains a total of 380.199 prostate and patient couch positions over a time span of 53 hours, 47 minutes and 29 seconds of life radiotherapy as exported by the instrument software. This data set has been used for the validation of models of prostate intra-fraction motion and for the estimation of the dosimetric impact of actual intra-fraction motion on treatment quality and side effects. We hope that this data set may be reused by other groups for similar purposes.


Background & Summary
Image guided radiotherapy (IGRT) employs various imaging modalities such as computed tomography (CT), cone beam CT (CBCT), stereoscopic x-ray imaging, or ultrasound to locate the target volume and surrounding organs at risk. In particular, the location of the tumour can be used to correct for positioning errors between fractions (inter-fraction) or even in real-time during treatment (intra-fraction).
In our study, we used time-resolved 3D ultrasound (4D-US) to monitor intra-fraction motion of the prostate during primary radiotherapy of adenocarcinoma. We used a trans-perineal robotic probe (Elekta Clarity) which remained fixed to the patient couch and automatically scanned and recorded the prostate position during treatment fractions.
The data we acquired was used to validate the random walk model of intra-fraction motion 1 , investigate a potential impact of patient couch shifts on intra-fraction motion 2 , to estimate the impact of ultrasound probe pressure on intra-fraction drift of the prostate 3 , to show that a shorter treatment time reduces the severity of intra-fraction motion 4 and to estimate the dosimetric impact of intra-fraction motion on boosts on intra-prostatic lesions 5 .
The dataset described here corresponds to the final study data of 28 patients. We hope that it can be reused by other groups for similar purposes.

Methods
The study is based on 28 patients with adenocarcinoma of the prostate who received a definitive external beam radiotherapy between June 2014 and March 2017 at our department. The first 14 patients of these were treated with IMRT (before May 2015). The latter 14 patients, starting afterwards, received VMAT. The set up remained otherwise unchanged.
The 14 patients in the IMRT group received 35 to 38 fractions of 2.0 Gy each, or 28 fractions of 2.2 Gy plus 10 fractions of 2.0 Gy each. Of the 519 fractions delivered, 358 (69%) were recorded with 4DUS. The decision to record a full fraction (in addition to mandatory daily initial patient setup control by both kV-CBCT and 3DUS) www.nature.com/scientificdata www.nature.com/scientificdata/ with 4DUS was made by technical personnel based on daily clinical workload. Similarly, the 14 patients in the VMAT group received 36 to 38 fractions of 2.0 Gy each. Of the 522 fractions delivered, 363 (70%) were recorded with 4DUS.
Patients were positioned pre-fraction on a 3-DOF robotic couch, matching daily kV-CBCTs to the planning CT. Shifts in vertical, longitudinal and lateral direction were then corrected by automatic repositioning of the couch. Rotational shifts were not corrected. The new position of the prostate was then used as the reference position. This procedure was repeated before each fraction.
Intra-fraction motion of the prostate was then monitored and recorded by robotic trans-perineal 4DUS using the Clarity system by Elekta, Stockholm, Sweden with an auto-scan probe 6,7 . Patients were placed in supine treatment position, knees on elevated cushions, legs moderately spread. There were no catheters, rectal balloons, spacers or other devices in use to affect intra-fraction motion. The ultrasound probe was fixed to the treatment table and made gel-mediated contact with the perineum at intermediate pressure 3 .
In the IMRT group, an average of 25.5 fractions were recorded per patient (median 27, range 12 to 34). In the VMAT group, an average of 25.9 fractions were recorded per patient (median 27, range 14 to 34). A 15 th IMRT patient was excluded from this analysis because he did not complete the course of treatment and had only a single fraction recorded. No other recorded fractions, however, were discarded and there is no indication that the recorded fractions were not representative of an average fraction. The decision to switch the treatment regime from IMRT to VMAT in 2015 was made purely for clinical reasons and blind to the results of this study. While this is a retrospective study, the decision was made to evaluate as soon as an equal number of patients had been recorded with VMAT as with IMRT.
A total of 53 hours, 47 minutes and 29 seconds of intra-fraction motion were recorded.
Human subjects. The study did not involve any experiments on human subjects. All data was generated retrospectively from quality control data acquired in a non-invasive and dose-free fashion during standard treatment independent of this study. Bavarian legislation expressively permits the use of such data for scientific research, cf. Art. 27 (4) of Bayerisches Krankenhausgesetz (BayKrG). All patients included in this study gave written informed consent that their quality control data would be reused for research purposes and would be made available to third parties. Furthermore, the data is completely anonymized and does not contain any identifiable features.

Data Records
The complete raw data is stored in a public open access repository 8 at Open Data LMU. The data is stored in two archive files in .zip and .tar format, respectively, with identical content. The content is organized in 721 separate files in comma separated values (CSV) format. The files are named 'patient_[pp]_frac-tion_ [nn].csv' where [pp] counts the patients, starting with '01' , and [nn] counts the fractions of each patient, starting over with '01' for each patient. Table 1 gives an overview of the available data. For example, the data corresponding to the 24 fractions recorded for the first patient is contained in 'patient_01_fraction_01.csv' through 'patient_01_fraction_24.csv' .
Each CSV files holds a large number of rows, each corresponding to one recorded data point in time, at a sample frequency of about 1.6 Hz. Table 2 gives an overview of the columns the data is organized into: Iso8601Time is a time stamp in in ISO 8601 format (YYYY-MM-DDThh:mm:ss.sss). Its absolute value is not meaningful, as the workstations internal clock may or may not have been correctly set at all times (in particular, regional daylight saving settings). However, the time stamps are essential in calculating relative durations. It is useful to define the begin of treatment as arbitrary zero.
SecondsFromMidnight is a time stamp in seconds (and milliseconds in the decimal places). As before, it is useful to define durations and one should choose an arbitrary zero.
XShift denotes the recorded position of the prostate on the longitudinal axis in units of mm. As the patient is lying on the treatment couch, this axis is horizontal in the laboratory frame of reference and points away from the gantry. Increasing values describe a motion in caudal direction, away from the gantry. Decreasing values describe a motion in cranial direction, towards the gantry. The absolute value of this quantity is not meaningful, one should define a suitable zero.
YShift denotes the recorded position of the prostate on the lateral axis in units of mm. As the patient is lying on the treatment couch, this axis is horizontal in the laboratory frame of reference and parallel to the gantry. Increasing values describe a motion towards the left side of the patient. Decreasing values describe a motion in towards the right side of the patient.
ZShift denotes the recorded position of the prostate on the vertical axis in units of mm. As the patient is lying on the treatment couch, this axis is also vertical in the laboratory frame of reference and points up. Increasing values describe a motion in anterior direction, or upwards. Decreasing values describe a motion in posterior direction, or downwards. The absolute value of this quantity is not meaningful; one should define a suitable zero.
CouchRelativeX, CouchRelativeY, CouchRelativeZ describe the absolute position of the patient couch with respect to the laboratory frame of reference, again on the longitudinal, lateral, and vertical axis and with the same orientations as before.
XShift, YShift, and ZShift decribe the position of the prostate relative to the ultrasound probe, which is fixed to the patient couch.
Thus, if one is interested in the physiological motion of the prostate, one should simply consider XShift, YShift, and ZShift. However, if one is interested in the absolute motion of the prostate, e.g. relative to the treatment beam, one should consider XShift + CouchRelativeX, Yshift + CouchRelativeY, and ZShift + CouchRelativeZ, respectively.

Technical Validation
The spatial resolution of the ultrasound system is specified by the manufacturer to about 0.2 mm 7 . The overall geometric inaccuracy of a very similar setup due to inherent technical limitations was measured to be 0.6 mm laterally, 0.7 mm vertically, 0.5 mm longitudinally, and 1.1 mm radially ('vector length' or Euclidean '3D-distance'; the square root of the sum of squares of the three axes) consisting of random errors (per single measurement point) and systematic errors (effectively, per fraction) 10 . The temporal resolution of the device is specified to about 2 Hz 7 ; data was in fact recorded at 1.6 Hz on average.  Table 1. Summary of input data and corresponding data file names.  www.nature.com/scientificdata www.nature.com/scientificdata/ The particular setup used in this study has been characterised before in detail 11 . The discrepancy between ultrasound localisation and implanted gold markers detected by CBCT was 0.0 ± 1.7 mm laterally, 0.2 ± 2.0 mm longitudinally, and 0.3 ± 1.7 mm vertically. Using implanted gold markers as a reference, systematic errors for ultrasound localisation were 1.2 mm, 1.1 mm, and 0.9 mm; and random errors were 1.4 mm, 1.8 mm, and 1.6 mm, on lateral, longitudinal, and vertical axes, respectively. The majority of these errors stems from inter-modality comparisons; within the modality accuracy and repeatability were generally sub-millimeter. The setup was routinely gauged during weekly QA. The motion management system of the Clarity system was used to shut off the beam whenever the prostate position exceeded a certain threshold per axis. In such cases, the table position was manually corrected and the prostate position checked before treatment was resumed. However, this motion of the table did not produce any excessive acceleration that could have caused prostate motion of its own. In particular, we checked 2 that the table motion was not visible in the prostate motion data as the prostate position was recorded relative to the table and not in absolute room coordinates. Therefore, our simulation resembles a situation before or without active prostate motion management and correction.

Usage Notes
In our analysis, we first visually inspected the prostate trajectories one by one. The data features end-of-fraction outliers, e.g. caused by patients leaving their position after treatment is stopped. Such outliers occurred in 95 fractions (13% of fractions). In these cases, on average 3% of the data of the respective fraction was truncated. After all, 53 hours, 33 minutes and 2 seconds (99.6% of recorded data) entered evaluation in our own papers.
We opted to leave the full original raw data in the deposited dataset, including outliers. However, to facilitate the preprocessing step of clipping the outliers, please refer to the Supplementary Table S1. For each fraction, it lists the beginning and end of the original recording (including outliers) and our suggested clipped durations.
It is further useful to resample the data to reduce high frequency noise and to equalize the time intervals (readouts do not occur perfectly equitemporal). In our analysis, we chose bins of five-second intervals.
The data is publicly available from Open Data LMU under CC BY 4.0 license. There are no access controls in place. Use of the data is not limited.

Code availability
No custom code was used in the generation or processing of datasets. All data is provided as ASCII text in CSV format and can be processed without custom code.
An optional interactive Excel worksheet for convenient browsing, importing and resampling of the data is available upon request from the corresponding author.

author contributions
Hendrik Ballhausen processed the dataset and wrote the data descriptor. Minglun Li was responsible for the collection of the data. Claus Belka designed the study and provided clinical oversight and guidance. All authors read and approved the final version of the data descriptor.