Prediction of violent reoffending in prisoners and individuals on probation: a Dutch validation study (OxRec)

Scalable and transparent methods for risk assessment are increasingly required in criminal justice to inform decisions about sentencing, release, parole, and probation. However, few such approaches exist and their validation in external settings is typically lacking. A total national sample of all offenders (9072 released from prisoners and 6329 individuals on probation) from 2011–2012 in the Netherlands were followed up for violent and any reoffending over 2 years. The sample was mostly male (n = 574 [6%] were female prisoners and n = 784 [12%] were female probationers), and median ages were 30 in the prison sample and 34 in those on probation. Predictors for a scalable risk assessment tool (OxRec) were extracted from a routinely collected dataset used by criminal justice agencies, and outcomes from official criminal registers. OxRec’s predictive performance in terms of discrimination and calibration was tested. Reoffending rates in the Dutch prisoner cohort were 16% for 2-year violent reoffending and 44% for 2-year any reoffending, with lower rates in the probation sample. Discrimination as measured by the c-index was moderate, at 0.68 (95% CI: 0.66–0.70) for 2-year violent reoffending in prisoners and between 0.65 and 0.68 for other outcomes and the probation sample. The model required recalibration, after which calibration performance was adequate (e.g. calibration in the large was 1.0 for all scenarios). A recalibrated model for OxRec can be used in the Netherlands for individuals released from prison and individuals on probation to stratify their risk of future violent and any reoffending. The approach that we outline can be considered for external validations of criminal justice and clinical risk models.


Variable
Sweden Netherlands Sex Assigned at birth Same definition Age Age at release from prison Same definition Immigrant First or second generation immigrants (self or either parent born outside of Sweden) Not included

Length of incarceration
Duration of incarceration for most recent offence Same definition. Non-prisoners were assigned to the lowest category (<6 months) Violent index offence Most recent offence was homicide, assault, robbery, arson, any sexual offence (rape, sexual coercion, child molestation, indecent exposure, or sexual harassment), illegal threats, or intimidation.
Most recent offence was a violent offence, including sexual offences and robbery

Previous violent crime
Any conviction for a violent offence previous to most recent offence (i.e. Before index offence).
Using any of "heroin, cocaine, opiates or speed", "at least three times per week" Any mental disorder Diagnosis of any mental disorder excluding substance use disorders (lifetime: before or during incarceration).
Those classified as having "psychiatric problems over a long period". Any severe mental disorder ICD diagnosis of schizophrenia-spectrum or bipolar disorder (lifetime: before or during incarceration).

Not included
Note: Items in the Netherlands were extracted from the RISC database

Validation of OxRec (violent reoffending on release from prison)statistical analysis plan Background and Study Summary
The OxRec (Oxford Risk of Recidivism) tool for predicting violent reoffending on release from prison was published in 2016, based on data from a Swedish cohort (Fazel et al., 2016). It is a Cox proportional hazard model based on a 14-item panel of risk factors, encompassing sociodemographic, criminal history and clinical variables.
A web calculator has also been published (https://oxrisk.com/oxrec). Upon entry of the 14 risk factors, this calculator estimates the probability of re-offending, expressed as a percentage, within 12 months and within 24 months. A categorical classification (risk level) of 'low' (probability < 10%), 'medium' (10-50%) or high (> 50%) based on the 24-month predictive probability is also assigned. Certain risk factors are permitted to be missing ('unknown'). In this case, the calculator returns a range of predictive probabilities corresponding to the maximum and minimum that would be possible to obtain had the risk factor been available. If this range spans more than one risk level, all indicated risk levels are displayed.
The objective of the current study is to assess the predictive performance of OxRec using data from an external cohort.
Data are available from prison and probation reports from the Netherlands, from 2012, with two-year follow-up. Participants are individuals undergoing either (i) prison or (ii) probationary (non-prison) assessment at time of release from prison.
Outcomes are (i) Violent reoffending within 12 months of release; (ii) Violent reoffending within 24 months of release.

Data
For the validation study, data are available from approximately equal numbers of prisoners (792) and (probationary) non-prisoners (798) (cf. a cohort of 47,326 prisoners in the Swedish development dataset).

Sample size
Preliminary data suggest a total sample size of 792 prisoners, with 88 (11%) violent crime events within 12 months and 154 (19%) violent crime events within 24 months. In addition, data are available from 798 non-prisoners (approximately 133 events) (see section 'Cohort definition' below). These figures appear comparable to estimated probabilities of reoffending of 11% and 18% within 1 year and 2 years respectively in the Swedish development dataset.
A simulation study from 2005 recommended ballpark minimum figures of 100 events for validating a clinical prediction rule derived using a logistic regression model (Vergouwe et al., 2005). These conclusions were broadly supported by another, more recent, simulation study (Collins et al., 2016b). In practice, the sample size of the current study is limited by data availability, and in particular by the number of recurrent violent crime events within the maximum two-year follow-up period. As the objectives of the study include both validation and possible updating of the tool for the Dutch cohort if validation is found to be poor, no formal sample size calculation is performed.

Diagnostic performance measures
The analysis will evaluate the calibration and discrimination of the model (Collins et al., 2016a). Calibration is a measure of how close the predicted outcomes are to the observed outcomes. This measure is usually evaluated graphically through calibration plots which compare the relationship observer-predicted outcomes to a diagonal line. Discrimination is a measure of how well the model distinguishes between individuals that committed a criminal re-offense and those who did not. The most common measure of discrimination is the c-index which can be adapted to survival models (Pencina et al., 2012). A list of the specific measures employed in this analysis is presented below (see section 'Presentation of results').

Strategy for analysis
Our approach to assessing the diagnostic performance of OxRec in the Dutch cohort will follow an incremental strategy that has been suggested previously (Steyerberg, 2009;Su et al., 2016). Details relating to the validation of survival models are also available (van Houwelingen, 2000). We will begin by assessing individual-level diagnostic performance with simple validation of the existing model in mind, without any adjustment of the model coefficients for the new population (Royston and Altman, 2013). If diagnostic performance is poor, we will consider a series of conservative updating steps to investigate the feasibility of adapting the prediction model to the Dutch population.
Specifically, we will carry out the following steps, proceeding in each case to the subsequent stage only if diagnostic performance is insufficient. For example, if the first (simple validation) stage indicates adequate performance, the study will be treated solely as a validation study and no additional updating will be performed. We anticipate making the assessment of diagnostic performance between steps (i) to (iv) below based on statistical significance (retaining the simplest version model for which the subsequent step provides no statistically significant improvement) and the final assessment of diagnostic performance based on clinical utility -in particular in relation to the equivalent measures reported in the model development paper (Fazel et al., 2016).
(i) Simple validation: assess the diagnostic performance of the existing OxRec model directly using the Dutch cohort data. This means applying the existing model directly, without adjusting model coefficients and without adjusting the estimate of baseline risk. Calibration will be assessed both 'in-the-large' (i.e. to check whether the model adequately estimates the prevalence of events across the whole cohort) and at individual level using calibration plots and the estimated calibration slope (Steyerberg, 2009). Discrimination will be assessed using summary statistics similar to those reported in the original study, including the c-index.
(ii) Updating the baseline risk (analogous to the intercept), holding coefficients of predictor variables at their values from the current model. This may also be seen as adjusting for any systematic risk difference that may be attributable to missing covariates (see subsection below).
(iii) Updating the baseline risk and performing a simple re-calibration of the coefficients of the predictors. This allows for the inclusion of a single re-calibration parameter, i.e. the effect of predictors on the outcome would be assumed to be the same as in the original study, apart from rescaling.
(iv) Additionally re-estimating a subset of the model coefficients. This will be performed only as a last resort as it is likely to lead to a substantively different prediction tool and would allow the effect of risk factors on the outcome to differ from those found in the original study. If possible, re-estimation will be restricted to a subset of parameters whose coefficients appear to be poorly calibrated, as the current study is unlikely to have a high enough event rate to reestimate all parameters and is not intended to be a model development study.

Cohort definition
The Dutch cohort contains approximately equal numbers of prisoners and probationer nonprisoners. The original study was intended to be used upon release from prison; therefore primary validation will be performed and reported based on the Dutch prisoner cohort. Brief assessment will also be made in the non-prisoner cohort to test transportability into a group with qualitatively different characteristics.

Missing predictors and missing data
We anticipate that most predictor variables, and in particular the stronger predictors of reoffending, will be complete or substantially complete in the Dutch cohort. The variable 'severe mental illness' is likely to be either entirely missing or to be based on a significantly different definition from the one used in the original study. We will investigate the impact of imputing this low-prevalence variable in two ways: firstly, by setting it to zero in all participants (a form of mean imputation), and secondly, if required, using multiple imputation assuming the association with other variables is similar to that observed in the Swedish cohort.
The 'immigrant' variable is required to be omitted from the tool for use in the Dutch cohort. This variable will therefore be treated as if missing for the purpose of validation. We plan to assign an average value to this variable for all participants, equivalent to its effect being incorporated into the model intercept.
Multiple imputation within the Dutch cohort alone will also be considered if there are other variables with non-negligible quantities of missing values (more than 20%). It is not possible to use multiple imputation, based on the Dutch data by itself, for variables that are entirely missing.
It is acknowledged that using multiple imputation is unlikely to reflect the use of the OxRec tool in practice, as the web calculator requires missing variables to be either arbitrarily set at a particular assumed value or (in some cases) to be retained as missing, in which case a range of prediction is presented. For this reason we will additionally consider a worst-case/best-case scenario analysis using the upper and lower limit of the probability estimate generated by the risk calculator. Further information on dealing with missing predictors in validation studies has been previously published (Held et al., 2016)

New predictors
The study will use the 14 predictor variables that make up the OxRec risk tool. No new predictors are available. It is acknowledged that some variable definitions may vary slightly between settings. In this case the variables have been chosen that are thought to match most closely with those used in the OxRec tool.

Censoring
In the original study, all individuals in the development cohort were used for model fitting, excluded individuals who were censored before the end-point without having had the outcome of interest were excluded for the purpose of summarising model performance. If the censoring in the validation sample is substantial, we will compare this approach to the Chambless/Diao estimator described in a recent methodological review on this subject (Kamarudin et al., 2017).

Predicted probabilities and risk categories
The current version of the web calculator presents both probabilistic results and two-year risk expressed as a risk category ('low', 'medium' or 'high'). In the validation study results will be primarily presented based on probabilities, as this is likely to be the more sensitive measure for detecting patterns of miscalibration. However, predictive performance of the two-year risk category will also be assessed using contingency tables and summary statistics.

Presentation of results
Assessment will be presented using both calibration and discrimination, graphically and using tables and summary statistics as appropriate. These will include calibration plots together with estimated calibration intercept and slope; sensitivity and specificity; Brier score; c-index.
The report will include descriptive information about the cohort and how it was defined, including any differences from the original cohort in definitions of variables. As far as possible we will follow published guidance on reporting validation study results (Bouwmeester et al., 2012;Collins et al., 2014;TRIPOD Group).