The prediction of suicide in severe mental illness: development and validation of a clinical prediction rule (OxMIS)

Assessment of suicide risk in individuals with severe mental illness is currently inconsistent, and based on clinical decision-making with or without tools developed for other purposes. We aimed to develop and validate a predictive model for suicide using data from linked population-based registers in individuals with severe mental illness. A national cohort of 75,158 Swedish individuals aged 15–65 with a diagnosis of severe mental illness (schizophrenia-spectrum disorders, and bipolar disorder) with 574,018 clinical patient episodes between 2001 and 2008, split into development (58,771 patients, 494 suicides) and external validation (16,387 patients, 139 suicides) samples. A multivariable derivation model was developed to determine the strength of pre-specified routinely collected socio-demographic and clinical risk factors, and then tested in external validation. We measured discrimination and calibration for prediction of suicide at 1 year using specified risk cut-offs. A 17-item clinical risk prediction model for suicide was developed and showed moderately good measures of discrimination (c-index 0.71) and calibration. For risk of suicide at 1 year, using a pre-specified 1% cut-off, sensitivity was 55% (95% confidence interval [CI] 47–63%) and specificity was 75% (95% CI 74–75%). Positive and negative predictive values were 2% and 99%, respectively. The model was used to generate a simple freely available web-based probability-based risk calculator (Oxford Mental Illness and Suicide tool or OxMIS) without categorical cut-offs. A scalable prediction score for suicide in individuals with severe mental illness is feasible. If validated in other samples and linked to effective interventions, using a probability score may assist clinical decision-making.


Statistical Analysis
Statistical analysis will be based on logistic regression, adjusting for risk factors, in each of two scenarios: (i) Violent crime within 12 months, from time of diagnosis (ii) Suicide within 12 months, from time of diagnosis.
Here, 'diagnosis' refers to episodes of psychosis for which a diagnosis was recorded within the period 1st January 2001 to 31st December 2008. The terms 'diagnosis' and 'episode' are used interchangably in this document.
The method of analysis will be the same in both scenarios, so for clarity the explanation below refers to scenario (i).
In each case, the follow-up period represented in the data-set lasts for at least 12 months after diagnosis, and so it is not necessary to consider the possibility of censoring when performing the analysis.

Random selection of diagnoses
Given a starting data-set of all patients and all patient episodes (diagnoses) between 1/1/01 and 31/12/08, patient episodes will be selected at random as follows:  Select at random, with equal probability, one inpatient or outpatient visit per patient.
 Note that some patients have more than one inpatient or outpatient visit on the same day, and some inpatient or outpatient visits comprise more than one diagnosis; only one inpatient or outpatient visit per day will be considered for random selection (if an inpatient visit and an outpatient visit are listed contemporaneously, the inpatient visit will be used).
 Extraneous episodes that are recorded as occurring within the period of an inpatient stay (because of recording errors or otherwise) will be identified in the data-set as having a date of episode that is earlier than lower-numbered episodes for the same individual; these will also be removed before the random selection step is made.
 If the inpatient or outpatient visit comprises a schizophrenia-spectrum diagnosis (with or without a bipolar diagnosis), this is considered as the diagnosis at time of selection; otherwise bipolar is the diagnosis at time of selection.
 For inpatient stays, the assumed date of diagnosis is the date the inpatient stay ends.  The randomly selection of diagnoses defines the final data-set for analysis, i.e. there will be no subsequent resampling of diagnoses.

Adjustment for risk factors
Risk factors will be considered in three groups, of decreasing levels of priority. The 'List of risk factor variables' table in the Appendix specifies the group to which each variable is assigned.
Group 1 consists of variables that will be included in the statistical model regardless of statistical significance. These include demographic characteristics that it is necessary to include to ensure the model has face validity, and other risk factors strongly suspected on the basis of previous research to be associated with the two outcome measures.
Group 2 consists of variables likely to show an association with outcomes but which are not required to be included to achieve face validity. This group includes measures of prescribed treatment, a categorical variable indicating the diagnosis of the individual and proxy measures of disease severity, such as whether the diagnosis occurred while the patient was an inpatient or an outpatient. The model will use a backwards stepwise selection procedure to determine whether to retain these variables in the model, with Group 1 variables always retained and Group 2 variables sequentially rejected in order of p-value until no group 2 variables remain that have p-values greater than 0.1 (Royston and Sauerbrei, 2008).
Group 3 consists of variables for which there is weaker prior evidence to suggest that they will be associated with outcomes. This group includes a deprivation measure, a measure of marital status, an income measure and several variables relating to parental outcomes or characteristics. The model will use a backwards stepwise procedure to determine whether to retain these variables, holding fixed any variables from Groups 1 and 2 that are already included. Group 3 variables will be retained only if they have p-values less than 0.1.
This strategy of risk factor adjustment recognises that the final model must demonstrate face validity, whilst simultaneously allowing the inclusion of additional risk factors if they show an association with outcome variables. The variables are considered in three groups in this way to recognise that a parsimonious model is preferable (i.e. easier to use in practice), provided that it has acceptable predictive ability.
The same set of candidate covariates will be used for both models (barring the few exceptions listed in Appendix 1).

Clustering by family
It is likely that the data-set will exhibit clustering by family: sibling or parent-child relationships for which a diagnosis of major psychosis occurs for more than one member of the same family. If this is the case, the model will include a single random effect term to take into account family membership.

Competing risks
A competing risk model, which might model jointly the probability of two or more events (such as violent crime and death), will not be used for the main analysis for two main reasons. Firstly, the outcome variable will be analysed as binary, and secondly, every individual in the data-set is followed up for at least twelve months after the initial time point. The main concern in failing to allow for competing risks (in this case, death) is inappropriately treating individuals as censored if they experience the competing risk event (Putter, 2007). However, in this study the length of follow means that there are no individuals lost to follow up during the twelve-month period of interest, and so any individual who has experienced the competing risk before the outcome can simply be assigned a zero value for the outcome.

Linearity and additivity assumptions
Smoothed partial residual plots will be used to check the assumption that continuous covariates are linearly associated with the log odds of the outcome. If there is a clear departure from normality, fractional polynomial terms will be included, as appropriate (Royston and Sauerbrei, 2008). Covariates that may have an extremely skewed distribution, such as the number of previous hospital admissions, may be dichotomised if model checking reveals poor fit when the variable is included as a continuous measure; it is not possible to anticipate the extent to which this will be necessary, so this will be a pragmatic decision based on the goodness-of-fit of the model, with the intention of guarding against extreme values of the covariate that may unduly affect patterns of associations or measures of predictive accuracy. Interactions between covariates will not be considered as possible predictors.

Missing data
Covariates that have more than 30% missing data will be excluded (an exception is made for the recent treatment variables, for which there is a particular pattern of missing data -see subsection below). Missing data on covariates with at most 30% missing data will be imputed via multiple imputation (with twenty imputations) using a regression model that uses other risk factors and the outcome variables as explanatory variables (Sterne et al., 2009). Estimates of coefficients in the final prediction rule will be combined across imputations, using standard methodology (Barnard and Rubin, 1999). It is possible that the variable selection procedure would give different sets of variables in different imputations of the data-set; for this reason, the 'RR' method described by Wood et al. (2008) will be used. In this method, summary estimates are computed by combining information across all imputed data-sets at each stage as part of a single variable selection process. The missing indicator method will not be used even though treatment data may reasonably be regarded as missing completely at random during the period 2001-2004. In observational studies, this method has been shown empirically to bias the effect size estimates of other variables in the model even if the missing completely at random assumption holds (Groenwold et al., 2012).

Recent treatment variables
Recent treatment variables are expected to have a high proportion of missing data values because information on treatment within the preceding six months is not available for episodes dated before the beginning of 2006. For this reason, it appears reasonable to regard the missing values for these four variables as missing completely at random; in this scenario, there is evidence that multiple imputation models demonstrate good performance, and are preferable to complete-case analysis, even when the proportion of missing data is high (White and Carlin, 2010;Lee and Carlin, 2012). Therefore multiple imputation will be performed on these variables in the same way as described in the section 'Missing data'. As a sensitivity analysis, the final model (using the same set of covariate models) will be fitted using complete-case data from 2006-2008 only, and any substantial differences in point estimates noted. Standard errors of estimates in this complete-case analysis are expected to be larger owing to the smaller sample size.

Validation and goodness of fit
To test the external validation of the model, a sub-sample of geographical regions (the 'validation sample') (Bleeker et al., 2003), based on the residential geographical location of the individual at the time of diagnosis, will be selected and removed from the data-set used to fit the model. Geographical regions are shown in Appendix 2. Regions are primarily based on the counties of Sweden, derived from the first two digits of the SAMS code.
Exceptions are the municipalities of Gothenburg and Malmö, which are separated from their respective counties, and Stockholm municipality, which is separated from its county and sub-divided into northern and southern parts by identifying each SAMS area with the historical province in which it is located. Regions are allocated to four groups, which are proxy measures of urban/rural status: the four urban areas (Group 1); the three counties in which the urban areas are located (Group 2); four counties with low population (Group 3); and all other counties (Group 4). The external validation sample will be selected by randomly, with equal probability, choosing one region from each of the first three groups, and selecting sequentially from the fourth group until a minimum of 180 violent crime cases in total has been reached (see Appendix 2 for details). This guarantees that the number of cases in the external validation sample will be large enough for a useful assessment of external validity to be made (Vergouwe et al., 2005).
All remaining data (the 'model development sample') will be used in the development of the model, to determine which risk factors are to be included. As one of the objectives is to develop a model that can be used externally (in different geographical settings), the geographical location itself is not considered as a candidate predictor in the model.
Once a 'final' model has been found using the steps outlined above, its internal validation will be assessed using bootstrapping to assess its predictive accuracy (Harrell et al., 1996). The bootstrapping step will use repeated resampling of families (rather than individualssee section 'Clustering by family') in order to preserve within-family correlation. The performance of the model will also be assessed using the external validation sample. Predictive accuracy will be summarised using several summary measures, including the concordance index (Harrell et al., 1982), the Brier score (Brier, 1950) and the net reclassification index (Pencina et al., 2008). The proportions of predicted and observed events at different levels of predicted probability will be compared using a calibration plot.
If it is deemed appropriate to split the predicted probabilities into just two categories ('Low' and 'High'; see below), sensitivity, specificity and positive and negative predictive values will also be reported.

Presentation of findings
The main output of the model will be a predictive probability, indicating the probability of occurrence of the outcome of interest within 12 months. The estimated coefficients of individual risk factors will be examined with a view to: (i) simplifying the prediction rule in order to make it easier to use in practice, for example by using integer-valued coefficients (including possibly resetting Group 1 variables with extremely poor predictive power to have a coefficient of zero) or even by dichotomising numerical variables, provided this does not compromise its predictive accuracy; and (ii) justifying a categorisation of the predicted probability into risk categories (for example 'Low risk' (<5%), 'Medium risk' (5-10%) and 'High risk' (>10%)). The latter would also benefit from an assessment of the number and characteristics of individuals that fall into the proposed categories in the model development sample and the validation sample.

Generalisability
Variables are defined in such a way as to help the generalisability of the model to other settings (see table below). For instance, deprivation will be included in the model in deciles rather than as a continuous variable in order that other deprivation measures that are used in other settings can be included in the same model to obtain predicted probabilities. When the final model is used for prediction, it is possible that some variables that are included as covariates may be missing. We will provide guidelines for using the model for prediction in this scenario, for example by using mean imputation of the missing variable.