Introduction

Observational studies allow physicians to answer epidemiologic questions and to study the real-world effectiveness and generalizability of interventions. They are also well suited for measurement of uncommon or long-term outcomes. Administrative data are defined as data that are routinely collected for a purpose other than a specific research question.1 Canada’s universal health-care system provides complete administrative data for the population, and is ideally suited for observational studies.

A high-quality administrative data study requires a valid population or outcome. Selected data elements may or may not accurately identify a diagnosis or outcome.1 Misclassification of patients can bias the study results and hamper the interpretation of the study conclusions.2

Traumatic spinal cord injury (TSCI) patients are well suited for study with administrative data. TSCIs are rare, making prospective cohort studies challenging. Important secondary health complications occur over a patient’s lifetime, and therefore prospective follow-up, or accurate retrospective studies are difficult and costly. Previous validation studies in this population have not included a ‘true negative’ population of patients to properly assess the specificity, sensitivity and negative predictive value of a data element for the identification of TSCI patients.3, 4, 5 No previous studies have examined the validation of the National Rehabilitation System (NRS) or physician billing data.

The primary objective of this study was to evaluate the diagnostic characteristics of data elements for identification of TSCI patients. The secondary objectives were to assess the agreement among lesion levels, and to describe the population of patients currently available in the administrative data of Ontario, Canada.

Materials and methods

We conducted a population-based retrospective validation study, using medical records as the ‘gold standard’.2 The administrative data originated from Ontario, Canada (population 13 million) and were made available for evaluation at the Institute for Clinical Evaluative Sciences (ICES). Ontario has universal healthcare provided through a single government payer. For this study, a TSCI was defined as an acute injury to the spinal cord or cauda equina caused by an external physical force resulting in neurological impairment.4 Data analysis was carried out at ICES. Study reporting and methods conform to recommendations for administrative data validation study reporting.1 Ethics approval was granted by Western University.

Data sources and quality

Reference standard

We created a validation cohort using individual patient medical records. All patients seen at the tertiary spinal cord clinic serving Southwestern Ontario (catchment 1.6 million people) after 1 January 2002 were reviewed. Patients were identified as true positive patients (inclusion criteria: TSCI occurring in Ontario between 1 April 2002 and 31 January 2012 and age >18 years at injury date) or true negative patients (inclusion criteria: non-TSCI or associated neurologic condition such as multiple sclerosis, transverse myelitis, spina bifida, or Gullian Barre syndrome, age >18 years as of 1 April 2002) based on the hospital chart review. These patients were chosen as true negative patients, as they all had neurologic conditions that could result in admission to hospital or rehabilitation, and could be miscoded as a TSCI.

There were a total of 1203 consecutive patients seen by one of four physiatrists, and 462 patients met our inclusion criteria. The most common reasons for patient exclusion were the patient did not have a TSCI (for the true positive group) or a related condition (for the true negative group) (427), their injury date was before 1 April 2002 (253), or their age at injury was <18 years of age (53). Prespecified data elements (name, date of birth, Ontario Health Insurance Plan (OHIP) number, gender, diagnosis, and if applicable date and level of spinal cord injury (SCI)) were extracted from the medical records (by Dr Welk) using a template; he was blinded to the administrative data coding. Lesion level was determined based on the assessment made at the time of rehabilitation admission by the treating specialist. The accuracy of the SCI lesion level extraction in this study was verified using data from the Rick Hansen SCI Registry (RHSCIR),6 which has prospective data capture of TSCI patients.5 Of the 203 patients with an SCI that we identified, 56 were matched to their RHSCIR data record. Lesion level agreement was excellent, with an intraclass correlation coefficient of 0.90, and <6% of patients had disagreement about a cervical versus thoracic segment lesion.

Administrative data

Since 2002, all rehabilitation centers in Ontario have been required to submit standardized data to the NRS. All patients are assigned a Rehabilitation Client Group (RCG) code, specifying the diagnosis responsible for admission. All hospitals in Canada are required to submit hospitalization details to the Canadian Institutes for Health Information Discharge Abstract Database (CIHI-DAD). The data quality of both the NRS and CIHI-DAD is maintained through variable verification, and audit-feedback; reabstraction studies have shown >80% agreement with coding elements.7 Physician billing occurs through the OHIP, and surgical fee claims are expected to have a high sensitivity and positive predictive value (PPV), as shown with other service payments.8

Data analysis

Individuals within the data sources were linked deterministically through unique, encrypted numbers. Data elements from the following sources were evaluated for their diagnostic properties:

  1. 1

    NRS (RCG codes 04.21x, 04.2xx or 14.1/14.3) and CIHI-DAD (up to 25 ICD10 diagnosis codes may be coded per patient; based on the previous methodology5, 9 S14.1/S14.1, S24.0/S24.1, S34.0/S34.1/S34.3 and T06.0/T06.1 were used). For patients from our chart abstraction true positive population, we determined if any of these codes were present within 60 days or 180 days of the SCI date as determined from the chart abstraction data. The two different outcome periods (60 versus 180 days) required us to define two TSCI cohorts based on the availability of NRS and CIHI-DAD records up to 31 March 2012: TSCI between 1 April 2002 and 31 January 2012 (for a minimum of 60 days of follow-up data), or TSCI between 1 April 2002 and 30 September 2011 (for a minimum of 180 days of follow-up data). For patients from our chart abstraction true negative population, we determined the presence of these codes anytime within a single observation window (1 April 2002–31 March 2012).

  2. 2

    OHIP: fee code E383A (introduced in October 2005 as an additional payment for surgical decompression/stabilization of TSCI patients within 6 weeks of injury). A similar process as described for the NRS was used. The observation window was 1 October 2005–31 January 2013 (a longer observation window was possible due to the availability of more recent OHIP records).

Using a diagnostic test framework, we determined sensitivity, specificity and positive/negative predictive values. Our reference standard was the chart abstraction data, and our primary outcome was the presence of an incident TSCI (binary variable).

Demographic characteristics were summarized for patients with traumatic SCI who were identified using the best performing algorithm. Socioeconomic status is a quintile rank based on the median neighborhood income. Rural residence is measured based on the population of a patient’s place of residence and the availability of local health services.10 Ontario drug benefit database contains the prescriptions for all patients in Ontario who are on disability or social assistance, or are over the age of 65 years.11

Statistical methods

Given our chart abstraction patient cohort with >200 patients in each group, and using an alpha of 0.05, we have 80% power to detect a diagnostic sensitivity of 90% (with a lower 95% confidence interval (CI) of 80%).12 Descriptive statistics (median, interquartile range (IQR)) are provided. Standard diagnostic test formulas were used.13 The Wilson score method was used to calculate 95% CIs.14 Unweighted Kappa was calculated to measure the observed agreement between the SCI lesion level as determined in our chart review cohort and the administrative data sources.15 Statistical analysis was performed with SAS 9.2 (SAS Institute Inc, Cary, NC, USA).

Results

A total of 462 patients with a known diagnosis (221 with a TSCI and 241 without a TSCI) were successfully matched to administrative data records from the NRS, CIHI-DAD and OHIP databases. General characteristics of these patients are reported in Table 1. Each year of the observation period (2002–2012) contributed a median of 20 (IQR 18–25) TSCI patients. The date of the TSCI as extracted from the medical records compared with administrative data sources was very accurate (CIHI-DAD hospital admission date had a median difference of 0 days, 95% interval −2 to 16 days difference; NRS injury date had a median difference of 0 days, 95% interval −9 to 3 days difference).

Table 1 Characteristics of the patient cohort of 462 patients (from medical record review)

The diagnostic characteristics of the data elements in each of the administrative data records were determined (Table 2). All data sources were highly specific (94%), indicating a strong tendency for these data sources to correctly identify patients who do not have TSCI. Sensitivity was variable (64–94%); among the three data sources, the NRS RCG codes performed very well, with a sensitivity of 92% (indicating a good ability to correctly identify patients who did have TSCI) and a Kappa (as a measure of agreement) of 0.89. There was a negligible difference among the diagnostic test results when comparing the 60- and 180-day windows. Several combinations of these data elements were also investigated to see whether sensitivity and specificity could be further optimized (Table 2). Among all possible combinations of data elements, the optimal algorithm for specificity was a patient with both NRS and CIHI-DAD data elements present (which had a specificity of 99% (95% CI 97–100%)), and the optimal algorithm for sensitivity was NRS or CIHI-DAD data elements (which had a sensitivity of 94% (95% CI 90–97%)).

Table 2 Diagnostic performance of data elements from NRS, CIHI-DAD and OHIP databases for the identification of incident TSCI patients (percentage, 95% CI)

The RCG and ICD10 codes also specify the TSCI lesion level. The accuracy of these data elements compared with the true lesion level (based on the chart abstraction data as the reference standard) was determined (Table 3). There was generally moderate to substantial agreement15 for CIHI-DAD records (kappa=0.56–0.70), and substantial to almost perfect agreement with NRS records (kappa=0.65–0.88).

Table 3 Diagnostic performance of the individual coding elements for lesion level from the NRS and CIHI-DAD (percentage, 95% CI)

The optimal method to identify incident TSCI patients was through the NRS: a total of 1801 TSCI patients were identified between 1 April 2002 and 30 March 2012 in Ontario, Canada (Table 4). Each year contributed a median of 189 (IQR 186–195) patients. The optimal definition to identify prevalent TSCI patients was the presence of either an NRS or a CIHI-DAD code; 3427 patients were identified (all patients then have a common 1-year observation window from 1 April 2011 to 30 March 2012, Table 4).

Table 4 Characteristics of patient cohorts based on Ontario’s administrative data

Discussion

One of the potential limitations of administrative data is the unknown accuracy of the data elements. This can, however, be mitigated with an appropriate validation study to examine how well a data element identifies the variable it is assumed to represent.

We examined three different data sources for their ability to correctly identify TSCI patients. The NRS (which uses RCG diagnosis codes) is a Canada-wide rehabilitation database; in this study, it had excellent diagnostic test characteristics, with a high sensitivity (92%) and specificity (97%). The PPV of 96% means that of the TSCI patients who had related NRS RCG codes, 96% were truly TSCI patients. CIHI-DAD ICD10 diagnosis codes had excellent specificity (99%), however, the sensitivity was lower (76%). This is similar to the results from ICD10 validation studies of TSCI patients done in British Columbia, Canada5 and Norway.4 One reason for this lower sensitivity compared with the NRS is the diagnosis responsible for admission to rehabilitation is likely clearly identified, whereas during a hospital admission a patient may not have complete and accurate documentation of all their injuries. Data coders are also probably more familiar with the limited list of 17 RCG clusters used in the NRS as compared with the 12 000 ICD10 codes. OHIP records had poor sensitivity, which is probably because not all patients with a TSCI would be managed operatively.

The ability of the NRS and CIHI-DAD to identify basic information about lesion level was generally good, with specificities of 93%. The NRS and CIHI-DAD records are able to identify patients with a specific lesion level, but sensitivity is limited (50–94%). These results are similar to those published in a validation study that compared the CIHI-DAD lesion level data with the RHSCIR.5 Only the RCG codes for Paraplegia (suggesting a thoracic or lumbar SCI injury) had a low proportion of false negative patients (6%).

There are only a few validation studies available for comparison. Furlan and Fehlings3 found that the ICD10 codes from the Canadian National Trauma Registry were inaccurate in identifying 92 TSCI patients (sensitivity 81%, specificity 7% and PPV 30%). Small, informal validation studies also suggested that there was significant misclassification of patients based on hospital discharge ICD9 TSCI codes.16, 17 Hagen et al.4 found that a combination of TSCI ICD10 codes performed better than previous ICD iterations, with a sensitivity of 84%, a specificity of 97% and a PPV of 88%. Finally, Noonan et al.5 compared the accuracy of level specific information from hospital discharge ICD10 codes with the prospectively collected lesion level in the RHSCIR. The modest sensitivity and high specificity that was demonstrated in their study is very similar to our CIHI-DAD results (Table 3).

Strengths of this validation study include the fact that we used an adequately powered sample to create our validation cohort, and included patients who could act as reference standard true negatives to determine specificity and negative predictive value. We manually reviewed medical records to ensure a correct diagnosis was assigned, and verified the accuracy of this chart abstraction using the RHSCIR.

There are also important limitations that should be acknowledged. This validation study only included patients who survived the initial period between TSCI and their first outpatient follow-up with their physiatrist. Therefore, we do not know whether the administrative data records can accurately measure all TSCIs (such as those who die at the time of the accident, or before hospital discharge). The NRS database would not, under any circumstances, be able to identify patients who died before rehabilitation. This is an important consideration when assessing the incidence of TSCI, or if death is a study outcome. We extrapolated a TSCI level range from the RCG codes, which codes lesion levels together as high/low quadriplegia or paraplegia. The prevalence of TSCI among our data sources is lower than our validation cohort. TSCI is a rare event in all of the data sources, and because of this, the PPV (which is dependent on prevalence) is likely overestimated.18 Finally, this study was conducted using patients from Southwestern Ontario. The generalizability of this population’s administrative data coding may not extend to other jurisdictions. However, given that our results are generally similar to other regions in Canada,5 and Europe,4 we feel our ICD10 CIHI-DAD results are broadly applicable.

Conclusion

The RCG codes in the NRS have high sensitivity (92%) and specificity (97%) for the identification of true TSCI patients, whereas CIHI-DAD ICD10 codes are highly specific (99%) and moderately sensitive (76%). The RCG and the ICD10 codes have a reasonable ability to discriminate between lesion level, however, there is a moderate amount of misclassification. There is a large population of TSCI patients identified within the administrative data holdings of Ontario. Hopefully future data linkage with prospective data sources, such as the RHSCIR, will expand the ability of administrative data to answer important health services and outcome-based research questions.

Data archiving

There were no data to deposit.