Abstract
The use of precision medicine is poised to increase in complex injuries such as traumatic brain injury (TBI), whose multifaceted comorbidities and personal circumstances create significant challenges in the domains of surveillance, management, and environmental mapping. Population-wide health administrative data remains a rather unexplored, but accessible data source for identifying clinical associations and environmental patterns that could lead to a better understanding of TBIs. However, the amount of data structured and coded by the International Classification of Disease poses a challenge to its successful interpretation. The emerging field of data mining can be instrumental in helping to meet the daunting challenges faced by the TBI community. The report outlines novel areas for data mining relevant to TBI, and offers insight into how the above approach can be applied to solve pressing healthcare problems. Future work should focus on confirmatory analyses, which subsequently can guide precision medicine and preventive frameworks.
Similar content being viewed by others
Introduction
With the enormous progress in the consolidation of large clinical datasets and national registries in modern healthcare1,2, vast amounts of personal, clinical, and environmental data are increasingly becoming available for research3. This presents an opportunity to identify novel associations and complex patterns of patient morbidity, personal circumstances, treatment seeking behaviours, and care over time, promoting scientific advancements in personalized medicine for the most complex disorders and injuries3,4,5.
Traumatic brain injury (TBI), defined as structural and/or physiological disruption of brain function as a result of an external force6, is rapidly becoming a major challenge faced by healthcare systems worldwide7. When internationally reported numbers are extrapolated, it is estimated that 50–60 million individuals are affected each year, and it is predicted that close to 50% of the world’s population will sustain a TBI in their lifetime8. The clinical view of TBI has shifted in the last decade from that of an injury event, to a chronic disorder with lifelong effects on both morbidity and mortality9, which expedites development of new clinical entities (comorbidities) over time, bringing complexity to its management10. Recently, TBI has been recognized as a consequence of multiple comorbid disorders which can potentiate or modify the risks associated with falls or adverse behaviors (e.g., assault and domestic abuse), including but not limited to depressive and substance-use disorders, epilepsy, vascular disease, psychosis, and medication effects11,12,13,14,15,16,17,18,19,20. Adding to this complexity, any single known adverse determinant of health (e.g. advanced age, socioeconomic deprivation, and gender inequality)21 can be implicated in the development of multiple comorbid disorders, thus increasing vulnerability to injury as a result of decreased physical and cognitive reserves22.
The World Health Organization (WHO) identified the prevention of injuries as a priority given the projected 40% increase in global deaths due to injury between 2002 and 203023. Likewise, the United States Congress, through Public Law 110–25224, highlighted injury surveillance as a federal priority given drastic increase in emergency department visits and hospitalizations for TBI over the past decades25. Primary prevention efforts are those designed to prevent the initial injury26. Although several studies describe medical and environmental factors associated with an increased risk of TBI including low socioeconomic status, youngest and oldest age groups, male sex, and place of residence27,28,29,30, these studies are not population-based, and their results may impose a ceiling effect based on the research hypotheses, selected populations with a wide variety of providers and specialists, as well as researcher knowledge and expertise. Based on the fundamental assumption that causal factors of TBI can be identified through the systematic examination of different populations, or of subgroups within populations31, successful prevention of TBI is theoretically possible with comprehensive concurrent evaluation of personal, clinical, and environmental contexts in the period prior to TBI32 in populations and population subgroups.
Here we describe exploratory research utilizing a data mining non-hypothesis driven approach used in genomics33,34 applied to a population which used emergency and acute care resources following TBI, and comparing them to a matched population (individually matched based on age, sex, income level, and place of residence) who used emergency and acute care resources for reasons other than TBI. The focus of this research is not only the data mining methodology, but also the results obtained through data mining approach sequencing more than 70,000 diagnosis codes within the International Statistical Classification of Diseases and Related Health Problems, Tenth Revision (ICD-10) codes within the five years preceding TBI event.
Methods
Data sources
Residents of Ontario, Canada’s largest province, have universal public health insurance covering all medically necessary services. The Institute for Clinical Evaluative Sciences (ICES)35 houses high quality health administrative data on a wide variety of publicly funded services provided to residents, including but not limited to individual-level information on emergency departments (ED) (identified in the National Ambulatory Care Reporting System, NACRS), and acute care visits (identified in the Discharge Abstract Database, DAD), within the province. The NACRS and DAD contain hospital records with diagnoses35, among other personal and environmental data, which are indicated by entries under the ICD-10 Canadian Enhancement (ICD-10-CA)36.
Study design and big data
An observational study was conducted using health administrative data of all patients discharged from the ED or acute care in the province between the fiscal years 2007/08 and 2015/16 with a diagnostic code for TBI37 (ICD-10- CA codes S02.0, S02.1, S02.3, S02.7, S02.8, S02.9, S04.0, S07.1, and S06). Personal, clinical, and environmental data for each patient were stored at the ICES; data collected five years prior to each TBI event was extracted for each patient and used in the analysis. A 10% random sample of patients discharged from the ED or acute care during the same study period for a reason other than TBI, individually matched to TBI patients by age, sex, income level, and place of residence (urban vs. rural), was used as a reference population. The first incident of TBI was chosen as the index date for patients with TBI, whereas, for a reference population the midpoint of the ED or acute care visits was selected. To protect against overfitting and for internal validation, the matched dataset was split into three datasets, i.e., training, validation, and testing, with an allocation of 50%, 25%, and 25%, respectively38.
Statistical approach
An association analysis was conducted using ICD-10-CA codes among every patient with TBI and matched patients from the reference population. All ICD-10-CA codes across the 10 and 25 diagnoses fields of the NACRS and the DAD, respectively, were converted into 2600 binary variables. This was done by using the first 3 characters in the ICD-10 codes. The individual codes are nested in these three-character blocks. This was done for each patient’s visit during the five-year period preceding the first TBI event, with the exception of provisional codes for research and temporary assignments, U98 and U99. Following this, a histogram for the days from index date of hospital visits for all TBI patients was constructed. A peak was observed around the index date with the frequency dropping to a stationary point 30 days before, and after, the index date (Supplementary Figs 1 and 2). The 60-day window, therefore, was determined as a TBI-related window, whereas all ED and acute care visits within five years up to 30 days prior to a TBI event were considered to be the pre-injury phase, and were the focus of this study. A similar procedure was performed for each patient in the reference population sample, with the exception that the midpoint of each patient’s hospital visits was selected as an index date.
The next step involved a matched McNemar test on the training dataset for each of the 2600 ICD-10-CA code variables using multiple testing methods39 to determine differences between the two groups (i.e., TBI and the reference population) within the period of five years preceding a TBI or an index date event. The Benjamini-Yekutieli multiple testing method40 was used to identify a threshold at which results are considered significant given a set of experimental circumstances, and to obtain a set of codes that were significantly overrepresented in TBI patients compared to matched patients from a reference population (i.e., had an odds ratio (OR) greater than one); the False Discovery Rate40 (i.e., an approach not commonly used in public health research, but standard in genomic research)41 was controlled at five percent. This set of codes was then re-tested on the validation dataset following the same procedure42. Codes found to be significant in both the training and validation datasets then had their OR calculated and reported utilizing the testing dataset. It is important to note that the training dataset was twice as large as the validating or the test dataset and consequently, these last two datasets had less power to observe significant effects.
Further, data dimensionality and codes reduction were examined using a factor analysis technique (i.e., principle components methods)43 with the following criteria to determine the number of factors: (i) eigenvalue larger than one; (ii) break-point on the scree plot; (iii) the greatest cumulative proportion of variance accounted for; and (iv) a conditional logistic regression looped through all possible factors covering the largest area under the receiver operating characteristic (ROC) curves using the validation dataset. A code was included in a given factor if its factor loading was greater than or equal to 0.2, with no limitations placed on any code loaded on multiple factors.
The decision regarding how many factors should be retained was supported by a binary form of a factor-based score. Each patient who had any of the ICD-10-CA codes included in the definition of the factor (based on the criteria above) obtained a score of one; otherwise, a score of zero was assigned. These binary factor-based scores were applied to the testing dataset and were used to calculate ORs and 95% confidence intervals from a looped conditional logistic regression model on the association between each factor and TBI.
Finally, to visualize the results, word cloud figures were generated for the frequencies and ORs of the factors, where the size of the words indicated the different magnitudes of these values.
All statistical analyses were conducted using SAS software (version 9.410, SAS Inc., Cary, NC) and R (version 3.4.1.11), R Foundation for Statistical Computing; www.r-project.org). Figures were created using R with the Wordcloud package.
Results
Among the overall Ontario population of between 12.9 and 14 million in 2008 and 201644, respectively, 239,103 unique patients had their first TBI-related visit in either an ED or acute care setting between the fiscal years 2007/08 and 2015/16. Each patient with TBI was matched to a patient from the 10% random sample of patients entering an ED or acute care setting for any reason other than TBI; 4,100 (1.7%) patients were left unmatched and were excluded from analysis; the final cohort consisted of 235,003 patients. This sample was randomly split into training (50%; n = 117,689), validation (25%; n = 58,798), and testing (25%; n = 58,516) datasets. Frequencies, outputs, and measurements were presented on the testing dataset (we refer the reader to the methods section for specifics).
Of the 58,516 patients (and matched reference patients), 57% were males and 62% were 40 years of age or younger when they had their first TBI. In 88% of patients, TBI was cited as the main diagnosis for their ED or acute care visit. Severity of injury was not established in the data files of 64% of the patients, among them, 92% were coded as concussion without a specified length of unconsciousness (ICD-10-CA code S06.0). Accidents accounted for 92% of TBIs and assaults for 7%. Twenty-five percent of all injuries were sports-related, and 10% were related to motor vehicle accidents. Most injuries were sustained as a result of falls (45%) or from being struck by an object or person (36%). During the studied period, patients with TBI had more than twice the average number of hospital visits (emergency or acute) than those in the reference population (4.3 vs. 2.0) (Table 1).
Matched McNemar tests were performed for all 2,600 ICD-10- CA binary variables on the training dataset. The Benjamini-Yekutieli multiple testing method identified 775 significant associations, of which 684 (88.3%) had an OR greater than one. These 684 codes were re-tested on the validation dataset, and 582 of them (85.1%) were internally validated.
Factor analysis was performed on the training dataset on 582 of the ICD-10- CA codes. Of the 582 codes included in the analysis, 329 (56.5%) individual codes met the factor loading cut-off of 0.2. Supplementary Tables 1 and 2 present the individual frequencies, ORs, and factor loadings of codes that met the factor analysis cut-off and codes that did not meet the cut-off, respectively.
Using the break-points on the scree plots and the interpretability, 43 factors were selected. The scree plots are presented in Supplementary Figs 3 and 4 with the values in Supplementary Table 3. Table 2 presents the descriptions, frequencies, ORs, and ICD-10 codes included for each of the 43 factors.
When factors were sorted by frequency in the TBI population, those related to general trauma (Factors 4, 27, and 18), dermatology (Factor 6), geriatrics (Factor 3), respirology (Factor 19), otolaryngology (Factor 8), gastroenterology (Factors 11 and 16), and cardiology (Factors 1 and 17) had a high rate of occurrence, while factors related to environmental exposure (Factors 43, 32, 33, 42, and 34), pharmacology emergencies (Factors 41 and 25), abuse trauma (Factor 38), toxicology (Factor 20), and infectious diseases (Factor 35) occur less frequently. For a visual representation of these frequencies, we refer the reader to the Wordcloud in Fig. 1.
When factors were sorted by the magnitude of the effect size (OR), factors related to pharmacology-related emergencies (Factors 22, 13, and 25), abuse (Factors 38 and 36), toxicology (Factors 20 and 14), general trauma (Factor 4), environmental exposures (Factor 33), and Alzheimer’s/dementia (Factor 29) have a stronger association with TBI, while factors related to nephrology (Factor 28), emergency medicine (Factors 31 and 10), endocrinology (Factor 15), gastroenterology (Factors 11 and 16), stroke (Factor 12), otolaryngology (Factor 8), and infectious diseases (Factor 35) have a weaker association. For a visual representation of these ORs, please see the Wordcloud in Fig. 2.
Supplementary Table 3 provides details of the demographics (i.e., age, sex, income, and rurality distributions) of each of the 43 factors in the TBI, and reference patient populations.
Discussion
As data mining is known to be useful for clinical data45, the focus is naturally turned to exploring health administrative data for improving the surveillance, management, and environmental mapping of complex injuries and disorders. Rich and structured patient data encoded in ICD-10 diagnostic fields significantly expand researchers’ ability to phenotype the profiles of patients at the pre-injury phase, both within the specific clinical pathology (comorbidity), as well as for environmental exposures and circumstances. Combining ICD-10 codes and personal characteristics of patients in the timeframe preceding TBI creates enormous opportunities for not only precision medicine46, but also for injury prevention47.
The data mining procedure applied here represents a novel non-hypothesis driven approach for dealing with complex medical issues and big data simultaneously, when manual inspection of valuable clinical and non-clinical information from each patient individually and then the population as a whole, would be an otherwise impossible task. We showed how administrative healthcare information can be used in categorizing multidimensional comorbidities, how multiple comorbidities load on individual factors, and how to perform factor reductions to maximize the cumulative percentage of explained variances, and enhance the clinical interpretability of results. Finally, the data mining approach developed allowed not only the validation of previously known risk factors of TBI, but also shed light on the magnitude of associations that previously received little attention, including those related to exposure to occupational hazards, both chemical (i.e., gases, mineral dusts), physical (i.e., extreme temperatures) and mechanical (i.e., trauma); the long-lasting concerns of assault and child abuse at the population level and their links to TBI, and the adverse effect of medications and drugs in the years preceding TBI. Such novel associations as exposure to toxic gases and fumes, and neurotoxicity of prescription drugs, are extremely important for the future of research and practice pertaining to concussions without a specified length of unconsciousness (S06.0), where there is considerable debate over the clinical definition, neurological signs, and clear epidemiological evidence of probable causation between certain clinical and environmental factors and the injury itself48, or where there is a need to differentiate the effects of neurotoxic drugs from those of TBI. When examining ICD-10-CA factors among patients with TBI by rows (Supplementary Table 3), it is evident that multiple medical conditions, which are well-known TBI risk factors15,18,19,49, are present within identified factors in the years preceding TBI. Instead of a binary association of a given code (or multiple codes) with a given patient, we presented the significance of the occurrences of the ICD-10 codes and associated factors using their frequency distribution in TBI patients and the reference population matched with age, sex, income level, and place of residence. It was observed that factors composed of cardiovascular and metabolic disorders, orthopaedic injuries, mental health disorders, dementias, and Parkinson’s disease are highly overrepresented in the five-year timeframe preceding an individual’s first TBI, as compared to the reference population. The above factors are known to be implicated in the risk of falls and motor vehicle accidents50,51. Overdose of prescription drugs highlighted here also play a role – drugs that cross the blood-brain barrier affect brain functioning and alertness and/or cause postural hypotension, increasing the risk of falls52,53. Likewise, pain killers, especially opioid medications, frequently cause opioid induced respiratory depression, a combination of a lowered level of consciousness, decreased respiratory drive, and upper airway obstruction, and are implicated in cerebral hypoxia and falls with or without the loss of consciousness54,55. Along with prescribed medications, alcohol abuse is a major risk factor for TBI. In more than half of all patients, new TBI occurred at a time when the patient was intoxicated, while excessive drinking increased the risk of dying from head trauma in 36% of assaults, 41% of falls, and 40% of suicidal circumstances56. In line with this evidence, alcohol-related issues, poisoning due to narcotics, and other psychotropic medications, were associated with an increased OR of TBI in the near future, compared to the reference population. Recognition of hazardous situations and exposures such as assaults and self-harm injuries preceding TBI diagnosis, as an opportunity to intervene and prevent TBI, cannot be underestimated57. However, identified codes loaded on adult and child abuse factors are novel and point to the complexity of social circumstances surrounding any given individual in such a situation, and the seriousness of the lasting adverse effects, highlighting the value of a multifaceted investigation needed in TBI prevention and post injury management.
Perhaps the most unique finding is the high percentage of men and women of working age who sought care after environmental exposures to gases and fumes, electrical currents, sharp objects, machinery, and the cold in the five years preceding their TBI event. Work-related TBIs have important societal impacts particularly in high-income countries where high-risk industries such as construction, transportation, manufacturing, farming, fishing, forestry, and mining, are active. The WHO considers “environment” to cover physical, chemical, and biological factors external to the individual58. Surveillance in occupational settings where workers are exposed to electrical currents, sharp objects and moving machinery, and geographic-specific interventions that focuses on unique hazards have to be controlled by governmental health and safety organizations. Safety interventions expect to be tailored to the particular hazards to which a worker is exposed to, with ongoing laboratory quality control and regular hygiene investigations at the workplace, to protect workers’ health59. While it is a tremendous challenge to alter environmental exposures to prevent TBIs, future research may propose ways to regulate work environments for such exposures as a means to reduce injury rates.
Our data mining approach and discussion have some limitations. Our analysis was focused on ICD-10 codes, which have largely unknown diagnostic accuracy, specificity, and sensitivity values. Our factor analysis highlighted that many codes from different categories were loaded on the same factor, and therefore, clinically they should be considered collectively as a single factor rather than as separate factors. This is the case for not only mental health disorder and cardiovascular disease codes, but also codes related to stroke and medical emergencies, infectious diseases and respirology, seizures, epilepsy, and complications of medical procedures, among others. Some of these codes loading on the same factor might represent shared pathophysiological mechanisms of systemic disorders, while others might represent variation in the use of codes among different medical disciplines across the health system in Ontario. Future improvement in methodologies for factors with competing loading effects of ICD codes should be undertaken, to disentangle the complex interplay of person-environment and the healthcare system interaction, to develop a richer understanding of TBI diagnosis, and to untangle the more complex interplay of processes preceding TBI. The complete data presented can be used for hypothesis generation, and not for making any conclusions about causality, and provide deeper insights into the roles of comorbidity, personal circumstances and environmental exposures in regulating the TBI.
To prevent TBI, a complex and often a lifelong disabling injury, it is essential to understand its distribution and patterns, in addition to having extensive knowledge of any clinical disorder, characteristic, or other definable entity, that differentiates TBI from other clinical populations. The findings of this study add to clinical and technological advancement, in providing new techniques for categorizing personal, clinical, and environmental exposure data, and combination of codes in clinically meaningful factors, with enhanced comprehensibility that could aid in future studies of injury prevention. Possible extensions to this work would involve the application of these novel frameworks for detecting factors at the event and post-event phases that could be targeted for secondary and tertiary prevention. With the support of data mining and big data, it is possible to monitor patients’ health and harmful environmental exposures and advance the fields of both precision medicine and injury surveillance. Future statistical and data mining advancements would require improving sensitivity and interpretability of the proposed methodology by validating the described data mining algorithm using data from a patient population across Canada.
Ethical approval and informed consent
Approval
The study protocol was approved by the ethics committees at the clinical (Toronto Rehabilitation Institute-University Health Network) and academic (University of Toronto) institutions.
Accordance
All methods were carried out in accordance with the relevant guidelines and regulations.
Informed consent
This research utilised de-identified administrative health data with no access to personal information.
Data Availability
The datasets generated during and/or analysed during the current study are available in the ICES repository, [www.ices.on.ca/DAS < http://www.ices.on.ca/DAS], under accession DAS 2016-257(2018 0970 084 000). Data sharing agreements prohibit ICES from making the datasets publicly available, however access may be granted to those who meet pre-specified criteria for confidential access. The full dataset creation plan and underlying analytic code are available from the authors upon request, understanding that the computer programs may rely upon coding templates or macros that are unique to ICES and are therefore either inaccessible or may require modification.
References
Shah, N. D., Steyerberg, E. W. & Kent, D. M. Big Data and Predictive Analytics: Recalibrating Expectations. JAMA. 320(1), 27–28 (2018).
Ristevski, B. & Chen, M. Big Data Analytics in Medicine and Healthcare. J Integr Bioinform. 15(3) (2018).
Cobb, A. N., Benjamin, A. J., Huang, E. S. & Kuo, P. C. Big data: More than big data sets. Surgery. 164(4), 640–642 (2018).
Mollayeva, T., Mollayeva, S. & Colantonio, A. Traumatic brain injury: Sex, gender and intersecting vulnerability. Nat Rev Neurol., https://doi.org/10.1038/s41582-018-0091-y. [Epub ahead of print] (2018).
Estape, E. S., Mays, M. H. & Sternke, E. A. Translation in Data Mining to Advance Personalized Medicine for Health Equity. Intell Inf Manag. 8(1), 9–16 (2016).
Manley, G. T. et al. The Traumatic Brain Injury Endpoints Development (TED) Initiative: Progress on a Public-Private Regulatory Collaboration to Accelerate Diagnosis and Treatment of Traumatic Brain Injury. J Neurotrauma., https://doi.org/10.1089/neu.2016.4729. [Epub ahead of print] (2017).
Maas, A. I. R. et al. Traumatic brain injury: integrated approaches to improve prevention, clinical care, and research. Lancet Neurol. 16(12), 987–1048 (2017).
Quaglio, G., Gallucci, M., Brand, H., Dawood, A. & Cobello, F. Traumatic brain injury: a priority for public health policy. Lancet Neurol. 16(12), 951–952 (2017).
Masel, B. E. & DeWitt, D. S. Traumatic brain injury: a disease process, not an event. J Neurotrauma. 27(8), 1529–1540 (2010).
Wilson, L. et al. The chronic and evolving neurological consequences of traumatic brain injury. Lancet Neurol. 16(10), 813–825 (2017).
Dams-O’Connor, K., Gibbons, L. E., Landau, A., Larson, E. B. & Crane, P. K. Health Problems Precede Traumatic Brain Injury in Older Adults. J Am Geriatr Soc. 64(4), 844–848 (2016).
Tollefsen, M. H. et al. Patients with Moderate and Severe Traumatic Brain Injury: Impact of Preinjury Platelet Inhibitor or Warfarin Treatment. World Neurosurg. 114, e209–e217 (2018).
Fu, W. W., Fu, T. S., Jing, R., McFaull, S. R. & Cusimano, M. D. Predictors of falls and mortality among elderly adults with traumatic brain injury: A nationwide, population-based study. PLoS One. 12, e0175868 (2017).
Jonsdottir, G. M. et al. A population-based study on epidemiology of intensive care unit treated traumatic brain injury in Iceland. Acta Anaesthesiol Scand. 61, 408–417 (2017).
Hawley, C., Sakr, M., Scapinello, S., Salvo, J. & Wrenn, P. Traumatic brain injuries in older adults-6 years of data for one UK trauma centre: retrospective analysis of prospectively collected data. Emerg Med J. 34, 509–516 (2017).
Yang, Y. et al. Clinical Risk Factors for Head Impact During Falls in Older Adults: A Prospective Cohort Study in Long-Term Care. J Head Trauma Rehabil. 32, 168–177 (2017).
Hwang, H. F., Cheng, C. H., Chien, D. K., Yu, W. Y. & Lin, M. R. Risk Factors for Traumatic Brain Injuries During Falls in Older Persons. J Head Trauma Rehabil. 30, E9–E17 (2015).
Mahler, B., Carlsson, S., Andersson, T. & Tomson, T. Risk for injuries and accidents in epilepsy: A prospective population-based cohort study. Neurology. 90, e779–e789 (2018).
Brenner, L. A. et al. Self-inflicted traumatic brain injury: Characteristics and outcomes. Brain Inj. 23, 991–998 (2009).
Karakurt, G., Patel, V., Whiting, K. & Koyutürk, M. Mining Electronic Health Records Data: Domestic Violence and Adverse Health Effects. J Fam Violence. 32, 79–87 (2017).
González-Chica, D. A. et al. Individual diseases or clustering of health conditions? Association between multiple chronic diseases and health-related quality of life in adults. Health Qual Life Outcomes. 15(1), 244 (2017).
Stawicki, S. P. et al. Comorbidity polypharmacy score and its clinical utility: A pragmatic practitioner’s perspective. J Emerg Trauma Shock. 8(4), 224–231 (2015).
World Health Organization. Projections of mortality and burden of disease, 2002–2030: deaths by income group, http://www.who.int.myaccess.library.utoronto.ca/healthinfo/global_burden_disease/projections2002/en/. Accessed February 18, 2019.
Public Law 110–206 110th Congress, https://www.govinfo.gov/content/pkg/PLAW-110publ206/pdf/PLAW-110publ206.pdf Accessed February 18, 2019.
Taylor, C. A., Bell, J. M., Breiding, M. J. & Xu, L. Traumatic Brain Injury-Related Emergency Department Visits, Hospitalizations, and Deaths - United States, 2007 and 2013. MMWR Surveill Summ. 66(9), 1–16 (2017).
Viano, D., von Holst, H. & Gordon, E. Serious brain injury from traffic-related causes: priorities for primary prevention. Accid Anal Prev. 29(6), 811–816 (1997).
Osborn, A. J., Mathias, J. L., Fairweather-Schmidt, A. K. & Anstey, K. J. Anxiety and comorbid depression following traumatic brain injury in a community-based sample of young, middle-aged and older adults. J Affect Disord. 213, 214–221 (2017).
Pugh, M. J. et al. TRACC Research Team. A retrospective cohort study of comorbidity trajectories associated with traumatic brain injury in veterans of the Iraq and Afghanistan wars. Brain Inj. 30(12), 1481–1490 (2016).
Kumar, R. G. et al. Epidemiology of Comorbid Conditions Among Adults 50 Years and Older With Traumatic Brain Injury. J Head Trauma Rehabil. 33(1), 15–24 (2018).
Karwat, I. D., Krupa, S. & Gorczyca, R. Causes and consequences of head injuries among rural inhabitants hospitalised in a Multi-organ Injury Ward. II. Circumstances, types and consequences of head injuries. Ann Agric Environ Med. 16(1), 23–29 (2009).
Scott-Parker, B. & MacKay, J. M. Research and practice in a multidimensional world: a commentary on the contribution of the third dimension of the Haddon matrix to injury prevention. Inj Prev. 21(2), 131–132 (2015).
Short, D. Using science to prevent injuries: dissecting an event using the Haddon Matrix. JEMS. 24(9), 68–70 (1999).
Ideker, T. & Sharan, R. Protein networks in disease. Genome Res. 18(4), 644–52 (2008).
Roche, K. E., Weinstein, M., Dunwoodie, L. J., Poehlman, W. L. & Feltus, F. A. Sorting Five Human Tumor Types Reveals Specific Biomarkers and Background Classification. Genes. Sci Rep. 8(1), 8180 (2018).
Institute for Clinical Evaluation Sciences. Privacy Code – Protecting Personal Health Information at ICES. Toronto: Institute for Clinical Evaluation Sciences.
Walker, R. L. et al. Implementation of ICD-10 in Canada: how has it impacted coded hospital discharge data? BMC Health Serv Res. 12, 149 (2012).
Chen, A. Y. & Colantonio, A. Defining neurotrauma in administrative data using the International Classification of Diseases Tenth Revision. Emerg Themes Epidemiol. 8, 4 (2011).
Hastie, T., Tibshirani, R. & Friedman, J. Model Assessment and Selection. The Elements of Statistical Learning: Data Mining, Inference, and Prediction. New York, NY, Springer, 219–223 (2009).
Moussa, M. A. Testing marginal homogeneity in square tables; with emphasis on matched data. Comput Programs Biomed. 19(2–3), 239–247 (1985).
Benjamini, Y. & Yekutieli, D. The control of the false discovery rate in multiple testing under dependency. Ann. Statist. 29(4), 1165–1188 (2001).
Sabatti, C., Service, S. & Freimer, N. False discovery rate in linkage and association genome screens for complex disorders. Genetics. 164(2), 829–833 (2003).
Steyerberg, E. W. et al. Internal validation of predictive models: efficiency of some procedures for logistic regression analysis. J Clin Epidemiol. 54(8), 774–781 (2001).
Abdi, H. & Williams, L. J. Principal component analysis. Wiley Interdisciplinary Reviews: Computational Statistics. 2(4), 433–459 (2010).
Census Profile, 2016 Census. Ontario and Canada, https://www12.statcan.gc.ca/census-recensement/2016/dp-pd/prof/details/Page.cfm?Lang=E&Geo1=PR&Code1=35&Geo2=&Code2=&Data=Count&SearchText=Ontario&SearchType=Begins&SearchPR=01&B1=All&GeoLevel=PR&GeoCode=35 Assessed February 18, 2019.
Islam, S., Hasan, M., Wang, X., Germack, H. D. & Noor-E-Alam., A. Systematic Review on Healthcare Analytics: Application and Theoretical Perspective of Data Mining. Healthcare (Basel). 6(2), 54 (2018).
You, N., He, S., Wang, X., Zhu, J. & Zhang, H. Subtype classification and heterogeneous prognosis model construction in precision medicine. Biometrics. 74(3), 814–822 (2018).
Lasry, O., Dendukuri, N., Marcoux, J. & Buckeridge, D. L. Accuracy of administrative health data for surveillance of traumatic brain injury: a Bayesian latent class analysis. Epidemiology. 29(6), 876–884 (2018).
Patricios, J. et al. What are the critical elements of sideline screening that can be used to establish the diagnosis of concussion? A systematic review. Br J Sports Med. 51(11), 888–894 (2017).
Yokomoto-Umakoshi, M., Kanazawa, I., Kondo, S. & Sugimoto, T. Association between the risk of falls and osteoporotic fractures in patients with type 2 diabetes mellitus. Endocr J. 64(7), 727–734 (2017).
Breen, J. M., Naess, P. A., Gjerde, H., Gaarder, C. & Stray-Pedersen, A. The significance of preexisting medical conditions, alcohol/drug use and suicidal behavior for drivers in fatal motor vehicle crashes: a retrospective autopsy study. Forensic Sci Med Pathol. 14(1), 4–17 (2018).
Karjalainen, K., Blencowe, T. & Lillsunde, P. Substance use and social, health and safety-related factors among fatally injured drivers. Accid Anal Prev. 45, 731–736 (2012).
Seppala, L. J. et al. EUGMS Task and Finish Group on Fall-Risk-Increasing Drugs. Fall-Risk-Increasing Drugs: A Systematic Review and Meta-analysis: III. Others. J Am Med Dir Assoc. 19(4), 372. e1-372, e8 (2018).
Chan, D. C. et al. Drug-related problems (DRPs) identified from geriatric medication safety review clinics. Arch Gerontol Geriatr. 54(1), 168–174 (2012).
Wolff, M. L. et al. Falls in skilled nursing facilities associated with opioid use. J Am Geriatr Soc. 60(5), 987 (2012).
Vaaramo, K., Puljula, J., Tetri, S., Juvela, S. & Hillbom, M. Head trauma sustained under the influence of alcohol is a predictor for future traumatic brain injury: a long-term follow-up study. Eur J Neurol. 21(2), 293–298 (2014).
Fazel, S., Wolf, A., Pillas, D., Lichtenstein, P. & Långström, N. Suicide, fatal injuries, and other causes of premature mortality in patients with traumatic brain injury: a 41-year Swedish population study. JAMA Psychiatry. 71, 326–333 (2014).
Gal, M. et al. Epidemiology of assault and self-harm injuries treated in a large Romanian Emergency Department. Eur J Emerg Med. 19(3), 146–152 (2012).
World Health Organization. Public Health, Environmental and Social Determinants of Health, https://www.who.int/phe/en/ Accessed February 16, 2019.
Government of Canada. Canadian Centre for Occupational Health and Safety, https://ccohs.ca/oshanswers/prevention/effectiv.html Accessed February 16, 2019.
Acknowledgements
This work was supported by the Eunice Kennedy Shriver National Institute of Child Health & Human Development of the National Institutes of Health under Award Number R21HD089106. The content is solely the responsibility of the authors and does not necessarily represent the official views of the National Institutes of Health. AC was funded by the Canadian Institutes of Health Research (CIHR) Chair in Gender, Work and Health (CGW-126580). TM was supported by the postdoctoral research grant from the Alzheimer’s Association (AARF-16-442937). The funders had no role in study design, data collection, decision to publish, or preparation of the manuscript. This study made use of de-identified data from the ICES Data Repository, which is managed by the Institute for Clinical Evaluative Sciences with support from its funders and partners: Canada’s Strategy for Patient-Oriented Research (SPOR), the Ontario SPOR Support Unit, the Canadian Institutes of Health Research and the Government of Ontario. Parts of this material are based on data and information compiled and provided by the Canadian Institute for Health Information (CIHI). The opinions, results and conclusions reported are those of the authors. No endorsement by ICES or any of its funders or partners, nor CIHI, is intended or should be inferred.
Author information
Authors and Affiliations
Contributions
A.C., T.M., V.C., M.S. and M.E. conceived the original concept and initiated the work. M.E. designed and optimized statistical analyses for this work. M.S. carried out the analyses with the support of M.C., S.J. and V.C. All authors discussed the results. T.M. and M.S. wrote the manuscript. All authors read the paper and commented on the text.
Corresponding author
Ethics declarations
Competing Interests
The authors declare no competing interests.
Additional information
Publisher’s note: Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supplementary information
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/.
About this article
Cite this article
Mollayeva, T., Sutton, M., Chan, V. et al. Data mining to understand health status preceding traumatic brain injury. Sci Rep 9, 5574 (2019). https://doi.org/10.1038/s41598-019-41916-5
Received:
Accepted:
Published:
DOI: https://doi.org/10.1038/s41598-019-41916-5
This article is cited by
-
Sex-specific analysis of traumatic brain injury events: applying computational and data visualization techniques to inform prevention and management
BMC Medical Research Methodology (2022)
-
Decoding health status transitions of over 200 000 patients with traumatic brain injury from preceding injury to the injury event
Scientific Reports (2022)
Comments
By submitting a comment you agree to abide by our Terms and Community Guidelines. If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.