Decoding health status transitions of over 200 000 patients with traumatic brain injury from preceding injury to the injury event

For centuries, the study of traumatic brain injury (TBI) has been centred on historical observation and analyses of personal, social, and environmental processes, which have been examined separately. Today, computation implementation and vast patient data repositories can enable a concurrent analysis of personal, social, and environmental processes, providing insight into changes in health status transitions over time. We applied computational and data visualization techniques to categorize decade-long health records of 235,003 patients with TBI in Canada, from preceding injury to the injury event itself. Our results highlighted that health status transition patterns in TBI emerged along with the projection of comorbidity where many disorders, social and environmental adversities preceding injury are reflected in external causes of injury and injury severity. The strongest associations between health status preceding TBI and health status at the injury event were between multiple body system pathology and advanced age-related brain pathology networks. The interwoven aspects of health status on a time continuum can influence post-injury trajectories and should be considered in TBI risk analysis to improve prevention, diagnosis, and care.


Scientific Reports
| (2022) 12:5584 | https://doi.org/10.1038/s41598-022-08782-0 www.nature.com/scientificreports/ management [13][14][15] at the time a person sustains an injury; in these situations, the disorder (e.g., cardiovascular, metabolic, and neurologic disorders) is more likely to be captured in the injury surveillance data, as these are considered in TBI management and care 16 . Other health-related conditions are temporal in nature (e.g., sprains and strains, acute intoxication, abuse) and may not be noted in the injury event but may increase the probability of an injury as a result of falls, such as among those with poor balance or confusion, thus reflecting changes at both the physiological and psychosomatic levels 17,18 . To complicate matters, most patients discharged directly from the emergency department (ED) receive a concussion diagnosis with non-specific complaints of headaches, dizziness, balance issues, and sensitivity to noise and light [18][19][20][21] . While the Glasgow Coma Scale (GCS) score allows for a well-defined designation of TBI severity in more severe injury events that include a loss of consciousness and post-traumatic amnesia, this scale lacks sensitivity for milder injuries, such as concussion 22 . The GCS score is also inadequate to explain progressive symptom evolution from a relatively minor external physical force in patients who present with multiple disorders that require concurrent treatment at the time of the injury event 23,24 . Furthermore, disorders often coalesce with each other and with age-related factors and social adversities, which creates a vastly complex web of possible correlations to account for in the study of health status transition in TBI. One method to increase confidence in the characterisation of health status transitions is to apply computational approaches to longitudinal health status data of patients with TBI both preceding and at the time of TBI diagnoses, and to compare these data with those of patients without TBI who are individually matched to TBI patients by sex, age, place of residence, and income level. This would allow health status to be studied in relation to the difference between the cohorts, and at two time points. In our recently published study 6 , we developed an algorithm to sequence thousands of diagnosis codes within the International Statistical Classification of Diseases and Related Health Problems, Tenth Revision (ICD-10) in 235,003 unique patients with TBI and the same number of patients without TBI, who visited ED or acute care hospitals over a decade. A total of 43 factors of health status from the five-years period preceding the TBI event that differentiated patients without a TBI were extracted and internally validated. Taking advantage of these 43 factors that describe the health status preceding TBI, here we first present a new analysis of data from the injury event, characterising health status of the same unique patients at the injury event. We then report associations between the factors of health status preceding TBI and that of an injury event, followed by portrayal of hierarchical clusters in data matrix of health status transitions, from time preceding injury to the injury event itself. We utilised the following steps in the analysis and validation process of the injury event phase and health status transition from the time of preceding injury to the TBI event phase: (1) determining the phase preceding TBI and the TBI event phase; (2) multiple testing to detect a set of definable health status patterns in TBI vs. non-TBI diagnoses; (3) factor analysis of health status patterns that are significantly related to TBI vs. non-TBI events; (4) a conditional logistic regression model and correlation matrix and hierarchical clustering using correlation-based distance to group health statuses at the TBI event; (5) health status transitions from the time preceding TBI 6 to the TBI event, grouping all factors from each period into a single heatmap using agglomerative hierarchical clustering with interpretation of factors preceding and at the TBI event that are clustered together, to examine how many meaningful dimensions can be distinguished; and (6) internal validation of the results at each level analysis. Using this process, we confirmed that health status preceding injury is reflected in the injury event health status, and we provide evidence that health status preceding injury can explain the external cause of TBI and contribute to injury severity designation. These results provide a means to connect information on health status transitions in TBI and associated factors, from the time preceding injury to the injury event itself.

Methods
Population and health status data. We accessed the data from ICES 25 , which collects and stores health administrative data on publicly funded services provided to residents of Ontario, Canada, including information on acute care hospitalisations and ED visits. With nearly 14 million residents, Ontario is Canada's most populous province, comprising 43% of Canada's population 26 . Universal health care covers all medically necessary healthcare services at the point of care. The standardised discharge summary includes patient demographics and main and secondary diagnoses according to ICD-10 codes 27 . The ICD-10 codes consist of a combination of alphanumeric characters that characterise broad diagnosis categories. Each code is designed as an alphanumeric code and arranged hierarchically, with the code length ranging from 3 to 6 characters. The first three characters designate the category of the diagnosis, which is the same as the World Health Organisation's ICD-10 international standard for reporting diseases and health conditions 28 . The health service records data are linked deterministically at the individual level through a unique, encoded identifier based on name, sex, date of birth, and postal code. By applying unique de-identified health records, the health status trajectory of each patient can be tracked over time.
We used a previously established cohort of patients discharged between the fiscal years (defined as from April 1 to March 31) 2007/2008 and 2015/2016 from the ED (identified in the National Ambulatory Care Reporting System) and acute care (identified in the Discharge Abstract Database) with a diagnostic code for TBI (ICD-10 codes S02.0, S02.1, S02.3, S02.7, S02.8, S02.9, S04.0, S07.1, and S06) 6 ; these patients comprised the TBI cohort in the present study. Patient demographics, main and secondary diagnoses, conditions, problems, or circumstances data 27 were extracted for each individual patient. We selected a 10% random sample of patients discharged from the ED or acute care hospitals during the same study period for a reason other than TBI, and individually matched these to patients with TBI by sex, age, place of residence (urban vs. rural), and income quintile; these patients comprised the reference population 6 .
We followed previously published severity classifications 29,30 to assign TBI injury severity. External causes of injury were determined using Centers for Disease Control (CDC) and Prevention major external cause of injury group codes, which were divided into falls, struck by/against an object, motor vehicle collision (MVC) and other Step 1. Determining the phase preceding TBI and the TBI event phase The index date for patients with TBI was defined as their first occurrence of TBI over the study period, whereas, for the reference population, the index date was the midpoint of the ED or acute care visits. Data from 235,003 unique patients with TBI (and a same number of reference patients) were randomly split into training (50%; n = 117,689), validation (25%; n = 58,798), and testing (25%; n = 58,516) datasets. All analyses were completed and reported using the testing dataset, and the training and validation datasets were used for internal validation.
Step 2. Multiple testing to detect a set of definable health status patterns in TBI vs. non-TBI diagnoses Health status preceding the injury event We evaluated the health status preceding a TBI event reported by previous studies 6,33 . From all the possible ICD-10 codes classifying patients' main and secondary diagnoses, a previous data mining and validation study identified 43 factors that were significantly overrepresented in patients with TBI compared to reference patients (individually matched based on sex, age, place of residence, and income quintile) within the 5 years preceding their TBI event. For details, please see the study 6 . Health status at the injury event To identify health status at the TBI event and to gain insight into its observed correlations, we analyzed all ICD-10 codes depicted across the 10 and 25 diagnoses fields of the National Ambulatory Care Reporting System and the Discharge Abstract Database, respectively, for patients with TBI and reference patients at the index date. We converted all 2,600 codes into binary variables, except for provisional codes for research and temporary assignments, U98 and U99. The first three characters (one alphabetic and two numeric) of the ICD-10 comprised 2,600 distinct codes that defined specific diagnoses. These 2,600 binary ICD-10 code variables were tested for significant correlations with TBI diagnosis codes using a matched McNemar test with correction for multiple testing 34 . The Benjamini-Yekutieli method was applied to acquire a set of codes controlled at a False Discovery Rate (FDR) of 5% 35,36 . We identified ICD-10 codes that were associated with a TBI event, for which we then calculated odds ratios (ORs) to compare with the reference population (Supplementary File). To eliminate measurement artifacts, the procedure was first performed using the training dataset and then repeated using the validation dataset 37 . Only codes that were significant in both the training and validation sets were retained for further analysis.
Step 3. Factor analysis of health status patterns that are significantly related to TBI vs. non-TBI events To gain insight into the dimensionality structure of individual diagnosis codes, we performed factor analysis using principal components methods 38 . The optimal number of factors was determined by the breakpoint on the scree plot, eigenvalue, the greatest cumulative proportion of variance accounted for, and via a conditional logistic regression looped through all possible factors covering the largest area under the receiver operating characteristic curves 39 ( Supplementary Fig. 1, Supplementary Table 2).

Step 4. Conditional logistic regression model and correlation matrix and hierarchical clustering
The conditional logistic regression model was built using binary factor-based scores 40 . Patients were assigned a score of one if they possessed any of the ICD-10 codes in the factor definition; otherwise, they were assigned a zero. These factor-based scores were used to calculate ORs and 95% confidence intervals from a conditional logistic regression model 40 on the association between each factor and TBI, controlling for sex, age, rurality, and income in the testing dataset and then repeated in the training and validating datasets. To visualise the results of the factor analysis and conditional logistic regression model, a Pearson's correlation matrix was generated for all significant factors 41 , and hierarchical clustering was performed on similar group factors using correlation-based distance 42 to identify groups of people with similar associative factors for TBI. To aid in the visualisation of clusters in the heatmap, clustering was performed using Ward (minimum variance) linkages 43 . The algorithms for these agglomerative clustering methods have been described elsewhere 43 .
Step 5. Health status transitions from the time preceding TBI to the TBI event To further expand our understanding of health status transitions from the time preceding TBI to the TBI event, we clustered all factors from each period into a single heatmap, where values of factors representing each time period were correlated. This was done using a Fisher transformation, which converted the correlations into "z-like statistics" 44 . Next, factors preceding TBI were pooled into separate "injury severity" and "mechanisms of injury" event groupings.

Internal validation of the results
To determine the consistency of observed patterns, heatmaps were generated and compared between the training, validation, and testing dataset, with an FDR-corrected alpha set at 0.05. All correlations with adjusted p-values greater than 0.05 were set to 0 on the heatmaps, leaving only significant correlations with an FDR < 0.05.
Ethical approval and informed consent. Approval www.nature.com/scientificreports/ ance: All methods were carried out in accordance with the relevant guidelines and regulations. Informed consent: This research utilised de-identified health administrative data with no access to personal information. No humans were directly involved in this study.

Results
Of the 58,516 patients in the testing dataset, 57% were male, and 43% were female. The most common TBI mechanisms were falls (n = 26,480 [45%]) and being struck by/against an object (n = 20,845 [36%]). Of all injuries, 25% were sports-related, and 10% were sustained in a MVC. Assaults accounted for 7% of TBIs. Injury severity was not established in 25,036 [43%] patients; most of these cases were recorded as concussion without a specified length of unconsciousness (ICD-10 code S06.0; Table 1 and Supplementary Table 3).
Step 1. Determining the phase preceding TBI and the TBI event phase We found that ED and acute care visits prior to and following the TBI event (i.e., index date for TBI) followed a certain trend, whereby they appeared to plateau 30 days before and after the index date and remained largely unchanged after that (Fig. 1). Therefore, this 61-day period was defined as the TBI event window, whereas all ED and acute care visits within five years up to 30 days prior to a TBI event were considered to be the pre-injury phase. A similar procedure was performed for each patient in the reference population sample, with the exception that the midpoint of each patient's ED and acute care visits was selected as an index date.

Multiple testing to detect a set of definable health status patterns in TBI vs. non-TBI diagnoses
The matched McNemar tests were performed on the training dataset for 2,600 ICD-10 codes at the first three-character level, for patients with TBI and their matched reference patients, for significant associations Table 1. Characteristics of patients with a first traumatic brain injury-related visit in the ED or acute care and matched reference patients. n/a = not applicable; TBI = traumatic brain injury; SD = standard deviation. Data given as mean (standard deviation) or n (%). *A patient had a transfer to either location on the same day. **A patient may have several designations (i.e., sports injury and struct by/against an object). www.nature.com/scientificreports/ with TBI diagnosis. The Benjamini-Yekutieli multiple testing, applied to acquire a set of codes controlled at a FDR of 5%, recognised 273 diagnoses codes that were significantly associated with TBI diagnosis (i.e., had an OR > 1). These codes were re-tested on the validation dataset, and 226 (83%) of them were internally validated (Supplementary Tables 3 and 4). Only codes that were significant in both the training and validation sets were retained for further factor analysis using the principal components method.

Step 3. Factor analysis of health status patterns that are significantly related to TBI vs. non-TBI events
Factor analysis was applied to the training dataset. Of the 226 codes included in the analysis, 164 (73%) unique codes met the factor loading cut-off of 0.2 ( Supplementary Fig. 2). For details on frequencies, ORs, and factor loadings of codes that met the factor analysis cut-off and codes that did not meet the cut-off, see Supplementary Tables 4 and 5, respectively. Using the breakpoints on the scree plots and the interpretability, 35 factors were selected. One factor (asphyxiation, suicide) had low frequencies in the reference population (< 6) and was excluded from further analyses. The remaining 34 factors were studied further. Figure 2 presents each factor by injury severity share. Table 2 presents the descriptions, frequencies, ORs, and ICD-10 codes included for each of the 34 factors. Supplementary Table 4 presents factor loadings and detailed descriptions of each factor.
Step 4. Conditional logistic regression model, correlation matrix, and hierarchical clustering Heatmaps of factors preceding TBI and factors of the TBI event are presented in Fig. 3.

Step 5. Health status transitions from the time preceding TBI to the TBI event
The strongest positive correlations between health status preceding TBI and health status at the injury event were between Cluster D and Cluster 1 (i.e., multiple body system pathology) and Cluster C and Cluster 2 (i.e., advanced age-related brain pathology). The multiple body system pathology was composed of endocrine system pathology, i.e., diabetes and diabetic emergencies (Factors P15 and Factor C17), cardiovascular system pathology (Factor P1 and Factor C2), alterations in renal and urinary tract function (Factor P5 and Factor C8 ), and brain haemorrhages and stroke (Factor P12 and Factor C19). The advanced age-related brain pathology consisted of liver disorders (Factor C11 and Factor P9), Alzheimer's disease and dementia (Factor P29 and Factor C3), and aplastic anaemias and haemorrhages and liver disorders (Factor P9 and Factor C27), among other advanced neurological sequelae.
Cluster B preceding TBI (i.e., poisons, drug overdose, social adversity) was strongly associated with multiple pathologies at the injury event (Clusters 1-3), including seizures and drug adversities (Factor P26 and Factor C20) and illnesses due to poisons and drug overdose (Factor P9 and Factor C23; Factor P13 and Factor C26). Weaker correlations were observed between multiple body system pathology (Cluster 1) and advanced age-related brain pathology at the injury event and Cluster A preceding injury (i.e., young age-related concerns), assault and intentional injury (Factors P28 and P36 and Factor C26), overexertion and superficial injuries, exposure to environmental adversities (i.e., burns, cold/hypothermia, exposure to heat/light), and lifestyle and adverse drug effect preceding injury (Factors P18, P7, P33, and Factors C23 and C28). www.nature.com/scientificreports/

Health status preceding injury associates with injury severity and external causes of injury
Many of the health status factors preceding TBI showed a significant association with TBI severity, thereby likely contributing to the GCS score at the time of injury. For example, severe TBI and age at injury were characterized by a link to Clusters C and D preceding injury (i.e., multiple body system pathology and advanced age-related brain pathology, Fig. 4), which comprised metabolic disorders (Factors P9, P11, P15), neurological disorders (Factor P12), cardiovascular pathology (Factors P1 and P17), Alzheimer's disease and dementia (Factor P29), and disorders of older people (Factor P3). In contrast, respiratory infections, musculoskeletal (MSK) injuries, and overexertion in Cluster A (i.e., young age-related concerns, Factors P8, P4, and P27) preceding TBI were negatively correlated with severe TBI. A reverse health status association was observed for the mild and unspecified TBI severity, whereas moderate TBI severity showed positive correlations with the cluster of disorders associated with poisoning due to narcotics, substance abuse, and liver pathology preceding injury (Factors P22, P14, and P9). This clustering analysis was performed separately for the training, validation, and testing datasets, and consistent patterns in the clustering of health status factors with injury severity were observed across each dataset. External causes of injury were also distinguished by combined clusters of health status factors preceding TBI. Falls were characterised by a strong positive correlation with Clusters C and D (i.e., multiple body system pathology and advanced age-related brain pathology), mimicking the clusters associated with severe TBI, and a strong negative correlation with Cluster A (i.e., young age-related concerns), mimicking the clusters associated with mild and unspecified TBI severity. In contrast, struck by/against an object showed a strong positive correlation with Cluster A and negative correlations with Clusters C and D.
A few patterns of weak negative correlations were observed between health status preceding TBI and MVC as an external cause of injury, whereas sport-related and assault-related causes of injury showed distinct positive correlations with health status preceding TBI. Meaningful observations included clustering of respiratory infections preceding injury with orthopaedic injuries and overexertion (Cluster A, Factors P8 with P4 and P27) in sport-related injury, and preceding injury poisoning by drugs and other substances, assault and abuse, and injuries from contact with sharp objects (Cluster B, Factors P23, P24, P36, and P38) with assault-related TBI.

Internal validation
To determine the consistency of the observed patterns, clustering analysis and heatmaps were generated and compared between the training, validation, and testing datasets. All reported results were confirmed in the training and validation datasets, and clusters and heatmaps were shown to be robust ( Supplementary  Figs. 2 and 3).

Discussion
In this paper, we described a method for aligning health status transition in TBI, a disorder of significant public health concern and a major cause of disability worldwide 4,45 . The methods presented here describe a non-hypothesis-driven approach for detecting health status at injury events and combining them with health status results preceding the injury. This approach offers an explanation for the challenges associated with injury diagnosis, classification, and surveillance, which can be confounded by population health heterogeneity and epigenetic ambiguity 46 . With these challenges in mind, we conducted an impartial and interpretable assessment of health status transitions in TBI accounting for 2,600 individual diagnoses encoded using the ICD-10 3 in a retrospective cohort of people of all ages, biological sex, socioeconomic standing, and place of living who had universallyfunded access to healthcare. We internally validated our results and found them to be robust. We believe that the Table 2. Factor analyses with ICD-10-CA codes, disease category and effect size (OR and 95% CI). ABX = antibiotics; CI = confidence interval; OR = odds ratio; TBI = traumatic brain injury; SD = standard deviation. Frequencies given as numbers. www.nature.com/scientificreports/ presented method to study health status transitions in TBI will spur the development of additional methods and prove useful for future analyses on health status transition after the injury event. Application of a health status transitions perspective to contextual injury event, and recognition that health status preceding injury makes a person more or less susceptible to TBI due to specific external cause of injury and developing a more or less severe TBI, entails new approaches to injury taxonomy, treatment and rehabilitation, and predictive classifications. This is important to avoid transition bias that can arise when people are prognostically different at the injury event phase because of their health status preceding injury. The results encourage dialogue among researchers, clinicians, and policymakers on health status transition perspective in TBI and other complex disorders and injuries. Our results demonstrate that the transitions in health status from the time preceding injury and the injury event are depicted in the patterns of associations, external cause of injury and injury severity. We observed both hidden transitions, when the person's exposures preceding injury were not a constituent of the health status captured in the TBI event (i.e., exposures to gases and fumes, electrical currents, sharp objects, machinery, as shown by white fields in Figs. 3 and 4), as well as observed transitions, when the health status preceding injury was contained in the assessment of the injury event. Such transitions include cardiovascular, endocrine, metabolic, and neurological disorders, and disorders of the elderly, that were not resolute with time, and which were significantly reflected in the TBI event's external cause of injury and injury severity (magenta and green fields in Figs. 3 and 4). Together, these results suggest that patterns in the health status transition of patients with TBI emerge along the course of their comorbidity, which is consistent with previous reports 45,47-51 . Our results suggest that many disorders preceding injury are reflected in external causes of injury and injury severity. Disorders clustering within the same external cause of injury and injury severity, as highlighted here, illuminate TBI as an event that is constructed within the context of health and social statuses, both formative and reflective 6,52 . For example, we observed that clusters composed of cardiovascular and metabolic disorders, stroke, dementia, and disorders of the elderly preceding TBI were strongly associated with falls and severe TBI. While the above disorders, individually, have long been known to be implicated in the risk of falls 53-55 , we demonstrated their formative links, both with other disorders and with TBI diagnosis. The association of Clusters C and D with TBI severity found here has significant diagnostic relevance. In this regard, both the depth and duration of coma following the injury event have been considered as an injury severity indicator using the GCS score 56 . While it has been previously suggested that GCS scores can be affected by intoxication, hypoxia, and hypotension, among other things 57-59 , the health status and age of patients presenting with these signs are not currently accounted for when determining injury severity. This has both clinical and policy implications, as there is a continuing debate over use of the GCS score in trauma patients of all ages, including preverbal children, to A large number of patients with TBI in our sample (43%) did not have an established injury severity; most of these events were coded as concussions without a specified length of unconsciousness (S06.0 codes). The links between MSK illnesses preceding injury event and unspecified TBI severity sustained in a sports-related context have been described previously 6 ; however their clustering on young age-related disorders (i.e., respiratory infections and adverse reaction to antibiotics (Factors P8 and P37), overexertion (Factor P27), adult and child abuse and assault (Factor P38), and foreign body in eye or airway (Factor P30) preceding injury) are novel findings. They may highlight the limitations of establishing level of responsiveness according to three aspects -eye-opening, motor, and verbal responses -in compromised person-environment and healthcare interactions, as well as a greater probability of an unwitnessed injury event in such interactions.
Finally, our results provide a basis for using pre-injury health status as an integral part of precision medicine and injury surveillance. We found clusters of factors associated with severe injury and those with mild injury severity and concussion. Factors clustering on moderate-to-severe TBI are composed of system-level disorders, poisons, and drug overdose, and those associated with mild TBI and unspecified injury severity (i.e., concussion codes) are composed of MSK-related injuries and respiratory illnesses. The external cause of injury, especially falls and struck by/against an object, nearly clustered in accordance to the severity, with falls linked to patients with system-level and neurological disorders and severe TBI, and being struck by/against an object linked to superficial injuries, overexertion, orthopaedic injuries, and mild TBI and concussions 67,68 . Notably, MVCs showed very few associations, all negative, the most significant of which, across clusters, were orthopaedic injuries, seizure disorders, and disorders of older people; these conditions linked to obstacles to, or lack of authorisation to operate a machinery [69][70][71] .
As presented in this research, we developed a feasible method to work with big data and complex clinical and public health topics in TBI simultaneously, which can be applied to other complex disorders and injuries. We have shown that it is possible to convert thousands of diagnoses encoded within the ICD-10 structure into hundreds of TBI-related diagnoses, and then further reduce these diagnoses into few dozens of factors that collectively explain the TBI event's significantly shared variance with factors preceding TBI. We created a procedure for visualised cluster analysis and heatmaps to accurately trace health status transitions and to detect and localise the clusters associated with the transitions. Encouraged by meaningful observations, we adapted and extended www.nature.com/scientificreports/ the analyses to external causes of injury and injury severity, and provided evidence that health status preceding an injury event is reflected in the injury event, as TBI-event health status and factors were implicated in the cause of injury and injury severity designation. Depending on the cluster and its formation, we anticipate that our analyses could offer new important information on injury severity in the case of falls and being struck by/ against an object, and assault-related injury surveillance. Despite the scientific and technological advances captured in this work, there are still questions to be addressed in future research. The data-driven approach we developed and the results are based on ED and acute care hospital records; there are still persons with TBI who may choose to be treated at a primary care facility within the healthcare system, which could be an additional source for coding injury and morbidity data. Primary care data should be explored in the future, given that hospital data tends to be efficient and does not always strive for completeness 27,28 . This is important for assault-related injury surveillance, as the ICD codes do not capture the victim/perpetrator relationship, e.g., TBIs due to intimate-partner violence. In Ontario, a code is mandatory only when the condition or circumstance exists at the patient's visit and is significant to the patient's treatment or care 28 . In the future, we plan further investigation and external validation of health status preceding injury, especially for circumstances not reflected in the TBI event window, but which were implicated in the external cause and severity of the injury, linking them to recovery and functional trajectory. In addition, we used the first three characters in the ICD-10, which designate the category of the diagnosis at the time preceding an injury event 72 . By using the whole sequence of codes instead of the first three characters, more data can be preserved for model testing, training, and validation; however, this necessitates a higher computing power to run the analyses 72 . Finally, despite the reliability/validity of ICES data on ED and acute care visits 73,74 , there may remain uncovered variation between Ontarians due to differences in access to care and help-seeking behaviours. This may be especially true for certain external causes of TBI, for example, assault related TBIs sustained in rural settings 75 .
In an effort to mitigate this issue, we internally validated the results that emerged in the testing dataset using the training and validation datasets and detailed these results in the supplementary material of the manuscript. Nonetheless, future research would require ensuring generalizability by externally validating the described health status transitions using data from a patient population across Canada.
Implications for prevention. Primary prevention seeks to circumvent injury before it occurs by protecting persons and vulnerable groups among the population 76 . Secondary prevention involves early recognition and targeting conditions that have already produced pathological change 77 , to stop the adverse injury course. Tertiary and quaternary prevention involves treatment directed to prevent long-term complications and minimize disability 78,79 . This research focused on time preceding injury and injury event, shown diagrammatically in Fig. 4, and, therefore, allowing the discussion of primary and secondary prevention initiatives.
Advanced age-related brain pathology (Cluster D) and disorders associated with poisoning due to narcotics, substance abuse and liver pathology (Cluster B) preceding TBI could be targeted in primary prevention. These relevant clusters preceding injury were associated with severe TBI, multiple pathologies, and other neurological sequelae at the time of injury. Interventions focusing on balance, posture, and moving equipoise training in the elderly has shown to be accompanied by a decline in falls [80][81][82] . Aside from targeting postural instability due to age-related motor impairments, advanced age-related brain pathologies highlighted in this work have been reported to challenge the risk-benefit ratio balance of treatment options with links to falls 83 . There has been a recent call to utilise a minimally disruptive approach when deciding on pharmacological management of conditions of the elderly to reduce the likelihood of drug interactions and falls prevention 84 .
Raising awareness about the links between poisoning due to narcotics, substance abuse, and liver pathology (Cluster B) preceding injury and assault related TBI is important 85 . While the ability of healthcare providers to prevent or modify such behaviors has not been proven, it might be possible to direct medical effort to the prevention of alcohol and drug-associated problems and, by that, prevent injury, violence, and medical complications of drug abuse 86 . Early detection and interventions that have proven effective for addictions include brief counselling, referral to ambulatory and inpatient treatment programs, community organizations, and appropriate medication use for substance use withdrawal [87][88][89] . Likewise, screening for exposure to relationship violence (i.e., adult and child abuse and sexual assault) and developing long-term plans and referrals to appropriate community and governmental agencies may prevent assault related TBIs 90 .
Ideas for secondary prevention strategies emerged from the results of this research include attention to interventions directed on the risk associated with the loss of patient autonomy in severe TBI cases, strongly linked to multiple body system pathology both preceding and at the time of injury (Cluster C and Cluster 1). Experimental therapies that inhibit the release of excitotoxins that play an important role in secondary injury to attenuate cellular oxidative and metabolic stress might prove to be effective 91 . Likewise, because of the substantial risks of repeated TBIs and adverse TBI outcomes from unaddressed narcotics, substance abuse and liver pathology (Cluster B), healthcare adoption of a broad construction of health status transition, considering family and social environments of their patients, is key 92 . Routinely eliciting information about the home, work, and neighbourhood exposures, and documenting family and social circumstances 93 can help direct secondary prevention interventions and recommendations, as individual patients' situations will dictate feasible targets to which primary care providers should be alerted, when intended to ameliorate the course of TBI.
In summary, advances in data-driven analysis reveal a remarkable extent of meaningful associations in health status in the time preceding and following a TBI event that direct ideas for primary and secondary prevention. Possible extensions to this line of research would involve detecting health status transitions from the event to post-injury phase that could support tertiary and quaternary prevention, with compelling injury surveillance and public health ramifications.

Data availability
ICES is an independent, non-profit research institute funded by an annual grant from the Ontario Ministry of Health and Long-Term Care (MOHLTC). As a prescribed entity under Ontario's privacy legislation, ICES is authorized to collect and use health care data for the purposes of health system analysis, evaluation and decision support. Secure access to these data is governed by policies and procedures that are approved by the Information and Privacy Commissioner of Ontario. The dataset from this study is held securely in coded form at the Institute for Clinical Evaluative Sciences (ICES). While data sharing agreements prohibit ICES from making the dataset publicly available, access may be granted to those who meet pre-specified criteria for confidential access, available at www. ices. on. ca/ DAS. The full dataset creation plan and underlying analytic code are available from the authors upon request, understanding that the computer programs may rely upon coding templates or macros that are unique to ICES and are therefore either inaccessible or may require modification.

Glossary of terms
Benjamini-Yekutieli procedure A statistical procedure that controls the false discovery rate under arbitrary dependence assumptions. Cluster A statistical classification technique in which a network of objects or points with similar character istics are grouped in clusters Cluster analysis A statistical classification technique for exploratory data analysis in which a set of objects with similar characteristics are grouped in clusters; in this work, the technique was used for grouping similar kinds of factors into respective categories Cluster heatmap A visualization technique used to reveal hierarchical clusters in data matrix by drawing a rectangular grid corresponding to rows and columns and coloring the cells by their values in the data matrix; this technique has high data density and reveals clusters better than unordered heatmaps alone Discharge Abstract Database Captures administrative, clinical, and demographic information on hospital discharges (including deaths, sign-outs, and transfers); data is received directly from acute care facilities or their respective health/ regional authority or ministry/department of health Factor A set of observed variables that have similar response patterns; they are associated with a hidden variable (called a confounding variable) that is not directly measured Factor analysis A statistical technique that takes a mass of data and finds hidden patterns shows how those patterns overlap and what characteristics are seen in multiple patterns False discovery rate A statistical method of conceptualizing the rate of type I errors in null hypothesis testing when conducting multiple comparisons Glasgow Coma Scale A calculated scale that determines a patient's level of consciousness based on the best eye-opening response, the best verbal response, and the best motor response International Classification of Diseases, 10th Revision, with Canadian Enhancements A coding system, developed by the World Health Organization that is used to classify diseases and related health problems (morbidity); includes enhancements developed by the Canadian Institute for Health Information for use in Canadian hospitals and other Canadian medical facilities Heatmaps Depict relationships between two variables, one plotted on each axis; by observing how cell colors change across each axis, one can observe if there are any patterns in value for one or both variables National Ambulatory Care Reporting System Contains data for all hospital-based and community-based ambula tory care: day surgery, outpatient and community-based clinics, emer gency departments Traumatic brain injury A disruption in the brain's normal function by a bump, blow, or jolt to the head or penetrating head injury Surveillance (in injury) Refers to the ongoing collection of data describing the occurrence of, and factors associated with, injury www.nature.com/scientificreports/ Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http:// creat iveco mmons. org/ licen ses/ by/4. 0/.