Brain imaging is necessary for understanding disease symptoms, including stroke. However, frequent imaging procedures encounter practical limitations. Estimating the brain information (e.g., lesions) without imaging sessions is beneficial for this scenario. Prospective estimating variables are non-imaging data collected from standard tests. Therefore, the current study aims to examine the variable feasibility for modelling lesion locations. Heterogeneous variables were employed in the multivariate logistic regression. Furthermore, patients were categorized (i.e., unsupervised clustering through k-means method) by the charasteristics of lesion occurrence (i.e., ratio between the lesioned and total regions) and sparsity (i.e., density measure of lesion occurrences across regions). Considering those charasteristics in models improved estimation performances. Lesions (116 regions in Automated Anatomical Labeling) were adequately predicted (sensitivity: 80.0–87.5% in median). We confirmed that the usability of models was extendable to different resolution levels in the brain region of interest (e.g., lobes, hemispheres). Patients’ charateristics (i.e., occurrence and sparsity) might also be explained by the non-imaging data as well. Advantages of the current approach can be experienced by any patients (i.e., with or without imaging sessions) in any clinical facilities (i.e., with or without imaging instrumentation).
Brain imaging becomes one of immediate procedures as stroke patients are admitted to hospitals1,2. The imaging measure is critical to distinctly disregard hemorrhagic causes, determine stroke stages (e.g., acute, subacute, chronic), detect acute ischemic changes, recognize ischemic penumbra, and identify stroke occlusion3. Especially for subacute stroke patients, the strategic imaging is necessary to observe vasogenic edema3, potentially lengthen the intervention window, and monitor intervention effects4. The proper selection of imaging modalities and sequential procedure is preferable to manage salvageable ischemia and treatment decisions3,5,6,7.
The prompt disability assessment requiring the responsiveness is unlikely done for severe patients. Therefore, the information from brain imaging, such as lesion locations, are frequently used to anticipate consequent impairments. Global cognitive impairments were observed in patients with lesions at the left angular gyrus and left basal ganglia8. Axial- (i.e., frontal-posterior)9 and lateral-dependent10 lesions may influence distinct cognitive impairments. The damages at the right hemisphere were associated with the reduced attentional maintenance11. Furthermore, lesion locations characterize sub-domain dysfunctions of sensory-motor processing12, swallow13, and memory14. Worse mental conditions were found in stroke patients with lesions at the left frontal lobe, the bilateral basal ganglia, and the right Rolandic operculum15,16. Besides cerebral lesions, lesions at the anterior part of cerebellum may contribute to the severe case of ataxia17.
Recovery prognosis has been evaluated based on the initial brain imaging. Lesions occurred at specific regions affected the recovery outcomes18. Its effects were varied in a temporal manner. For example, lesions at the primary motor cortex and the caudate nucleus brought a poor prognosis of long-term gait recovery compared to the lesioned cingulum that specifically influenced a shorter course (6 vs. 3 months)19. Furthermore, the recovery of sub speech domains (i.e., fluency, comprehension, naming, repetition) were associated with lesion locations20. Both the follow-up imaging and the assessment of improved or worsened conditions are essential to monitor the effectiveness of rehabilitation.
The issue of limited instrumentation unfortunately raises, and the practicability concerns should also be addressed. Patients with restless and/or claustrophobic conditions feel inconveniences as undergoing the imaging process (e.g., magnetic resonance imaging/MRI)21. The presence of mental and electronics implants is a major constraint. The injection of contrast agent is cautiously done for patients with renal failures. The follow-up imaging becomes difficult to be routinely performed; this session is suggested to be carried out for only selective patients22 with worsened impairments23,24 after any interventions. The imaging information does important; obtaining that information with less inconvenience and high productivity is urgently strived.
Understanding the brain condition is beneficial not only for estimating post-stroke impairments and prognoses but also for early screening stroke onsets. Late reports of onset (about 1 h)25 may limit the eligibility for reperfusion treatment26,27,28. This phenomenon is commonly caused by patients’ unawareness of dysfunctions and confusion; several devices measuring patients’ systemic signals, motor and cognitive functions (i.e., non-imaging data) evaluate the risk of stroke29 which may be translated to brain conditions. Furthermore, the autonomic nervous system on physiological responses is associated with perceptive-cognitive functions and emotions30,31,32,33. The link between heartbeat, somatosensory perception, and brain (parietal and posterior cingulate regions) has also been confirmed34. Therefore, we hypothesize that the non-imaging data representing patients’ mind-brain-body functions is feasibly applied for estimating lesions.
The current study aims to investigate the use of patients’ data and scores acquired from various standard tests as independent variables; those variables are fitted to the logistic regression model quantifying the probability of lesion occurrences for each brain region. Figure 1 depicts our concept on the current study. The evaluation metric comprises measures of accuracy, specificity, and sensitivity to examine performances of estimation models. The optimization process is also involved in selecting variables and employing analytical methods. The advanced technique, including artificial technology (AI), has been proposed to be implemented in the clinical field. The AI-aided detection of vessel occlusion was found to be cost-effective35. We are expecting that this lesion estimation can contribute to the advancement of healthcare providing more conveniences for patients, their families, and health professionals.
Lesion occurrence and sparsity parameters for controlling lesion distributions
Lesions were monitored by magnetic resonance imaging (MRI) and diversely found across brains in the dataset (n = 195). Lesion locations were specified following the Automated Anatomical Labeling36,37 (AAL; 116 regions). Lesion degrees in a region were sparsely distributed (right skewed). Its degrees were concentrated in 0%; high coefficients of variation (CV) were consequently observed. The well-distributed dataset is required in an analysis. Therefore, an approach was attempted to control lesion distributions across subjects. The subject data were clustered into three subsets based on lesion parameters—occurrence and sparsity (density of lesion occurrences across regions). Those subsets were specified by low occurrence—low sparsity (subset 1; n = 67), low-to-moderate occurrence—moderate sparsity (subset 2; n = 86), and moderate-to-high occurrence—moderate sparsity (subset 3; n = 42). Figure 2A shows lesion distributions for each specific dataset. The great gap of CV across regions (2.2 and 14 for left thalamus and left precentral, respectively) was observed in the complete dataset. Meanwhile, the gap between maximum and minimum CVs in the subsets 1–3 could be maintained by 5.1–7.2. The lesion modeling was then independently executed for each subset; the model performances were expected to be better in each subset than those of the complete dataset.
Non-imaging data to model lesion characteristics
In order to realize the system with less imaging requirement, the categorization of subject data into subsets should also be feasible using non-imaging data. Therefore, the prediction of subsets using non-imaging features was attempted. About 40% of non-imaging features showed significant subset effects (Kruskal–Wallis H test; p < 0.05; H(2,192) = 6.0–20). Those features were transformed to principal components (PCs); the weights of 7 PCs accounted for 95% of total weight. By performing the random forest analysis, those PCs were modelled to classify subjects into three subsets. From the model, PC importance was observed to be different for classifying each subset. The accuracy performance (i.e., true subset 1, true subset 2, and true subset 3) was adequate (83.6%). However, the model quality was less moderate with a Matthews correlation coefficient (MCC) of 0.29. Figure 2B illustrates a confusion matrix for the ternary classification. The hit rate was similar for each subset (82.1– 85.7%). Among subsets, subset 3 slightly showed a higher classification mismatch (10 out of 42). The greater distribution of lesion occurrence (subsets 3 vs. 1 and 2; 0.03–0.32 vs. 0.01–0.12) and the overlapped characteristic of lesion sparsity (subsets 3 vs. 2; 0.31–0.50 vs. 0.18–0.39) may explain this phenomenon. The attributes of region-wise CVs between the true and predicted subsets were found to be highly correlated particularly for the subsets 2 and 3 (Spearman’s ρ = 0.70; p < 0.05; Fig. 2C). Afterwards, the lesion estimation was performed on the predicted subsets (65, 82, and 48 subjects for subsets 1, 2, and 3, respectively). The successful subset prediction using only non-imaging data suggested the practical use of this system for the new data without imaging.
Improved estimation performances in subsets with controlled lesion distribution
Instead of estimating lesion degrees, lesion occurrences [i.e., absence or presence (degree > 0%)] were predicted for each region (116 regions in total). Figure 3A shows boxplots of region-wise estimation performances [accuracy (true absence and true presence); specificity (true absence); sensitivity (true presence)] for each subset. The estimation was performed by logistic regression analyses; the effect of regularization on estimation performances was evaluated. Specificity and sensitivity performances were significantly influenced by the regularization process (Kruskal–Wallis H test; p < 0.05; H = 8.0–2.3 × 102 ; d1 = 2; d2 = 79–313). Without the regularization process, the improved sensitivity was observed in all subsets (blue-colored boxplots in Fig. 3A; Tukey–Kramer post hoc). Meanwhile, by implementing the regularization, the specificity was found to be significantly higher (cyan- and yellow-colored boxplots for L1 and L2 regularizations, respectively) than that of resulted from the regression analysis without regularization. The accuracy also increased as the regression without regularization was conducted particularly in the subsets 2 and 3. As the specificity is optimized, the sensitivity consequently decreases, and vice versa. For a clinical screening application, the sensitivity performance was prioritized to minimize the negligence of lesion risks. Therefore, the regression analysis without regularization was selected as the optimum method.
The subset effect on accuracy was significant (Kruskal–Wallis H test; p < 0.01; H = 63–87 ; d1 = 3; d2 = 321–454) for any regression analyses (i.e., with or without regularization). The sensitivity of subsets was significantly greater than that of the complete dataset (subsets 2 and 3 > complete dataset, subset 1 ~ subset 2 ~ subset 3; Fig. 3: A3 and A4 vs. A1). Furthermore, the subsets 1 and 2 showed higher accuracy and specificity than the completed dataset and the subset 3 (Fig. 3: A2 and A3 vs. A1 and A4). According to these results, performing analysis on subsets with controlled lesion distributions obtained the better estimation; the importance of distribution control was confirmed.
Applicable estimation models for any estimation resolutions
The term of spatial resolution hereby denotes how brain regions are specified. For the lower-level resolution, brain regions can be registered following the anatomical lobes (Talairach level 2; 18 regions) and hemispheres (Talairach level 1; five regions). On the other hand, the higher-level resolution with associated function labelling (e.g., AAL) provides more detailed regions. The estimation analysis was initially performed based on 116 AAL regions (see. “Non-imaging data to model lesion characteristics”). How the developed estimation models manage the modified resolutions was evaluated.
Figure 3B shows boxplots of region-wise estimation performances for resolutions of Talairach levels 2 and 1 in each subset. The estimation was carried out by two ways—(1) applying 116 AAL estimation models for the modified resolutions (i.e., modified estimation; blue-colored boxplots in Fig. 3B) and (2) developing estimation models using the corresponding resolutions (i.e., direct estimation; cyan-colored boxplots in Fig. 3B). The modified estimation brought a better (Wilcoxon rank-sum test; p < 0.05; z = 2.4–2.9) or equivalent sensitivity compared to the direct estimation. The slightly decreased specificity (Wilcoxon rank-sum test; p = 0.046; z = −2.0) in the modified estimation was shown in the complete dataset (Fig. 3B1; Talairach level 2). According to these results, the models of higher-level resolution (AAL) applicably or even more remarkably estimated lesions in lower-level resolutions. Newly developed models for lower-level resolutions are dispensable .
The effect of subsets on estimation performances was affected by the resolution. Its effect on the lesion estimation of Talairach level 1 was insignificant (Kruskal–Wallis H test; p > 0.05). However, the prominent subset effects were observed in both modified and direct estimations (Kruskal–Wallis H test; p < 0.05; H = 8.7–17 ; d1 = 3; d2 = 65–72) for predicting lesions of Talairach level 2. The complete dataset particularly demonstrated poorer estimation performances (Tukey–Kramer post hoc; subset 1 > complete dataset for accuracy and specificity; subset 3 > complete dataset for sensitivity) than other subsets did.
From the perspective of subject-wise estimation, the predicted occurrences were found to be similar to the true occurrences in all spatial resolutions as an example illustrated in Fig. 3C–E. Lesion at some regions of the right temporal and parietal lobes were seemingly overestimated. Those regions were located at the adjacent regions which the true occurrences were observed. As the spatial resolutions decreased, the sensitivity performances for the subject-wise estimation in all subsets significantly improved (Kruskal–Wallis H test; p < 0.05; H = 19–96 ; d1 = 2; d2 = 138–582).
The association between imaging and estimation results was further assessed. The Fisher’s exact test was performed on the contingency table (i.e., absence and presence for true and estimated conditions) regardless regions. The estimation results were plausible for any estimation resolutions and subsets (p < 0.05). The estimation procedure could properly mimic the results of lesion occurrence based on imaging. The insignificant association was found in several AAL regions (3–5 regions). Nevertheless, these insignificances were relatively minor (< 5%), the estimation models were worthwhile outcomes.
Dominant estimation model inputs from cognitive- and motor-related tests
As the estimation analysis was independently performed, the region-wise models were acquired for each subset. The number of independent variables (i.e., inputs) was varied depending on regions. Cross-validation and re-sampling processes could not be performed for the limited number of region-wise lesions. Therefore, the risk of overfitting might be heightened for models of some AAL regions. The subset 3 required more variables (11 vs. 5–7 in median) for each region-wise model compared to the subsets 1 and 2 (Kruskal–Wallis H test; p < 0.01; H(2,223) = 15; Tukey–Kramer post hoc).
Table 1 shows the frequency of standard tests used in models across regions. The frequency use was diverse between subsets; however, the most predominant tests were similar. Around (or more than) 50% of total modelled regions used non-imaging features acquired from the Behavioral Inattention Test (BIT), the Cognitive Assessment Tool (CAT), and the Mini-Mental State Examination (MMSE). The Functional Independence Measure (FIM) providing variables related to both motor and cognitive functions was also useful for estimation models especially in the subsets 1 and 3. Even though features of the Rivermead Behavioral Memory Test (RBMT) supported the estimation at more than 60% of total modelled regions for the subset 3, those features were unemployed for performing the estimation in the subsets 1 and 2.
The mental-related tests, such as the Hospital Anxiety and Depression Scale (HADS) and the Japanese Perceived Stress Scale (JPSS), were moderately influential for about 20–30% of estimation models. Meanwhile, the variables from the metabolite assessment (e.g., blood) were inapplicable for any models. Some tests, including the Brunnstrom Recovery Stage (BRS) and the apathy scale, showed subset-specific uses. For example, BRS features were particularly involved in estimation models (e.g., right frontal–temporal-occipital, right cingulum, right insula, right thalamus, right heschl, and left cerebellum) for the subset 3.
The lesion information has been well employed to interpret post-stroke impairments and prognosis. In the current study, we attempted the opposite approach of lesion prediction that is rarely pursued to provide the lesion information related to impaired and intact functions. We confirmed the feasibility of non-imaging data (e.g., cognitive-motor functions, mental conditions, etc.) used for estimating the occurrence of lesioned brain regions. These results implied the bi-directional relationships between lesions and post-stroke functions.
The non-imaging features collected from several standard tests were used in estimation models for major regions. Those standard tests, including MMSE, FIM, a part of CAT (e.g., digit span), were commonly conducted to screen and/or evaluate deficits of global and higher-level cognitions38. In the practical framework, cognitive and mood assessments have been implemented using specifically different batteries for each clinical stage (e.g., pre-stroke, hyperacute, rehabilitation)39. Even though the validity of assessment tools are still argued40,41, the importance of neuropsychological assessment is well agreed. Cases of post-stroke dementia42 and psychiatric disorders (e.g., depression, apathy)43,44,45 became one of rationalizations for both cognitive and mood assessments. Post-stroke cognitive and neuropsychiatric impairments raised major concerns from all entities (e.g., stroke survivors, caregivers, and health professionals)46.
Widely located lesions were contributed to global cognitive deficits47. Those locations were associated with vascular trees rather than random incidents8. The hypothesis of vascular regulation on functional brain networks was proposed48. The lesion-evoked effects on vascular regulation and its causal impacts on brain network have not be investigated yet. However, the relationship between post-stroke connectivity disruptions at extensive brain networks (e.g., dorsal attention, auditory, sensorimotor, visual, etc.) and multidomain impairments (e.g., motor, memory, language, etc.) has been reported49. These disruptions may occur due to reduced interhemispheric connectivity within networks and increased intrahemispheric connectivity between networks50. Furthermore, a straightforward insight of disconnected connectomes caused by lesions had been suggested51.
In previous study, it was investigated that a sole mental domain, and the risk of worse mental conditions may be related to lesions at several regions15,16. The causal association of those lesion-specific locations is still unclear. The involvement of several networks (affective, cognitive control, default mode, and reward networks) modelled in a specific pattern tried to explain the symptomatic depression52,53. Depressive-related deficits on cognition have been frequently observed54; the engagement between impaired cognitive processes and emotional dysregulation is proposed55. High-degree brain hubs were found to be responsible for compounding cognitive functions56. The functional and structural segregation of each network toward a specific deficit domain is seemingly implausible, and vice versa. Therefore, the uses of features from multiple standard tests in estimation models of multiple brain regions are well-grounded. According to the results, some performed tests showed a little inclusion (or even none) in estimation models. The irrelevant tests can be reduced for the next study; the data collection period will be accordingly shortened.
We introduced a novel concept of sparsity measure. This measure quantifies the density of lesion occurrences. The lesion degree (i.e., ratio between lesioned and total voxels) was disregarded; the lesion location was implicitly comprehended. Subjects’ lesion sparsity and occurrence were characterized by using non-imaging features with high classification performances (83.6% accuracy). This result confirmed the practicability of non-imaging data to explain imaging data to some extent. One of foremost criteria for a good modeling is the sufficient sample number57. The dataset should be well distributed. However, the actual datasets have limited samples and often follow non-normal distributions58. The study with a small number offers a shorter research time59. Collecting more samples does require greater cost. Therefore, the small study is frequently attempted to provide a preliminary evaluation for hypotheses. The subject classification based on the lesion sparsity into some subsets brought a technical advantage. The subsets consisted of a smaller sample number. We tried our best to minimize the risk of overfitting by performing the processes of cross-validation and re-sampling. The better-performing models were consequently obtained from the subsets with reduced lesion variances. Therefore, the measures of lesion sparsity and occurrence presented relevant subject attributes.
More advanced machine learning methods (e.g., neural networks) have been extensively implemented for stroke studies60. Those methods had been applied on both imaging data and non-imaging data for multiple purposes (e.g., outcome prediction, risk assessment, onset identification)61,62,63. We currently tried two machine learning methods—random forest (ensemble model) and logistic regression for subset classification and lesion estimation, respectively. Each method performed well for its target. Some studies elaborated the comparison between classical (i.e., logistic regression) and advanced machine learning methods. Both classical and advanced methods showed equivalent performances64,65. However, the method suitability may depend on the dataset. Therefore, the suitability of machine learning methods will be advantageously evaluated for the next study. The trade-off between performance and computation costs should be considered as well.
For stroke cases, the prognostic outcome is the main concern. The theory of neuroplasticity has been introduced and studied. While the lesion volume is likely unchanged or even expanded66, the reorganization triggered by compensatory responses leads the recovery67. The recovery is progressive and time-dependent68. The utilization of estimation models is also envisioned during the recovery course. The reduced number of required imaging session is anticipated. The usefulness of estimation was improved by providing the subset classification based on only non-imaging data to select subset-specific models. Therefore, patients who inconveniently perform the follow-up imaging can receive similar benefits functionally addressing the neuroplasticity-evoked effects. The current lesion estimation is also purposeful for three different resolutions. Observed lesions at the lower spatial resolutions (i.e., greater region volume) predict wide-ranging impairments. The care feedback managing broader functions may consequently suggested. Those cares are expectedly optimized and personalized for each case. We hypothesize that the altered subset categorization (i.e., changed sparsity and total occurrence) and region-dependent occurrence during the rehabilitation course may be relevant to the network reorganization. Unfortunately, the longitudinal data, including both imaging (topology, functional) and non-imaging data, is unavailable for the moment. The verification of our hypothesis should be addressed in the near future.
Despite our achievements through the current study, we encountered four limitations. First, the low sample number entailed modeling disadvantages. Even though the modeling processes had been independently done for each subset, the model performances were region-dependent. The sample numbers for lesion classes (i.e., absence and presence) were found to be imbalanced (0.012–1.7 ratio between numbers of presence and absence across regions). The processes of cross-validation and resampling have been carried out; the risk of overfitting may unavoidably remain to some degree. As the feasibility of lesion estimation has been confirmed in this small dataset, the more accurate reliability of estimation models and reproducibility of estimation performances should be investigated using a greater dataset. Second, the estimation was only applicable to lesion occurrences not to lesion degrees (i.e., volume, size). The estimation of lesion degrees was unsuitably carried out in the current dataset with the right-skewed distribution of lesion degrees. More data acquisition with well-distributed lesion degrees for each region is highly requisite for modeling. Lesion degrees have been highly corresponded to functional impairments and prognostic recovery in both human and animal studies69,70,71. The lesion variables have been extended to its affected structural72-functional73 topology and activation74. In the future study, the estimation of lesion location and degree should consider clinical manifestations toward symptomatic impairments and restored functions. Thrid, the current analysis only involved T2W images as reference data. However, an image system analyzing both diffusion and perfusion images has been constructed. Thus our estimation method will be applicable to more diverse imaging methods in the future. Forth, some standard scales [e.g., National Institute of Health Stroke Scale (NIHSS), Fugl-Meyer Assessment (FMA)] for the severity and/or recovery stages of patients were not described. The NIHSS is commonly used to determine the severity of acute stroke patients; in this study, the patients were in rehabilitation stage after acute-phase treatment. Therefore, the assessment of activities in daily living using the FIM test was more appropriately conducted for evaluating patients’ severity. The FMA has been widely employed to monitor recoveries. However, its complex and lengthy procedure hindered the application in clinical practices. The inclusion of other standard tests (i.e., NIHSS FMA) in models will be an issue for future study.
In conclusion, we confirmed that non-imaging data feasibly predicted lesion occurrences at brain regions. The lesion occurrence density (i.e., sparsity) together with the lesion occurrence characterized stroke patients, and this sparsity measure was also associated with non-imaging data. The high-resolution imaging systems are not always available for all clinical facilities. The current approach is expected to deliver the merits of understanding lesion information even in the challenging environments. Even though further sample acquisition and analysis are required, the current approach may be prospectively feasible for acute and subacute stroke settings. The extension of model usages for specific acute (e.g., detection of ischemic penumbra and occlusion presences) and subacute (e.g., observation of corticospinal tract, Wallerian degeneration, brain atrophy, and connectivity alteration) cases will demonstrate advanced clinical benefits. The effects of employing the estimated lesion profile on rehabilitation outcomes will be assessed further by also comparing the results of conventional function monitoring.
Two hundred eight patients (170 males; 66.9 ± 9.8 years old; range 39–86 years old) with cerebral infarction participated in this study. A part of this dataset has been reported in the previous publication16. Patients were initially admitted to hospitals and then transferred to the rehabilitation ward of Hibino Hospital, Hiroshima. Severity conditions were widely distributed; the admission timing to the rehabilitation ward was consequently different (0–90 days after stroke onset) across patients. Patients were recruited during their rehabilitation periods. Among 208 patients, 42 patients suffered recurrent stroke. Thirteen subjects were excluded (12 males; 65.4 ± 9.9 years old; range 42–80 years old) due to limited lesions occurred in the brainstem. This study was approved by the ethics committee of the Shinaikai Hibino Hospital and the Hiroshima University Epidemiological Research. All experiments were conducted in compliance with relevant regulations and the latest version of the Declaration of Helsinki. Informed consent was obtained from either patients or those authorized to provide consents on behalf of patients, including family members or guardians, before we started any measurements.
Patients’ brains were scanned using a 1.5-T MRI instrument to obtain T2-weighted images (GE Healthcare, Milwaukee, WI). Imaging data were examined to quantify the brain lesions. Brain scans were normalized in accordance with the brain template of Montreal Neurological Institute (2 × 2 × 2 mm3 voxel size). Lesions were visually assessed and manually labeled (MRIcron, https://people.cas.sc.edu/rorden/mricron/index.html) afterwards by a proficient doctor (KS). Binary masks (1 for lesioned voxels, and vice versa) were created for non-smoothed lesions. The process of lesion identification is described in detail elsewhere11. In order to understand region-dependent lesions, binary masks were spatially registered following the AAL36,37 into 116 regions. The ratio between lesioned and total voxels for each region was computed (in % unit). Those ratios were hereafter referred as the ”lesion degree measure”.
As mentioned above, severity conditions were varied shown in wide-range numbers of lesioned regions across patients. The region-wise distribution of lesion degree measure was also dispersed. Frontal lesions were unlikely observed in patients; more than one-fifth patients showed lesions in the hippocampus, thalamus, and temporal lobe. These dispersed distributions may interfere the modeling process of lesion estimation. Therefore, 195 patient data were categorized into smaller subsets for minimizing the lesion distribution dispersion.
Patient data were characterized by two parameters, named lesion occurrence and lesion sparsity. The parameter of lesion occurrence quantifies a ratio between the number of regions with lesions (degree > 0%) and the number of total regions (i.e., 116 regions). Meanwhile, the parameter of lesion sparsity indicates a density measure of lesion occurrences across regions. This measure is interpreted by the averaged Hadamard products of mutual lesion occurrence (1: AND logic, NaN: OR logic; see Eq. (1)) and normalized distance (i.e., average of voxel-to-voxel distances using the AAL template references) matrices between regions.
where D is lesion degree of region m, n, etc. Figure 4A shows examples of lesion sparsity computation from two subjects having low and high mutual occurrences, respectively. The greater magnitude of lesion sparsity denotes the lower lesion occurrence density between regions. Figure 4B visualizes the lesion sparsity versus lesion occurrence plot. k-means clustering (k = 3) was then performed on two parameters resulting in three smaller subsets. Subsets 1–3 were distinguished by low occurrence—low sparsity (67 samples), low-to-moderate occurrence—moderate sparsity (86 samples), and moderate-to-high occurrences—moderate sparsity (42 samples), respectively. We hypothesized that the estimation performance may improve as implementing a dataset having relatively uniform lesion conditions (e.g., occurrence, sparsity) on the modeling process.
During the rehabilitation periods, patients’ cognitive and motor functions were assessed using standard batteries. Memory, attention, arithmetic, language, visual, spatial, and higher order functions were evaluated by performing the Cognitive Assessment Tool (CAT; digit/tapping span forward/backward tests, visual cancellation test, symbol digit modalities test, continuous performance test, position Stroop test, paced auditory serial addition test; auditory detection test, memory updating test), Mini-Mental State Examination (MMSE), Behavioral Inattention Test (BIT), Rivermead Behavioral Memory Test (RBMT), and Trail Making Test (TMT; Parts A and B). Functional disabilities and recovery progresses were estimated using the Functional Independence Measure (FIM) and the Brunnstrom Recovery Stage (BRS) at the times of admission to and discharge from the rehabilitation ward. Besides cognitive and motor functions, patients’ mental conditions were monitored using Japanese versions of the Apathy Scale, Hospital Anxiety and Depression Scale (HADS), and the Perceived Stress Scale (JPSS). Furthermore, the blood test was performed to measure levels of white/red blood cells, hemoglobin, etc. Demographic data (e.g., age) and medication information (e.g., doses) were also included as the non-imaging features. By excluding categorical variables and variables collected at the discharge time, 241 variables were used as potential inputs of lesion estimation models (see Table 1).
Model for subset selection
The subset categorization was done based on the imaging data. Recovery courses may alter lesion conditions; its effects on occurrence and sparsity have not been investigated yet. To minimize measurement numbers, lesion occurrence and sparsity were favorably characterized by non-imaging data. Prior to the process of lesion estimation, non-imaging features were modelled to classify subjects into one of subsets (low occurrence—low sparsity, low-to-moderate occurrence—moderate sparsity, and moderate-to-high occurrences—moderate sparsity). The supervised multiclass classification was executed using the random forest in open-source PyCaret library (https://pycaret.org/about; Python programming language). Features with excessive missing data were eliminated. Subset effects on non-imaging features were evaluated (Kruskal–Wallis H test); features with significant subset effects were selected as potential independent variables. Afterwards, the feature dimensionality was reduced by performing the principal component analysis (95% of total weight). Cross-validation (75% training data, threefold) and re-sampling (100 times) were also conducted to obtain robust models. Hyperparameters were tuned to optimize MCC as a measure of model quality. An optimum model was selected based on the highest MCC respected to the highest probability of correct rate.
Lesion modeling process
Three smaller subsets, results of subset selection model, and the complete dataset (i.e., 195 subjects) were independently analyzed. Even though, the lesion characteristics (e.g., occurrence and sparsity) across regions have been controlled for each subset, the region-wise distributions were not normal (0.5–33.6% total subjects with lesions in a region). The lesion degree measure was set as the dependent variable (y); the non-imaging features were used as the independent variables (x1,2,3…). Due to non-normal lesion distributions, analyzing nominal data (in %) of the dependent variable was unlikely feasible. Therefore, the lesion degree data was converted to two categories (Step A in Fig. 5). The categories were indexed as 0 and 1 for lesion absence and presence, respectively.
To obtain straightforward explanations of useful tests, lesion estimation models were constructed based on the multivariate logistic regression (Eq. 2) for each AAL region.
where x is inputs, a is estimated coefficients, b is intercept, and P is probability of the presence/high lesion category. If P is lower than 0.5, subject data were estimated for the absence/low lesion category, and vice versa. The modeling was separately performed for regression analyses with and without the regularization process. The least absolute shrinkage and selection operator (LASSO) and Ridge regression methods implements L1 (α = 1) and L2 (α = 0) regularization, respectively. Those methods preferably use normalized variables as inputs because of penalized coefficients (Step B in Fig. 5). The normalization process was possibly applied on 192 variables (x1,2…,192; in 0–1 value range). On the other hand, for the analysis without the regularization process, inputs were selected (Step C in Fig. 5) among 241 independent variables (x1,2…,241) according to its significant relationships (Kendall’s τ rank correlation; p < 0.05) with the categorized and dependent variable (y). The selected inputs were then fitted to the logistic model (Step D in Fig. 5).
Estimation models were validated by 4*3 fold nested cross-validation (Step E in Fig. 5). The LASSO and Ridge methods optimized the tuning parameter (λ) during the nested cross-validation process. Meanwhile, the nested cross-validation for the regression analysis without the regularization was optimized by the highest validation sensitivity (i.e., true presence/high lesion category). The re-sampling process (100 times) was also performed to tune hyperparameter (Step F in Fig. 5). Four hundred models were obtained from the processes of nested cross-validation and re-sampling. However, those processes (Steps E and F) were unfeasibly executed for all regions because the number of data with presence/high lesion category was limited (< 4) in some regions. Only a single model was acquired in those regions. Akaike information criterion (AIC) values were computerized for all models (Step G in Fig. 5). The best model was selected among 400 models in accordance with the lowest AIC value (Step H in Fig. 5). All processes related to modeling and cross-validation declared in this sub-section (“Lesion modeling process”) were developed in the MATLAB environment (R2018b, The MathWorks, Inc.).
Brain regions can be registered by different manners, such as the Talairach atlas. The Talairach levels 2 (i.e., lobes; 18 regions) and 1 (i.e., hemispheres; five regions) classify the brain with the smaller number of regions than the AAL registration does (Fig. 6A). For example, a single region of Talairach level 2 (e.g., left occipital lobe) comprises several AAL regions (e.g., left calcarine, left cuneus, left lingual, left superior-middle-inferior occipital gyrus). The spatially modified labeling, the transformation from AAL to Talairach labels, was done by the nearest neighbor method on the voxel-wise template [Statistical Parametric Mapping (SPM); www.fil.ion.ucl.ac.uk/spm/].
The selected models were applied on independent variables to estimate lesion categories for each AAL region. Three model parameters—accuracy (true absence/low and true presence/high), specificity (true absence/low), and sensitivity (true presence/high), were evaluated. Those parameters were also assessed as the same models of AAL regions were used to estimate lesions in spatially modified regions. If presence/high lesion categories (index 1) were estimated, the regions of Talairach levels 2 and 1 respected to AAL regions were also predicted having presence/high lesion categories (i.e., OR logic). Figure 6B shows an example of lesion category indexing for the spatially modified regions. Observing lesions in one or more AAL regions within a lobe (Talairach level 2) and a hemisphere (Talairach level 1) consequently brought an estimation of occurred lesion in the respected lobe (Fig. 6B2) and hemisphere (Fig. 6B3). To compare the effect of spatial modification in lesion estimation, the modeling process (see “Model for subset selection”) was directly carried out for each region of Talairach levels 2 and 1.
The datasets used and/or analysed during the current study available from the corresponding author on reasonable request.
Muir, K. & Santosh, C. Imaging of acute stroke and transient ischaemic attack. J. Neurol. Neurosurg. Psychiatry 76, iii19–iii28 (2005).
Merino, J. G. & Warach, S. Imaging of acute stroke. Nat. Rev. Neurol. 6, 560–571 (2010).
Birenbaum, D., Bancroft, L. W. & Felsberg, G. J. Imaging in acute stroke. West J Emerg Med 12, 67–76 (2011).
Ni, L. et al. The value of resting-state functional MRI in subacute ischemic stroke: Comparison with dynamic susceptibility contrast-enhanced perfusion MRI. Sci. Rep. 7, 41586. https://doi.org/10.1038/srep41586 (2017).
Latchaw, R. E. et al. Recommendations for imaging of acute ischemic stroke: a scientific statement from the American Heart Association. Stroke 40, 3646–3678 (2009).
Wintermark, M. et al. Imaging recommendations for acute stroke and transient ischemic attack patients: A joint statement by the American Society of Neuroradiology, the American College of Radiology and the Society of NeuroInterventional Surgery. J. Am. Coll. Radiol. 10, 828–832 (2013).
Lin, M. P. & Liebeskind, D. S. Imaging of ischemic stroke. Continuum (Minneap Minn) 22, 1399–1423. https://doi.org/10.1212/con.0000000000000376 (2016).
Zhao, L. et al. Strategic infarct location for post-stroke cognitive impairment: A multivariate lesion-symptom mapping study. J. Cereb. Blood Flow Metab. 38, 1299–1311 (2018).
Cipolotti, L. et al. Cognitive estimation: Performance of patients with focal frontal and posterior lesions. Neuropsychologia 115, 70–77 (2018).
Sagnier, S. et al. The influence of stroke location on cognitive and mood impairment. A voxel-based lesion-symptom mapping study. J. Stroke Cerebrovasc. Dis. 28, 1236–1242 (2019).
Shimonaga, K. et al. The right hemisphere is important for driving-related cognitive function after stroke. Neurosurg. Rev. 44, 977–985 (2021).
Kenzie, J. M. et al. Localization of impaired kinesthetic processing post-stroke. Front. Hum. Neurosci. 10, 505 (2016).
Wilmskoetter, J. et al. Mapping acute lesion locations to physiological swallow impairments after stroke. NeuroImage Clin. 22, 101685 (2019).
Al-Qazzaz, N. K., Ali, S. H., Ahmad, S. A., Islam, S. & Mohamad, K. Cognitive impairment and memory dysfunction after a stroke diagnosis: A post-stroke memory assessement. Neuropsychiatr. Dis. Treat. 10, 1677–1691 (2013).
Hama, S. et al. Post-stroke affective or apathetic depression and lesion location: left frontal lobe and bilateral basal ganglia. Eur. Arch. Psychiatry Clin. Neurosci. 257, 149–152 (2007).
Sutoko, S. et al. Lesions in the right Rolandic operculum are associated with self-rating affective and apathetic depressive symptoms for post-stroke patients. Sci. Rep. 10, 20264 (2020).
Stoodley, C. J., MacMore, J. P., Makris, N., Sherman, J. C. & Schmahmann, J. D. Location of lesion determines motor vs. cognitive consequences in patients with cerebellar stroke. NeuroImage Clin. 12, 765–775 (2016).
Plowman, E., Hentz, B. & Ellis, C. Jr. Post-stroke aphasia prognosis: A review of patient-related and stroke-related factors. J. Eval. Clin. Pract. 18, 689–694 (2012).
Lee, K. B. et al. Brain lesions affecting gait recovery in stroke patients. Brain Behav. 7, e00868 (2017).
Sul, B. et al. Association of lesion location with long-term recovery in post-stroke aphasia and language deficits. Front. Neurol. 10, 776 (2019).
Singer, O. C., Sitzer, M., de Rochemont, R. M. & Neumann-Haefelin, T. Practical limitations of acute stroke MRI due to patient-related problems. Neurology 62, 1848 (2004).
Schneider, L. B., Libman, R. B. & Kanner, R. Utility of repeat brain imaging in stroke. Am. J. Neuroradiol. 17, 1259–1263 (1996).
George, A. J. et al. Trimming the fat in acute ischemic stroke: An assessment of 24-h CT scans in tPA patients. Int. J. Stroke Off. J. Int. Stroke 10, 37–41 (2014).
Schurig, J. et al. Frequency of hemorrhage on follow up imaging in stroke patients treated with rt-PA depending on clinical course. Front. Neurol. 10, 368 (2019).
Mosley, I., Nicol, M., Donnan, G., Patrick, I. & Dewey, H. Stroke symptoms and the decision to call for an ambulance. Stroke 38, 361–366 (2007).
(CDC), C. f. D. C. a. P. Pre-Hospital and Hospital Delays After Stroke Onset—United States, 2005–2006. 474–478 (2007).
Chen, C.-H. et al. Pre-hospital and in-hospital delays after onset of acute ischemic stroke: A hospital-based study in southern Taiwan. Kaohsiung J. Med. Sci. 23, 552–559 (2007).
Matsuo, R. et al. Association between onset-to-door time and clinical outcomes after ischemic stroke. Stroke 48, 3049–3056 (2017).
Bart-Erdene, B.-O. & Saver, J. L. Automatic acute stroke symptom detection and emergency medical systems alerting by mobile health technologies: A review. J. Stroke Cerebrovasc. Dis. 30, 105826 (2021).
Sandman, C. A., McCanne, T. R., Kaiser, D. N. & Diamond, B. Heart rate and cardiac phase influences on visual perception. J. Comp. Physiol. Psychol. 91, 189–202 (1977).
Garfinkel, S. N. et al. Fear from the heart: Sensitivity to fear stimuli depends on individual heartbeats. J. Neurosci. 34, 6573–6582 (2014).
Critchley, H. D. & Garfinkel, S. N. The influence of physiological signals on cognition. Curr. Opin. Behav. Sci. 19, 13–18 (2018).
Al, E. et al. Heart-beat interactions shape somatosensory perception and evoked potentials. Proc. Natl. Acad. Sci. 117, 10575–10584 (2020).
Al, E., Iliopoulos, F., Nikulin, V. V. & Villringer, A. Heartbeat and somatosensory perception. Neuroimage 238, 118247 (2021).
van Leeuwen, K. G. et al. Cost-effectiveness of artificial intelligence aided vessel occlusion detection in acute stroke: An early health technology assessment. Insights Imaging 12, 133 (2021).
Tzourio-Mazoyer, N. et al. Automated anatomical labeling of activations in SPM using a macroscopic anatomical parcellation of the MNI MRI single-subject brain. Neuroimage 15, 273–289 (2002).
Maldjian, J. A., Laurienti, P. J., Kraft, R. A. & Burdette, J. H. An automated method for neuroanatomic and cytoarchitectonic atlas-based interrogation of fMRI data sets. Neuroimage 19, 1233–1239 (2003).
Saa, J. P. et al. Longitudinal evaluation of cognition after stroke—A systemic scoping review. PLoS ONE 14, e0221735 (2019).
Quinn, T. J., Elliott, E. & Langhorne, P. Cognitive and mood assessment tools for use in stroke. Stroke 49, 483–490 (2018).
Stolwyk, R. J., O’Neill, M. H., McKay, A. J. D. & Wong, D. K. Are cognitive screening tools sensitive and specific enough for use after stroke? A systematic literature review. Stroke 45, 3129–3134 (2014).
Burton, L. & Tyson, S. F. Screening for cognitive impairment after stroke: A systematic review of psychometric properties and clinical utility. J. Rehabil. Med. 47, 193–203 (2015).
Pendlebury, S. T. & Rothwell, P. M. Prevalence, incidence, and factors associated with pre-stroke and post-stroke dementia: A systematic review and meta-analysis. Lancet Neurol. 8, 1006–1018 (2009).
Hama, S. et al. Depression or apathy and functional recovery after stroke. Int. J. Geriatr. Psychiatry 22, 1046–1051 (2007).
Hama, S., Yamashita, H., Yamawaki, S. & Kurisu, K. Post-stroke depression and apathy: Interactions between functional recovery, lesion location, and emotional response. Psychogeriatrics 11, 68–76 (2011).
Ayerbe, L., Ayis, S., Wolfe, C. D. A. & Rudd, A. G. Natural history, predictors and outcomes of depression after stroke: Systematic review and meta-analysis. Br. J. Psychiatry 202, 14–21 (2013).
Pollock, A., St George, B., Fenton, M. & Firkins, L. Top 10 research priorities relating to life after stroke—Consensus from stroke survivors, caregivers, and health professionals. Int. J. Stroke Off. J. Int. Stroke 9, 313–320 (2014).
Weaver, N. A. et al. Strategic infarct locations for post-stroke cognitive impairment: A pooled analysis of individual patient data from 12 acute ischaemic stroke cohorts. Lancet Neurol. 20, 448–459 (2021).
Bright, M. G., Whittaker, J. R., Driver, I. D. & Murphy, K. Vascular physiology drives functional brain networks. Neuroimage 217, 116907 (2020).
Siegel, J. S. et al. Disruptions of network connectivity predict impairment in multiple behavioral domains after stroke. Proc. Natl. Acad. Sci. 113, E4367–E4376 (2016).
Baldassarre, A., Ramsey, L. E., Siegel, J. S., Shulman, G. L. & Corbetta, M. Brain connectivity and neurological disorders after stroke. Curr. Opin. Neurol. 29, 706–713 (2016).
Griffis, J. C., Metcalf, N. V., Corbetta, M. & Shulman, G. L. Structural disconnections explain brain network dysfunction after stroke. Cell Rep. 28, 2527–2540 (2019).
Egorova, N. et al. Lower cognitive control network connectivity in stroke participants with depressive features. Transl. Psychiatry 7, 4 (2017).
Li, B.-J. et al. A brain network model for depression: From symptom understanding to disease intervention. CNS Neurosci. Ther. 24, 1004–1019 (2018).
Papakostas, G. I. Cognitive symptoms in patients with major depressive disorder and their implications for clinical practice. J. Clin. Psychiatry 75, 8–14 (2014).
Gotlib, I. H. & Joormann, J. Cognitive and depression: Current status and future directions. Annu. Rev. Clin. Psychol. 6, 285–312 (2010).
van den Heuvel, M. P. & Sporns, O. Network hubs in the human brain. Trends in Cognit. Sci. 17, 3285 (2013).
Button, K. S. et al. Power failure: Why small sample size undermines the reliability of neuroscience. Nat. Rev. Neurosci. 4, 365–376 (2013).
Bono, R., Blanca, M. J. & Arnau, J. Non-normal distributions commonly used in health, education, and social sciences: A systematic review. Front. Psychol. 8, 1602 (2017).
Hackshaw, A. Small studies: Strengths and limitations. Eur. Respir. J. 32, 1141–1143 (2008).
Sirsat, M. S., Fermé, E. & Câmara, J. Machine learning for brain stroke: A review. J. Stroke Cerebrovasc. Dis. 29, 105162 (2020).
Heo, J. N. et al. Machine learning-based model for prediction of outcomes in acute stroke. Stroke 50, 1263–1265 (2019).
Orfanoudaki, A. et al. Machine learning provides evidence that stroke risk is not linear: The non-linear Framingham stroke risk score. PLoS ONE 15, e0232414 (2020).
Zhu, H. et al. An automatic machine learning approach for ischemic stroke onset time identification based on DWI and FLAIR imaging. NeuroImage Clin. 31, 102744 (2021).
Panesar, S. S., D’Souza, R. N., Yeh, F.-C. & Fernandez-Miranda, J. C. Machine learning versus logistic regression methods for 2-year mortality prognostication in a small, heterogeneous glioma database. World Neurosurg. X 2, 100012 (2019).
Desai, R. J., Wang, S. V., Vaduganathan, M., Evers, T. & Schneeweiss, S. Comparison of machine learning methods with traditional models for use of administrative claims with electronic medical records to predict heart failure outcomes. JAMA Netw. Open 3, e1918962 (2020).
Seghier, M. L., Ramsden, S., Lim, L., Leff, A. P. & Price, C. J. Gradual lesion expansion and brain shrinkage years after stroke. Stroke 45, 877–879 (2014).
Johansson, B. B. Brain plasticity and stroke rehabilitation. Stroke 31, 223–230 (2000).
Hara, Y. Brain plasticity and rehabilitation in stroke patients. J. Nippon Med. Sch. 82, 4–13 (2015).
Marchina, S. et al. Impairment of speech production predicted by lesion load of the left arcuate fasciculus. Stroke 42, 2251–2256 (2011).
Hope, T. M. H., Seghier, M. L., Leff, A. P. & Price, C. J. Predicting outcome and recovery after stroke with lesions extracted from MRI images. NeuroImage Clin. 2, 424–433 (2013).
Chen, C. L., Tang, F. T., Chen, H. C., Chung, C. Y. & Wong, M. K. Brain lesion size and location: Effects on motor recovery and functional outcome in stroke patients. Arch. Phys. Med. Rehabil. 81, 447–452. https://doi.org/10.1053/mr.2000.3837 (2000).
Salvalaggio, A., De Grazia, M. F., Zorzi, M., de Schotten, M. T. & Corbetta, M. Post-stroke deficit prediction from lesion and indirect structural and functional disconnection. Brain 143, 2173–2188 (2020).
Aben, H. P. et al. Extent to which network hubs are affected by ischemic stroke predicts cognitive recovery. Stroke 50, 2768–2774 (2019).
Zarahn, E. et al. Prediction of motor recovery using initial impairment and fMRI 48 h poststroke. Cereb. Cortex 21, 2712–2721 (2011).
The authors declare no competing interests.
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.
About this article
Cite this article
Sutoko, S., Atsumori, H., Obata, A. et al. Artificially-reconstructed brain images with stroke lesions from non-imaging data: modeling in categorized patients based on lesion occurrence and sparsity. Sci Rep 12, 10116 (2022). https://doi.org/10.1038/s41598-022-14249-z
By submitting a comment you agree to abide by our Terms and Community Guidelines. If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.