Machine learning to reveal hidden risk combinations for the trajectory of posttraumatic stress disorder symptoms

Takahashi, Yuta; Yoshizoe, Kazuki; Ueki, Masao; Tamiya, Gen; Zhiqian, Yu; Utsumi, Yusuke; Sakuma, Atsushi; Tsuda, Koji; Hozawa, Atsushi; Tsuji, Ichiro; Tomita, Hiroaki

doi:10.1038/s41598-020-78966-z

Download PDF

Article
Open access
Published: 10 December 2020

Machine learning to reveal hidden risk combinations for the trajectory of posttraumatic stress disorder symptoms

Yuta Takahashi^1,2,3,
Kazuki Yoshizoe⁴,
Masao Ueki^2,4,
Gen Tamiya^1,2,4,
Yu Zhiqian^2,3,
Yusuke Utsumi¹,
Atsushi Sakuma¹,
Koji Tsuda^4,5,
Atsushi Hozawa²,
Ichiro Tsuji^1,2 &
…
Hiroaki Tomita^1,2,3

Scientific Reports volume 10, Article number: 21726 (2020) Cite this article

1565 Accesses
3 Citations
6 Altmetric
Metrics details

Subjects

Abstract

The nature of the recovery process of posttraumatic stress disorder (PTSD) symptoms is multifactorial. The Massive Parallel Limitless-Arity Multiple-testing Procedure (MP-LAMP), which was developed to detect significant combinational risk factors comprehensively, was utilized to reveal hidden combinational risk factors to explain the long-term trajectory of the PTSD symptoms. In 624 population-based subjects severely affected by the Great East Japan Earthquake, 61 potential risk factors encompassing sociodemographics, lifestyle, and traumatic experiences were analyzed by MP-LAMP regarding combinational associations with the trajectory of PTSD symptoms, as evaluated by the Impact of Event Scale-Revised score after eight years adjusted by the baseline score. The comprehensive combinational analysis detected 56 significant combinational risk factors, including 15 independent variables, although the conventional bivariate analysis between single risk factors and the trajectory detected no significant risk factors. The strongest association was observed with the combination of short resting time, short walking time, unemployment, and evacuation without preparation (adjusted P value = 2.2 × 10⁻⁴, and raw P value = 3.1 × 10⁻⁹). Although short resting time had no association with the poor trajectory, it had a significant interaction with short walking time (P value = 1.2 × 10⁻³), which was further strengthened by the other two components (P value = 9.7 × 10⁻⁵). Likewise, components that were not associated with a poor trajectory in bivariate analysis were included in every observed significant risk combination due to their interactions with other components. Comprehensive combination detection by MP-LAMP is essential for explaining multifactorial psychiatric symptoms by revealing the hidden combinations of risk factors.

Multi-omic biomarker identification and validation for diagnosing warzone-related post-traumatic stress disorder

Article Open access 10 September 2019

Kelsey R. Dean, Rasha Hammamieh, … Charles Marmar

Assessment of early neurocognitive functioning increases the accuracy of predicting chronic PTSD risk

Article 26 January 2022

Katharina Schultebraucks, Ziv Ben-Zion, … Arieh Y. Shalev

Pre-deployment risk factors for PTSD in active-duty personnel deployed to Afghanistan: a machine-learning approach for analyzing multivariate predictors

Article Open access 02 June 2020

Katharina Schultebraucks, Meng Qian, … Charles R. Marmar

Introduction

The symptoms of posttraumatic stress disorder (PTSD) after the disasters could take multiple trajectories¹. In a population-based longitudinal study, Welch et al.² identified six clusters of PTSD symptom trajectories after the disaster: low-stable (48.9%), moderate-stable (28.3%), moderate-increasing (8.2%), high-stable (6.0%), high-decreasing (6.6%), and very high-stable (2.0%).

Although factors that modulate the prognosis of PTSD symptoms after disaster have been investigated, the effect size of each single factor seems to be too weak to explain the variety of trajectories observed in clinical practice^3,4. For example, Kessler et al.³ reported that middle age and low income were slightly associated with the trends in PTSD symptoms in a longitudinal surveillance study after a hurricane, but these risks explained only 2.1% of the variance in the trajectories of PTSD symptoms. Both Kessler et al.³ and Adams and Boscarino⁴ reported that the degree of exposure to stressful events was a significant predictor only for the onset of PTSD and not for the trends in PTSD after a disaster.

Considering the multifactorial nature of the condition, the most straightforward approach for obtaining useful information with sufficient effect sizes regarding the prognosis of PTSD would be effective accumulation of risk factors by considering interaction among the factors. Several studies have demonstrated interactions among risk and protective factors for the prognosis of PTSD symptoms^5,6,7,8,9. Satisfaction with social support has a significantly larger positive effect on the prognosis of PTSD symptoms in females than in males⁵. Excessive alcohol intake can have a large impact on the exacerbation of PTSD symptoms in males⁶. Loss of family members or lack of family support influences the prognosis of PTSD more in younger subjects than in older subjects^7,8. Drožđek et al. considers combinations of risk factors and shows the hidden long-term impacts of exposure to war and violence⁹.

In previous studies to elucidate combinational risk factors by focusing on interactions, the major limitation was that candidate risk factors were selected based on their association with the target symptom. However, a factor showing no association with the target symptom in bivariate analysis could plausibly contribute to reliable combinational risk predictors by strong interaction with other factors; such a risk factor that is apparent only in combinational analysis can be referred to as a “hidden risk component”. Therefore, although several risk predictors for PTSD prognosis have already been suggested by previous bivariate association studies, a comprehensive combination detection study based on a number of potential risk factors, without selection by other statistics, would be useful to detect reliable combinational risk predictors.

Despites the potential usefulness of comprehensive combination detection studies in detecting hidden risk components, such studies have been infeasible due to high computational costs and excessively severe multiple-testing correction. For example, if 30 potential risk factors were tested for combinational risks, there would be 2³⁰ (> 10⁹) possible combinations. Therefore, if all of these combinations were tested, the computational cost would be so high as to render the calculation impractical, and the raw P value would need to be no greater than 4.6 × 10⁻¹¹ for “significance” at the α = 0.05 level after a Bonferroni correction.

The Massive Parallel Limitless-Arity Multiple-testing Procedure (MP-LAMP) was developed to explore significant combinational risk factors among a large number of independent variables^10,11. LAMP is a novel algorithm that renders comprehensive detection of significant combinations feasible by reducing computational costs and preventing excessively severe multiple-testing correction by avoiding unnecessary significance tests of potential risk combinations that (1) cannot be significant or (2) are completely dependent on each other^12,13,14,15. First, if the number of subjects with a potential risk combination is sufficiently small, the association between the combination and the target variable (e.g., psychiatric symptom score) can never be significant, regardless of the values of the target variables (detailed in Supplementary methods). These combinations do not influence the familywise error rate¹⁶ and are ignored in the LAMP algorithm. Second, the possible risk combinations are often completely dependent on each other. For example, when all of the subjects with risk factors A and B have risk factor C, the subject group with the risk combination of A and B and the subject group with A, B, and C would be the same. In this case, LAMP conducts significance tests only for combinations with more components (i.e., A, B, and C) and avoids unnecessary duplicate tests. Through the abovementioned two procedures, the LAMP algorithm makes comprehensive significant combination detection feasible under the condition that the familywise error rate is controlled rigorously under the threshold. MP-LAMP is a software tool to accelerate LAMP calculations and render it feasible in large datasets by utilizing parallel calculations.

The current study targets a relatively long-term prognosis of PTSD symptoms because of clinical importance. According to previous studies, the short-term prognosis of PTSD was largely explained by the severity of PTSD symptoms just after the disaster^3,4. Then, people who have severe PTSD just after the disaster easily obtain access to specialized treatments. In contrast, the long-term prognosis of PTSD is weakly explained by the symptoms just after the disaster¹⁷, and appropriate support is possibly not provided to people who suffer from delayed PTSD symptoms. In this case, the prediction of PTSD prognosis based on various risk factors would be useful to provide adequate support to high-risk populations. Nevertheless, the long-term prognosis of PTSD after a natural disaster has rarely been surveyed, and there is little evidence we can consult¹⁷.

In the current study, we applied MP-LAMP to identify combinational risk factors that modulate the prognosis of residents severely affected by the Great East Japan Earthquake regarding PTSD symptoms, as measured by Impact of Event Scale-Revised (IES-R) scores. We conducted annual surveys to evaluate the mental health condition of all residents whose houses were located in the town of Shichigahama and had been destroyed or severely damaged by the catastrophe^18,19. We utilized datasets including 624 subjects who completed the surveys in 2011, 2012 and 2018. To investigate the risk factors that modulate the prognosis of PTSD symptoms, we used IES-R scores in the 8th year adjusted for those in the 1st year, referred to hereinafter as “PTSD trajectory scores”, as the target variables, following the methods of previous studies^3,4,20. The PTSD trajectory score represents the change in PTSD symptoms that is not explained by the baseline PTSD symptoms. This derived measure is beneficial in the search for useful risk factors that can be used in conjunction with baseline symptomatology to predict the prognosis of PTSD symptoms. We utilized MP-LAMP to explore combinational explanatory factors for PTSD trajectory scores based on information about stressors (experience related to the tsunami or earthquake, loss of loved ones), sociodemographics, lifestyle, and clinical information collected just after the disaster. The results of MP-LAMP regarding combinational risk factors were compared with those of the conventional association tests for individual risk factors, referred to hereinafter as “bivariate analyses”.

Material and methods

Subjects

This study is based on a health survey administered as part of a project called the Shichigahama Health Promotion Project^18,19. The first survey was conducted in November 2011 following the Great East Japan Earthquake and Tsunami of March 11th, 2011. Annual surveys were conducted thereafter, and the latest survey before the current analysis was conducted in October 2018. This study is based on the questionnaire collected on the first, second, and eighth (i.e., the latest) survey. In the study population of 2,478 Japanese subjects who were at least 18 years old and whose houses were totally collapsed or severely damaged, 1,791 subjects participated in the first year survey and returned the questionnaire after giving written informed consent. Among those subjects, 1,173 participated in the second survey, and 636 participated in the first, second, and eighth surveys. Then, the subjects who omitted > 20% of items on the IES-R items or potential risk factors were excluded based on previous studies^21,22,23, and those who omitted > 50% of items on the questionnaire were also excluded based on the literature reviewed^24,25,26.

Questionnaire

Because the purpose of the current study is to elucidate risk predictors available just after the disaster for the prognosis of PTSD, the data utilized as potential risk factors were mostly based on the questionnaire collected in the first year. The data from the first survey included sociodemographic characteristics (age, sex, and employment status), lifestyle (smoking status, alcohol drinking, daily time spent walking/sitting/sleeping), clinical information (past medical history), the Kessler Psychological Distress scale (K6), the Athens insomnia scale (AIS), and the Lubben Social Network Scale-6 (LSNS-6). In addition, the data related to experiences of the earthquake and tsunami (the evacuation, witnessing the tsunami, life-threatening experiences, witnessing threats to other people’s lives, death of family or friends) and changes in income or work volume were collected in the second year survey. The abovementioned 61 variables were utilized as potential risk factors for the prognosis of PTSD in the following analyses.

Outcome measures

The IES-R score was used as an indicator of PTSD symptoms. The respondents were asked about their PTSD symptoms over the previous week based on 22 questions, to which they responded by selecting “extremely” (4 points), “quite a bit” (3 points), “moderately” (2 points), “a little bit” (1 points), or “not at all” (0 points). The total scores ranged from 0 to 88. IES-R scores correlate well with the criteria for PTSD in the Diagnostic and Statistical Manual of Mental Disorders (DSM), and IES-R is one of the most commonly used metrics of PTSD symptomatology^27,28. To evaluate the long-term change in PTSD symptoms that was not explained by the baseline PTSD symptoms, we utilized the eighth-year IES-R adjusted by the first year IES-R as a target variable in the following analysis; we refer to this measure as the “PTSD trajectory score” throughout the manuscript.

Statistical analyses

After the abovementioned exclusion of subjects with high rates of missing responses on the questionnaire, the missing rates among IES-R items and potential risk factors were 0.5% and 2.9%, respectively. After confirming that there were no statistically significant bias effects caused by the missing data, we imputed the missing numbers nonparametrically using the missForest package²⁹ in R because the LAMP analyses require datasets without missing data (Supplementary methods).

To detect all significant combinational risk factors, we used MP-LAMP. MP-LAMP is a software package to accelerate the LAMP algorithm^10,11. The LAMP algorithm renders combinational significance detection feasible by ignoring combinations that cannot be significant or are completely dependent on each other^12,13,14. To select testable combinations, the LAMP algorithms utilized machine learning techniques of frequent itemset mining. The LAMP algorithm utilizes a calibrated Bonferroni method to correct for multiple testing under the condition that the familywise error rate is controlled rigorously under the threshold. The LAMP was originally developed for biological data, but the method has already been used for survey data^30,31. In the current analysis, the main analysis was not adjusted for potential confounding factors following the previous LAMP-based survey studies^30,31, while additional analysis adjusted for age and sex was also performed to check the consistency of the results. In this additional analysis, the PTSD trajectory score adjusted for age and sex was utilized as a target variable. The source code for MP-LAMP is available at https://github.com/tsudalab/mp-lamp.

Because the independent variables must be binary in order for MP-LAMP to detect combinational risks, some variables were converted to binary values by setting cutoffs. For those of the scales that already had proposed cutoffs, those cutoffs were utilized (5/6 and 12/13 for K6, 5/6 for AIS, and 11/12 for LSNS-6)^32,33,34,35. For other ordinal variables with more than three levels and for all continuous variables, the variables were first discretized into ordinal variables with three levels of approximately equal frequency by using the infotheo R package³⁶ and then converted into binary variables with the highest or lowest level as the risk group and the remaining two levels as the nonrisk group. This division was chosen because MP-LAMP requires substantially more computational time to analyze independent variables with a higher frequency of membership in the risk group. The detailed process of converting ordinal variables into binary variables is shown in the Supplementary methods.

For comparison with the results of the combinational analysis, conventional association analysis for the same response and independent variables was also performed. We implemented linear regression adjusted by age and sex to evaluate the association between adjusted IES-R and each independent variable, a procedure referred to as “bivariate analysis” throughout the manuscript in contrast to the combinational analysis by MP-LAMP. Multiple-testing correction was performed using the Bonferroni method to control the familywise error rate.

The Mann–Whitney U test was implemented to evaluate the association between the potential risk combinations and the PTSD trajectory score. In addition to the MP-LAMP software, R was utilized in statistical analyses³⁷. P < 0.05 was considered to indicate statistical significance.

Ethics approval and consent to participate

All protocols for the studies were approved by the Ethics Committee of Tohoku University. Written informed consent was obtained from all subjects. This study was carried out according to the principles expressed in the Declaration of Helsinki.

Results

Of the variance of IES-R scores in the 8th year, only 23.5% was explained by the baseline IES-R, and the remaining explanatory factors were explored using the PTSD trajectory score as a target variable in the following analyses.

Demographic and trauma-exposure information

The demographic characteristics and trauma exposure of the subjects are summarized in Table 1. Older age, female gender and a high degree of traumatic exposure had a strong association with high baseline IES-R scores; however, they had a weaker association or no association at all with the PTSD trajectory score. After correcting for multiple testing, there were no significant associations between PTSD trajectory scores and demographic or trauma information in bivariate analyses.

Table 1 Demographic characteristics and trauma exposure of participants.

Full size table

Comprehensive combinational risk detection analysis

The 61 abovementioned potential risk factors were subjected to comprehensive combinational risk detection analysis by MP-LAMP and bivariate analysis. Although bivariate analyses detected no significant predictors of PTSD trajectory scores, combinational association analyses by MP-LAMP detected 56 significant combinations, in which 15 independent variables were used at least once each as components. The P values of the representative significant combinations shown by MP-LAMP and the components of the significant combinations are illustrated in Fig. 1. Compared with bivariate analyses, the comprehensive combination detection approach substantially increased the power to detect significant predictors of PTSD trajectory scores. All of the significant combinations and the results of the bivariate analyses for individual risk factors are shown in Supplementary Tables S1 and S2.

The significant combinations yielded by comprehensive combination detection were completely different from the combinations selected solely based on the strength of association in the bivariate analyses, as the interactions among the risk factors also contributed to the strength of association in the combinational analysis. To maximize the association with the target variable through interactions among components, each significant risk combination identified by MP-LAMP included at least one component that had no association with the target variable (raw P value > 0.05) in bivariate analyses. The average (SD) numbers of interactions with P < 0.05 and P < 0.01 by analysis of variance among the components of the significant risk combinations were 4.9 (2.9) and 2.5 (1.3), respectively, which were substantially higher than the 95% confidence intervals of 1.2–1.9 and 0.3–0.7 calculated from randomly selected combinations consisting of the equivalent number of components (100,000 bootstrap replications; Supplementary Table S1).

The additional analysis adjusted for age and sex was also performed to check the consistency of the results, and the significant risk combinations in this analysis are shown in Supplementary Table S3. The significant risk combinations in this additional analysis largely overlapped with the main analysis. Specifically, the top 10 significant risk combinations in the main analysis were also significant in this additional analysis, while all 15 significant risk combinations in the additional analysis were also significant in the main analysis.

The combination most strongly associated with the PTSD trajectory score

The combination that was most strongly associated with the PTSD trajectory score was unemployment, walking less than 30 min/day, short resting time (sitting or napping for less than 3 h/day), and evacuation without preparation (adjusted P value = 2.2 × 10⁻⁴, and raw P value = 3.1 × 10⁻⁹). The effect size of this combination and its components on the IES-R scores are illustrated in Fig. 2A, and the combination was demonstrated to have a substantially stronger effect size on the IES-R in the 8th year than any single component.

Although short resting time was not significantly associated with the PTSD trajectory score in bivariate analyses (adjusted P value > 1, raw P value = 0.055), it had a significant interaction with short walking time (P value = 1.2 × 10⁻³), which was further strengthened by the other two components (P value = 9.7 × 10⁻⁵). To illustrate this significant interaction, Fig. 2B shows the interaction regarding effect sizes on IES-R scores between short sitting/napping time and the other components. The effect size of short sitting/napping time on the 8th-year IES-R score increased in the subgroup selected based on the other components, which reflected the interaction among these factors.

Discussion

The current study used MP-LAMP to explore the combinational risk factors that modulate the long-term prognosis of PTSD symptoms after a disaster, showing that (1) the combinational risk approach increased the power and detected novel significant risk factors and that (2) the significant combinations detected by the comprehensive combination approach included interactions among the components.

Although bivariate analyses detected no significant risk factors, the combinational approach detected 56 combinational risk factors consisting of 15 independent variables, demonstrating that the combinational approach substantially increased the power to detect risk factors associated with PTSD trajectory scores. The remarkable point was not merely that the detection power was increased by the combinational analyses but that the risk factors newly detected in combinational analyses were completely different from the ones detected by loosening significance levels in the bivariate analyses. Among 15 independent variables included at least once in the significant combinations, there were 10 variables that had no association with the target variable in bivariate analyses (raw P value > 0.05); these 10 variables could be referred to as “hidden risk components”. Based on this finding, in the search for risk factors to increase predictive performance, the conventional approach of combining the previously reported risk factors would be useless for identifying the most reliable predictor combinations including hidden risk components; only a comprehensive combination detection approach considering all possible interactions among the variables, regardless of whether each variable would be counted as a risk factor based on bivariate analyses, could detect hidden risk components.

In the search for combinational predictors, the major reason to include hidden risk components that have no bivariate association with the target variable is that, although a factor may carry a low risk in bivariate analyses and lack a strong association with the target variable, it can interact with other components that increase the association between the combination and the target variable. The significant risk combinations detected by MP-LAMP consisted of the components among which there were significantly more and stronger interactions than randomly selected combinations. The interactions detected by analysis of variance included not only interactions among two components (49%) but also interactions among three or more components (51%). Most of the previous studies investigating interactions among risk factors for PTSD symptoms analyzed only the interactions between pairs of components among several risk factors^5,38, mainly because comprehensive interactions including three or more components consist of an exponentially larger number of possible combinations. MP-LAMP resolved this problem by ignoring “untestable” combinations, whose frequency is too small to be significant, and investigated all possible interaction patterns without limitation of the number of components, which successfully revealed the significant risk combinations that explain the trajectory of PTSD symptoms.

The risk combination approach can provide useful interpretation for clinical practice based on the relationship between risk factors. Previous studies using conventional bivariate analysis stated that the degree of traumatic experience influenced only PTSD symptoms just after the disaster but did not influence PTSD symptom prognosis^3,4. However, the results from the current risk combination analysis presented another view about the relationship between the traumatic experience and the prognosis of PTSD. In the current study, most of the significant risk combinations include the risk factors of a traumatic experience (e.g., evacuation without preparation or life-threatening experience), working status (e.g., unemployment), and lifestyle factors (e.g., short walking time or short resting time). The distribution of PTSD trajectory scores in the set of subjects selected by combinational or single risk factors is shown in Supplementary Fig. S1. As shown in this figure, although no single traumatic factor increased the PTSD trajectory score by itself, the combination of the traumatic factors, working status, and lifestyle factors increased the PTSD trajectory scores through the interactions. In clinical practice, these results imply that surveillance about not only the traumatic experience but also the social or lifestyle information is useful to assess the high-risk population for long-term prognosis.

In the current analyses, female gender was associated with elevated baseline PTSD symptoms (P value = 3.1 × 10⁻⁴) but did not influence the PTSD trajectory score (P value = 0.45) in bivariate analyses. However, the gender factor had a significant interaction with decreased income (P value = 2.7 × 10⁻³), physical condition (not good) (P value = 8.1 × 10⁻³) and older age (P value = 0.025), and was included in some of the significant risk combinations for PTSD trajectory scores. Based on these findings, the factor of gender alone cannot be considered to influence the trajectory of recovery from PTSD symptoms; however, the risks factors of income, physical condition, and age can influence recovery from PTSD symptoms more severely in females than in males.

The variance explained by the risk factors was calculated to compare the results with those of the previous studies (Supplementary Tables S1 and S2). Among the significant risk factor combinations, the combinations of unemployment, short walking time, short resting time, evacuation without preparation, life-threatening experience, and decreased income explained the largest variance in the PTSD trajectory score (8.5%). Among single risk factors, physical conditions (poor) and decreased work explained the largest variance (2.0%). The abovementioned values did not conflict with the findings of a previous study³. For example, Kessler et al. showed that the PTSD prognosis explained by the strongest risk factors (age and incomes) was 2.1% in a 2-year longitudinal surveillance study after a disaster³. The current study’s approach to creating risk combinations was shown to be useful to combine the effects of single risk factors.

The components of the significant risk combinations in the current study did not conflict with the previous PTSD prognosis study after the disaster^3,4. The significant risk combinations in the current study were composed of gender, age, working condition, lifestyle factors (e.g., working time or sleeping time), life events (e.g., loss of family), and distress scale (i.e., K6 score). Although there are no risk combination studies, there are a couple of studies using bivariate analysis to search for risk factors for PTSD prognosis after the disaster. Kessler et al. performed a 2-year longitudinal study after Hurricane Katrina that suggested that PTSD prognosis was influenced by the risk factors of age and working condition³. Adams et al. performed a 2-year longitudinal study after the World Trade Center Disaster, which suggested that the change in PTSD symptoms was influenced by negative life events, Latino ethnicity, and reduced self-esteem⁴. Considering the similarity between the results of the current study and those of previous studies, the current results could be applied to PTSD prognosis after various types of disasters. In contrast, the risk factors for PTSD prognosis from the other types of trauma (e.g., violence) should be explored in future studies based on an appropriate study population.

The current study discussed the long-term prognosis of PTSD symptoms based on information from mainly two time points (i.e., just after the disaster and 7 years after the disaster). Compared with previous studies on the short-term prognosis of PTSD symptoms^3,4, the relationship between the risk factors and the predicted prognosis would be more complicated. Future studies that utilize information about new exposure after the disaster and detailed trajectory of PTSD symptoms would support our further understanding of the long-term prognosis of PTSD.

Although the LAMP minimizes false negatives by calibrating the Bonferroni factor, maintains statistical power under multiple comparisons and provides the significant P values for each combination against the outcomes, the risk factors identified by LAMP should be confirmed using ordinary statistical methods. In the current study, the validity of statistical methods was confirmed by checking the interaction, the distribution, and the variance explained by significant risk combinations as well as bivariate analysis for each risk component.

The current study has several limitations. First, the sample size was relatively small (624 subjects). This is a common problem for PTSD prognosis studies after natural disasters because a limited number of people are exposed to the disaster³⁹. On the other hand, we achieved high levels of significance when we applied the combinational analysis, which suggests that the results in the present study are reliable. Second, the current MP-LAMP source code does not implement the function to adjust covariates. Therefore, we additionally performed the analysis using the target variable adjusted for potential confounding factors (Supplementary Table S3) and confirmed the consistency of the results. Considering the large overlap between significant risk combinations between the main analysis and the adjusted analysis, serious confounding was not observed in the current analysis. Third, each significant combination detected in the current study must be tested for reproducibility in an independent validation cohort in the future. To evaluate the generalizability of the results, future combinational risk studies conducted with different ethnicities or different traumatic experiences are needed.

Conclusions

A comprehensive approach using MP-LAMP to detect significant combinations increased the power of the analysis and revealed significant risk combinations for high PTSD trajectory scores. Considering that hidden risk components were included in all of the detected significant risk combinations, a comprehensive combinational approach will be essential for detecting reliable risk combinations strongly associated with psychiatric conditions.

Data availability

The datasets analyzed during the current study are not publicly available due to ethical and privacy reasons.

Abbreviations

PTSD:: Posttraumatic stress disorder
MP-LAMP:: Massive Parallel Limitless-Arity Multiple-testing Procedure
K6:: Kessler Psychological Distress scale
AIS:: Athens insomnia scale
LSNS-6:: Lubben Social Network Scale-6
DSM:: Diagnostic and Statistical Manual of Mental Disorders
IES-R:: Impact of Event Scale-Revised

References

Masten, A. S. & Obradović, J. Disaster preparation and recovery: lessons from research on resilience in human development. Ecol. Soc. 13, 9 (2008).
Article Google Scholar
Welch, A. E. et al. Trajectories of PTSD among lower Manhattan residents and area workers following the 2001 World Trade Center disaster, 2003–2012. J. Trauma Stress 29, 158–166 (2016).
Article Google Scholar
Kessler, R. C. et al. Trends in mental illness and suicidality after Hurricane Katrina. Mol. Psychiatry 13, 374–384. https://doi.org/10.1038/sj.mp.4002119 (2008).
Article PubMed PubMed Central CAS Google Scholar
Adams, R. E. & Boscarino, J. A. Predictors of PTSD and delayed PTSD after disaster: the impact of exposure and psychosocial resources. J. Nerv. Ment. Dis. 194, 485–493. https://doi.org/10.1097/01.nmd.0000228503.95503.e9 (2006).
Article PubMed PubMed Central Google Scholar
Andrews, B., Brewin, C. R. & Rose, S. Gender, social support, and PTSD in victims of violent crime. J. Trauma Stress 16, 421–427. https://doi.org/10.1023/A:1024478305142 (2003).
Article PubMed Google Scholar
Soo, J. et al. Trends in probable PTSD in firefighters exposed to the World Trade Center disaster, 2001–2010. Disaster Med. Public 5(Suppl 2), S197-203. https://doi.org/10.1001/dmp.2011.48 (2011).
Article Google Scholar
Bokszczanin, A. PTSD symptoms in children and adolescents 28 months after a flood: age and gender differences. J. Trauma Stress 20, 347–351. https://doi.org/10.1002/jts.20220 (2007).
Article PubMed Google Scholar
Husain, S. A. et al. Stress reactions of children and adolescents in war and siege conditions. Am. J. Psychiatry 155, 1718–1719. https://doi.org/10.1176/ajp.155.12.1718 (1998).
Article PubMed CAS Google Scholar
Drožđek, B., Rodenburg, J. & Moyene-Jansen, A. “Hidden” and diverse long-term impacts of exposure to war and violence. Front. Psychiatry 10, 975 (2020).
Article Google Scholar
Yoshizoe, K., Terada, A. & Tsuda, K. MP-LAMP: parallel detection of statistically significant multi-loci markers on cloud platforms. Bioinformatics (Oxford, England) 34, 3047–3049. https://doi.org/10.1093/bioinformatics/bty219 (2018).
Article CAS Google Scholar
Yoshizoe, K., Terada, A. & Tsuda, K. Redesigning pattern mining algorithms for supercomputers. arXiv preprint arXiv:1510.07787 (2015).
Terada, A., Okada-Hatakeyama, M., Tsuda, K. & Sese, J. Statistical significance of combinatorial regulations. Proc. Natl. Acad. Sci. USA 110, 12996–13001 (2013).
Article ADS MathSciNet Google Scholar
Minato, S., Uno, T., Tsuda, K., Terada, A. & Sese, J. In Joint European Conference on Machine Learning and Knowledge Discovery in Databases (eds T. Calders, F. Esposito, E. Hüllermeier, & R. Meo) 422–436 (Springer, 2014).
Terada, A. & Tsuda, K. Data Mining for Systems Biology 83–94 (Springer, Berlin, 2018).
Book Google Scholar
Terada, A., Yamada, R., Tsuda, K. & Sese, J. LAMPLINK: detection of statistically significant SNP combinations from GWAS data. Bioinformatics (Oxford, England) 32, 3513–3515. https://doi.org/10.1093/bioinformatics/btw418 (2016).
Article CAS Google Scholar
Tarone, R. E. A modified Bonferroni method for discrete data. Biometrics 46, 515–522 (1990).
Article CAS Google Scholar
Hu, S. et al. Recovery from post-traumatic stress disorder after a flood in China: a 13-year follow-up and its prediction by degree of collective action. BMC Public Health 15, 615. https://doi.org/10.1186/s12889-015-2009-6 (2015).
Article PubMed PubMed Central Google Scholar
Nakaya, N. et al. The association between medical treatment of physical diseases and psychological distress after the Great East Japan Earthquake: the Shichigahama Health Promotion Project. Disaster Med. Public 9, 374–381. https://doi.org/10.1017/dmp.2015.52 (2015).
Article Google Scholar
Tsuchiya, N. et al. Impact of social capital on psychological distress and interaction with house destruction and displacement after the Great East Japan Earthquake of 2011. Psychiatry Clin. Neurosci. 71, 52–60. https://doi.org/10.1111/pcn.12467 (2017).
Article PubMed Google Scholar
Vasterling, J. J. et al. PTSD symptom increases in Iraq-deployed soldiers: comparison with nondeployed soldiers and associations with baseline symptoms, deployment experiences, and postdeployment stress. J. Trauma Stress 23, 41–51 (2010).
Article Google Scholar
Nygaard, E., Hussain, A., Siqveland, J. & Heir, T. General self-efficacy and posttraumatic stress after a natural disaster: a longitudinal study. BMC Psychol. 4, 15. https://doi.org/10.1186/s40359-016-0119-2 (2016).
Article PubMed PubMed Central Google Scholar
Imširagić, A. S., Begić, D., Vuković, I. S., Šimićević, L. & Javorina, T. Multivariate analysis of predictors of depression symptomatology after childbirth. Psychiatr. Danub 26, 416–421 (2014).
PubMed Google Scholar
Helle, N., Barkmann, C., Ehrhardt, S. & Bindt, C. Postpartum posttraumatic and acute stress in mothers and fathers of infants with very low birth weight: cross-sectional results from a controlled multicenter cohort study. J. Affect. Disord. 235, 467–473. https://doi.org/10.1016/j.jad.2018.04.013 (2018).
Article PubMed Google Scholar
Lin, W.-C. & Tsai, C.-F. Missing value imputation: a review and analysis of the literature (2006–2017). Artif. Intell. Rev. 53, 1487–1509 (2020).
Article Google Scholar
Farhangfar, A., Kurgan, L. & Dy, J. Impact of imputation of missing values on classification error for discrete data. Pattern Recogn. 41, 3692–3705. https://doi.org/10.1016/j.patcog.2008.05.019 (2008).
Article MATH Google Scholar
Lakshminarayan, K., Harp, S. A. & Samad, T. Imputation of missing data in industrial databases. Appl. Intell. 11, 259–275. https://doi.org/10.1023/A:1008334909089 (1999).
Article Google Scholar
Weiss, D. S. In Assessing Psychological Trauma and PTSD (eds J. P. Wilson & T. M. Keane) 168–189 (The Gulford Press, 2004).
Asukai, N. et al. Reliabiligy and validity of the Japanese-language version of the impact of event scale-revised (IES-R-J): four studies of different traumatic events. J, Nerv. Ment. Dis. 190, 175–182 (2002).
Article Google Scholar
Stekhoven, D. J. missForest: Nonparametric Missing Value Imputation Using Random Forest (Astrophysics Source Code Library, 2015).
Fukuda, H. et al. Elucidation of the strongest predictors of cardiovascular events in patients with heart failure. EBioMedicine 33, 185–195. https://doi.org/10.1016/j.ebiom.2018.06.001 (2018).
Article PubMed PubMed Central Google Scholar
Shindo, K. et al. Artificial Intelligence Uncovered Clinical Factors for Cardiovascular Events in Myocardial Infarction Patients with Glucose Intolerance. Cardiovasc. Drugs Ther. 34, 535–545. https://doi.org/10.1007/s10557-020-06987-x (2020).
Article PubMed CAS Google Scholar
Prochaska, J. J., Sung, H. Y., Max, W., Shi, Y. & Ong, M. Validity study of the K6 scale as a measure of moderate mental distress based on mental health treatment need and utilization. Int. J. Methods Psychiatr. Res. 21, 88–97. https://doi.org/10.1002/mpr.1349 (2012).
Article PubMed PubMed Central Google Scholar
Furukawa, T. A., Kessler, R. C., Slade, T. & Andrews, G. The performance of the K6 and K10 screening scales for psychological distress in the Australian National Survey of Mental Health and Well-Being. Psychol. Med. 33, 357–362 (2003).
Article CAS Google Scholar
Soldatos, C. R., Dikeos, D. G. & Paparrigopoulos, T. J. The diagnostic validity of the Athens Insomnia Scale. J. Psychosom. Res. 55, 263–267 (2003).
Article Google Scholar
Lubben, J. et al. Performance of an abbreviated version of the Lubben Social Network Scale among three European community-dwelling older adult populations. The Gerontologist 46, 503–513 (2006).
Article Google Scholar
Meyer, P. E. Information-Theoretic Variable Selection and Network Inference from Microarray Data. PhD thesis of the Universite Libre de Bruxelles (2008).
R Core Team. R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing, Vienna, Austria. https://www.R-project.org/ (2018).
Lapp, L. K., Agbokou, C. & Ferreri, F. PTSD in the elderly: the interaction between trauma and aging. Int. Psychogeriatr. 23, 858–868. https://doi.org/10.1017/S1041610211000366 (2011).
Article PubMed Google Scholar
Neria, Y., Nandi, A. & Galea, S. Post-traumatic stress disorder following disasters: a systematic review. Psychol. Med. 38, 467–480. https://doi.org/10.1017/S0033291707001353 (2008).
Article PubMed CAS Google Scholar

Download references

Acknowledgements

We are grateful to Dr. Yumi Sugawara, Dr. Junko Okuyama and Ms. Harumi Nemoto, as well as the participants in the Shichigahama Health Promotion Projects, for supporting this study. This study utilized the RIKEN AIP Deep Learning Environment (RAIDEN) supercomputer system for the computations.

Funding

This work was supported by a grant from the Strategic Research Program for Brain Science from the Japan Agency for Medical Research and Development (AMED) [JP20dm0107099]; the Tohoku Medical Megabank Project of the Ministry of Education, Culture, Sports, Science and Technology (MEXT) of Japan and AMED [JP18km0105001]; a Health Sciences Research Grant for Health Services from the Ministry of Health, Labor and Welfare of Japan [H24-Kenki-Shitei-002, H25-Kenki-Shitei-002 (Fukko)]; an Intramural Research Grant for Special Project Research from the International Research Institute of Disaster Science, Tohoku University, Japan; the Core Research Cluster of Disaster Science, Tohoku University, Japan; and JST CREST JPMJCR1502. The funders had no role in the study design, data collection and analysis, decision to publish, or preparation of manuscript.

Author information

Authors and Affiliations

Graduate School of Medicine, Tohoku University, Sendai, 980-0872, Japan
Yuta Takahashi, Gen Tamiya, Yusuke Utsumi, Atsushi Sakuma, Ichiro Tsuji & Hiroaki Tomita
Tohoku Medical Megabank Organization, Tohoku University, Sendai, 980-8573, Japan
Yuta Takahashi, Masao Ueki, Gen Tamiya, Yu Zhiqian, Atsushi Hozawa, Ichiro Tsuji & Hiroaki Tomita
International Research Institute of Disaster Science, Tohoku University, Sendai, 980-8572, Japan
Yuta Takahashi, Yu Zhiqian & Hiroaki Tomita
RIKEN Center for Advanced Intelligence Project, Tokyo, 103-0027, Japan
Kazuki Yoshizoe, Masao Ueki, Gen Tamiya & Koji Tsuda
Department of Computational Biology and Medical Sciences, Graduate School of Frontier Sciences, The University of Tokyo, Chiba, 277-8568, Japan
Koji Tsuda

Authors

Yuta Takahashi
View author publications
You can also search for this author in PubMed Google Scholar
Kazuki Yoshizoe
View author publications
You can also search for this author in PubMed Google Scholar
Masao Ueki
View author publications
You can also search for this author in PubMed Google Scholar
Gen Tamiya
View author publications
You can also search for this author in PubMed Google Scholar
Yu Zhiqian
View author publications
You can also search for this author in PubMed Google Scholar
Yusuke Utsumi
View author publications
You can also search for this author in PubMed Google Scholar
Atsushi Sakuma
View author publications
You can also search for this author in PubMed Google Scholar
Koji Tsuda
View author publications
You can also search for this author in PubMed Google Scholar
Atsushi Hozawa
View author publications
You can also search for this author in PubMed Google Scholar
Ichiro Tsuji
View author publications
You can also search for this author in PubMed Google Scholar
Hiroaki Tomita
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

Y.T. drafted the manuscript. Y.T., K.Y., and M.U. participated in study design, the analyses and interpretation of the data. Y.Z., Y.U., A.S., A.H., I.T., and H.T. contributed to the data collection and interpretation of the epidemiological data. G.T. and K.T. contributed to the analyses and interpretation of the statistics. H.T. contributed to study concept and design, and the analyses and interpretation of the data. All authors contributed to the discussion, writing, review and editing of the paper.

Corresponding author

Correspondence to Yuta Takahashi.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Publisher's note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary Information

Supplementary Information.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Takahashi, Y., Yoshizoe, K., Ueki, M. et al. Machine learning to reveal hidden risk combinations for the trajectory of posttraumatic stress disorder symptoms. Sci Rep 10, 21726 (2020). https://doi.org/10.1038/s41598-020-78966-z

Download citation

Received: 24 July 2020
Accepted: 02 December 2020
Published: 10 December 2020
DOI: https://doi.org/10.1038/s41598-020-78966-z

Comments

By submitting a comment you agree to abide by our Terms and Community Guidelines. If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.

Subjects

Abstract

Similar content being viewed by others

Multi-omic biomarker identification and validation for diagnosing warzone-related post-traumatic stress disorder

Assessment of early neurocognitive functioning increases the accuracy of predicting chronic PTSD risk

Pre-deployment risk factors for PTSD in active-duty personnel deployed to Afghanistan: a machine-learning approach for analyzing multivariate predictors

Introduction

Material and methods

Subjects

Questionnaire

Outcome measures

Statistical analyses

Ethics approval and consent to participate

Results

Demographic and trauma-exposure information

Comprehensive combinational risk detection analysis

The combination most strongly associated with the PTSD trajectory score

Discussion

Conclusions

Data availability

Abbreviations

References

Acknowledgements

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Competing interests

Additional information

Publisher's note

Supplementary Information

Supplementary Information.

Rights and permissions

About this article

Cite this article

Share this article

Comments

Search

Quick links