Skip to main content

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

Machine learning-based automated classification of headache disorders using patient-reported questionnaires

Abstract

Classification of headache disorders is dependent on a subjective self-report from patients and its interpretation by physicians. We aimed to apply objective data-driven machine learning approaches to analyze patient-reported symptoms and test the feasibility of the automated classification of headache disorders. The self-report data of 2162 patients were analyzed. Headache disorders were merged into five major entities. The patients were divided into training (n = 1286) and test (n = 876) cohorts. We trained a stacked classifier model with four layers of XGBoost classifiers. The first layer classified between migraine and others, the second layer classified between tension-type headache (TTH) and others, and the third layer classified between trigeminal autonomic cephalalgia (TAC) and others, and the fourth layer classified between epicranial and thunderclap headaches. Each layer selected different features from the self-reports by using least absolute shrinkage and selection operator. In the test cohort, our stacked classifier obtained accuracy of 81%, sensitivity of 88%, 69%, 65%, 53%, and 51%, and specificity of 95%, 55%, 46%, 48%, and 51% for migraine, TTH, TAC, epicranial headache, and thunderclap headaches, respectively. We showed that a machine-learning based approach is applicable in analyzing patient-reported questionnaires. Our result could serve as a baseline for future studies in headache research.

Introduction

Headache disorders are the most common neurological symptoms and have a substantial impact on sufferers. A proper diagnosis of headaches is essential for its treatment. Currently, the diagnosis of headache disorders is highly dependent on self-report from patients and the interpretation of the self-report by clinicians. The International Classification of Headache Disorder (ICHD) was published to aid a standardized diagnosis of headache disorders1. The ICHD has three chapters including primary headache disorders, secondary headache disorders, and painful cranial neuralgias/facial pain. These chapters include a number of disorders and their subtypes. However, its clinical application may be challenging for physicians who are inexperienced in headache medicine.

There have been efforts to aid the diagnosis of primary headache disorders using neurophysiological tests2, neuroimaging3,4, and blood-based biomarkers5,6; however, these have not replaced clinical interviews. Previous studies have mainly focused on migraine with little focus on the differential diagnosis of other headache disorders7,8. Recently, a simple questionnaire for the screening of migraine was developed and validated for research purposes9. The clinical diagnosis of headache disorders should, however, be based on a holistic approach since a single characteristic cannot replace the proper diagnosis.

Recently, data-driven approaches using machine learning or deep learning have been tested in the medical field to avoid biases attributed to human factor10,11,12. These approaches have been used mostly for neuroimaging analysis in headache research3,4. In this study, we aimed to analyze self-reported symptoms of patients to classify four headache disorders including migraine, by using machine learning approaches. Real-world questionnaires obtained from more than 2000 patients were used for this study.

Methods

Subjects

This study was approved by the institutional review board (IRB) of the Samsung Medical Center (IRB 2018-10-029). Written informed consents were obtained from patients or their guardians. Our study was performed in full accordance with local IRB guidelines. A total of 2162 patients who visited our headache clinic for the first time between January 2017 and December 2018 were included in our prospective headache clinic registry. The registry was retrospectively screened for this study. All patients completed structured questionnaires. Based on the questionnaire and clinical interview, the diagnosis of headache disorders was made using the ICHD-3 beta or ICHD-3 (whichever was the most updated at the time of visit) by headache specialists (MJL with 10 years of experience and C-SC with 30 years of experience).

Headache clinic registry

The questionnaire for assessing patients on their first visit was developed by a headache specialist (MJL) and has been used in the Samsung Medical Center headache clinic since 2015. The questionnaire consists of 75 screening questions (Supplementary Table S1) including headache characteristics (e.g., intensity, location, nature of pain, and aggravation during or avoidance of physical activities), disease course (onset, the mode of onset, and the time of aggravation), associated symptoms (e.g., nausea, vomiting, photophobia, phonophobia, osmophobia, autonomic symptoms), aura, information regarding the medication used for headaches, past medical history (e.g. hypertension, diabetes, insomnia, depression, anxiety, and others), and social history (e.g. caffeine intake, smoking, alcohol consumption, and occupation). In addition to data from the questionnaire, patient demographics including age, sex, and body mass index (BMI) were prospectively recorded in the headache clinic registry and used for the analysis in this study. The ICHD-3-based diagnosis of each patient was coded in the registry.

The data of 2018 were used as the training cohort and the data of 2017 were used as the test cohort. There was no overlap between the two cohorts because the questionnaire was evaluated only for new patients. For the analysis, we merged headache disorders with similar entities into seven groups: migraine, tension-type headache (TTH), trigeminal autonomic cephalalgia (TAC), epicranial headache (including primary stabbing headache and occipital neuralgia), thunderclap headache ([TCH] including primary and secondary causes of TCH), other primary headache disorders, and secondary headaches other than those causing TCHs. Among these, we excluded other primary headaches (n = 49, training cohort) due to the high heterogeneity in the subtype. Secondary headache disorders other than TCH (n = 122, training cohort) were also excluded because this subtype presented with heterogeneous diseases that required diagnoses from the clinical course rather than headache characteristics. Data from 2162 patients, including the training cohort (n = 1286) and test cohort (n = 876), were finally used for this study. Further details of the patients are given in Table 1.

Table 1 Distribution of primary headache subtypes.

Stacked classifier model

We adopted a stacked model that consisted of four layers of binary XGBoost13 classifiers as shown in Fig. 1. Each layer of binary XGBoost classifier was used to classify subjects into two groups: the target subtype and the rest. We explored all possible orders of the stacked classifier and chose the order with the best accuracy in the training cohort. The first layer classified the most dominant subtype (i.e., migraine) and the rest (i.e., non-migraine). This enabled less challenging issues to be tackled first. The second layer classified between TTH and the rest (i.e., non-TTH). The third layer classified TAC and the rest (i.e., epicranial headaches and TCH). The final layer classified epicranial headaches and TCH. Our ordering of the classifier is similar to a multi-scale approach where one starts solving a large-scale problem before progressing on to small-scale problems.

Figure 1
figure1

Structure of the stacked classifier model. TTH tension-type headache, TAC trigeminal autonomic cephalalgia.

Feature selection

Each patient assessment was turned into a long feature vector. Continuous variable responses were normalized with a value between − 1 and 1. Categorical variable responses were converted to a one-hot vector. Multi-hot encoding was adopted for some categorical questions with multiple responses. Thus, the assessment of 75 questions for each patient was transformed into features with 128 dimensions. We applied the least absolute shrinkage and selection operator (LASSO)14 in choosing a few important features for each stacked classifier layer. For example, the LASSO was used to select features that can distinguish between migraine from non-migraine subtypes in the first layer. The LASSO was applied using the stratified tenfold cross-validation. From the cross-validation, features that appeared at least three times out of the ten folds were chosen. These features were chosen as the set of stable features and the threshold of three was chosen to maximize the classifier performance on average in the left-out fold in the training cohort within the tenfold cross-validation. The final model was re-trained using the stable features from the entire training cohort.

Classifier performance

The selected stable features were used to train the stacked XGBoost classifier. The trained classifier was evaluated on the independent test cohort. Sensitivity, specificity, and accuracy were assessed to quantify the performance of the classifiers in both cohorts. The classifiers were also evaluated using minimum sensitivity and specificity among the subtypes, to provide summary statistics over the subtypes. A confusion matrix was also provided.

Comparison with other methods

To ensure the methods used in our study are well-suited in classifying headache subtypes, we compared our feature selection method (LASSO) with support vector machine recursive feature elimination (SVM-RFE)15 and minimum-redundancy maximum-relevancy (mRMR)16 approaches. The numbers of the selected features using mRMR and SVM-RFE for each classifier layer were fixed as those of LASSO. The selected features were fed into the stacked XGBoost classifier. We also compared XGBoost with other binary classifiers such as k-nearest neighbor (k-NN), support vector machine (SVM), and random forest in each of the stacked layers with features selected by LASSO.

Results

Selected features

The feature selection procedure led to 32, 19, 6, and 22 features that corresponded to the first, second, third, and fourth layers of the stacked classifier from the training cohort (Table 2), respectively. Table 2 showed selected features positively correlated with the corresponding target subtypes. The top three prominent features in the first layer (migraine vs. non-migraine) were mode of onset: gradual, female sex, and absence of lacrimation. The top three prominent features in the second layer (TTH vs. non-TTH) were mode of onset: gradual, nature of pain: vague/cloudy, and cognitive complaint during headache attack. The top three prominent features in the third layer (TAC vs. specific headache syndromes including epicranial headache and thunderclap headache) were headache attack during sleep, headache triggered by upset stomach, and conjunctival injection. The top three prominent features in the fourth layer (epicranial headache vs. thunderclap headache) were location: retroauricular, nature of pain: electric shock-like, and nature of pain: jabbing, assuming epicranial headache as the positive subtype in the specific headache syndromes classifier.

Table 2 Selected features from different layers of the classifier.

Classifier performances

The performances of the classifiers for both cohorts were given in Table 3 and the performance using the confusion matrices were in Tables 4 and 5. The stacked XGBoost classifier using the selected features attained an accuracy of 82%, sensitivity of 87%, 66%, 85%, 65%, and 64% for the five subtypes, and specificity of 94%, 54%, 58%, 63%, and 57% for the five subtypes in the training cohort. The baseline accuracy (i.e., assigning all cases as the dominant subtype) was 67%. The stacked XGBoost classifier using the selected features led to an accuracy of 81%, sensitivity of 88%, 69%, 65%, 53%, and 51% for the five subtypes, and specificity of 95%, 55%, 46%, 48%, and 51% for the five subtypes in the test cohort. The baseline accuracy (i.e., assigning all cases as the dominant subtype) was 68%. Our approach performed better (p value < 10−8) than the baseline naïve classifier in the test cohort using Fisher’s exact test.

Table 3 Classifier performance of both cohorts.
Table 4 Confusion matrix for the training cohort. The bold numbers in the main diagonal denote correctly classified subjects.
Table 5 Confusion matrix for the test cohort. The bold numbers in the main diagonal denote correctly classified subjects.

Comparison of feature selection methods

Our feature selection method (LASSO) was compared to SVM-RFE and mRMR approaches. The numbers of the selected features using mRMR and SVM-RFE for each classifier layer were fixed as those of LASSO. The selected features were fed into the stacked XGBoost classifier. An overall accuracy, minimum sensitivity, and minimum specificity of the stacked XGBoost classifier were evaluated in the test cohort. Table 6 showed that features obtained through LASSO led to the best performance in the test cohort.

Table 6 Comparison of the proposed method with other feature selection methods in the test cohort in terms of classifier performance. The bold values indicate the highest score in each performance metric.

Comparison of binary classifiers

We compared XGBoost with k-NN, SVM, and random forest classifiers in each of the stacked layers in terms of overall accuracy, minimum sensitivity, and minimum specificity. The evaluation was performed in the test cohort (Table 7). The same features chosen from the feature selection stage using LASSO were used for all the classifiers. Although there were small differences in classifier performances, XGBoost still outperformed the other classifiers.

Table 7 Comparison of the proposed method with other classifiers in the test cohort. The bold values indicate the highest score in each performance metric.

Discussion

In this study, we applied a machine learning approach to classify major headache disorders using questionnaires completed by patients in a real-world setting. We found that machine learning is applicable in analyzing questionnaires. The performance of the machine learning approach in the classification of migraine was excellent however, its accuracy in classifying headache disorders other than migraine was inferior to that in classifying migraine. Nonetheless, our automated classification results could be still meaningful as the gold standard for the diagnosis of headache is a manual skillful application of the current classification criteria (currently ICHD-3, published in 2018). In the era of ICHD-3, there have been no studies evaluating the reliability and accuracy of the diagnosis of primary headache disorders made by primary care providers or general non-headache neurologists. Furthermore, there has been no classification methods other than ICHD-3.

Our study is one of the first studies to apply machine learning in the analysis of patient-reported questionnaires to classify primary headache disorders7. The diagnosis of headache disorders requires a skillful interview with patients and a comprehensive decision algorithm. We tested whether machine learning can substitute the role of the clinical interview. However, the samples of each headache disorder other than migraine and TTH were insufficient for the training. Headache disorders or syndromes other than migraine and TTH were merged into broader categories such as epicranial headaches or TCHs, which was not ideal for the detailed classification of second- or third-digit ICHD codes. In addition, secondary headaches other than those causing TCHs were excluded from the analysis since they cannot be incorporated into one entity. Secondary headaches should be diagnosed by clinical courses and causative workups rather than headache features. Taken together, our approach could not replace physician-based diagnosis due to insufficient results. However, this study demonstrated the feasibility of developing a better algorithm-based automated classification for headache disorders. Besides, our results might be used to inform or assist physicians by pre-screening with the most important factors of the stacked classifier (i.e., Table 2) or increasing the accuracy of less-specialized providers.

Our approach adopted a stacked XGBoost classifier that resulted in an overall accuracy of 81%, sensitivity and specificity of over 87% in the diagnosis of migraine. Our results were superior to the results from a previous study in which more selective data were used7. Existing studies on the classification of headache disorders with machine learning have focused on a few selected headache disorders such as migraine and tension-type headache due to challenges with sample size7,8. Previous studies used the random forest for classification however, our study adopted the XGBoost. XGBoost belongs to the boosting classifiers in which both the variance and bias of the classifier is reduced, while random forest belongs to the bagging classifiers in which only the variance of the classifier is reduced13. XGBoost has shown improved performance in many recent machine learning challenges where high-dimensional features were involved. The performance of XGBoost in classifying migraine was superior in our study because migraine is characterized by diverse features which cannot be fully incorporated in conventional statistical models, due to the complexity and challenge of multiple testing. Manual analysis even by human experts, may be time-consuming and prone to errors. However, with the automated classification algorithm suggested by this study, multiple features of headache disorders can be systematically identified. This automated classification algorithm is thus time efficient and could minimize human error in the diagnosis of headache disorders.

Our stacked classification model well reflected features of each headache disorder. Top three features used in our classification model show insights into each headache disorder when compared to the ICHD-3 criteria1. First, the mode of onset was important in migraine, TTH, and epicranial vs. TCH classifiers. This important feature should be always considered in the differential diagnosis of secondary and primary headaches, but it has not been listed in the ICHD-3 criteria for migraine, TTH, and epicranial headaches1. While migraine and TTH are typical examples of gradual-onset headaches, thunderclap onset is the most important syndrome-defining features of TCH as its nomenclature implies. For TAC, the mode of onset was not included in the classifier, as most patients with TAC experiences a relatively rapid evolution of headache attack. Second, the demographic feature was also important, while the ICHD only deals with headache characteristics. For example, female sex was ranked as the second important feature of classifying migraine. This may suggest that the female predominance is more robust in migraine than in other primary headache disorders at least in clinic-based samples. Third, the nature of pain was important in TTH and epicranial vs. TCH classifiers: vague and/or cloudy nature of pain for TTH and electric-shock like and jabbing natures for epicranial headaches. These features well reflect the nature of corresponding headache disorders, although they are different from features listed in ICHD-3 criteria1. The ICHD-3 denotes pressing or tightening quality of pain as features of TTH and stabbing, shooting, or sharp quality of pain as epicranial (primary stabbing headache or occipital neuralgia) headaches1. However, these features may be less useful in the differential diagnosis as they can co-exist in migraine attacks, TACs, and even TCHs in the real world. Fourth, the presence or absence of autonomic symptoms was important in differentiating migraine and TACs. The ICHD-3 also denotes autonomic symptoms as characteristic features of TAC1. Although autonomic symptoms can accompany migraine attacks, they are less prominent when compared to those of TACs17. Finally, sleep-awakening hypnic attacks were important in the TAC classifier. The time of headache attack has not been included in the ICHD-3. However, most of the primary headaches other than TAC tend to regress during sleep. In summary, our data showed these features can have greater relative weights in the differential diagnosis between primary headache disorders even though they are not listed as or different from syndrome-defining features in the ICHD-31.

To apply our study results to clinical practice, it should be kept in mind that secondary headache disorders were excluded in this model. This may have some clinical implications: in addition to clinical history, biochemical, radiological, or sometimes histologic evaluations are needed to rule out secondary headache syndromes. Historically, the clinical course rather than headache characteristics has been more important, whilst this cannot be easily captured by the questionnaire. Still, we explored whether automated classification was possible using the same approach. The classification performance was unsatisfactory as shown in the Supplement.

Our study has some limitations. First, the results were derived from data from a single center. Thus, our results need to be validated in an independent cohort study. Second, we applied conventional machine learning approaches in this study. Deep learning could be thought of as a high degree-of-freedom extension of conventional machine learning which has significantly improved classification performance in many domains11,18. Deep learning could be certainly applied in headache research and we believe the autoencoder network could be effective. Autoencoder network is capable of handling high-dimensional features that are correlated and can also learn low-dimensional feature embedding that is robust to noise. The features used in headache were high-dimensional (e.g., 128 dimensions) and could have a substantial correlation among them due to how the features were designed in this study. We plan to pursue research in this direction in the future.

We presented a method to classify subtypes of primary headache by fusing four XGBoost classifiers in a stacked fashion. Each classifier captured important characteristics for the target subtype in a data-driven approach. Existing studies were insufficient as they only considered fewer subtypes and reported worse classification performance than ours. Thus, although our approach was effective for the migraine subtype only, we believe our study is a first step towards building a comprehensive computer-aided diagnosis model for headaches. The software code for this study is open and can be adopted by other researchers to foster novel machine learning research in the migraine field.

Data availability

The data from the Samsung Medical Center is unavailable to the public due to IRB restrictions. Interested researchers should contact Dr. Mi Ji Lee (mirony.lee@gmail.com), who oversaw the data collection.

Code availability

The code of this study is available at https://github.com/junmokwon/automated-headache-classification.

References

  1. 1.

    Olesen, J. Headache Classification Committee of the International Headache Society (IHS) The International Classification of Headache Disorders. Cephalalgia 38, 1–211 (2018).

    Google Scholar 

  2. 2.

    Zhu, B., Coppola, G. & Shoaran, M. Migraine classification using somatosensory evoked potentials. Cephalalgia 39, 1143–1155 (2019).

    Article  Google Scholar 

  3. 3.

    Lee, M. J. et al. Dynamic functional connectivity of the migraine brain: a resting-state functional magnetic resonance imaging study. Pain 160, 2776–2786 (2019).

    Article  Google Scholar 

  4. 4.

    Chong, C. D. et al. Migraine classification using magnetic resonance imaging resting-state functional connectivity data. Cephalalgia 37, 828–844 (2017).

    Article  Google Scholar 

  5. 5.

    Cernuda-Morollón, E. et al. Interictal increase of CGRP levels in peripheral blood as a biomarker for chronic migraine. Neurology 81, 1191–1196 (2013).

    Article  Google Scholar 

  6. 6.

    Lee, M. J., Lee, S. Y., Cho, S., Kang, E. S. & Chung, C. S. Feasibility of serum CGRP measurement as a biomarker of chronic migraine: a critical reappraisal. J. Headache Pain 19, 53 (2018).

    Article  Google Scholar 

  7. 7.

    Krawczyk, B., Simić, D., Simić, S. & Woźniak, M. Automatic diagnosis of primary headaches by machine learning methods. Cent. Eur. J. Med. 8, 157–165 (2013).

    Google Scholar 

  8. 8.

    Garcia-Chimeno, Y., Garcia-Zapirain, B., Gomez-Beldarrain, M., Fernandez-Ruanova, B. & Garcia-Monco, J. C. Automatic migraine classification via feature selection committee and machine learning techniques over imaging and questionnaire data. BMC Med. Inf. Decis. Mak. 17, 38 (2017).

    Article  Google Scholar 

  9. 9.

    Lipton, R. B. et al. A self-administered screener for migraine in primary care: the ID migraineTM validation study. Neurology 61, 375–382 (2003).

    CAS  Article  Google Scholar 

  10. 10.

    Patel, B. K. et al. Computer-aided diagnosis of contrast-enhanced spectral mammography: a feasibility study. Eur. J. Radiol. 98, 207–213 (2018).

    Article  Google Scholar 

  11. 11.

    Xie, Y. et al. Knowledge-based collaborative deep learning for benign-malignant lung nodule classification on chest CT. IEEE Trans. Med. Imaging 38, 991–1004 (2019).

    Article  Google Scholar 

  12. 12.

    Mansour, R. F. Deep-learning-based automatic computer-aided diagnosis system for diabetic retinopathy. Biomed. Eng. Lett. 8, 41–57 (2018).

    Article  Google Scholar 

  13. 13.

    Chen, T. & Guestrin, C. XGBoost: a scalable tree boosting system. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining-KDD’ 16 785–794. https://doi.org/10.1145/2939672.2939785 (ACM, 2016).

  14. 14.

    Tibshirani, R. Regression shrinkage and selection via the lasso. J. R. Stat. Soc. Ser. B 58, 267–288 (1996).

    MathSciNet  MATH  Google Scholar 

  15. 15.

    Guyon, I., Weston, J., Barnhill, S. & Vapnik, V. Gene selection for cancer classification using support vector machines. Mach. Learn. 46, 389–422 (2002).

    Article  Google Scholar 

  16. 16.

    Peng, H., Long, F. & Ding, C. Feature selection based on mutual information: criteria of max-dependency, max-relevance, and min-redundancy. IEEE Trans. Pattern Anal. Mach. Intell. 27, 1226–1238 (2005).

    Article  Google Scholar 

  17. 17.

    Lai, T. H., Fuh, J. L. & Wang, S. J. Cranial autonomic symptoms in migraine: characteristics and comparison with cluster headache. J. Neurol. Neurosurg. Psychiatry 80, 1116–1119 (2009).

    Article  Google Scholar 

  18. 18.

    Dey, D., Chaudhuri, S. & Munshi, S. Obstructive sleep apnoea detection using convolutional neural network based deep learning framework. Biomed. Eng. Lett. 8, 95–100 (2018).

    Article  Google Scholar 

Download references

Acknowledgements

This work was supported by the Institute for Basic Science (Grant Number IBS-R015-D1), the NRF (National Research Foundation of Korea, Grant Number NRF-2020M3E5D2A01084892), the MIST (Ministry of Science and ICT) of Korea under the ITRC (Information Technology Research Center) and the support program (Grant Number IITP-2020-2018-0-01798) supervised by the IITP (Institute for Information & communication Technology Promotion), the IITP grant funded by the Korean government under the AI Graduate School Program (Grant Number 2019-0-00421), and the MSIT of Korea under the ICT Creative Consilience program (grant number IITP-2020-0-01821).

Author information

Affiliations

Authors

Contributions

J.K. and H.P. designed the experiment, analyzed the data, and wrote the manuscript. H.L., S.C., C.-S.C., and M.J.L. reviewed the manuscript. M.J.L. conceptualized the study, co-wrote the manuscript, and provided parts of the funding. H.P. was responsible for ensuring the integrity of the analyzed data.

Corresponding authors

Correspondence to Mi Ji Lee or Hyunjin Park.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Publisher's note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Kwon, J., Lee, H., Cho, S. et al. Machine learning-based automated classification of headache disorders using patient-reported questionnaires. Sci Rep 10, 14062 (2020). https://doi.org/10.1038/s41598-020-70992-1

Download citation

Further reading

Comments

By submitting a comment you agree to abide by our Terms and Community Guidelines. If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.

Search

Quick links

Nature Briefing

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing