Chronic back pain sub-grouped via psychosocial, brain and physical factors using machine learning

Chronic back pain (CBP) is heterogenous and identifying sub-groups could improve clinical decision making. Machine learning can build upon prior sub-grouping approaches by using a data-driven approach to overcome clinician subjectivity, however, only binary classification of pain versus no-pain has been attempted to date. In our cross-sectional study, age- and sex-matched participants with CBP (n = 4156) and pain-free controls (n = 14,927) from the UkBioBank were included. We included variables of body mass index, depression, loneliness/social isolation, grip strength, brain grey matter volumes and functional connectivity. We used fuzzy c-means clustering to derive CBP sub-groups and Support Vector Machine (SVM), Naïve Bayes, k-Nearest Neighbour (kNN) and Random Forest classifiers to determine classification accuracy. We showed that two variables (loneliness/social isolation and depression) and five clusters were optimal for creating sub-groups of CBP individuals. Classification accuracy was greater than 95% for when CBP sub-groups were assessed only, while misclassification in CBP sub-groups increased to 35–53% across classifiers when pain-free controls were added. We showed that individuals with CBP could sub-grouped and accurately classified. Future research should optimise variables by including specific spinal, psychosocial and nervous system measures associated with CBP to create more robust sub-groups that are discernible from pain-free controls.


Results
Demographics, false discovery rate, feature weighting and cluster validation. The results of demographic data are presented in Table 1 and the participant flow in Fig. 1. The following steps were used to determine important variables to use at each point in the analyses: Step 1: Of the 1502 variables used in this study, 100 were significantly different between CBP and pain-free controls from t-tests after adjustment of p values using the false discovery rate method (Supplementary Table 2). Assessment for multicollinearity showed that no variables were highly correlated (Supplementary Table 3). These 100 variables were analysed in subsequent feature weighting methods to determine the most important variables for differentiating CBP and pain-free controls.
Step 2: Following feature weighting tests, ten variables showed potential importance for differentiating CBP and pain-free individuals and were used in subsequent analyses ( Supplementary Figures 1 and 2). These variables included depressive symptoms, symptoms of loneliness and social isolation, body mass index, grip strength, frontoparietal to default mode network connectivity (FPN-DMN; edge 1014), right frontoparietal to sensorimotor network connectivity (FPN-SMN; edge 222) and visual to default mode network connectivity (VN-DMN; edge 44), primary motor, primary somatosensory and fronto-orbital cortex grey matter volumes.
Step 3: Using Laplacian scores to rank features based on variability (distances in the data space) for CBP only, the order of importance of features in a CBP-only space was symptoms of loneliness/social isolation, depressive symptoms, FPN-DMN connectivity (edge 1014), grip strength, body mass index, FPN-SMN connectivity (edge 222) and VN-DMN connectivity (edge 44), fronto-orbital cortex, primary motor and somatosensory grey matter volumes (Supplementary Figure 3).
Step 4: For cluster validity techniques, adding dimensions in the order of importance in Step 3 demonstrated that clustering in three or more dimensions added no new information; hence, two dimensions (loneliness/ social isolation and depressive symptoms) were used for the main clustering analysis (Supplementary Table 4). Results from clustering validity methods indicated the five clusters were optimal for the two-dimensional space (Supplementary Table 4).

Results of clustering and classification in each sub-domain.
Step 7: Given the main clustering analysis was derived from available psychosocial factors, no further clustering analyses were completed in this sub-domain. For physical measures of body mass index and grip strength, cluster validity methods showed that three subgroups were optimal for this two-dimensional space (Supplementary Table 7). These three sub-groups consisted of: (1) higher body mass index and lower grip strength (n = 870; 20.9%); (2) lower body mass index and grip strength (n = 1933; 46.5%); and (3) lower body mass index and higher grip strength (n = 1353; 32.6%) (Supplementary Figure 15). There was moderate similarity of each data points to its cluster with a Silhouette value of 0.581 (Supplementary Figure 16). Within-cluster distances ranged from 0.09 to 0.12, while between cluster   Figure 29). Cluster validity methods indicated three clusters in two dimensions were optimal (Supplementary Table 9). Based on the centroids of fuzzy c-means clustering, sub-groups were: (1) higher FPN-DMN connectivity (n = 1,347; 32.4%); (2) higher FPN-SMN connectivity (n = 1335; 32.1%); and (3) lower FPN-DMN and FPN-SMN connectivity (n = 1,474; 35.5%) (Supplementary Figure 30). For the similarity of clusters, the Silhouette value was 0.497 (Supplementary Figure 31). The within-cluster distances ranged from 0.11 to 0.12, the between-cluster distances ranged from 0.23 to 0.25 and the discrimination values ranges from − 0.27 to − 0.24 (Supplementary Table 10). The results showed misclassification error rates of < 1% for SVM, 4.9% for Naïve Bayes, 11.1% for kNN and 2.1% for RF classifiers, which indicated accurate classification of CBP sub-groups (Supplementary Figure 32). Adding pain-free individuals into the dataset increased misclassification to 11.3% for SVM, 15.2% for Naïve Bayes, 20.2% for kNN and 15.6% for RF (Supplementary Figure 33). AUC metrics were similar to the main analyses (Supplementary Figures 34 and 35). Calibration curves are reported in Supplementary Figures 36-43 and calibration metrics are reported in Supplementary Table 6. For brain structure, Laplacian scores to rank features are in Supplementary Figure 44. Two variables and two clusters were optimal for clustering (Supplementary Table 11). Results of clustered showed sub-groups of: (1) higher fronto-orbital and primary motor cortex sizes (n = 1917; 46.1%); and (2) lower fronto-orbital and primary motor cortex sizes (n = 2239; 53.9%) (Supplementary Figure 45). The similarity of data points to its cluster, based on the Silhouette value, was 0.589 (Supplementary Figure 46). The within-cluster distance ranged from 0.07 to 0.08, the between-cluster distances was 0.18 and the discrimination values was − 0.20 (Supplementary Table 12). The results showed misclassification error rates of < 1% for SVM, 5.6% for Naïve Bayes, 16.4% for kNN and 1.7% for RF classifiers, which indicated accurate classification of CBP sub-groups (Supplementary Figure 47). Adding pain-free individuals into the dataset increased misclassification to 10  Higher scores indicate greater symptoms of depression or loneliness/social isolation. Based on the centroids of fuzzy c-means clustering, classes and colours are (1; blue) low social isolation and loneliness and moderate depressive symptoms (n = 776; 18.7%), (2; red) low loneliness/social isolation and depressive symptoms (n = 2296; 55.3%), (3; yellow) high loneliness/social isolation and moderate depressive symptoms (n = 185; 4.5%), (4; green) moderate loneliness/social isolation and high depressive symptoms (n = 297; 7.2%) and (5; pink) moderate loneliness/social isolation and low depressive symptoms (n = 602; 14.5%). Black squares on the heat map indicate no class was available at those values. The X value on the scatter plot indicates the centroid of that cluster. For the three sub-groups derived from physical variables, results from ANCOVA are presented in Table 3. Sub-groups one (8.5 [8.1, 8.8]  The x-axis is the predicted class while the y-axis is the true class. Blue squares indicate the number in the class that was accurately classified, while the oranges squares show the number of misclassifications. The boxes on the right of the matrix show the percentage of classification (blue) and misclassification (orange) for the class. Classes are (1) low social isolation and loneliness and moderate depressive symptoms (n = 776; 18.7%), (2) low loneliness/social isolation and depressive symptoms (n = 2296; 55.3%), (3) high loneliness/social isolation and moderate depressive symptoms (n = 185; 4.5%), (4) moderate loneliness/social isolation and high depressive symptoms (n = 297; 7.2%) and (5) moderate loneliness/social isolation and low depressive symptoms (n = 602; 14.5%). Between-group differences for sub-groups derived from brain variables are reported in Table 4. Results showed that for FPN-DMN connectivity (edge 1,014), sub-group one had higher connectivity Differences between-groups derived from brain structure are presented in Table 5. Fronto-orbital cortex volume was higher in sub-group one (765. 3   Blue squares indicate the number in the class that was accurately classified, while the oranges squares show the number of misclassifications. The boxes on the right of the matrix show the percentage of classification (blue) and misclassification (orange) for the class. Classes are (0) pain-free individuals (n = 14,927; 78.2%), (1) low social isolation/loneliness and moderate depressive symptoms (n = 776; 4.1%), (2) low loneliness/social isolation and depressive symptoms (n = 2296; 12.0%), (3) high loneliness/social isolation and moderate depressive symptoms (n = 185; 1.0%), (4) moderate loneliness/social isolation and high depressive symptoms (n = 297; 1.6%) and (5) moderate loneliness/social isolation and low depressive symptoms (n = 602; 3.2%). The results of prior probably adjusted classification models are in Supplementary Table 17. For classifiers testing CBP individuals, no major differences were observed between original and prior probability adjusted Table 4. Results of the Analysis of Covariance (ANCOVA) sub-groups derived from brain function variables. Data are presented as adjusted mean (standard error). Adjusted mean differences and 95% confidence interval (95 CI) are reported. Effect sizes and 95% confidence intervals were calculated using standard formulae. All data were adjusted for age and sex. p value s were adjusted for multiple comparisons using www.nature.com/scientificreports/ models. For models on CBP versus pain-free individuals, SVM and Naïve Bayes overall classification accuracy remained stable across all analyses. However, when looking at different classes, there was greater misclassification of pain-free controls and increased accuracy of classification for CBP sub-groups to maintain this result. Classification accuracy dropped for kNN and RF classifiers for CBP versus pain-free controls, particularly due to the increased misclassification of pain-free controls.

Discussion
Our primary results showed that there are sub-groups of individuals with CBP based on psychosocial factors, yet not brain and the limited physical variables examined. These groups consisted of: (1) low loneliness/social isolation and moderate depressive symptoms; (2) low loneliness/social isolation and depressive symptoms; (3) high loneliness/social isolation and moderate depressive symptoms; (4) moderate loneliness/social isolation and high depressive symptoms; and (5) moderate loneliness/social isolation and low depressive symptoms. Furthermore, these groups could be accurately classified using ML approaches when people with CBP were evaluated alone. When pain-free controls were examined as well, the error rate in the models increased, particularly in CBP subgroups. We interpret the latter finding to indicate that these variables alone are insufficient for predicting who has CBP yet are likely necessary in determining sub-group classification of patients with known CBP.
We have previously showed using UKBioBank data that differences in brain structure, psychosocial, and physical health exist depending on if back pain is acute or chronic and localised or widespread 15 . However, identifying sub-groups of individuals with back pain has been a goal of clinicians and researchers to improve diagnosis and treatment 16 , with the aim of improved patient outcomes and reduced financial costs at both the individual and societal level. To build on prior ML research that only used binary classification of back pain versus no-back pain 13 , our results showed that individuals with CBP could be sub-grouped based on psychosocial factors of symptoms of depression and loneliness/social isolation and attempt classification on these new classes. These results are supported by Backyrd et al. 17 , who included psychosocial measures only and found four subgroups of individuals with chronic pain based on hierarchical clustering methods characterised by low and high levels of psychological and social distress. Given back pain is a biopsychosocial issue 11 , we attempted to include additional brain-related and physical variables in our analyses to derive sub-groups. Even though these differed between CBP and pain-free controls, they were not important for deriving subgroups based on feature weighting and cluster validity methods for the main analyses. However, as shown by our secondary analyses, physical and brain measures may still be important for deriving sub-groups of CBP. Importantly, even though psychosocial variables are often treated as continuous in analyses, they have results at discrete intervals on a measurement scale (e.g. a numeric rating scale of 0-10) 18 . Given feature weighting and cluster validity methods are based on distances within-and between-groups 19,20 , this may mean psychosocial factors are the most important solely due to method of measurement and the potential for greater distances on normalised data scales within-and between-groups. Therefore, future research should: (1) derive sub-groups based on homogenous outcomes across multidimensional domains linked to CBP, and (2) develop methods to account for differences in measurement scales in feature weighting and cluster validity methods.
The findings from the current study showed that individuals with CBP can be accurately into all the subgroups derived. However, the addition of pain-free controls increased the misclassification of CBP individuals to greater than 35-53% for CBP groups in the main analyses. Prior research has showed sub-groups in chronic pain based on psychosocial distress 17 , while research in low back pain had showed clusters based on sensory testing 21 and spinal tissue profiles 9 . However, only one of these studies included pain-free controls for comparison 21 . For example, prior work in acute low back pain showed that sub-groups may exist based on pressure-pain thresholds and conditioned pain modulation, with differences observed in ANCOVA compared to pain-free controls in all but one cluster (n = 31/125; 25%) 21 . In a study without pain-free controls, which assessed sub-groups of low back pain based on MRI morphological findings, a cluster containing 51.6% of 3155 vertebral motion-segments consisted of no or minimal (< 5%) MRI morphological findings 9 . Even though no ML techniques were used in either study 9,21 , it seems the inclusion of a pain-free control arm is paramount for developing meaningful back pain sub-groups. However, in our study, pain-free and back pain status was self-report and therefore it is unclear if pain-free individuals had prior pain. Based on our findings, future research should include a pain-free control arm to identify variables which produce less overlap to create more meaningful CBP sub-groups.
When examining sub-groups based on physical variables of body mass index and grip strength, we showed that three sub-groups may exist. A systematic review showed no statistically significant relationship on developing chronic low back pain in those with a body mass index above ≥ 25 kg/m 222 . However, a prior longitudinal study showed that those with obesity (body mass index ≥ 30 kg/m 2 ) were 30% more likely to develop low back pain compared to those with a body mass index < 25 kg/m 223 . Therefore, the importance of body mass index may be in those with CBP who are obese (e.g. physical sub-group one). For grip strength, muscular strength is not considered to be a risk factor for back pain 24 , however, may be considered a consequence of deconditioning 25 . In our analyses we showed physical sub-groups two and three have lower grip strength compared to pain-free controls. Therefore, treatment orientated at restoring muscular conditioning may be reserved for sub-groups two and three. Notably, we were unable to assess other key physical parameters, such as intervertebral disc health 26 , as these measures were not available in the UKBioBank dataset. Clusters of CBP based on body mass index and grip strength exist, however, follow-up with other key physical parameters, such as spine imaging, is required.
For sub-groups based on resting state brain function, we showed that clusters may exist for connectivity between frontoparietal, default mode and sensorimotor networks in those with CBP. These function of these brain networks, while linked, differ; the frontoparietal network is linked to cognitive control of behaviour in a goal-driven manner 27 ; the default mode network is involved in memory retrieval and internal cognitions, while deactivating during attention demanding external tasks 28  www.nature.com/scientificreports/ processing, and could be involved in encoding the location and intensity of pain 29 . Previous studies have examined connectivity differences in the regions between individuals with chronic pain and pain-free controls [30][31][32][33] , yet none have attempted sub-grouping. In our analyses, we showed that sub-group one had greater FPN-DMN connectivity compared to pain-free controls and could indicate impairments in the interaction between memory, internal cognitions, goal-driven behaviour and pain. This may explain why some individuals with CBP have impaired movements in the presence of fear and catastrophising 34 . Sub-group two had higher connectivity in the FPN-SN, while sub-group three had lower connectivity compared to pain-free controls. This could indicate impairments in connectivity between goal-driven behaviours 27 and sensory processing of pain 29 . This may be important for sub-group two given increases in sensorimotor network activity have been shown to have a moderate strength correlation with higher levels of back pain 30 . However, for both FPN-DMN and FPN-SN connectivity, we were unable to explore the association between clinical outcomes (pain, fear and catastrophising) in our study due to outcome being unavailable in the UKBioBank. Overall, these results indicate that sub-groups of CBP may have impairments in frontoparietal, default mode and sensorimotor networks. Regardless, more research is required to determine the clinical meaningfulness of these findings. For sub-groups based on brain structure, we derived two clusters based on the fronto-orbital and primary motor cortex grey matter volumes. The fronto-orbital cortex is linked to cognitive/emotional processing of pain, while the primary motor cortex to motor aspects 35 . In addition to this, the primary somatosensory cortex grey matter volume differentiated CBP individuals and pain-free controls and was a candidate for clustering. The primary somatosensory cortex has been previously linked to somatosensory aspects of pain encoding 35 . When assessing the derived sub-groups, sub-group one had higher values in all three brain grey matter volumes, while sub-group two had lower values. For sub-group two, this could indicate impairments in motor, sensory and cognitive/emotional areas related to pain 35 . Furthermore, given that all brain grey matter volumes were lower, this may indicate impairments with central processing of pain in this sub-group 36 . However, given we did not assess if the pain was localised or widespread in this study, this should be considered in future research to further identify important brain markers across back pain individuals. Differences in brain grey matter volumes also need to be related to clinical outcomes to determine if they are clinically relevant for treatment. Our study showed that clusters may exist in brain structure in individuals with CBP and should be further explored to determine clinically relevant CBP sub-groups.
It is appropriate to consider the potential treatment implications of the sub-groups identified. A biopsychosocial approach is promoted for the treatment of back pain 11 and our study indicates that clinicians could consider differing psychosocial, physical and brain outcomes in individuals they treat for back pain. An approach to integrate this into clinical practice could be to individualise assessment based on findings from subjective assessment 11 or use physical/psychosocial screening tools such as the STarTBack tool 37 and Orebro Musculoskeletal Questionnaire to identify psychosocial aspects of pain through question responses 38 . However, even though we could not assess this in our study, prior research has showed that other sensory testing 21 and spinal tissue 9 factors could also be used to sub-group back pain individuals. Even though some back pain sub-groups may not be clearly separated from pain-free controls, until it is clear which variables best segregate these groups, they should continue to be considered clinically to truly adopt a biopsychosocial approach to treatment. Using such an approach, of targeting psychosocial factors with treatment, may improve patient centred care by identifying and targeting more individualised factors in the treatment of back pain conditions. The current study had multiple strengths. Firstly, this was the first study to develop and classify sub-groups of CBP based on psychosocial, physical and nervous system measures using data-analytic approaches 13 . Secondly, we had 19,083 participants that could be used to help identify sub-groups of individuals with CBP. Furthermore, we followed up clustering methods with post-hoc tests and relevant adjustments to ensure more robust sub-groups that consider covariates of important variables. For limitations, firstly, there was a lack of spinal tissue measures, which could be important for sub-grouping some individuals with CBP [8][9][10] . Even though spinal tissues outcomes do not always relate to pain, they are a source of nociception 39 and need consideration in future studies for a more informative approach to sub-classification 40 . Secondly, clinical outcomes, such as pain intensity, disability and pain duration, were not available in the UKBioBank. It would be beneficial to relate the sub-groups detected to these clinical characteristics. However, validation of our classification models with the inclusion of pain-free controls suggests future research should determine which variables best discriminate cases from controls for more robust subgroups, regardless of additional pain-related variables. Furthermore, limited psychological variables that can modulate pain and disability were available in the UKBioBank. Future research using classification methods should consider additional psychological variables such as anxiety and pain catastrophizing. Finally, the cross-sectional nature of this study precludes conclusions regarding causality. Future research should use prospective data to determine how sub-groups transition through the stages of back pain.
Based on our findings and limitations, future studies should consider which variables they use in design and implementation of attempting accurate sub-classification of back pain individuals with ML approaches. In particular, pain processing mechanisms 7,21 and changes in spinal tissues 8-10 could be used in addition to psychosocial factors to identify relevant back pain sub-groups. This has the potential to reduce overlap and misclassification in the data; hence, creating more robust sub-groups. Furthermore, studies should investigate combining different outcomes, while also completing sub-domain analyses separately, to better understand how to combine variables in data-analytic method. In addition, future studies should consider relating clinical outcomes (e.g. pain intensity and disability) to new sub-groups. Understanding the relationship between sub-groups and important clinical measures may allow for improved knowledge on how to target outcomes for individuals with back pain.
This study was the first to use ML algorithms to attempt to divide individuals with CBP into sub-groups based on psychosocial, brain and physical factors. For the main analysis, optimal sub-grouping was determined to require only two psychosocial variables. Secondary analyses indicate that sub-groups based on physical, brain structure and function can also be derived in sub-domains analyses. Misclassification error may indicate that www.nature.com/scientificreports/ variables used in this study were not optimal to differentiate these sub-groups from pain-free controls. Future research should optimise variables via including spinal tissue (e.g. intervertebral disc height), sensory testing (e.g. pressure-pain thresholds) and additional psychosocial variables (e.g. self-efficacy) to create more robust CBP sub-groups that discriminate them from pain-free controls to optimise targeted treatments for the condition.

Methods
This was a cross-sectional study of the UKBioBank imaging visit (Project ID: 55843). Following the first wave of recruitment between 2006 and 2010, a subset of participants was invited for initial imaging analyses between 2014 and 2018. Participants provided written informed consent prior to data collection. For this sub-study, we used a sample of individuals with chronic back pain and those who were pain-free. Inclusion and exclusion criteria. The UKBioBank initially recruited adults from the United Kingdom between the ages of 40-69 who were able, and consented to, attending an assessment centre for testing. For inclusion, self-report questions available in the UkBioBank (https:// bioba nk. ndph. ox. ac. uk/ showc ase/ label. cgi? id= 100048) were used to determine pain status. Information on the specific questions can be found here: https:// bioba nk. ctsu. ox. ac. uk/ cryst al/ cryst al/ docs/ Touch scree nQues tions MainF inal. pdf (search SY5). The following groups were included for this sub-study: (a) pain-free, in which participants reported 'none of the above' to pain experienced over the month prior; and (b) CBP: participants reporting 'yes' to back pain lasting greater than three months. Participants without full neuroimaging datasets were excluded from the sample. For magnetic resonance imaging, individuals with metal implants, such as a pacemaker, were excluded.
Data collection. We selected and extracted key variables available in the UKBioBank which have previously shown to be related to CBP. For brain imaging, meta-analyses and systematic reviews show both of structural 41 and functional magnetic resonance imaging (MRI) 42 measures may be important for individuals with CBP. Psychosocial factors including mental and social health 43 , and physical factors of body mass index 22 and muscular strength 44 have also been shown to be associated with CBP. . For analyses here, we used volumetric and functional brain analyses. For functional brain analyses, we used partial correlation matrices in the 100-dimension from independent component analysis to assess 1,485 direct connections between sub-regions of functional brain nodes at varying anatomical labels. For important connections in our analyses, we used the Automated Anatomical Labelling (AAL) atlas through MRICron (https:// www. nitrc. org/ proje cts/ mricr on) to define the location of the peak activations of each node and determine the anatomical location of the connectives 45 . To further label the anatomical locations of each node to a corresponding functional network (such as the default mode network), we used 7-network parcellation 46 . Brain coordinates, anatomical locations and corresponding functional networks of connectivity edges used in data-analytic approaches are in Supplementary Table 1. For structural brain analyses, we used regional grey matter volumes linked to pain processing 47 . The amygdala, thalamus, insula, medial frontal cortex, anterior cingulate cortex, caudate, putamen, hippocampus, precentral gyrus (primary motor cortex), supplementary motor cortex, post-central gyrus (primary somatosensory cortex) and parietal operculum (secondary somatosensory cortex) were used to determine how these brain volumes differed between groups 47 . Data from left-and right-hand sides were pooled for the analyses.
Mental and social health. We used measures of current depression and loneliness/social isolation symptoms as these were available psychosocial measures in the UKBioBank dataset linked to CBP 43 . Both measures were collected via questionnaires on a touch screen. Four questions were used to determine current symptoms of depression. From the validated Patient Health Questionnaire, questions assessed frequency of depressed mood, disinterest/unenthusiasm, tenseness/restlessness and tiredness/lethargy over the prior 2 weeks 48 . For each question, a four-point scale, ranging from not at all, several days, more than half the time to almost every day was set. The total score was summed from four to 16, with higher scores indicating a greater level of current depressive symptoms 49 . As reported in previous UKBioBank studies, symptoms of social isolation/loneliness were also assessed in the current study 15,50 . Social isolation consisted of three questions: (1) "Including yourself, how many people are living together in your household?" (2) "How often do you visit friends or family or have them visit you?" and (3) "Which of the following (leisure/social activities) do you engage in once a week or more often? You may select more than one. " Loneliness was assessed with two questions: (1) "Do you often feel lonely?" and (2) "How often are you able to confide in someone close to you?" The scale was scored from 0 to 5, where individuals with higher scores were deemed to have greater symptoms of social isolation and loneliness 50 . www.nature.com/scientificreports/ squeezed the dynamometer for three seconds with encouragement to produce as much force as possible. For both right-and left-hand sides, the maximum grip strength was measured in kilograms of force. The highest grip strength achieved on either the left-or right-hand side was used for analysis.
Anthropometrics. Body mass index was used as a measure of adiposity. Height in centimetres was measured using Seca 202 stadiometer (Seca GMBH & Co, Hamburg, GER). Weight in kilograms was measured using a Tanita BC-418 (Tanita, Tokyo, JAP). Body mass index was then calculated as kg/m 2 .
Sampling, statistical and data analytic methods. Sampling and variable selection.
Step 1: Initial data preparation was completed in StataMP version 15 (StataCorp, Texas, USA) to allow for a reduced working file size for subsequent analyses. Individuals who did not complete the first brain imaging visit (instance 2) were removed from the dataset. Following this, the sample was matched one-to-three back pain-to-pain-free control on age and sex using a custom code in software package R version 3.5.1 (www.r-proje ct. org), while preferentially matching for cases without missing data if possible. We selected three controls for matching, given little is gained by including more 51 . Following matching, 20,352 individuals remained (pain-free controls, n = 15,284; CBP, n = 5068), with 19,083 having full datasets for ML analysis (pain-free controls, n = 14,927; CBP, n = 4156) (Fig. 1). Further, statistical analyses were conducted using StataSE version 16 (StataCorp, Texas, USA). Independent t-tests were used to determine between-group differences and the order of importance of variables in the dataset with all variables treated as continuous. Given 1502 variables existed, with an initial p value set at 0.05, adjustment for false positives across each t-test was completed using the Benjamini-Hochberg false discovery rate method 52 . Variables which remained significant following p value adjustment were retained and multicollinearity was assessed with a correlation matrix using a threshold of r > 0.8 53 . Variables which passed both false discovery rate and multicollinearity were used in subsequent data-analytic methods.
Data-analytic methods. We conducted the data-analytic methods using software package MATLAB version R2020a (MathWorks, Massachusetts, USA) to determine sub-groups of CBP. Further details on the data-analytic methods used below are available in the Supplementary Methods 1.
Step 2: Feature weighting was completed to determine variables which best distinguish CBP from pain-free controls to create the most robust CBP sub-groups. Treating variables as continuous, data was first normalised on a 0-1 scale to allow comparability of different variables and different scales of values. Variable weighting methods of (a) minimum redundancy maximum relevance (MRMR) and (b) a random forest (RF) predictor was compared to determine important features. Following this, pain-free controls were removed from the dataset, to explore clustering of CBP individuals.
Step 3: Laplacian scores were used to rank remaining variables in the CBP space in order of importance. The dataspace for clustering was initially explored with two variables and were added into the dataspace in order of importance to determine where the breakpoint of random noise being added to the models.
Step 4: To determine the optimal number of variables and clusters in each dataspace, the noise breakpoint was explored using Calinski-Harabasz, Davies-Bouldin and Silhouette cluster evaluation methods with k-means linkage 54 , as well as fuzzy c-means clustering using entropy values [55][56][57] .
Step 5: Following this, new sub-groups of CBP were labelled based on the membership values to each cluster from fuzzy c-means clustering. To evaluate the post-hoc similarity of each data point, on average, to its derived clusters, we used Silhouette Index values to determine the goodness of fit (overall tightness and separation). In addition, we evaluated the average within-and between-cluster distances using the centroids of each cluster. From the average within-and between-cluster distances, we also calculated the discrimination value between clusters as reported here 58 . Given the potential for differences in effect sizes/variance in outcomes across domains due to the way they are measured, even when they are all scaled to 0-1, we also explored clustering for variables which passed prior steps in each sub-domain of psychosocial, physical, brain function and brain structure separately 59 .
Step 6: Retaining all variables which passed FDR and feature weighting, Support Vector Machine (SVM), Naïve Bayes, k-Nearest Neighbour (kNN) and RF one-vs-one multi-class classifiers were fit to the new CBP subgroups to determine how accurately they could be classified. To cross-validate the model, a portion of the training set (validation set) was used through tenfold cross-validation (9 parts of the data set were used for training and one part for validation, repeated each time reserving a different tenth of the data for validation). Pain-free controls were readded into the dataset and classifiers were trained and validated again to determine if classification accuracy changed. The reason for completing classification with the addition of pain-free controls was to determine if and what sub-groups were separable from pain-free controls. There is contention surrounding the most important pain-related features, therefore, determining which features separate CBP from pain-free controls is important to guide future research to find specific treatable phenotypes of the condition 59 . Furthermore, we attempted classification on the sub-groups derived from the main and sub-domain clustering analyses. Area under the received operating curve was calculated for every class following each classifier. Reliability diagrams (calibration curves), histograms of predicted probabilities and calibration metrics (Brier score, logistic loss and expected calibration error) were also determined for each model and class in binary one-vs-rest approach 60 .
Between sub-group and pain-free control statistical analyses.
Step 7: To further determine where statistical significance existed between each derived sub-group and pain-free controls, an analysis of covariance (ANCOVA) was used on remaining variables using StataSE version 16 (StataCorp, Texas, USA). The ANCOVA models were adjusted for age and sex (to account for any remaining variability following matching). p values were adjusted for multiple comparisons using the Tukey HSD method. An alpha level was set to 0.05. Standard formula were used to calculate effect sizes (Cohen's d; ES) and relevant confidence intervals between CBP sub-groups and www.nature.com/scientificreports/ pain-free controls 61 . Effect sizes were considered small from 0.2-0.49, moderate from 0.5-0.79 and large from 0.8 and above 62 .
Clustering and classification validation.
Step 8: To further validate the result from our clustering and classification, the processing pipeline was completed in CBP individuals again by splitting the data 50/50 into training and test sets. Firstly, we analysed Laplacian scores in the training set to rank features in order of explaining the most variance. Following this cluster validity methods were repeated in the training set by adding variables in order of Laplacian scores to determine the optimal number of clusters and variables. After this, fuzzy c-means clustering was applied to the training set only. Fuzzy c-means clustering was then applied to the test set 'out-ofsample' to determine if clustering performance was similar to the training set. Cluster centroids, within-and between-cluster distances, Silhouette index and discrimination values were compared between clustering in the training and test sets. The classifiers used in Step 6 were then repeated with only the training data being used to train the model. To cross-validate the model, a portion of the training set (validation set) was used through tenfold cross-validation. Following this, the trained classification model was applied to the test set and the accuracy and areas under the receiver operating characteristics curve compared to that of the training validation set. Furthermore, to explore the effects of class imbalance on all of our classification models, we balanced the models class prior probabilities uniformly to determine if this altered the classification results 63 .

Data availability
Publicly available UKBioBank data was analysed in our study. Researchers can access datasets through open registration (see: https:// www. ukbio bank. ac. uk/ regis ter-apply/).

Funding
This study was supported by institutional funding (DLB).