Introduction

“ The greatest evil is physical pain”

– Saint Augustine

Pain is a subjective feeling with objective roots and profound evolutionary biological utility. It reflects perceived or actual damage to the organism [1, 2]. Mental states can affect the perception of pain, and in turn be affected by pain. Psychiatric patients may have an increased perception of pain, as well as increased physical health reasons for pain, due to their often adverse life trajectory [3]. As such, they may be a particularly suitable population in which to try to identify peripheral blood biomarkers for pain, that may be complementary to genetic findings in the field [4]. Of note, a number of psychiatric medications are currently used to treat pain disorders [5]. Given these close inter-relationships, we expect our findings to be generalizable, trans-diagnostic, and have general relevance to pain, independent of specific psychiatric disorders.

First, we used a powerful longitudinal within-subject design in individuals with psychiatric disorders to discover blood gene expression changes between self-reported low pain and high pain states. Second, we prioritized the list of candidate biomarkers with a Bayesian-like Convergent Functional Genomics approach, comprehensively integrating previous published human and animal model evidence in the field for involvement in pain, and directly citing it. Third, we validated our top biomarkers from discovery and prioritization in an independent cohort of psychiatric subjects with a clinical diagnosis of a pain disorder and with high scores on pain severity and functional impact ratings. Fourth, we tested if the candidate biomarkers from the first three steps are able to predict high pain state, and future emergency department (ED) visits for pain, in another independent cohort of psychiatric subjects. We tested the biomarkers in all subjects in the independent test cohort, as well as in a more personalized fashion by gender and psychiatric diagnosis, showing increased accuracy with the personalized approach. Fifth, we assessed if our biomarkers have evidence for involvement in other psychiatric and related disorders, as well as analyzed the biological pathways and networks they are involved in. Sixth, we bioinformatically identified which of our individual biomarkers are modulated by existing drugs and thus can be used for pharmacogenomic population stratification and measuring of response to treatment, as well as used the gene expression signatures of the top predictive biomarkers to interrogate the Connectivity Map database from Broad/MIT to identify drugs and natural compounds that could be repurposed for treating pain.

Materials and methods

Cohorts

We used three independent cohorts: discovery (major psychiatric disorders), validation (major psychiatric disorders with clinically severe pain disorders), and testing (an independent major psychiatric disorders cohort for predicting pain state, and for predicting future ED visits for pain) (Fig. 1a).

Fig. 1
figure 1

Steps 1–3: Discovery, prioritization, and validation. a Cohorts used in study, depicting flow of discovery, prioritization, and validation of biomarkers from each step. b Discovery cohort longitudinal within-subject analysis. Phchp### is study ID for each subject. V# denotes visit number. c Discovery of possible subtypes of Pain based on High Pain visits in the discovery cohort. Subjects were clustered using measures of mood and anxiety (Simplified Affective State Scale (SASS)), as well as psychosis (PANNS Positive). d Differential gene expression in the Discovery cohort—number of genes identified with differential expression (DE) and absent-present (AP) methods with an internal score of 2 and above. Red—increased in expression in High Pain, blue—decreased in expression in High Pain. At the discovery step probesets are identified based on their score for tracking pain with a maximum of internal points of 6 (33% (2 pt), 50% (4 pt), and 80% (6 pt)). e Prioritization with CFG for prior evidence of involvement in pain. In the prioritization step, probesets are converted to their associated genes using Affymetrix annotation and GeneCards. Genes are prioritized and scored using CFG for pain evidence with a maximum of 12 external points. Genes scoring at least six points out of a maximum possible of 18 total internal and external scores points are carried to the validation step. f Validation in an independent cohort of psychiatric patients with co-morbid pain disorders and severe subjective and functional pain ratings. In the validation step biomarkers are assessed for stepwise change from the discovery groups of subjects with Low Pain, to High Pain, to Clinically Severe Pain disorder, using ANOVA. N = number of testing visits. Five biomarkers were nominally significant, MFAP3 and PIK3CD were the most significant, and 68 biomarkers were stepwise changed

Similar to our previous studies [6,7,8], the psychiatric subjects are part of a larger longitudinal cohort of adults that we are continuously collecting. Subjects were recruited from the patient population at the Indianapolis VA Medical Center. All subjects understood and signed informed consent forms detailing the research goals, procedure, caveats and safeguards, per Indiana University IRB approved protocol. Subjects completed diagnostic assessments by an extensive structured clinical interview—Diagnostic Interview for Genetic Studies, and up to six testing visits, 3–6 months apart or whenever a new psychiatric hospitalization occurred. At each testing visit, they received a series of rating scales, including a visual analog scale (1–10) for assessing pain and the SF 36 quality of life scale, which has two pain-related items (items 21 and 22), and the blood was drawn. We collected whole blood (10 ml) in two RNA-stabilizing PAXgene tubes, labeled with an anonymized ID number, and stored at −80 ºC in a locked freezer until the time of future processing. Whole-blood RNA was extracted for microarray gene expression studies from the PAXgene tubes, as detailed below.

For this study, our within-subject discovery cohort, from which the biomarker data were derived, consisted of 28 subjects (19 males, 9 females) with multiple testing visits, who each had at least one diametric change in pain from Low Pain (VAS of 2 and below) to High Pain (VAS of 6 and above) from one testing visit to another (Fig. 1b and Fig. S1). There were three subjects with five visits each, one subject with four visits, twelve subjects with three visits each, and twelve subjects with two visits each resulting in a total of 79 blood samples for subsequent gene expression microarray studies (Fig. 1 and Table S1).

Our validation cohort, in which the top biomarker findings were validated for being even more changed in expression, consisted of 13 male and 10 female subjects with a pain disorder diagnosis and clinically severe pain (Table S1). This was determined as having a pain VAS of 6 and above and a sum of SF36 scale items 21 (pain intensity) and 22 (impairment by pain of daily activities) of 10 and above (Table S1).

Our independent test cohort for predicting state (High Pain) consisted of 134 male and 28 female subjects with psychiatric disorders, demographically matched with the discovery cohort, with one or multiple testing visits in our lab, with either Low Pain, intermediate Pain, or High Pain, resulting in a total of 414 blood samples in which whole-genome blood gene expression data were obtained (Fig. 1 and Table S1).

Our test cohort for predicting trait (future ED visits with pain as the primary reason in the first year of follow-up, and all future ED visits for pain) (Fig. 1) consisted of 171 males and 19 female subjects for which we had longitudinal follow-up with electronic medical records. The subjects’ subsequent number of ED pain-related visits in the year following testing was tabulated from electronic medical records by a clinical researcher, who used the key word “pain” in the reasons for ED visit, or “ache” with a mention of acute pain in the text of the note.

Medications

The subjects in the discovery cohort were all diagnosed with various psychiatric disorders, and had various medical co-morbidities (Table 1). Their medications were listed in their electronic medical records, and documented by us at the time of each testing visit. Medications can have a strong influence on gene expression. However, our discovery of differentially expressed genes was based on within-subject analyses, which factor out not only genetic background effects but also minimizes medication effects, as the subjects rarely had major medication changes between visits. Moreover, there was no consistent pattern of any particular type of medication, as our subjects were on a wide variety of different medications, psychiatric and non-psychiatric. Some subjects may be non-compliant with their treatment and may thus have changes in medications or drug of abuse not reflected in their medical records. That being said, our goal is to find biomarkers that track pain, regardless if the reason for it is endogenous biology or driven by substance abuse or medication non-compliance. In fact, one would expect some of these biomarkers to be targets of medications, as we show in this paper. Overall, the discovery of biomarkers with our universal design occurs despite the subjects having different genders, diagnoses, being on various different medications, and other lifestyle variables.

Table 1 Aggregate Demographics

Blood gene expression experiments

RNA extraction

Whole blood (2.5–5 ml) was collected into each PaxGene tube by routine venipuncture. PaxGene tubes contain proprietary reagents for the stabilization of RNA. RNA was extracted and processed as previously described [6,7,8].

Microarrays

Microarray work was carried out using previously described methodology [6,7,8,9]. and as described below.

Biomarkers

Step 1: Discovery

We have used the subject’s score from the VAS Pain Scale, assessed at the time of blood collection (Fig. 1). We analyzed gene expression differences between visits with Low Pain (defined as a score of 0–2) and visits with High Pain (defined as a score of 6 and above), using a powerful within-subject design, then an across-subjects summation (Fig. 1).

We analyzed the data in two ways: an Absent-Present (AP) approach, and a differential expression (DE) approach, as in previous work by us on suicide biomarkers [6,7,8]. The AP approach may capture turning on and off of genes, and the DE approach may capture gradual changes in expression. Analyses were performed as previously described [7,8,9]. We have developed in our labs R scripts to automate and conduct all these large dataset analyses in bulk, checked against human manual scoring [9].

Gene Symbol for the probesets were identified using NetAffyx (Affymetrix) for Affymetrix HG-U133 Plus 2.0 GeneChips, followed by GeneCards to confirm the primary gene symbol. In addition, for those probesets that were not assigned a gene symbol by NetAffyx, we used GeneAnnot (https://genecards.weizmann.ac.il/geneannot/index.shtml) to obtain gene symbols for these uncharacterized probesets, followed by GeneCard. Genes were then scored using our manually curated CFG databases as described below (Fig. 1e).

Step 2: Prioritization using Convergent Functional Genomics (CFG)

Databases

We have established in our laboratory (Laboratory of Neurophenomics, www.neurophenomics.info) manually curated databases of the human gene expression/protein expression studies (postmortem brain, peripheral tissue/fluids: CSF, blood and cell cultures), human genetic studies (association, copy number variations and linkage), and animal model gene expression and genetic studies, published to date on psychiatric disorders. Only findings deemed significant in the primary publication, by the study authors, using their particular experimental design and thresholds, are included in our databases. Our databases include only primary literature data and do not include review papers or other secondary data integration analyses to avoid redundancy and circularity. These large and constantly updated databases have been used in our CFG cross validation and prioritization platform (Fig. 1e). For this study, data from 355 papers on pain were present in the databases at the time of the CFG analyses (December 2017) (human genetic studies-212, human nervous tissue studies-3, human peripheral tissue/fluids- 57, non-human genetic studies-26, non-human brain/nervous tissue studies-48, non-human peripheral tissue/fluids- 9). Analyses were performed as previously described [7, 8].

Step 3: Validation analyses

Validation analyses of our candidate biomarker genes were conducted separately for AP and for DE. We examined which of the top candidate genes (total CFG score of 6 or above), were stepwise changed in expression from the Low Pain and High Pain group to the Clinically Severe Pain group. A CFG score of 6 or above reflects an empirical cutoff of 33.3% of the maximum possible CFG score of 12, which permits the inclusion of potentially novel genes with maximal internal score of 6 but no external evidence score. Subjects with Low Pain, as well as subjects with High Pain from the discovery cohort who did not have severe clinical pain (SF36 sum of item 21 and 22 < 10) were used, along with the independent validation cohort which all had severe clinical pain and a co-morbid pain disorder diagnosis (n = 23).

For the AP analyses, we imported the Affymetrix microarray.chp data files from the subjects in the validation cohort of Clinically Severe Pain into MAS5 Affymetrix Expression Console, alongside the data files from the Low Pain and High Pain groups in the live discovery cohort. We transferred the AP data to an Excel sheet and transformed A into 0, M into 0.5, and P into 1. We then Z-scored everything together by gender and diagnosis. If a probeset would have showed no variance and thus gave a non-determined (0/0) value in Z-scoring in a gender and diagnosis, we would have excluded the values from that probeset for that gender and diagnosis from the analysis.

For the DE analyses, the cohorts (Validation Clinically Severe Pain, alongside the Low Pain and High Pain groups in the Discovery cohort) were assembled out of Affymetrix .cel data that was RMA normalized by gender and diagnosis. We transferred the log transformed expression data to an Excel sheet, and non-log transformed the data by taking 2 to the power of the transformed expression value. We then Z-scored the values by gender and diagnosis.

We then imported the Excel sheets with the Z-scored by gender and diagnosis AP and DE expression data into Partek, and statistical analyses were performed using a one-way ANOVA for the stepwise changed probesets, and also attempted a stringent Bonferroni corrections for all the probesets tested (Figure 1F). We also wrote an R script that automatically analyzes the data directly from the Excel sheet, and used that to confirm our calculations.

Choice of biomarkers to be carried forward

We carried forward into testing the top biomarkers from each step. The longer list of candidate biomarkers includes the top biomarkers from discovery step ( ≥ 90% of scores, n = 28), the top biomarkers from the prioritization step (CFG score ≥ 8, n = 32), and the nominally significant biomarkers after the validation step (n = 5), for a total of n = 65 probesets (n = 60 genes). The short list of top biomarkers after the validation step is five biomarkers. In Step 4, testing, we then predict with the biomarkers from the long list in independent cohorts High Pain state, and future ED visits for pain in the first year, and in all future years.

Diagnostics

The test cohort for predicting High Pain (state), and the subset of it that is a test cohort for predicting future ED visits (trait), were assembled out of data that was RMA normalized by gender and diagnosis. The cohort was completely independent, there was no subject overlap with the discovery cohort. Phenomic (clinical) and gene expression markers used for predictions were Z-scored by gender and diagnosis, to be able to combine different markers into panels and to avoid potential artefacts due to different ranges of expression in different gender and diagnoses. Markers were combined by simple summation of the increased risk markers minus the decreased risk markers. Predictions were performed using R-studio.

Predicting state-high pain state

Receiver-operating characteristic (ROC) analyses between genomic and phenomic marker levels and Pain were performed by assigning subjects with a Pain score of 6 and greater into the High Pain category. We used the pROC package of R (Xavier Robin et al. BMC Bioinformatics 2011). We used the Z-scored biomarker and phene scores, running them in this ROC generating program against the diagnostic groups in the independent test cohort (High Pain vs. the rest of subjects). Additionally, a one-tailed t-test was performed between High Pain group vs. the rest, and Pearson R (one-tail) was calculated between Pain scores and marker levels (Supplementary Information- Complete Datasets and Analyses).

Predicting trait-future ED visits for pain in first year following testing

We conducted analyses for predicting ED visits for pain in the first year following each testing visit, in subjects that had at least 1 year of follow-up in the VA system, for which we have access to complete electronic medical records. ROC analyses between genomic and phenomic marker levels at a specific testing visit and future ED visits for pain were performed as described above, based on assigning if subjects had visited the ED with primary reason for pain or not within 1 year following a testing visit. Additionally, a one tailed t-test with unequal variance was performed between groups of subject visits with and without ED visits for Pain. Pearson R (one-tail) correlation was performed between hospitalization frequency (number of ED visits for Pain divided by duration of follow-up) and marker levels. A Cox regression was performed using the time in days from the testing visit date to first ED visit date in the case of patients who had been to the ED, or 365 days for those who did not. The hazard ratio was calculated such that a value >1 always indicates increased risk for ED visits, regardless if the biomarker is increased or decreased in expression.

We also conducted odds ratio analyses for ED visits for pain for all future ED visits due to pain, including those occurring beyond 1 year of follow-up, in the years following testing (on average 5.56 years per subject, range 0.44 to 11.27 years; see Table 1 and Table S1), as this calculation, unlike the ROC and t-test, accounts for the actual length of follow-up, which varied from subject to subject. The ROC and t-test might in fact, if used, under-represent the power of the markers to predict, as the more severe psychiatric patients are more likely to move geographically and/or be lost to follow-up. A Cox regression was also performed using the time in days from visit date to first ED Pain visit date in the case of patients who had been to the ED for Pain, or from visit date to last note date in the electronic medical records for those who did not. The hazard ratio was calculated such that a value >1 always indicates increased risk for ED Pain-related visits, regardless if the biomarker is increased or decreased in expression.

Biological understanding

Pathway analyses

IPA (Ingenuity Pathway Analyses, version 24390178, Qiagen), David Functional Annotation Bioinformatics Microarray Analysis (National Institute of Allergy and Infectious Diseases) version 6.7 (August 2016), and Kyoto Encyclopedia of Genes and Genomes (KEGG) (through DAVID) were used to analyze the biological roles, including top canonical pathways and diseases (Table S4), of the candidate genes resulting from our work, as well as to identify genes in our dataset that are the target of existing drugs. We ran the pathway analyses for the combined AP and DE probesets 60 unique genes (65 probesets). For Network analysis of the 60 unique genes we performed STRING Interaction Network (https://string-db.org) by in putting the genes into the search window and performed Multiple Proteins Homo sapiens analysis.

CFG beyond Pain: evidence for involvement in other psychiatric and related disorders

We also used a CFG approach to examine evidence from other psychiatric and related disorders, for the long list of 65 candidate biomarkers (Table S3).

Therapeutics

Pharmacogenomics

We analyzed which of our individual top biomarkers is known to be modulated by existing drugs using our CFG databases, and using Ingenuity Drugs analyses (Table S4).

New drug discovery/repurposing

We also analyzed which drugs and natural compounds are an opposite match for the gene expression profile of panels of our top biomarkers (n = 65), using the Connectivity Map (https://portals.broadinstitute.org, Broad Institute, MIT) (Table 3). Thirty-three out of 65 probesets were present in the HGU-133A array used for the Connectivity Map.

Convergent functional evidence (CFE)

We tabulated into a convergent functional evidence (CFE) score all the evidence from discovery (up to 6 points), prioritization (up to 12 points), validation (up to 6 points), testing (state, trait first year ED visits, trait all future ED visits- up to 8 points each if significantly predicts in all subjects, 6 points if predicts by gender, 4 points if predicts in gender/diagnosis). The total score can be up to 48 points: 36 from our data and 12 from literature data. We weigh our data three times as much as the literature data. The goal is to highlight, based on the totality of our data and of the evidence in the field to date, biomarkers that have all around evidence: track pain, are reflective of pain pathology, and predict it. Such biomarkers merit priority evaluation in future clinical trials.

Results

First, we used a discovery cohort composed of subjects with psychiatric disorders followed longitudinally over time [6,7,8,9,10,11], in which each subject had blood samples collected and neuropsychological testing done in at least one low pain state visit (Pain VAS ≤ 2 out of 10) and at least one high pain state visit (Pain VAS ≥ 6 out of 10) (Fig. 1 and Fig. S1).

We used a powerful longitudinal within-subject design [6,7,8,9,10,11,12] in individuals to discover blood gene expression changes between self-reported low pain and high pain states. A longitudinal within-subject design is orders of magnitude more powerful than a cross-sectional case-control design. Some of these candidate gene expression biomarkers are increased in expression in high pain states (being putative risk genes, or “algogenes”), and others are decreased in expression (being putative protective genes, or “pain suppressor genes”).

Second, we prioritized this list of candidate biomarkers with a Bayesian-like Convergent Functional Genomics approach [13, 14], comprehensively integrating previous published human and animal model evidence in the field for involvement in pain, and directly citing it.

Third, we further validated our top biomarkers from discovery and prioritization in an independent cohort of psychiatric subjects also carrying a clinical diagnosis of a pain disorder, and with high scores on pain severity and functional impairment ratings.

We ended up with a list of 65 candidate biomarkers (Table 2 and S2, S3, S5) from the first three steps, including a shorter list of five validated biomarkers (MFAP3, PIK3CD, SVEP1, TNFRSF11B, ELAC2). The biomarkers with the best evidence after validation were Hs.666804/MFAP3 (p = 6.03E-04) and PIK3CD (p = 1.59E-02).

Table 2 Convergent functional evidence (CFE) for top candidate biomarkers for pain (n = 60 genes, 65 probesets)

Fourth, we tested if the 65 candidate biomarkers are able to predict pain severity state, and future emergency department (ED) visits for pain, in another independent cohort of psychiatric subjects. We used biomarker levels information cross-sectionally, as well as expanded longitudinal information about biomarker levels at multiple visits, as predictors. We tested the biomarkers in all subjects in the test cohort, as well as in a more personalized fashion by gender and psychiatric diagnosis, showing increased accuracy with the personalized approach, in particular in women (Fig. 2). Across all subjects tested, CNTN1 was the best predictor for state (AUC 63%, p = 0.0014), GBP1 the best predictor for trait first year ED visits (AUC 59%, p = 0.0035), and GNG7 the best predictor for trait all future ED visits (OR 1.28, p = 0.00013, surviving Bonferroni correction for the 65 biomarkers tested). By gender, in females, DNAJC18 was the best predictor for state (AUC 78%, p = 0.0049), GBP1 the best predictor for trait first year ED visits (AUC 71%, p = 0.043) and ASTN2 for trait all future ED visits (OR 2.45, p = 0.044). In males, CNTN1 was the best predictor for state (AUC 63%, p = 0.0022), Hs.554262 the best predictor for trait first year ED visits (AUC 59%, p = 0.016), and MFAP3 the best predictor for trait all future ED visits (OR 1.35, p = 0.0089). Personalized by gender and diagnosis, in female bipolar CDK6 was a strong predictor for state (AUC 100%, p = 0.007), in female PTSD SHMT1 was a strong predictor for trait first year ED visits (AUC 100%, p = 0.022), and in female depression GNG7 for trait all future ED visits (OR 14.54, p = 0.022). In male depression CASP6 was a strong predictor for state (AUC 87%, p = 0.00007, surviving Bonferroni correction for the 65 biomarkers tested), in male PTSD LY9 was a strong predictor for trait first year ED visits (AUC 77%, p = 0.041), and in male PTSD MFAP3 was a strong predictor for trait all future ED visits (OR 15.93, p = 0.00085). In general, panels of all 65 top biomarkers or of the five validated biomarkers did not work as well as individual biomarkers, particularly when the later are tested by gender and diagnosis, consistent with there being heterogeneity in the population and supporting the need for personalization. The notable exception was predicting longitudinally all future ED visits for pain, where the panel of five validated biomarkers performed better than individual biomarkers (Supplementary Information—Complete Analyses). Importantly, predictions of future ED visits for pain in the independent cohorts were consistently stronger using biomarkers than clinical phenotypic markers (pain VAS scale, pain items 21 and 22 from SF 36) (see Supplementary Information—Complete Analyses), supporting the utility of biomarkers.

Fig. 2
figure 2

Best single biomarkers predictors. From the long list (n = 65). Those on short list (n = 5) are bolded. Bar graph shows best predictive biomarkers in each group. *Nominally significant p < 0.05. **Bonferroni significant for the 65 biomarkers tested. Table underneath the figures displays the actual number of biomarkers for each group whose ROC AUC p-values (a, b) and Cox odds ratio p-values (c) are at least nominally significant. Some female diagnostic group are missing from the graph as they did not have any significant biomarkers. Cross-sectional is based on levels at one visit. Longitudinal is based on levels at multiple visits (integrates levels at most recent visit, maximum levels, slope into most recent visit, and maximum slope). Dividing lines represent the cutoffs for a test performing at chance levels (white), and at the same level as the best biomarkers for all subjects in cross-sectional (gray) and longitudinal (black) based predictions. All biomarkers perform better than chance. Biomarkers performed better when personalized by gender and diagnosis

Fifth, we assessed if our biomarkers have evidence for involvement in other psychiatric and related disorders (Table S3). A majority of our biomarkers have some evidence in other disorders, whereas a few seem to be specific for pain, such as CCDC144B (Coiled-Coil Domain Containing 144B), COL2A1 (Collagen Type II Alpha 1 Chain), PPFIBP2 (PPFIA Binding Protein 2), DENND1B (DENN Domain Containing 1B), ZNF441 (Zinc Finger Protein 441), TOP3A (Topoisomerase (DNA) III Alpha), and ZNF429 (Zinc Finger Protein 429). A majority of our biomarkers (50 out of 60 genes, i.e., 83.3%) have prior evidence for involvement in suicide, suggesting an extensive molecular co-morbidity between pain and suicide, to go along with the clinical and phenomenological co-morbidity (physical pain, psychic pain) [15]. We also analyzed the biological pathways and networks our biomarkers are involved in (Table S4 and Fig. 3). There is a network centered on GNG7 (Fig. 3), that may be involved in connectivity/signaling, comprising HTR2A, EDN1, PNOC (involved in pain signaling) and CALCA (involved in Reflex Sympathetic Dystrophy and Complex Regional Pain Syndrome). It is reassuring that we see PNOC (Prepronociceptin) increased in expression in high pain states, i.e., as an algogene. Given its known roles in pain, it serves as a de facto positive control. A second network is centered on CCND1, may be involved in activity/trophicity, and comprises HRAS, CDK6, PBRM1, CSDA, LOXL2, EDN1, PIK3CD, and VEGFA. A third network is centered on HLA DRB1, may be involved in reactivity/immune response, and comprises GBP1, ZNF429, COL2A1, and HLA DQB1, from our list of 65 top biomarkers.

Fig. 3
figure 3

Biological roles. STRING interaction network for the top biomarkers for pain (65 probesets, 60 genes)

Sixth, we identified which of our biomarkers are targets of existing drugs and thus can be used for pharmacogenomics population stratification and measuring of response to treatment (Table 2 and Table S5), as well as used the biomarker gene expression signature to interrogate the Connectivity Map database from Broad/MIT to identify drugs and natural compounds that can be repurposed for treating pain (Table 3). The top drugs identified as potential new pain therapeutic is SC-560, an NSAID, haloperidol, and antipsychotic, and amoxapine, an antidepressant. The top natural compounds were pyridoxine (vitamin B6), cyanocobalamin (vitamin B12), and apigenin (a plant flavonoid).

Table 3 Therapeutics

Discussion

Biomarkers are emerging as important tools in disorders where subjective self-report of an individual and/or clinical impression of a healthcare professional are not always reliable. Recent work by our group has identified blood gene expression biomarkers that track suicidality using powerful longitudinal within-subject designs, validated them in suicide completers, and tested them in independent cohorts demonstrating their ability to predict state (suicidal ideation), and to predict trait (future hospitalizations for suicidality) (Niculescu et al. [7], Levey et al. [2], Niculescu et al. [8, 11]). Similar to suicidality, pain is a subjective feeling, with objective roots. It may reflect past or current injury events, their adverse consequences and compensatory mechanisms. The rationale for identifying validated and reproducible blood biomarkers is precisely because you cannot directly biopsy brain and spinal centers of pain perception. Blood biomarkers are easily accessible, and constitute a surrogate (liquid biopsy).

We present work employing a powerful longitudinal within-subject design, previously described by us for suicidality [6,7,8,9,10,11,12], and used now to discover blood gene expression changes between self-reported low pain and high pain states. Gene expression is more powerful than genetics, as it integrates a multitude of genetic variants and environmental effects. Longitudinal within-subject designs are more powerful than case-control designs, and can provide information with small Ns, as illustrated and discussed by Snyder and colleagues [16,17,18], as well as Schork, Topol, and colleagues [19, 20]. Some of these candidate gene expression biomarkers are increased in expression in high pain states (being putative risk genes), and others are decreased in expression (being putative protective/resilience genes). We cannot readily differentiate with our observational studies which of them are a reflection of damage and which are compensatory mechanisms. However, given the fact that these biomarkers are discovered in Step 1 by tracking present/state changes in the perception of pain and not past/trait exposure, they may be more likely a reflection of pathogenesis rather than adaptation.

Our systematic approach led to the identification of objective predictive biomarkers for pain, state and trait. We present evidence for universal biomarkers for pain, as well as show evidence that personalization by gender and diagnosis enhances precision, going from AUCs over 60% to AUCs over 80%. A majority of the top biomarkers we have identified overlap with biomarkers previously identified by us in suicide, and almost all have evidence in other psychiatric disorders (Table S3). Overlap and co-morbidity of genetic findings are in general the rule, not the exception, for neuropsychiatric disorders. Given the fact that pain disorders and psychiatric disorders are highly co-morbid clinically, that psychiatric medications are used to treat pain, and that pain medications can have psychiatric effects, the overlap is perhaps not surprising. It underlies the co-morbidity and impact of pain on mental health and on suicidality. Indeed, mood disorders were among the top diseases identified by pathway analyses of our pain biomarker data (Table S4B).

The biomarkers with the best overall convergent information evidence (CFE) across the multiple steps were GNG7, CNTN1, LY9, CCDC144B, GBP1, and MFAP3 (Table 2). GNG7 (G Protein Subunit Gamma 7), with roles in signal transduction, is decreased in expression in blood in High Pain states in our work, i.e., it is a pain suppressor gene. GNG7 is a strong predictor in the independent cohorts, particularly for all future ED visits for pain. There is evidence in other tissues in human studies for involvement in pain (diabetic neuropathy [21], vertebral disc [22]). GNG7 is a strong predictor in the independent cohorts, particularly for all future ED visits for pain. GNG7 also has trans-diagnostic evidence for involvement in other psychiatric disorders. It is decreased in expression in mouse brain by alcohol, hallucinogens, and stress [23, 24, 25], and increased in expression by omega-3 fatty acids. CNTN1 (Contactin 1), with roles in neuronal cell adhesion, is decreased in expression in blood in High Pain states in our work, i.e., it is a pain suppressor gene. Reassuringly, there is a possible mechanistic basis for its involvement in pain [26], and there is convergent evidence in other tissues in human studies for involvement in pain: CNTN1 has also been reported to be decreased in expression in CSF in women with chronic widespread pain (CWP) [27]. Anti-contactin 1 autoantibodies, that block/decrease levels of contactin 1, have been described in chronic inflammatory demyelinating polyneuropathy [28]. Such reproducibility across studies, tissues and populations provides strong reasons to consider it as a bona fide marker for pain, and it serves as a reassuring de facto positive control for the design and power of our study. CNTN1 has also trans-diagnostic evidence for involvement in psychiatric disorders. It is decreased in expression in schizophrenia brain [29] and blood [30], and in blood in suicidality in females [8]. CNTN1 is increased in expression by clozapine in mouse brain [24]. LY9 (lymphocyte antigen 9), with immunomodulatory roles, is increased in expression in blood in High Pain states in our work, i.e., it is an algogene. LY9 is a good predictor in the independent cohorts for state and trait, particularly for males with MDD and PTSD. It also has epigenetic evidence for involvement in exposure to stress [31], and is decreased in expression by omega-3 fatty acids in mouse brain [32]. CCDC144B (Coiled-Coil Domain Containing 144B) is decreased in expression in blood in High Pain states in our work. There is evidence in other tissues in human [33] and animal model [34] studies for involvement in pain. CCDC144B is a good predictor in the independent cohorts for state and trait, particularly for males with psychosis (SZ, SZA). It does not have trans-diagnostic evidence for involvement in other psychiatric disorders, seeming to be relatively specific for pain. GBP1 (guanylate binding protein 1), with interferon induced signaling roles, is increased in expression in blood in High Pain states in our work. There is other evidence in human studies, gene expression [33] and genetic [35], for involvement in pain. GBP1 is a predictor in the independent cohorts for trait, particularly in females. It is increased in expression in the brain in MDD [36], schizophrenia [37, 38], and suicide [36], and in blood in PTSD [39]. GBP1 is decreased in expression by omega-3 in mouse brain [40]. Hs.666804/MFAP3 (microfibril associated protein 3), another of the top markers, is a component of elastin-associated microfibrils. MFAP3 has the most robust empirical evidence from our discovery and validation steps, and is a strong predictor in the independent cohort, particularly for pain in females and males with PTSD. Interestingly, it has no prior evidence for pain in the literature curated to date for our Prioritization/CFG step, which demonstrates that we are casting a wide-enough net with our approach that can bring to the fore completely novel findings. MFAP3 is decreased in expression in blood in High Pain states in our work, i.e., it is a pain suppressor gene. It also has previous evidence for involvement in alcoholism [41], stress [42], and suicide [7, 9].

A phenotypic clustering analysis of the discovery cohort revealed two broad putative subtypes of High Pain states, a predominantly psychotic subtype, possibly related to mis-connectivity and increased perception of pain centrally, and a predominantly anxious subtype, possibly related to reactivity and increased physical health reasons for pain peripherally. Deeper analyses of the clustering in future studies may also substantiate further parsing of the subtypes, possibly into eight instead of only two subtypes, and of underlying differentiating biomarkers.

The biomarkers gene expression signatures also open the door to drug repurposing approaches, including of nutraceuticals. Nutraceuticals are particularly amenable to use in preventive population level approaches.

In conclusion, our work opens the door for precision medicine for pain, with objective diagnostics and targeted novel therapeutics. Given the massive negative impact of untreated pain on quality of life, the current lack of objective measures to determine appropriateness of treatment, and the severe addiction gateway potential of existing opioid-based pain medications, the importance of approaches such as ours cannot be overstated.