To understand heterogeneous disorders, researchers have sought homogeneous subtypes using a variety of features and analytic methods. Post-traumatic stress disorder (PTSD) presents with varying intensities of symptoms of intrusive trauma memories, hyperarousal, and stress-induced dysregulation. In the search, various combinations of features have been selected from clinical items, multi-omic blood markers, GWAS, neuro-cognition, electroencephalography, and regional brain connectivity domains. This has led to substantially different subtypes not easily compared but offering an expanded view of PTSD. Distinct subtypes have been identified from mixed feature classes including MRIs [1], multi-omic biomarkers [2] and electro-encephalographs that predict differential treatment outcomes [3].

Identifying subtypes from only clinical features supports the practice goal of precision medicine (PM) for treating physicians to be able to identify patients by features they can measure for whom they can recommend a specific treatment. This is best achieved if the features defining the subtypes are clinically accessible to them such as symptoms obtained from reliable and valid scales that they commonly use or can easily obtain such as clinical laboratory values. Other measures that are harder and more costly to obtain in the clinical setting need to be considered as well, particularly if they lead to subtypes for whom an efficacious treatments can be identified.

We performed a study designed from a PM perspective in which only clinical measures were used for subtyping. The subjects were 145 male military veterans several years post deployment, 74 with PTSD and 71 healthy controls [4]. We used as features clinical measures of symptoms from 16 reliable and validated mood and behavior scales related to PTSD. To obtain a measure of the distance between individuals, we used Random Forests (RF) [5], a machine learning method that classifies subjects into pre-specified categories (PTSD or not) based on an ensemble of decision trees using items only from the clinical scales. An entry in the proximity matrix generated by the RF is the fraction of times a pair of individuals are in the same terminal node of the trees and is a measure the degree of their feature similarity with respect to predicting PTSD. One minus this fraction is a distance measure, which was input to a clustering algorithm, Partitioning Around Medoids [6]. We call this two-step method “purposeful clustering” because the distance measure was tailored to the specific PM goal.

We found two subtypes, one having greater mean symptom severity on every scale item than the other. To examine their biological distinctiveness, a new RF was run comparing subtypes with a large selection (342) of multi-omic blood biomarkers with some prior evidence of a relationship to PTSD. The RF that distinguished the severe subtype from controls had high accuracy (AUC = 0.922, achieved with 37 markers) with epigenetic and microRNA markers among the top five with greatest importance to prediction. The markers have established evidence of a relationship to other neurological and psychiatric disorders. These results, while not causal, strongly support an epigenetic biological association with clinically recognizable symptom-severity subtypes.