Unsupervised data-driven stratification of mentalizing heterogeneity in autism

Individuals affected by autism spectrum conditions (ASC) are considerably heterogeneous. Novel approaches are needed to parse this heterogeneity to enhance precision in clinical and translational research. Applying a clustering approach taken from genomics and systems biology on two large independent cognitive datasets of adults with and without ASC (n = 694; n = 249), we find replicable evidence for 5 discrete ASC subgroups that are highly differentiated in item-level performance on an explicit mentalizing task tapping ability to read complex emotion and mental states from the eye region of the face (Reading the Mind in the Eyes Test; RMET). Three subgroups comprising 45–62% of ASC adults show evidence for large impairments (Cohen’s d = −1.03 to −11.21), while other subgroups are effectively unimpaired. These findings delineate robust natural subdivisions within the ASC population that may allow for more individualized inferences and accelerate research towards precision medicine goals.


Supplementary Figure 1: Item-difficulty Patterning Across TD Subgroups.
Panels A (TD Discovery) and B (TD Replication) show item-difficulty profiles (i.e., percentage of subjects within a subgroup that answer the item correctly) for each ASC subgroup denoted by the different colored lines. Panels C and D show correlation matrices from item-difficulty between subgroups. Asterisks indicate specific comparisons that pass FDR q<0.05 correction for multiple comparisons.

Supplementary Figure 2: TD between-subject dissimilarity of RMET response patterns.
This figure depicts between-subject dissimilarity matrices in TD for the easy item (A) or difficult item (B) subsets. Cooler colors indicate more between-subject similarity, whereas hotter colors indicate more between-subject dissimilarity. Each cell of the matrices represents the dissimilarity between a pair of subjects. The rows and columns are arranged by subgroup rank order and Discovery and Replication datasets are adjacent to each other and denoted above the rows and columns by D and R. The black outlines delineate between-subject dissimilarities within a particular subgroup.

Supplementary Figure 3: Confusion matrices for multi-class classifier predictions of TD subgroup membership.
Confusion matrices show counts of actual TD subgroup membership along the rows and classifier predicted subgroup membership along the columns. The coloring of cells in the confusion matrices represents the percentage of actual subgroup individuals predicted within each subgroup category. Above the matrices are descriptions of which dataset was used for training and testing.

Supplementary Figure 4: Multi-class classifier performance
This figure shows the histogram of the null distribution of classifier accuracy when labels were randomly shuffled (10,000 iterations). The true accuracy level under the real labels is shown as a red line. Panel A shows performance for the analysis on ASC subgroups, whereas panel B shows performance for TD subgroups.

Supplementary Figure 5: Verbal IQ
This figure shows verbal IQ data as a boxplot with dots overlaid to represent individual subject's data points. It also shows heatmaps of effect size for comparisons made within ASC subgroups, within TD subgroups, and comparisons between ASC and TD subgroups. An asterisk next to effect sizes indicates that this comparison passed Bonferroni correction for multiple comparisons.

Supplementary Figure 6: Age
This figure shows age data as a boxplot with dots overlaid to represent individual subject's data points. It also shows heatmaps of effect size for comparisons made within ASC subgroups, within TD subgroups, and comparisons between ASC and TD subgroups. Panel A shows data from the Discovery (CARD) dataset, whereas panel B shows data from the Replication (AIMS) dataset. An asterisk next to effect sizes indicates that this comparison passed Bonferroni correction for multiple comparisons.

Supplementary Figure 8: EQ
This figure shows EQ data as a boxplot with dots overlaid to represent individual subject's data points. It also shows heatmaps of effect size for comparisons made within ASC subgroups, within TD subgroups, and comparisons between ASC and TD subgroups. Panel A shows data from the Discovery (CARD) dataset, whereas panel B shows data from the Replication (AIMS) dataset. An asterisk next to effect sizes indicates that this comparison passed Bonferroni correction for multiple comparisons.

Supplementary Figure 9: BDI and BAI
This figure shows BDI (panel A) and BAI (panel B) data as boxplots with dots overlaid to represent individual subject's data points. It also shows heatmaps of effect size for comparisons made within ASC subgroups, within TD subgroups, and comparisons between ASC and TD subgroups. An asterisk next to effect sizes indicates that this comparison passed Bonferroni correction for multiple comparisons.