Introduction

Advances in psychiatric genetics have led to the identification of a growing number of individually rare, but collectively common genetic variants that are highly penetrant for psychiatric morbidity [1, 2]. Of these, recurrent gene dosage disorders (GDD), including aneuploidies and sub-chromosomal copy number variations (CNVs), have been most amenable to clinical characterization because their incidence yields cohorts that are sufficiently large for group phenotyping [3]. Despite increasing reports on such cohorts, there have been only a few studies which compare behavioral profiles across several GDDs [4, 5]. Such research is important for determining if different genetic lesions induce unique psychopathology profiles. If significant behavioral variegation is observed across GDDs, genetic diagnosis may be used to tailor assessments and improve prognostication. Conversely, weak variegation implies the existence of common biological pathways that “concentrate” different genetic risks, with potentially positive implications for treatment generalizability.

The degree to which different genetic diagnoses impart unique behavioral profiles is hotly-debated [6]. The current study engages with this debate by providing an analysis of variation in autism-related traits (ARTs) across seven GDDs—Down, Williams–Beuren, Smith–Magenis, and several sex chromosome aneuploidy syndromes—using the Social Responsiveness Scale—Second Edition (SRS-2 [7]), a well-validated and widely used questionnaire of ARTs. These seven clinical cohorts enabled analysis of available SRS-2 data across an informatively diverse set of GDDs which varied substantially in their genomic basis (duplications and deletions, aneuploidies, and CNVs) and clinical characteristics (e.g., severity of behavioral disturbance and intellectual disability). Our focus on ARTs rather than diagnostic status was motivated by several considerations. First, autism-related social communication and behavioral flexibility impairments exist as a continuous distribution within the general population [8]. Moreover, these traits are not only elevated in individuals with ASD, but also in groups with diverse non-ASD diagnoses [9,10,11]. Moreover, variation in ARTs in both the general population and groups with non-ASD psychiatric diagnoses is known to correlate with adaptive functioning and other clinical outcomes [12,13,14,15]. Therefore, the degree to which ARTs show dissociable alterations across different genetic disorders carries broad relevance. Second, ARTs provide a powerful context to test for patterned effects of rare genetic disorders on psychopathology, because there is already evidence for their genetic dissociability from quantitative genetic research in population-based samples [16, 17]. Finally, variation in ARTs is linked to variation in cognitive ability at genetic [18,19,20] and clinical [21] levels. This property makes ARTs particularly well-suited for testing whether apparent differences in psychopathology across genetic disorders are amplified or diminished by the degree of co-occurring cognitive impairment. Thus, the current research sought to examine (a) whether ART profiles, as measured by the SRS-2 subscales, vary as a function of GDD, (b) how cognitive impairment relates to different ARTs, and (c) whether machine learning could be used to predict genotype from SRS-2 subscales and items.

Methods and materials

Procedures

Participants included 350 individuals with one of seven GDDs, 171 typically developing (TD) controls, and 74 individuals with a behaviorally defined diagnosis of ASD. Absent any single, prospectively identified sample of youth with large cohorts of diverse GDDs and TD controls, the current sample was collated via collaborations across several research labs studying GDDs at the National Institutes of Health Intramural Research Program. Across IRB-approved protocols, informed consent was obtained from participants/guardians and study procedures adhered to guidelines set forth in the Declaration of Helsinki. In addition, an ASD sample was compiled from the National Database on Autism Research.

Participants

The GDD sample consisted of 7 subgroups: (i) four groups with differing sex chromosome aneuploidies: “plusX” (XXY, XXX, n = 89 (A subset of the sex chromosome aneuploidy cohort was previously independently reported with regard to their SRS-2 total raw and T-scores [22])), “plusY” (XYY, n = 27), “plusXX” (XXXY, XXXX, XXXXY, XXXXX, n = 28), and “plusXY” (XXYY, n = 25), (ii) “DS” (Down syndrome, trisomy 21, n = 22), (iii) “WS” (Williams–Beuren syndrome, del7q11.23, n = 93 (A subset of the WS cohort was previously independently reported [23, 24].)), and (iv) “SMS” (Smith–Magenis syndrome, del17p11.2 or RAI mutation, n = 66). Genetic testing procedures for the GDD groups are provided in the Supplemental Material. Participant characteristics are summarized in Table 1.

Table 1 Demographics and SRS-2 total scores.

In the current investigation, we included males and females with an extra X chromosome in our ‘plusX’ and ‘plus XX’ groups. Combining male and female carriers of a supernumerary X-chromosome into a single “plus X” group was supported by prior behavioral studies of SCAs (e.g., refs. [22, 25]) and lack of statistical evidence for such an interaction in our own data (i.e., no statistically significant T-score differences between XXX and XXY groups).

Measures

ART measurement

The SRS-2 [7] consists of 65 items and taps social functioning as well as restricted interests and repetitive behaviors. Parents reported on their child’s behavior using the Preschool (ages 2.5–4.5 years; n = 13), School-Age (ages 4–18 years; n = 498), or Adult (ages 19+; n = 87) forms. The Preschool and Adult forms of the SRS-2 were created, respectively, as downward and upward extensions of the original SRS, which had ages that correspond to the SRS-2 school age form. Of the 65 items on the SRS-2, 32 items are identical across the age versions; 33 items are adjusted due to differences in developmental expectations, with the majority only involving slight modifications to item content to fit with the relevant age group (e.g., referring to children vs. adults in the item or referring to ‘playing with’ rather than ‘interacting with’ peers). Thus, the SRS-2 is well-suited to describing clinical groups across a wide age range (see ref. [26] for an example of another study that used the SRS-2 in a similar manner).

The Preschool, School-Age, and Adult SRS-2 forms each consist of the same five treatment subscales which are derived by summing their constituent items across the three forms. Descriptions of the treatment subscales, including the number of items by subscale and example behaviors, are provided in Table 2. These subscales yield normative T-scores (i.e., age-group-normed [Preschool, School-Age, Adult] for all three SRS-2 forms and sex-normed for the SRS-2 School-Age form) which were used to compare ART scores across groups and to examine relations between ARTs and cognitive impairment. For machine learning analyses, raw SRS-2 subscale and item scores with age and sex covaried were used. The SRS-2 has strong psychometric characteristics, with high internal consistency (α > 0.90) and test-retest reliability (r ≥ 0.88).

Table 2 SRS-2 treatment subscales.

Cognitive impairment measurement

Because participants were enrolled in different investigations, several tests were used to estimate cognitive impairment (i.e., intellectual functioning). These are detailed in the Supplemental Material. For the current investigation, IQ standard scores (mean = 100; SD = 15) were transformed to T-scores that have the same polarity (higher scores = greater impairment) and distribution (mean = 50, SD = 15) as the SRS-2. Thus, they are described as cognitive impairment rather than IQ.

Statistical analyses

Prior to running primary analyses, SRS-2 data were inspected and found to be normally distributed and free of outliers (>3 SDs from mean). Primary analyses were as follows.

Evaluation of SRS-2 profiles among the GDD groups

To examine SRS-2 profiles using normative T-scores, a 7 (GDD groups) × 5 (SRS-2 subscale) mixed-model ANOVA was completed (followed by an ANCOVA with cognitive impairment covaried). Then two complementary approaches were used to provide finer-grained descriptions of group and subscale effects. First, a series of ANOVAs (and ANCOVAs with cognitive impairment covaried) were completed to test the effect of GDD on each SRS-2 subscale and the effect of SRS-2 subscale within each GDD (see Table 3). Multiple comparisons were controlled with a Bonferroni-correction for the number of SRS-2 subscales (p = 0.01 [0.05/5]) and the number of GDDs (p = 0.007 [0.05/7]).

Table 3 Age and sex normed T-scores on the SRS-2 by GDD and scale.

Second, SRS-2 profile differences (i.e., differences in the pattern of test scores/strengths and weakness) were evaluated using a series of mixed-model ANOVAs. Specifically, 21 mixed model ANOVAs (each consisting of one between-subjects factor [group: GDD1 vs. GDD2] and one within-subjects factor [SRS-2 subscale]) were run. For these analyses, group effects (e.g., GDD1 is more/less impaired than GDD 2 overall) and group*subscale interaction effects (e.g., GDD1 and GDD2 have different profiles) were evaluated. To adjust for the number of effects examined (21 group effects, 21 group*subscale interactions), statistical significance was evaluated against a Bonferroni-corrected p-value of p < 0.001 (=0.05/42) (Fig. 1, upper triangle). These analyses were re-run with cognitive impairment covaried (Fig. 1, lower triangle).

Fig. 1: Synopsis of results from pairwise mixed model ANOVAs (above diagonal) and ANCOVAs with cognitive impairment covaried (below diagonal).
figure 1

To test for differential ART subscale profiles between each unique pair of GDD groups, we ran 21 (number of unique GDD group pairings) 2 (GDD group) × 5 (SRS-2 subscale) mixed-model ANOVAs. For these analyses, group effects (e.g., GDD 1 is more or less impaired overall than GDD 2 on the SRS-2 subscales) and group*subscale interaction effects (e.g., there is a difference in the SRS-2 profile for GDD 1 vs. GDD 2) were evaluated and results are presented above the diagonal. Parallel analyses were also completed using ANCOVA including cognitive impairment as a covariate and results are presented below the diagonal. When interpreting the figure, note the following. Main effects of group were denoted with a ‘G’; group*scale interactions were denoted with an ‘I’. Instances in which there was a main effect of group or group*scale interaction that did not survive Bonferroni correction are denoted with a single asterisk (*p < 0.05); those that survived Bonferroni correction are denoted with a double asterisk (**p < 0.05—Bonferroni corrected). Finally, to aid interpretation, color coding was implemented as follows. When a main effect of group was identified that survived Bonferroni correction, the cell in the matrix was color coded blue. When there was a Bonferroni-corrected group*subscale interaction, the cell was color coded yellow. Instances in which there was both a main effect of group (magnitude of impairment) and a group* subscale interaction (SRS-2 profile difference) following Bonferroni correction were color coded green.

In order to interpret the group*subscale interactions that indicated a profile difference between different GDD pairs, ANOVAs/ANCOVAs yielding a significant group*subscale interaction were followed up with post-hoc comparisons between (ANOVAs: Bonferroni-corrected p = 0.0008 [=0.05/60]; ANCOVAs: corrected p = 0.001 [=0.05/35]) and within (ANOVAs and ANCOVAs: Bonferroni-corrected p = 0.0007 [=0.05/70]) groups, see Fig. 2 (between-group) and Fig. 3 (within-group). Note that all p-values reported in the manuscript in which group means were compared are two-tailed tests.

Fig. 2: Gene dosage disorder (GDD) scores for different autism-related traits (ARTs).
figure 2

Top panel: Point-line graph showing score profiles for each GDD across ARTs. Color encodes group. GDDs are in solid lines, and the benchmark autism spectrum disorder (ASD) and healthy volunteer (HV) groups are in dashed lines. Middle panel: Boxplots for each ART showing GDD group score distributions. Bottom panel: Heatmaps for each ART showing Cohen’s d effect sizes for all pairwise GDD group comparisons (column group vs. row group). Asterisks denote statistically significant comparisons (*nominal p < 0.05, **surviving Bonferroni correction for multiple comparisons).

Fig. 3: Autism-related trait (ART) scores for different gene dosage disorders (GDDs).
figure 3

Top panel: Point-line graph showing score profiles for each ART across GDDs. Colored solid lines encode ARTs. Cognitive impairment scores are shown as a reference (dashed gray). Middle panel: Boxplots for each GDD showing score distributions for each ART. Bottom panel: Heatmaps for each GDD showing Cohen’s d effect sizes for all pairwise ART comparisons (column group vs. row group). Asterisks denote statistically significant comparisons (*nominal p < 0.05, **surviving Bonferroni correction for multiple comparisons).

Relationship between ARTs and cognitive impairment across GDD groups

Linear mixed models were used to test for variation in ART–cognitive impairment relationships as a function of both GDD group and SRS-2 subscale. Models with successively lower-order interactions between the fixed effects of GDD group, ART subscale, and cognitive impairment were compared using ANOVAs, with participant ID as a random effect. Figure 4 visualizes these relationships. For each unique cell in this matrix, a linear relationship between cognitive impairment and ARTs was quantified using percentage bend correlation (selected for robustness to outliers) as implemented in the R package correlations.

Fig. 4: Relationships between autism-related trait (ART) scores and cognitive impairment within each gene dosage disorder (GDD).
figure 4

Scatterplots and linear fit lines showing the relationship between increasing cognitive impairment (x-axis: “IQ_as_Tscore”) and ART score value (y-axis) faceted by GDD (rows) and ART subscale (columns). Robust correlation coefficients are provided for each cell. Color encodes GDD. Dashed lines show population norm values (50, black) and 2 standard deviations above this norm (70, gray). Note that IQ is inverted and transformed to a distribution with mean = 50, sd = 10 to form “IQ_as_Tscore” (i.e., IQ_as_Tscore > 70 is equivalent to IQ < 70).

Prediction of genotype from ARTs

To examine whether ART profiles across GDDs are sufficiently distinctive that they can be used to predict genotype from SRS-2 ratings, machine learning techniques including least absolute shrinkage and selection operator (LASSO) [27] and group LASSO [28] were used. These models have been widely applied and have been shown to yield high prediction accuracy and robust pattern identification [29,30,31,32].

LASSO was conducted to examine whether each of the GDD groups can be distinguished from the remaining groups’ data utilizing the five SRS-2 subscales. For each of the 7 GDD groups (i.e., the target group), the remaining data were constructed to be balanced and representative by randomly sampling subjects from each of the other 6 GDD groups with equal sizes. The resulting mixed-GDD group was required to have the same sample size as the target group. This allowed balanced data between the target group and the mixed-GDD group as suggested by the machine learning literature [33, 34].

Group LASSO, selected to account for potentially high correlations among the items, was also applied to test how well each of the GDD groups can be differentiated from all others using the 65 SRS-2 items. To evaluate whether the predictive ability of the SRS-2 subscales or items varied by including cognitive impairment in the model, we re-ran the LASSO and group LASSO models with cognitive impairment covaried.

For each model, a nested 5-fold cross-validation (CV) was used to tune the model penalty parameter and evaluate model performance. Prediction accuracy was averaged across the 5-fold CV, and each model’s 95% confidence interval was reported (Fig. 5). A heat map was created (Fig. 6) for the most parsimonious model (with cognitive impairment covaried) to visualize feature importance and the direction of the feature effect (e.g., a positive or negative effect). The feature importance was calculated as the proportion of times that a feature was selected by a machine learning model as an important predictor to improve prediction accuracy across the 5-fold CV.

Fig. 5: Average prediction accuracy for each GDD group achieved by each of the four machine learning (ML) models.
figure 5

All the models yield plausible (above chance) prediction accuracy. Without IQ, group LASSO with 65 items performs better than LASSO with 5 subscales for most of the GDD groups except WS where their prediction accuracies are similar. With IQ information, the comparison of ML model performance with 65 items vs. 5 subscales varies across the GDD groups. The error bar indicates 95% confidence interval of the 1000 bootstrap samples across the 5-fold cross-validation.

Fig. 6: Feature importance for the LASSO model with IQ included.
figure 6

The coloring represents both the feature importance and the direction of feature effect. Positive values indicate an increased likelihood of a certain GDD group while negative values suggest a reduced probability of a certain GDD group. Higher absolute values (i.e., feature importance values) indicate greater consistency of a feature being selected as an important predictor to improve prediction accuracy across the 5-fold cross-validation. Awareness is consistently selected as an important predictor for SMS and PLUSX. Cognition is consistently selected for predicting SMS and WS. Motivation is consistently selected for predicting WS, PLUSX, and PLUSY. RIRB is consistently selected for predicting SMS and DS.

Supplemental Table 2 summarizes the analytic plan.

Results

SRS-2 profiles across and within GDDs

Table 3 details the mean SRS-2 subscale scores by group. All were elevated above the normative reference T-score of 50 (ps < 0.004), with the exception of the Soc_Mot subscale in DS. Mixed model ANOVA revealed main effects of GDD group (F [6,343] = 10.26, p < 0.001, np2 = 0.15) and SRS-2 subscale (F[4,1215] = 68.90, p < 0.001, np2 = 0.17); these main effects were qualified by a significant group*subscale interaction (F[21,1215] = 12.18, p < 0.001, np2 = 0.18), indicating that SRS-2 profiles differed among groups. A follow-up ANCOVA with cognitive impairment covaried also identified a statistically-significant main effect of group (F[6, 207] = 5.00, p < 0.001, np2 = 0.13), qualified by a statistically significant group*subscale interaction (F[21, 716] = 5.04, p < 0.001, np2 = 0.13). Thus, there were differences in SRS-2 profiles among GDD groups that could not be fully explained by variation in cognitive impairment.

The effects of GDD group on each subscale and of subscale within each GDD group were next evaluated with univariate ANOVA (see Table 3) (see Supplemental Table 3 for estimated marginal means/effect sizes estimates with cognitive impairment held constant). After Bonferroni-correction, significant effects of group were observed for all subscales (except Soc_Cog with cognitive impairment covaried), indicating significant variation in ARTs across GDDs. The proportion of variance explained by GDD group differed between ART subscales, with a high of 26% (RIRB) and a low of 9% (Soc_Cog). Statistically significant within-group variation across ART subscales was also observed for all GDDs, except the +Y group.

Next, as is common practice in neurodevelopmental disorders research [22, 35,36,37], SRS-2 profile differences (i.e., differences in the pattern of test scores/strengths and weakness) were evaluated using a series of mixed-model ANOVAs (or ANCOVAs covarying for cognitive impairment). Specifically, 21 mixed model ANOVAs/ANCOVAs (each consisting of one between-subjects factor [group: GDD1 vs. GDD2] and one within-subjects factor [SRS-2 subscale]) were completed. For these analyses, group effects indicating an overall difference in the magnitude of impairment between two groups (e.g., GDD1 is more/less impaired than GDD 2 overall) and group*subscale interaction effects (e.g., GDD1 and GDD2 have a different pattern of scores on the SRS-2 subscales) were evaluated. The results of these ANOVAs and ANCOVAs are presented in Fig. 1 (upper and lower triangles, respectively). As seen in Fig. 1, when a main effect of group (i.e., a difference between the two GDDs being compared in the overall magnitude of impairment) was identified, the cell in the matrix was color coded blue. When there was a group*subscale interaction (i.e., the profile or pattern of SRS-2 subscale scores differed between the two GDDs being considered), the cell was color coded yellow. Lastly, instances in which there was both a main effect of group (magnitude of impairment) and a group*subscale interaction (SRS-2 profile difference) were color coded green. As evidenced by the preponderance of yellow cells in the matrix, the most common form of between-group ART differences was for the profile of ART scores in the absence of group differences in overall ART severity, regardless of whether cognitive impairment was covaried. The GDD groups that most differed from others in their SRS-2 profiles were the WS and plusX groups. The SMS group was notable for often showing differences in overall SRS-2 scores and profile.

The data underlying the statistical comparisons detailed above are presented visually in Figs. 2 and 3. For those pairwise GDD comparisons characterized by group*subscale interactions, follow-up pairwise t-tests were completed to identify which SRS-2 subscales differed between and within GDD groups (see bottom panels of Figs. 2 and 3, respectively). For results with cognitive impairment covaried, see Supplementary Table 2.

ART–IQ relationships across GDDs and SRS-2 subscales

Analysis of variance comparisons of nested mixed models failed to find evidence that relationships between SRS-2 scores and cognitive impairment are significantly modulated by interactive effects of GDD and ART subscale (p = 0.57). However, the relationship between cognitive impairment and ART scores did vary significantly as a function of subscale (controlling for a main effect of GDD group, p < 10−10). Thus, different SRS-2 subscales vary from each other in the nature of their relationships with cognitive impairment, but this does not differ significantly across GDD groups.

Given these results, standardized regression coefficients were estimated for IQ as a predictor of each SRS-2 score while controlling for the main effect of GDD group. This revealed that greater cognitive impairment was associated with more severe ARTs for all subscales, but that the magnitude of this relationship (i.e., regression slope, β) varied by ART subscale: Soc_Cog (β = 0.3, p = 0.000002), Soc_Com (β = 0.28, p = 0.00002), RIRB (β = 0.26, p = 0.00005), Soc_Mot (β = 0.24, p = 0.0003), Soc_Awr (β = 0.17, p = 0.01). Although GDD group did not significantly modulate the relationship between the SRS-2 subscales and cognitive impairment, scatterplots (with robust correlation coefficients) of the relationship between each SRS-2 subscale and cognitive impairment are provided for each GDD group (Fig. 4), given the rarity of these conditions.

Prediction of genotype from SRS-2 scores

Finally, machine learning was used to assess if SRS-2 score variation across GDDs was sufficient to predict an individual’s GDD grouping. Models included either the five SRS-2 subscales (LASSO) or the 65 items (group LASSO) and were run with and without cognitive impairment covaried. Figure 5 displays prediction accuracy for the four models. Average prediction accuracy was similar across models, with most models having 60–80% prediction accuracy. As the magnitude of the differences between the models was modest, we focus on the results of the LASSO model (utilizing the 5 subscales) with cognitive impairment covaried for parsimony’s sake in order to highlight which model features (i.e., SRS-2 subscales) were of the greatest importance when cognitive impairment was held constant. To visualize, a heatmap was created (Fig. 6) that depicts feature-importance for predicting group membership. Higher absolute feature-importance values indicated greater consistency of a feature being selected as an important predictor across the 5-fold CV. To simplify interpretation, features with an importance value >0.75 were considered to be consistently selected by LASSO. The maximally important predictive features varied between GDDs, with notably important SRS-2 subscale predictors including: Soc_Mot for WS, plusX, and plusY; Soc_Awr for SMS and plusX; RIRB for SMS and DS; and Cognition for SMS and WS. The Soc_Com subscale was notably of limited importance for predictive accuracy of all GDDs.

Discussion

The analyses presented in this report detail the profile of ARTs within and across 7 GDDs and provide fresh insights into how ART profiles vary across these genetically defined groups. First, our study replicates and adds to prior single-disorder reports of ARTs in the specific GDDs considered. For example, consistent with studies that examine diagnoses of ASD among different GDDs, our continuous examination of ARTs revealed the lowest impairment in the DS group [38]. In contrast, the SMS group presented with highest ART impairment, particularly in the realm of repetitive behavior, consistent with past reports [39, 40]. Within the WS group, social cognition was a peak impairment (along with repetitive behavior), whereas social motivation was largely preserved, consistent with prior research [41]. Lastly, within the sex chromosome aneuploidy subgroups, elevated ARTs were observed relative to normative expectations. Moreover, ARTs were nominally more impaired in those with an extra Y compared to those with an extra X, consistent with prior research [35].

By directly comparing the GDDs studied, we document considerable variegation in ART profiles as a function of GDD that is largely maintained when controlling for cognitive impairment. For example, without controlling for cognitive impairment, we observed 12 unique pairwise GDD group differences in ART profiles. Six of these differences were with WS and appeared to be driven by the relative preservation of social motivation in this group. Other notable profile differences include the relative preservation of social motivation in DS as compared to SMS, and greater severity of many ARTs in SMS as compared to several other GDDs. Seven unique pairwise GDD group differences in ART elevation profiles were apparent after controlling for cognitive impairment. Salient aspects of this variegation above and beyond cognitive impairment included the relative preservation of social motivation in WS and the relatively low level of RIRBs in plusX. Consistent with prior research, these findings suggest that there are meaningful differences in the profile of ART elevations (i.e., the pattern of scores on the different scales as opposed to the overall severity of ART elevation) between different GDDs [5].

In further support of this notion, we find evidence for the discriminability of GDDs by ART profile from multivariate machine learning analyses. In particular, we found moderate to high levels of prediction for most of the GDD groups using sparse regression models which varied as a function of granularity of features examined (5 subscales vs. 65 items) and whether cognitive impairment was included in the model. Overall, a comparison of models suggested similar levels of prediction accuracy for models in which the 5 subscales or 65 items were used as predictors. For parsimony, we focus on the 5 subscale model in which cognitive impairment was covaried. From this model, we learned the following: the maximally important predictive features vary between GDDs, with notably important SRS-2 subscale predictors including Soc_Mot for all GDDs except SMS and plusXY, Soc_Awr for SMS and DS, RIRB for SMS, DS, and WS, and Soc_Cog for WS. The Soc_Com subscale was of limited importance for predictive accuracy in all GDDs. One possible explanation for this is that all of the disorders studied are characterized by some degree of language impairment [42,43,44,45]. Even impairments in non-social facets of language are likely to impact social communication abilities by limiting the toolkit needed to effectively communicate. Thus, this ART may carry a lower level of specificity across different GDDs.

Our findings carry implications for both basic and clinical neuroscience. The observation that different GDDs can induce different ART profiles suggests that the human brain systems underlying different ARTs must be dissociable at some level. For example, disruptions of the different gene sets that define each GDD may achieve dissociable changes in social motivation as compared to repetitive behavior by altering the development of different features of the brain [46]. However, our findings also indicate that some ART elevations appear to show less variability across GDDs, suggesting that the genetic fractionability of these traits may be lower than that of other ARTs. Thus, there may be many more routes to impacting highly integrative brain outputs than there are for impacting more granular aspects of behavior such as the tendency to show restricted and repetitive behaviors. A goal for future work will be testing if traits that are highly variegated across GDDs also show distinct brain-behavior correlations within GDDs.

Dissociability of ART profiles across GDDs is also important from clinical and translational perspectives. If a clinically relevant trait is highly differentiated across GDDs, knowing an individual’s GDD subtype could help to tailor care. Of the ARTs, patterns of RIRBs may most closely fit this scenario. Given the significant functional impact of these behaviors [47] this may be a priority area for attempted tailoring of clinical care to GDD. Although on a longer timeframe, our findings also inform prospects for mechanistically informed treatments for ARTs. Specifically, the non-specific elevation of a certain ARTs across multiple GDDs implies shared mechanistic pathways that, if successfully targeted, could provide a generalizable path to cross-GDD interventions.

Our findings should be considered in light of several limitations. First, we examine ARTs in the absence of diagnostic information about ASD or other psychiatric disorders. However, there is extensive evidence for continuity between continuous measures of ARTs and categorical ASD diagnoses [48]. Also, ART elevation is well-recognized across many non-ASD psychiatric diagnoses [9,10,11], and ASD itself is often comorbid with other psychiatric diagnoses [49]. Taken together, we believe these considerations support our focus on dimensional ARTs. Second, the key analytic outcomes of our study design may vary with overall severity of clinical impairment and/or participant age. In this context, the potential for ascertainment bias and our inability to meaningfully model age effects represent notable limitations. However, these are broader challenges for neuropsychiatric research in rare disorders and will only be overcome with detailed dimensional data on large, longitudinal and population-based GDD cohorts (e.g., ref. [50]). Such data will enable our field to address critical open questions including the clinically important potential for developmentally dynamic shifts in ARTs, which may themselves vary between different GDDs. A related challenge in seeking to understand these developmental dynamisms is the tension between needing instruments that capture age-specific symptomatology while also generating output metrics that can be combined across different age ranges and are scaled relative to age expectations. This tension is reflected in need to use the Preschool, School-Age, and Adult forms of the SRS-2 in the current research. This limitation in our method represents a common a challenge faced by researchers studying neurodevelopmental disorders—i.e., the limited number of assessment tools that are available to evaluate cognition and behavior for individuals with a wide range of chronological and/or mental ages (for a review, see ref. [51]). Third, smaller sample sizes in some GDD groups impacted the ability to detect statistically significant differences between and within groups. We addressed this by also providing effect size estimates. However, future research could benefit from studying GDD groups with equal sample sizes. Fourth, the current study’s sample was compiled across multiple research labs and was not prospectively ascertained. Although this may be conceptualized as a sample of ‘convenience,’ our approach permitted comparing diverse GDDs, both in term of genotypic variation and behavioral presentation. We hope that this will encourage future research in which prospectively identified samples of youth with diverse GDDs may be studied to further elucidate the GDD-specific and shared ARTs that characterize these unique groups. Lastly, the SRS-2 has been critiqued for both its lack of statistically derived factor structure and differential measurement in phenotypically diverse populations [52], such as high and low IQ. Although this criticism should be considered when interpreting the current study’s findings, it is important to note that the SRS-2 is one of the best tools available to measure continuous autistic traits in large samples of participants. Acknowledging this limitation, we hope that the current study will spur further research with diverse GDD samples using alternative tools and assessment approaches with the goal of further distilling GDD-specific and shared traits. It is our hope that such research will inform both basic and clinical science and ultimately support quality of life for individuals with GDDs.