Autism spectrum disorder (ASD) is a neurodevelopmental condition defined by deficits in social communication and interaction and restricted and repetitive patterns of behaviour.1 Although genetic and neurobiological factors contribute to the ASD phenotype, neurocognitive functions also play an important role in the core behaviours of ASD. Executive function (EF) has long been of interest given its proposed role in contributing to specific impairments in ASD in the areas of theory of mind2 and social cognition, social impairment,3 restricted and repetitive behaviour patterns4 as well as broader impacts on quality of life.5 EF encompasses a broad range of purposeful higher-order neuropsychological domains, including goal-directed behaviour, abstract reasoning, decision making and social regulation.6 It is generally accepted that EF difficulties have an important role in ASD, described as poor regional coordination and integration of prefrontal executive processes that integrate with other emotion and social circuits.7 In ASD, brain abnormalities have been observed in cortical volume and thickness in both frontal and other cortical brain regions.8 Aberrant functional network connectivity influencing EF have also been reported between prefrontal and other cortical and subcortical areas9 that may be influenced by different EF subdomains. A summary of key EF domains, associated brain areas and related ASD phenotype is presented in Supplementary Table 1. Figure 1 illustrates developmental changes in EF and observed impairment in ASD.

Figure 1
figure 1

Developmental changes in executive function and associated impairment in autism spectrum disorders (ASD).

PowerPoint slide

Despite extensive research, however, including a number of meta-analyses10, 11 and reviews,12, 13 the role of EF in ASD remains unclear. Individual research studies place different emphasis on the EF constructs of interest and few studies evaluate EF across most of the accepted domains of interest. The identification of a cognitive profile of executive dysfunction could provide a better understanding of the neural circuitry underpinning ASD, and may assist with clinical utility, diagnosis and treatment. Previous published ASD meta-analyses and systematic reviews of EF focus on one or two specific subdomains and thus an overall framework of the executive dysfunction profile in ASD has not been established.

Other factors may contribute to these mixed findings. Studies inconsistently control for potential moderators of the relationship between EF and ASD and the observed high interindividual cognitive variability within the spectrum.14 Moderators considered in ASD studies include variables that affect sample selection or task characteristics and may influence the observed relationship between executive dysfunction and ASD. Selection of the ASD and comparison samples varies between studies depending on the ASD classification(s) of interest (see Supplementary Table 2) and choice of a clinical or typical comparison group (or a combination). Matching criteria between ASD and comparison groups also vary and may be based on a range of variables including cognitive measures (different IQ indices), age (chronological/mental age) and gender. Sample characteristics including age and gender may moderate EF performance given that EF domains may follow a differential developmental trajectory in the typically developing brain15, 16 and those with ASD.17 However, many studies do not examine developmental trajectories in ASD and also utilise mixed age cohorts,18, 19 making outcomes on EF performance difficult to interpret. Finally, task characteristics may vary between studies on a range of variables including assessment type (psychometric tests vs experimental tasks), features of presented stimulus (verbal vs visuospatial), presentation format (computerised vs traditional) and the response type required from the participant (verbal or motor response). Yet, these potential moderators have not been systematically studied.20

The observed interindividual cognitive variability within the spectrum and differences in EF performance that may be differentially modulated by distinct EF domains (and associated brain areas) and/or different mediating factors may translate to the need for a more individualised approach for diagnostic measures and clinical interventions. Thus, research is needed to explore the clinical utility of group-based EF measures (based on standardised psychometric tests and experimental tasks) in discriminating between ASD and comparison typical populations.21

The objectives of the study were: (1) to examine evidence for executive dysfunction in ASD including the individual contribution of EF subdomains; (2) to assess the influence of moderating variables based on sample or task characteristics; and (3) to review the clinical sensitivity of individual EF measures. We hypothesised that overall EF will be impaired in ASD, individual EF subdomains will make a differential contribution to executive dysfunction and this will be correlated with improved clinical sensitivity in associated behavioural and informant EF measures. An exploratory approach was taken for reviewing moderator impact and no specific hypotheses were made.

Materials and methods

The PRISMA (Preferred Reporting Items for Systematic Reviews and Meta-Analyses)22 and MOOSE (Meta-analyses Of Observational Studies in Epidemiology)23 guidelines (refer Supplementary Materials) were followed in conducting this study.

Study selection

Included in the meta-analysis were studies published in peer-reviewed journals in the English language with an a priori aim to assess EF in ASD. The selected publication date was between 1980 (first inclusion of Autism diagnosis in the DSM-III (The Diagnostic and Statistical Manual of Mental Disorders, Third Edition)24 and end of June 2016. The majority of selected studies utilised a cross-sectional design. For data extracted from clinical trials or longitudinal study designs, only the baseline data were included in the meta-analysis. Eligible studies included participants with a diagnosis of ASD based on DSM or International Classification of Diseases (ICD) classifications) and/or a diagnosis of ASD based on structured and validated diagnostic instruments (Autism Diagnostic Observations Schedule (ADOS) and/or the Autism Diagnostic Interview (ADI)). Given that the search period ranged from 1980 to 2016, the diagnostic criteria for the selected studies varied depending on the edition of DSM and ICD publications. Studies with participants <6 years of age were excluded from the meta-analysis to account for the qualitative differences in the types of assessment instruments used in younger aged groups.25 Eligible studies evaluated one or more of six key EF domains (Concept Formation/Set Shifting, Mental Flexibility/Set Switching, Fluency, Planning, Response Inhibition and Working Memory; refer to Supplementary Table 1). These EF domains were selected as they have been widely investigated in the ASD literature.

Search strategy and study variables

The literature search was conducted on the computerised databases of Medline, Embase and PsycINFO using search criteria based on EF domains and measures of interest. The first author (EAD) screened search results for initial eligibility based on title and abstract. Full-text versions of the potentially eligible studies were then assessed and included if satisfying the selection criteria. Coding of individual outcomes into EF domains was done by the first author based on accepted neuropsychological categorisation6, 26 and verified by a second independent reviewer (JEP). Reported outcomes were extracted as mean and s.d., F-test value or t-test value for each group at a single time point. In order to avoid selective data extraction, when studies included more than one measure of EF (either within the same domain or for more than one domain), all relevant outcomes were extracted. This was based on the assumption that within assessment measures are at least moderately correlated, and to avoid selective data reporting.

Moderator analysis


A stratified approach (based on the mean age reported in the study plus or minus 1 s.d.) was utilised to categorise each study in one of the following age categories: ‘Children<12’, ‘Youth >12<18’, ‘Adults>18’, ‘Mixed age<18 years’ and ‘Mixed age’.


A comparison between studies that included female or male participants only.

Diagnostic group

Participants were grouped based on their study classification (Autism Diagnosis, Asperger or ASD combined (including a combination of two or more of the above classifications).

Control type

A comparison between studies utilising neurotypical controls vs sibling controls.

Diagnostic tool

Studies were classified based on the assessment tool(s) utilised for the diagnosis. These may have included one or more of the following: DSM, ICD, ADOS and ADI.

Sample matching criteria

A comparison between studies that used one or more matching criteria for sample selection.

IQ differences

A comparison based on whether a significant IQ difference was observed between the study groups.

Assessment tool format

A comparison between computer vs traditional administration of assessments.

Stimulus processing mode

A comparison based on the presentation features of test stimuli, verbal vs nonverbal.

Response mode

A comparison based on the response mode required from the participants, verbal vs motor.

Study appraisal and risk of bias in individual studies

Quality review was based on the Quality Assessment Tool27 and was completed by two independent assessors (see Acknowledgements), not involved in any other aspects of the study. To assess risk of publication bias, funnel plots for overall outcomes as well as for each cognitive domain were inspected for asymmetry and formally assessed using Egger’s regression test.

Data analysis

Data analysis was performed on Comprehensive Meta-analysis (CMA) version 3 (Biostat, Englewood, NJ, USA) using the random-effects model. The unit of analysis was standardised mean difference (calculated as Hedges’ g) on each measure between ASD and healthy controls. When more than one control group was reported in the study, the control groups were combined following established statistical procedures.28 A positive effect size indicated that the control group performed better on the EF measure compared with the ASD group.

The data analysis was planned a priori and was completed in three stages. The initial analysis combined all EF outcomes to assess the overall EF effect size in ASD. The second analysis examined subgroup comparison of the individual EF domains. In the final step, subgroup analyses were conducted to examine between study variability and moderator impact for overall EF and individual EF domains and ‘Year of Publication’ was assessed as a covariate in meta-regression analyses.

Hedges’ g effect sizes 0.30, >0.30 and <0.60 and 0.60 are described as small, moderate or large following the same convention applied to Cohen’s d effect sizes. Heterogeneity across studies was assessed using the I2 statistic with 95% confidence intervals (CIs). The I2 values of 25, 50 and 75% define small, moderate and large heterogeneity. Between-subgroup heterogeneity was tested using Cochrane’s Q-statistic.

Clinical sensitivity was determined by the overlap percentage statistic (OL%) based on Cohen’s29 idealised distributions. This can be converted to a percentage representing the degree that the performance of the ASD group overlaps with the control group; OL<15% represents clinical marker criteria.


The literature search resulted in 235 studies (see Supplementary Table 2) that satisfied the selection criteria with a total of 14 081 participants (ASD: N=6816, Control: N=7265).

Overall effect of EF

The overall effect of EF was large and statistically significant (k=235, g=0.60, 95% CI 0.53–0.67, P<0.001). True heterogeneity across studies was large (I2=75.5%). The forest plot revealed that studies including results based on self- or carer-reported ratings had higher effect sizes with the majority of results ranging from large to very large (0.64<g<5.60) compared with studies with psychometric tests and/or experimental tasks. Egger’s regression test was significant (Egger’s intercept=1.5, P<0.001), but a trim-and-fill analysis did not result in the imputation of any studies.

A statistical comparison of ‘Assessment Tool Type’ revealed significant differences between the different tool types (psychometric test vs experimental task vs questionnaire) with the questionnaire format having the largest effect size (g=1.84, 95% CI 1.48–2.20, P<0.001). Comparably, true heterogeneity was highest for the questionnaire format (I2=89.6) compared with experimental tasks (I2=56.7) and psychometric tests (I2=50.4). It is of note that most studies including self/informant reports used the Behavioural Rating Inventory of Executive Function (BRIEF).30 A sensitivity analysis revealed that by excluding questionnaire outcomes, the revised effect size was moderate (g=0.49, 95% CI 0.44–0.55, P<0.001). Homogeneity was similarly reduced to I2=54.2%.

Given the above results the questionnaire data were excluded from the remainder of the meta-analysis and results are reported based on dependent measures assessed by psychometric tests and/or experimental tasks only.

The forest plot revealed two conspicuous outliers with Hedges’ g values >5.31, 32 Following the removal of these two outliers there was a marginal reduction in effect size ((k=221, g=0.48, 95% CI 0.43–0.53, P<0.001) and heterogeneity was also comparably reduced (I2=46.2%). The funnel plot suggested evidence of small study effect (Egger’s intercept=1.21, P<0.001). A trim-and-fill analysis however did not result in imputation of any studies and the overall effect size remained the same.

EF domain-specific effects

Small to moderate effect sizes were observed for each of the EF domains of interest (see Figure 2 and Supplementary Table 3). The subgroup analysis between EF domains was not significant (P>0.05).

Figure 2
figure 2

Subgroup analysis of executive function domains.

PowerPoint slide

Moderator analysis

Figure 3 summarises the EF subdomain analysis by age. A detailed summary of subgroup analysis for moderator effects is presented in Supplementary Table 3.

Figure 3
figure 3

Subgroup analysis of executive function domains by age.

PowerPoint slide

All within-subgroup analyses on moderator effects were significant with effect sizes ranging from small–moderate to large, but the majority of the between-subgroup analyses assessing moderator impact were not significant. Significant between-group effects were observed for the subgroup age comparison for the Working Memory domain. This was driven by lack of significant difference between ASD and controls for the Youth age grouping. The subgroup comparison between computer and traditional assessment format was also significant with presentation of tasks by computer having an attenuating effect on overall EF and this effect was also observed for the EF domains of Concept Formation and Response Inhibition. Year of Publication was statistically significant in moderating effect on overall EF and also for the subdomains of Concept Formation, Fluency and Planning.

Clinical specificity and sensitivity

Only a very limited number of measures achieved the criterion of clinical sensitivity as defined by Cohen’s ‘idealised population distribution’ (see Supplementary Table 4). The majority of the measures reaching clinical sensitivity were based on the BRIEF30 questionnaire.


The meta-analysis extracted all EF data since ASD was introduced as a psychiatric diagnosis and showed consistent evidence of an overall moderate effect size of executive dysfunction in ASD. Individuals with a diagnosis of ASD performed on average significantly worse on EF in comparison with neurotypical controls. However, contrary to our prediction that individual EF subdomains would be differentially impaired, no significant differences in effect sizes were observed between these. Moderate effect sizes were observed for all of the established individual EF subdomains of interest. These findings suggest that there is relative equivalence of EF impairments in ASD across the constructs that were examined. This was further supported in this study by the largely homogeneous impact of most moderators on EF outcomes.

These findings are also consistent with the largely linear trajectory observed in the development of EF in ASD (see Supplementary Table 1) and recent trends in ASD research focussing on aberrant brain connectivity in predicting cognitive deficits and symptom severity in ASD.33, 34 A global impairment due to either under- or overconnectivity between brain networks broadly contributing to EF, as opposed to discrete anatomical deficits, could account for the lack of differences between subdomains of EF. The age comparison was only significant for the working memory domain. The age effect for working memory may relate to the developmental trajectory reported for some EF in neurotypical populations where performance may decline around puberty because of synapse reorganisation.35 Thus, the lack of significant difference in working memory observed in adolescents may reflect the underlying neural changes observed in this developmental period that may contribute to a decrease in EF performance in typically developing individuals. Differences therefore in this subdomain between the two groups may be least pronounced in this age range.

The generally smaller effect sizes on EF observed for the adult ASD group support other research that, either due to developmental maturity and/or increased use of compensatory strategies, adults with ASD perform better in EF than younger age groups, whereas residual executive dysfunction still persists. In addition, a smaller effect size between ASD and controls was observed when only one of the ADOS or ADI was used for diagnosis. Given the variability across ASD and recommendations for multifactorial assessment, use of a single diagnostic tool may result in a less severe cohort meeting much broader ASD criteria.

It was of interest to also note the small but significant moderating impact of Year of Publication on overall EF and for the domains of Concept Formation, Fluency and Planning that may reflect the broadening of the Autism Spectrum criteria over successive editions of the DSM since the diagnosis was first made in DSM-III.

It was hypothesised that the different diagnostic classifications of ASD may introduce variability between studies of EF, because of potential heterogeneity in cognitive function reflecting different classifications. However, our results failed to find differences in effect sizes between different diagnostic groups. This lends support to the recent focus on individual variability within the spectrum rather than between classification groups guiding EF outcomes.36

Similarly, an evaluation of the potential differences between different matching criteria did not reach significance, although the largest effect size was observed for matching based on chronological age. Differences in EF are likely to be more pronounced between experimental and comparison groups within the same age range when no other moderators such as IQ or mental age are taken into account.

Our findings on the clinical utility of EF measures show that the majority of EF measures did not achieve clinical utility in differentiating between ASD and typical controls, with mostly informant-based measures based on the BRIEF30 achieving absolute clinical marker criteria. This lends further support to the proposition that measures with ecological validity (that is, based on more representative environmental situations) may be more appropriate especially in clinical practice. Informant measures such as the BRIEF may offer greater clinical utility, but further investigations are needed to consider whether outcomes represent higher validity or might be influenced by demand or reporter characteristics. Based on the results of this study, however, the superior ecological validity of informant measures21 supports their use for circuitry-based models (Research Domain Criteria)37 and clinical staging models38 (matching developmental stage of impairment with clinical intervention and risk factors39) and for diagnostic and intervention frameworks. In addition, laboratory-based EF neuropsychological tests should be chosen based on feasibility and ease of use, given the relative equivalence of performance across domains. Taken together, these findings suggest that the focus of diagnostic and intervention measures needs to shift to a more ecologically and clinically valid framework while taking into account the likely individual differences within the spectrum.

A number of limitations may have influenced the findings of this study. The self- or informant-reported questionnaires were excluded from the majority of analyses given the significant differences in effect sizes compared against psychometric tests and experimental tasks. In addition, only accuracy-based measures were included in the analysis with all reaction time variables excluded given the direction and specification of the reaction time variable to EF can be unclear. Furthermore, we did not explore the impact that intraindividual variability within the spectrum may have on the observed findings. Finally, although we attempted to consider a comprehensive number of moderators, there remain a number of factors that may influence EF in ASD. These may relate to task characteristics (for example, task complexity, open-ended vs structured task format20) or participant characteristic including symptom severity, emotional states (for example, depression/anxiety) or comorbidities (for example, attention deficit hyperactivity disorder). Anxiety in particular has been noted to have a strong association with poor EF performance in ASD populations.40

Conclusion and further directions

In this meta-analysis we conducted an evaluation of the role of EF in ASD, including an assessment of a large range of potential moderators. Our findings, across a large number of research participants with ASD, suggest there is an overall effect of executive dysfunction and this applies evenly across individual domains, where moderate effect sizes were observed. Predictions of a differential profile of executive dysfunction based on subdomain performance were not supported and our review of moderators mostly returned null results. Taken together, these findings suggest that ASD populations are impaired in EF, but this reflects an overall and not fractionated impairment in EF performance and may be best accounted by the observed aberrant long-range and local over- and underconnectivity between brain networks in ASD. Further work is needed to identify feasible and sensitive EF general markers for use in diagnosis and clinical trials. Given the stability of EF performance in ASD across neurodevelopment, early intervention may provide the best opportunity to alter trajectories over the lifetime to improve outcomes for people with ASD.