Freezing of gait (FoG) is a disabling symptom of Parkinson’s Disease (PD), which presents as a “brief, episodic absence or marked reduction of forward progression of the feet, despite the intention to walk”.1 FoG is well-known to lead to falls2 and lower quality of life, making it an important target for treatment.3 The pathophysiological mechanism of FoG has been linked to executive dysfunction, particularly in aspects of cognitive control,4 which aligns with neuroimaging evidence showing fronto-parietal and fronto-striatal impairments.5 Recent meta-analytic data suggests that cognitive training (CT) is an effective6 and important7 behavioral intervention for improving cognition, and in particular executive functions, in patients with PD.

Given that these executive deficits have been hypothesized to underlie the pathophysiological mechanisms of FoG, it is plausible that reducing executive dysfunction via CT may lessen the severity of FoG, by mediating more effective fronto-striatal function.8 A number of studies have now shown that CT in PD can lead to neuroplastic changes by way of increased activity and functional connectivity in frontal-striatal regions.9,10,11 Given that FoG relates to dysfunction in these areas, it is reasonable to hypothesize that CT may facilitate more efficient processing between frontal and striatal regions, leading to a reduction of FoG severity. Interested readers are directed to a previous review from our group, which has provided more extensive evidence and rationale for this proposal.12

In this study, a double-blind randomized controlled trial was conducted to explore the efficacy of CT targeting executive functions in PD patients with FoG. We hypothesized that participants receiving CT would show improvements as illustrated by the reduced severity of FoG after completion of the intervention. Additionally, we anticipated that secondary outcomes including cognitive and psychosocial measures would show improvement following the CT program.



Figure 1 illustrates the flow of participants moving through the study. The first participant was randomized in April 2013 and the last in June 2015. There were nine dropouts in the active control condition (AC) (five prior to beginning the program) and one participant was lost to follow up. There were no dropouts in the CT group. In addition, in the AC group one participant was removed entirely from the analysis as their diagnosis was changed from PD to Progressive Supranuclear Palsy, and a second was removed as they could not complete the TUG assessments due to severe motor disability. Two participants were removed from the CT group due to inadvertent incorrect randomization. These participants were retrospectively identified as not meeting the original inclusion criteria and thus it was determined their data would not be analysed at any point.

Fig. 1
figure 1

CONSORT Flow diagram

Fig. 2
figure 2

Each condition involved two trials with a left and right turn version. In the 180° condition, the participant walked to the box, turned around and returned to their chair; In the 540° condition, they completed a 540° turn in the box before returning to the chair; In the box condition, participants shuffled around the box, keeping their inside foot to the outside of the box; in the dual task condition, participants did the same as in the 180°, however completed a cognitive task as they walked. This was either naming the months backwards or multiples of 9 or 7 aloud. The %TF outcome was calculated by summing all FoG episodes across the four conditions, and dividing by the total time to complete across all conditions

Upon intervention completion, TUG scoring indicated that despite self-reporting FoG, nine participants from the CT and four from the AC group did not objectively exhibit FoG on baseline assessment. Therefore, we designed two analysis populations post-hoc. The full analysis set (FAS) included all participants randomized into the study, whether or not they dropped out, or showed objective FoG at BL. The sample of interest set (SIS) was decided upon as a post-hoc analysis, to account for the fact that a number of participants did not display FoG at baseline. The SIS therefore included only those participants who showed objective FoG on BL testing, as identified in the baseline TUG, and completed the study in full. The SIS population is considered the analysis of interest and is therefore what is reported in the results, however all analyses were initially run on the FAS sample to confirm no sampling bias. Demographic data for both the FAS and SIS samples are provided in Table 1.

Table 1 Demographic data of participants in both analysis samples

Primary outcome

Results of the SIS analysis for the primary outcome are displayed in Table 2. This analysis showed that patients in the CT group showed a large and statistically significant reduction in FoG severity in the on-state compared to participants in the AC. There was no difference in the off-state. The FAS analysis was consistent with these results, suggesting that the SIS sample was not biased. We did not compare performance across each of the four conditions separately, on and off as this was not part of the predetermined outcome plan, and secondly, we felt it was an inappropriate exploratory analysis given the smaller than anticipated sample.

Table 2 Primary outcome data between groups before and after intervention

Secondary outcomes

Results of the SIS analysis for the secondary outcomes are displayed in Table 3. In the SIS analysis, there were no statistically significant differences for any of the secondary outcomes over time between groups.

Table 3 Secondary outcome data between groups before and after intervention

Covariate analysis

The covariate analysis showed that the results for the primary outcome remained unchanged by the introduction of covariates (i.e., still statistically significant). However, in terms of secondary outcomes, the inclusion of DDE as a covariate led to TMT-A and daytime sleep disturbance scores becoming significant, with those in the CT group improving compared to the AC.


This pilot study represents one of the largest RCTs of CT to date in PD. Though interpretation of the results must remain cautious owing to the limitations outlined below, the results allude to the potential for CT to reduce the severity of FoG in people with PD. We showed that CT led to a large and significant reduction of FoG severity compared to AC while in the on-state, but this was not replicated in the off-state. These results were consistent, whether we included participants who did not display FoG at baseline or not, and when accounting for covariates. We suggest these results warrant larger scale replication, employing the suggested methodological adjustments we provide below.

The result of FoG only improving during the on-state is noteworthy. Firstly, we preface this discussion by stating this is the clinically relevant behavioral state, as patients in their day-to-day life would take dopaminergic medications as prescribed to minimize time in the off-state. Our provisional supposition to explain this result is that participants in the off-state were too impaired to benefit from any of the potential changes initiated through CT. Training was expected to impact frontal processing and also occurred in the on-state. In the dopamine depleted state, it is conceivable that the striatal dysfunction overshadowed any benefit of CT,8,13 and FoG could not be improved. Our future analysis of functional neuroimaging outcomes in this study may be able to unravel this further.

A number of trials in older adults have now shown that CT can have a beneficial impact on multiple gait parameters.14,15,16 In PD specifically, a pilot study by Milman and colleagues17 showed that 12 weeks of CT could improve TUG performance. Unfortunately, this pilot study did not employ a control group. Therefore, the current results are an important extension showing improvements on TUG performance via the reduction of FoG, compared to an active control group.

Given that FoG was reduced in the CT group, we expected there to be additional improvements in tests of EF, which were presumed to underlie any improvement of FoG. However, changes on these outcomes did not reach statistical significance. We were therefore unable to confirm the hypothesis that improving EF would be the driver of reduced severity of FoG. It is possible that our smaller sample size was a factor however (though it was deemed inappropriate to conduct post-hoc power analysis18). Indeed, there were near medium-sized effect sizes (d ≥ 0.45) for many of the executive tests we anticipated improvements on including TMT-B, and shift-measures of the affective go-no-go test (AGN) and verbal fluency (VF) (see Table 2). We do note however that when adjusting for the effect of dopaminergic medication, the CT group did show medium-sized significant improvements compared to AC in processing speed and daytime sleepiness.


This study has limitations which warrant consideration. The first is that we did not meet the projected sample size target due to feasibility issues with recruitment. In addition, there were a number of dropouts in the AC group, although we note that over half of these dropouts occurred prior to commencement of the intervention and only one was due a lack of interest. The third issue was that participants were randomized prior to TUG scoring. This was necessary to avoid delay to participants being enrolled as TUG scoring is time consuming, requiring skilled and trained raters. Nonetheless, we attempted to address these limitations by running the FAS analysis, which confirmed our primary result.

Additionally, despite random allocation, the groups were unbalanced in their baseline FoG severity. The CT group had more FoG than the AC in the on and less in the off-state. We stratified the randomization by cognitive functioning (MOCA scores), however it may be more appropriate in future to stratify by objective FoG scores at baseline. We highlight however that the results remained when accounting for the impact of LEDD, and that the on-state is the clinically meaningful state. Related to this unbalanced severity, it is important to highlight that the AC group actually had worse FoG at follow-up compared to the CT group in the on-state and this pattern of results was replicated in the FAS analysis. Replication with a larger sample is needed to demonstrate if this is a reliable finding, or represents the variability found in small samples such as this one.

Future directions

We believe there is reason to be hopeful for the use of these trials in the future. Feedback from participants and family members involved in the groups was overwhelmingly positive, our pilot results highlight positive trends, and the importance of nonpharmacological trials including CT has become increasingly clear.7,19,20 We suggest that replication of this trial is warranted. However, with the hope of improving any future work learning from some of the issues that were raised during this study, we suggest authors consider some of the following suggestions.

Future studies where possible should aim to score FoG severity prior to enrollment. A certain threshold for severity (e.g., >5%) should be specified for eligibility, and stratification across groups could also be based on this. Where possible, additional methods of FoG measurement could increase the reliability of %FoG scores. This could be done through measures such as gait mats and accelerometry data, and also repeat TuG assessments to address measurement variability. Multisite recruitment would increase the potential for sample size without relying on home-based CT, which we do not believe would be a viable option,21 particularly in this sample. The inclusion of additional data to aid analysis such as measurement of expectancy effects and CT training data can be useful. Finally, we did not include a long-term follow up assessment. This has often been used as a criticism against CT, though we rebut that very few interventions elicit sustained improvements after the cessation of treatment. Thus, it is likely that clinically, CT needs to be continuously delivered in order to continue any found benefits, just like most other interventions (e.g., exercise, medications etc). Nevertheless, obtaining a better understanding of how long such results are maintained22 is useful for future trial design and clinical applications, and thus future studies could try to obtain this information if feasible.


The current study provides preliminary evidence that CT can reduce the severity of FoG in PD during the on-state. This improvement was seen without concurrent, significant changes to executive functioning (despite near-medium sized effects on these measures), but in the context of improved processing speed and daytime sleep disturbance. Despite the limitations of this study, these results add to the growing body of evidence showing that CT is a useful therapeutic technique worthy of continued exploration in PD.


Study registration

This study was registered in 2013 on the 5 April through the Australian and New Zealand Clinical Trials Registry (ACTRN12613000359730) and was approved by the Human Research Ethics Committee of the University of Sydney. Written informed consent was obtained from all participants.


Eligible participants were those diagnosed with idiopathic Parkinson’s disease based on the UK Brain Bank clinical criteria,23 with self-reported FoG at the time of assessment, and who were free from dementia as determined by a score of ≥24 on the mini-mental state examination (MMSE).24


The study was advertised in a local PD community magazine as well as local PD community support groups. Potential participants were also recruited from the Parkinson’s Disease Research Clinic at the Brain and Mind Centre, University of Sydney. Interested participants were invited to participate if they had previously reported a positive score on Question 3 of the Freezing of Gait Questionnaire (FOG-Q): “do you feel that your feet get glued to the floor while walking, making a turn, or trying to initiate walking (freezing)?”.25

Prior to recruitment, we used baseline data from a previously published trial26 to conduct a power analysis using a conservative effect size estimate of at least 0.2 in the study’s primary outcome. This suggested the minimum sample size required for each group was 39 (based on power = 0.80 and α = 0.05).

Study design

The study was a double-blind randomized active controlled trial. Interested patients were enrolled by CCW and LM after they met eligibility criteria during a baseline assessment and were then randomized into either the CT or an AC group. Conditions were masked as either “morning” or “afternoon” sessions, and the order of these was randomized between recruitment waves prior to trial commencement. In order to facilitate blinding, participants were told that each session involved different computerized activities, but were not explicitly told of a treatment or control group. Randomization of participants and morning/afternoon sessions was carried out using a randomly generated number sequence allocated by a blinded researcher not involved in trial recruitment, data gathering, assessments or training. Randomization was undertaken using permuted blocks and stratified by cognitive functioning, with strata defined by Montreal cognitive assessment (MOCA) scores of <26 or ≥26. Participants were advised of their allocation into the morning or afternoon session by way of sealed opaque envelopes delivered by CCW upon completion of the baseline assessments. Post-intervention assessments were conducted by clinicians who were blinded to treatment allocation. All participants allocated to the AC group were offered the opportunity to complete CT after their involvement in the trial was complete. Ten participants elected to complete this, with those who declined citing time commitments as the primary reason.


Baseline and post-intervention assessments were each completed in two parts: on and off-medications. The on-state assessment included a neuropsychological test battery, psychosocial measures, part III of the Movement Disorders Society’s revision of the Unified Parkinson’s Disease Research Scale (MDS-UPDRS),27 and a modified timed up-and-go (TUG) task. These assessments were completed in a random counterbalanced order and took approximately 2.5 h to complete. Baseline assessments were conducted within 3 weeks prior to training commencement, and follow-up assessment was within 3 weeks of the intervention finishing. The practically defined off-state assessment was completed in the morning on a different day when participants were asked not to take their usual Parkinson’s medication until after the assessment, and comprised a repeated MDS-UPDRS part III and TUG, taking approximately 1 h. Those with deep brain stimulation did not complete the off-state assessment. A random subset of participants also underwent neuroimaging, however the investigation of any training-induced changes are a tertiary outcome and are therefore not included in the current manuscript.

Primary outcome

The primary outcome was percentage of time spent frozen (%TF) across all four trials of a TUG assessment. Percentage was chosen as the primary outcome as it was anticipated to be more sensitive than a reduction on the FOG-Q, whilst accounting for inter-individual variability in gait speed and the variable duration of freezing episodes across TUG tasks.26 In each task, the participant was requested to get up from a chair, walk to a square box shape taped to the floor five meters ahead and complete both a left and a right turn (see Fig. 2). TUGs were video recorded and then scored independently post-assessment. Six scorers (MG, JMH, AJM, MG, JYYS, and KAEM) were randomly distributed videos of the TUGs. Scorers were given baseline and follow-up TUG videos in a random order for the same participant, to minimize pre-post scoring variability. FoG was tagged in the video at any point when a participant made a paroxysmal and involuntary cessation of normal progression of the feet through the task. This included a typical trembling of the feet, short shuffling steps of a few centimetres in length or a complete motor block.28

The %TF outcome was calculated by summing all FoG episodes across the four conditions, and dividing by the total time to complete across all conditions. Inter-rater variability amongst blinded scorers was strong, and calculated by all scorers being given a random selection of the same six videos to independently score. The intraclass correlation coefficient was .902 across all FoG episodes (average %TF: 9.56%). We note that two of the six videos by chance did not contain FoG, however they were included to confirm no false-positive scoring had occurred. As this inflated reliability across scorers however, we re-calculated the coefficient with the two videos removed to be sure. Scoring was still accurate across raters (.865) (average %TF: 14.31%).

Secondary outcomes

Cognitive assessment

To assess global functioning for descriptive purposes, the MMSE24 and MOCA29 were used. Total and delayed Hopkins verbal learning test-revised scores were used to assess verbal memory.30 To assess attention and working memory, the total score from the Digit Span subtest of the Wechsler Memory Scale was used.31 To assess verbal fluency, the total words generated on each condition of the VF subtest of the Delis–Kaplan executive function system was used.32 In this test, part 1 measures phonemic fluency across three letters, part 2 measures semantic fluency across two categories while parts 3 and 4 assess switching between two differing categories. Part 3 represents the total number of correct items, while part 4 represents the total number of correct switches. Processing speed was assessed by the number of correct responses on the oral symbol digit modalities test.33 For these measures, higher scores are indicative of better performance.

To assess processing speed and cognitive flexibility, times for parts A and B respectively of the trail making test (TMT) were used.34 Finally, the AGN of the Cambridge neuropsychological test automated battery35 was used as a computerized measure of inhibitory control/switching. Mean latency post-switch was used to determine performance in addition to the number of commissions or omissions. For these tests, lower scores were indicative of better performance. Alternate versions of each test with the exception of Digit Span (not available) were used at baseline and follow-up to minimise potential practice effects.

Psychosocial measures

Participants completed several questionnaires targeting mood and wellbeing. The Hospital Anxiety & Depression Scale was used to assess anxious and depressive symptoms.36 The Scales of Outcomes in PD (SCOPA)-Sleep was used to assess sleep quality in terms of both daytime sleepiness and night time sleep disturbance.37 The Parkinson’s Disease Questionnaire (PDQ-39)38 was used as a measure of quality of life. Finally, if a participant lived with a carer, the Cambridge behavioral inventory-revised39 was used as an informant report of cognitive and behavioral changes. For all of these scores, a higher score was indicative of more substantial impairment.


Both the CT and AC groups attended sessions at the Brain and Mind Centre at the University of Sydney in our designated CT laboratory, and both conditions were matched in terms of time, clinician contact, computer use and social interaction. In accordance with our previous CT programs for older adults (see,40,41) the intervention was completed in a group format (n ≤ 10), and comprised of 2-h sessions, twice weekly over 7 weeks (14 sessions in total). Both groups were supervised and facilitated by CCW & LM. The first hour of the session was identical across CT and AC groups: (i) 30–45 min was designated to psycho-education on a number of topics relevant to PD including cognition, sleep and mood, and was delivered by multidisciplinary specialists and researchers from the Brain and Mind Centre; (ii) participants then took an enforced break of 10–15 min. This first hour, whilst not CT per-se, was included for both conditions as a means of increasing participant engagement and has previously been shown to support our excellent program adherence rates.40,41 The second hour of the session differed across CT and AC groups:

(A) CT: Participants in this group completed a program of computerized CT tasks, selected for their focus on executive functions and on the basis of our previous experience employing the “Neuropsychological Educational Approach to Remediation” approach41,42 in providing computerized CT to >400 older adults,41,43,44 including those with PD.40 Tasks included designated “brain training” programs (e.g., Attention Process Training-III45) as well as commercially-available software (e.g., computerised Sudoku), which were determined by clinical neuropsychologists (LM, SLN) to target the cognitive processes of most interest to FoG (inhibitory control, attentional set-shifting, working memory, processing speed and visuospatial skills).4,12 Performance was monitored by the facilitators, with the focus of progressively making the tasks more difficult where possible for the participant. These changes were done in an individualized manner based on performance and in consultation with the participant. Therefore while the tasks delegated for each session where standardized across all participants, there were differences in how far each progressed in terms of difficulty. The majority of exercises provided the participant with feedback in the form of scores, and this was further discussed between facilitators and participants to help them better understand their performance.

(B) AC: Participants in this group completed a series of non-specific computer-based tasks including predominantly watching informative nature videos and answering content-related questions as previously used,46 as well as online “treasure hunts” devised by our team. These tasks were designed to provide broader, generalized cognitive engagement compared to the targeted focus on specific cognitive functions in the CT condition. Therefore, those in this group were not expected to have reduced FoG severity but were rather intended to match for clinician and peer contact, along with computer use.

Statistical analysis

To minimize any potential bias, statistical analyses were conducted by a consultant statistician experienced in RCTs who was based at the University of Sydney (see acknowledgements) and who was not involved in any other aspect of the trial. Data was analysed using SAS software version 9.4.

The analysis took the form of a mixed effects model using fixed effects fitted to all endpoints, to test the null hypothesis of no difference in change over time across groups against the alternative hypothesis of a difference between the two arms. An additional term in the model was fitted to account for the repeated measures pre- and post-intervention. “Participant” was declared a random effect. An unstructured covariance pattern between baseline and post-intervention was used. The Kenward and Roger’s method for correcting for the fixed effects, standard error bias by inflation of the variance and Satterthwaite’s adjustment to the degrees of freedom has also been applied to cater for the small sample size. Analyses of endpoints with non-normal variance for analysis have been transformed to the log10 scale for analysis. Results were back transformed for interpretation and represent geometric means. Cohens d was calculated as a measure of effect size with 0.2, 0.5, and 0.8 considered small, medium and large effects in the CT compared to the AC group.47

Covariate analysis

An additional analysis was undertaken to investigate the effects of the following covariates on all outcome measures: age, education, levodopa equivalency daily dose LEDD;48 years since diagnosis, and the amount of days between CT completion and FU assessment (days until FU) on each outcome. Only significant results are reported.

Data availability statement

The data that support the findings of this study are available from the corresponding author upon reasonable request.