Introduction

Spinal cord injury (SCI) remains a prominent cause of long-term disability in the United States.1 In an attempt to combat rising disability, numerous therapies have been identified over the last 25 years,2, 3, 4, 5, 6 whereas several have been tested in clinical trials.7, 8, 9 However, the ability to quantify ‘neurologic’ recovery with these therapies still remains challenging. This is because the commonly used clinical scales offer an indirect or an incomplete illustration of neurologic recovery. For example, most outcome measures do not assess the neurophysiologic substrates that directly contribute to functional recovery, such as mechanisms within the brain and its corticomotor output to the weak limbs.

To offer a more comprehensive view of neurologic recovery, there has been a recent drive to introduce tools that can supplement clinical diagnosis with assessment of neurophysiology in patients with SCI.10 One promising experimental technique is transcranial magnetic stimulation (TMS).11, 12, 13, 14, 15 TMS is a low-cost, virtually painless and non-invasive technique that has the ability to assess motor system neurophysiology. By stimulating the surface of the brain, TMS elicits impulses that travel via emergent motor pathways to evoke potentials in muscles of the contralateral limb. As such, TMS has an inherent advantage for study of neurological diseases, particularly incomplete SCI (iSCI). This is because it can objectively assess (1) activity, recruitment and viability of residual motor pathways from the brain that supply muscles of the weak limbs and (2) plasticity, or change in physiology of pathways and reorganization within parent cortices, which occurs in association with recovery.10, 16, 17, 18 Therefore, TMS metrics could be used to track both baseline neurophysiologic potential as well as plasticity that underlie functional recovery with therapies in iSCI.

However, before this promising 'experimental' technique can be translated into a 'clinical' modality for iSCI, it is critical to understand the reliability of TMS metrics in comparison with the reliability of commonly used clinical outcome measures. TMS metrics have generally been found to be reliable in the healthy, but in iSCI the neurologic injury and the diffuseness of injury, and/or the intake of centrally acting medications could likely affect reliability.19 Therefore here, we investigated reliability of TMS measures in patients with iSCI. Specifically, we evaluated test–retest reliability of metrics that quantify activity, recruitment and viability of corticospinal pathways devoted to muscles of the upper limbs in patients with cervical iSCI. We studied metrics for muscles that were less affected following injury, as well as muscles that were more affected. Finally, we also identified how heterogeneous characteristics of iSCI influence reliability of TMS metrics. We postulated that understanding factors that contribute to variability of TMS in SCI would be critical to consider, as one designs clinical studies to longitudinally assess neurophysiology and associated functional recovery.

Materials and methods

Subjects

Eight male patients with chronic cervical iSCI (mean age 53.5±4.1 years (s.d.), range 48–62 years) were enrolled. Clinical and demographic data are presented in Table 1. Inclusion criteria for the study were age 18 years, chronic phase (>6 months) after an iSCI between C2 and C8 levels, and incompleteness of injury classified as the American Spinal Injury Association Impairment Scale (AIS) B, C or D.20 Patients with contraindications to TMS, such as a history of seizures, medication-resistant epilepsy and intracranial metallic implants, were excluded from the study.21 Patients taking neuro-active medications were included if they maintained the same dose and intake regime throughout the study duration (see Table 1). Prior to enrollment, subject selection criteria and the level of injury were confirmed by a physician specializing in SCI (FF). All subjects gave informed consent prior to enrollment. The Institutional Review Board of the Cleveland Clinic and the Department of Defense’s Human Research Protection Office approved the experimental protocol.

Table 1 Patient characteristics

Study design and procedures

Subjects underwent two testing sessions separated by at least 2 weeks. During each session, functional and TMS measures were collected. We chose to collect functional measures so that reliability of TMS metrics could be compared with reliability of commonly used clinical indices.

Functional measures included the upper extremity motor score (UEMS), manual muscle testing (medical research council (MRC) scale) and the action research arm test (ARAT). We determined UEMS based on well-established methodology.22, 23 Strength of the elbow and finger flexors, wrist and elbow extensors, and little finger was tabulated for both the left and right sides. Scores were reported as a combined measure for the left and right sides of the body (total=50). A manual muscle test was completed for the left and right upper limbs at both testing time points. The same physical therapist (EBP) performed manual muscle testing at both sessions. Investigated muscles included trapezius, anterior deltoid, middle deltoid, posterior deltoid, infraspinatus, supraspinatus, biceps brachii, brachioradialis, triceps brachii, supinator, pronator teres, flexor carpi ulnais, flexor carpi radialis, extensor carpi radialis longus, extensor carpi ulnaris, extensor digitorum, flexor digitorum superficialis, flexor digitorum profundus, first dorsal interosseous, abductor digiti minimi, abductor pollicis brevis, opponens pollicis, flexor pollicis brevis and interrossei. Strength for each muscle was based on the MRC grade scale (0–5),20, 24, 25 wherein;

0=Total paralysis.

1=Palpable or visible contraction.

2=Active movement, full range of motion, gravity eliminated.

3=Active movement, full range of motion, against gravity.

4=Active movement, full range of motion, against gravity and provides some resistance.

5=Active movement, full range of motion, against gravity and provides normal resistance.

Similarly, we compiled ARAT scores for the left and right upper limbs using previously established methods.26, 27, 28 Briefly, patients were rated on performance upon 19 tasks designed to assess grasp, pinch, grip and gross motor abilities, with a maximum score of 57 for each side of the body. Scores for each task were defined by the following.

0=Can perform no part of the test.

1=Performs the test partially (for example, dropped the object).

2=Completes the test but takes abnormally long or has great difficulty.

3=Performs the test normally.

Neurophysiological measures included TMS metrics reflecting activity/recruitment of residual motor pathways, and activity or excitability of parent motor cortices. TMS metrics were collected only on the side of greater deficit (Table 1). The side of greater deficit was determined for each individual based on (1) patient’s complaint and (2) UEMS collected at initial clinical evaluation (Table 1).

Prior to neurophysiological measurements using TMS, anatomical (T1-weighted) magnetic resonance imaging (MRI) was collected in order to allow for stereotactic guidance during TMS. MRI-based frameless stereotactic guidance was adopted to ensure maximum accuracy in application of TMS. MRI images were collected with 176 axial slices with a thickness of 1 mm and field of view=256 × 256 mm. An inversion time/echo time/repetition time and flip angle of 1900 msec/1.71 msec/900 msec and 8°, respectively, were used. All MRI images were uploaded and registered with neuronavigation software (Brainsight, Rogue Research, Montreal, QC, Canada). The navigation software helped register cranial landmarks (nasion, left ear, right ear) for each individual with their respective sites on the MRI.29 The software relayed real-time information about the position of TMS coil on subject’s head, relation of TMS coil to subject’s cranial landmarks and their relation to target region identified on patient’s MRI. Investigators utilized this information to accurately apply TMS. For patients #2 and #5, a template MRI was utilized, as they could not participate in MRI for our research study because of the use of baclofen pumps (as per the institutional policies).

TMS was applied using a figure-of-eight coil (70 mm diameter) connected to a Magstim 2002 stimulator (Magstim, Dyfed, UK). The coil was held tangential to the scalp with the handle pointing backward and laterally at a 45º angle from the mid-sagittal axis. This position places the coil perpendicular to the central sulcus and the primary motor cortex (M1), allowing for maximal stimulation of the descending motor tracts, that is, corticospinal tracts.30

Surface electromyography was used to measure the TMS-evoked motor potentials (MEPs) in contralateral muscles. Electrodes (Ag/AgCl) were placed over the belly of the muscle. Electrodes with a diameter of 45 mm or 8 mm were used for recording based on the muscle that was being tested. Electromyography recordings were amplified, band-pass filtered (10 Hz–2 kHz), digitized (4 kHz; PowerLab 4/25T, AD Instruments, Colorado Springs, CO, USA) and saved for offline analysis (LabChart, version 7.3, ADInstruments, Colorado Springs, CO, USA).

All TMS measures were collected in an active state of the muscle, that is, the muscle was contracted at 20% of its maximal voluntary contraction. Patients were given visual feedback about the target level contraction that they were required to maintain (LabChart, version 7.3). Recording TMS metrics in an active state is common in iSCI patients.31, 32 This is because muscle activation lowers the threshold for evoking MEPs, otherwise corticospinal excitability is characteristically reduced, which limits the ability to evoke MEPs in SCI.11, 32, 33, 34 Patients were monitored throughout the process to ensure that they were not using compensatory strategies to maintain 20% contraction.

Using TMS, we defined the cortical site devoted to the target muscle, also known as the motor hotspot. For this, we applied TMS to sites on a grid (10 mm resolution) centered over the motor cortex (M1). Motor hotspot was identified as the site that elicited MEPs at least 200 μV larger than pre-stimulus muscle activity in the contracted muscle at the lowest TMS intensity over three out of five trials. Intensity of TMS required to elicit criterion-level MEPs at the hotspot was called active motor threshold (AMT).

At the hotspot, we also determined recruitment of residual motor pathways using recruitment curve (RC) and active MEP (AMEP), and inhibition of residual motor pathways using cortical silent period (CSP). For RCs, TMS was delivered at stimulus intensities ranging from 90 to 150% AMT, where different intensities were tested in a random order.14 We delivered 15 pulses at each intensity, whereas we recorded MEPs in the contralateral muscle. For AMEP, we delivered 15 consecutive pulses at 120% AMT. We measured CSP during measurement of AMEP. Specifically, we labeled the short-term suppression of ongoing muscle activity in the contracting muscle that follows AMEP as CSP (see Figure 4).35, 36

Finally, cortical representational maps that are used typically to witness shifts in plasticity were created for each muscle. Maps were defined by delivering TMS pulses to scalp sites represented by a 5 × 5 grid (10 mm resolution) centered on the motor hotspot (termed motor map). At each site, we delivered five TMS pulses at 110% AMT and recorded MEPs. A site was deemed part of the map when it elicited an MEP that was larger by at least one standard deviation compared with pre-stimulus activity in at least 3 out of 5 trials.

We collected all TMS metrics for two muscles: a weak muscle and a strong muscle. To identify these muscles, we first determined which muscles of the upper limb were eligible for TMS. A muscle was considered eligible for TMS if it elicited MEPs that were at least 200 μV larger than average pre-stimulus muscle activity. Thus, for each patient, several muscles were identified as TMS-eligible. Next, we determined which muscles could be paired as a weak and a strong muscle. To be considered as a pair, it was necessary that the relatively stronger muscle had an MRC score 3, and the weaker muscle was at least a grade weaker. The weaker muscle is operationally defined here as the muscle with the lower MRC grade, and the stronger muscle is defined as the muscle with the higher MRC grade. To ensure that the study of metrics of one muscle did not confound the study of metrics of the other because of close spacing between hotspots in the motor cortex, we identified the pair that had the largest available separation along the neurological axis.

Data analysis

We tested reliability for the following functional outcome measures: average MRC grade and the ARAT score for each side of the body; UEMS score and the MRC grade of the TMS tested muscle. We defined the average MRC grade by computing the mean of the MRC scores (0–5) across all tested muscles in the shoulder, forearm, wrist and hand for each side of the body, respectively. Similarly, we determined the ARAT score for each side of the body by summing the individual scores across the 19 assessed tasks (max 57 for each side of the body). All data analyses for neurophysiologic measures with TMS are outlined in Table 2.

Table 2 TMS metric analysis

Statistical analysis

We used the software SPSS (IBM Corporation, Armonk, NY, USA) for statistical analysis. Correlation and reliability between measurements at test 1 and test 2 were determined using Spearman’s correlation coefficient (SCC) and Concordance correlation coefficient (CCC).37 SCC is commonly used in reliability assessments, as it defines the relationship between two variables using a monotonic function. Specifically, variables are ranked based on raw values, and the difference between the ranks is utilized to assess reliability. For SCC analysis, both ρ and the associated P-value are reported. P0.05 is considered statistically significant. We also chose to include CCC in our analysis, as it offers a non-parametric assessment of reliability. CCC builds on the SCC analysis, as it is able to take into account the differences in mean and variance between test 1 and test 2.38 Significance of the CCC was determined based on guidelines established by Lin et al.37, 39 Specifically, the strength of association between parameters collected over tests 1 and 2 was deemed as small (0 to 0.6), substantial (0.6 to 0.8) or near-perfect (0.8 to 1) based on the value of the CCC coefficient.

We also investigated how reliability differed between muscles (higher MRC grade vs lower MRC grade) and across patients. For this, we utilized descriptive statistical analyses. For each muscle, we calculated percent change in metrics from test 1 to test 2 and visualized the data in two ways. First, we determined how reliability differed between muscle grades by plotting the average percent change from test 1 to test 2 for neurophysiologic metrics outlined in Table 2. We compared percent change of metrics across higher and lower MRC grade muscles. Second, to understand how variability was affected across patients, we plotted the average percent change for each patient using a stacked bar-plot for assessed parameters.

Statement of ethics

We certify that all applicable institutional and governmental regulations concerning the ethical use of human volunteers were following during the course of this research.

Results

Patient characteristics

All patients except patient #5 were right handed. The majority of patients (7 out of 8) had experienced greatest weakness on their right side. Patients were 109.3±118.2 months (range: ~2 to 31 years) post injury, where the most common etiology for injury was falls. Seven patients completed both tests 1 and 2; patient #6 withdrew following test 1 because of schedule constraints.

All TMS metrics could be acquired for muscle with the higher MRC grade. For the muscle with the lower MRC grade, however, only functional measures, AMT and CSP could be acquired across all patients. Motor maps were only acquired in six patients because in patient 1 MEPs could not be elicited at any site, except for the hotspot. Also, RCs could only be recorded from five patients because AMT values were high in patients 1 and 3, which precluded testing of higher intensities such as 140 and 150% AMT.

Functional measures

Functional outcomes collected on both sides of the body demonstrated high reliability (Figure 1). UEMS demonstrated significant reliability between test 1 and test 2 (ρ=0.899, P=0.015; CCC=0.964). The ARAT total score too was reliable for the more affected (ρ=0.883, P=0.008; CCC=0.975) and the less affected sides (ρ=0.964, P<0.001; CCC=0.988). Average MRC grade also showed substantial reliability for both the more affected (ρ=0.928, P=0.008; CCC=0.943) and the less affected sides of the body (ρ=1, P<0.001; CCC=0.905). When we studied reliability of MRC grades for muscles tested using TMS, however, only the muscle with the lower MRC grade showed significant reliability (ρ=0.921, P=0.026; CCC=0.862).

Figure 1
figure 1

Test–retest reliability of functional measures in patients with iSCI. (a) Signficant test–retest reliability was noted for the UEMS in patients with chronic iSCI. (b) We found that only the lower MRC grade muscle used in TMS showed reliability in scoring between test 1 and 2. (c) We noted significant reliability of the average MRC grade for the more and less affected sides for of the upper limb. Similarly, we noted that the ARAT showed reliability for both the less and the more affected side of the body. ρ values and associated P-values for SCCs are displayed for each plot. Bold ρ and P-values represent significant reliability. The CCCs are also displayed for each plot. Bold values for CCC represent substantial reliability based on guidelines (CCC>0.6). Open-circled plots denote metrics that were non-reliable. The line represents y=x, the ideal case of reliability in which all values at test 1 are equal to all values at test 2.

Neurophysiological measures: corticospinal excitability, output and inhibition

In general, TMS metrics were more reliable for the muscle with the higher MRC grade than the muscle with the lower MRC grade. Specifically, as shown in Figure 2, muscles with the higher MRC grade showed significant reliability for corticospinal excitability (AMT; ρ=0.883, P=0.008; CCC=0.906), whereas AMT for muscles with the lower MRC grade was inconclusive, where significance was only evident for CCC (CCC=0.935). Similarly, AMEP was only reliable for muscles with the higher MRC grade (ρ=0.919, P=0.003; CCC=0.647; Figure 2). RCSlope (ρ=0.750, P=0.052; CCC=0.943) and RCAUC (ρ=0.929, P=0.003; CCC=0.661) too were only observed to be reliable for muscles with the higher MRC grade (Figure 3). Again, corticospinal inhibition (CSP duration) was only reliable for the muscle with the higher MRC grade (ρ=0.893, P=0.007; CCC=0.884; Figure 4).

Figure 2
figure 2

Test–retest reliability of corticospinal excitability (AMT) and output (AMEP) in patients with iSCI. We noted significant reliability for the AMT in the higher MRC grade muscles in comparison with lower MRC grade muscles. For AMEP measurements, significant reliability was found only for the higher MRC grade muscles. Illustrative representation of AMEPs (right) collected from higher and lower MRC grade muscles. We noted more consistent amplitudes from higher MRC grade muscles in comparison with lower MRC grade muscles. ρ values and associated P-values for SCCs are displayed for each plot. Bold ρ and P-values represent significant reliability. The CCCs are also displayed for each plot. Bold values for CCCs represent substantial reliability based on guidelines (CCC>0.6). Open-circled plots denote metrics that were non-reliable. Hatched circle plots denote metrics that showed reliability with either CCC or SCC. The line represents y=x, the ideal case of reliability in which all values at test 1 are equal to all values at test 2. AMT is plotted as a percentage maximum stimulator output (MSO).

Figure 3
figure 3

Test–retest reliability of corticospinal output (RC metrics) in patients with iSCI. Representative examples of RCs obtained from higher and lower MRC grade muscles. RCs from higher MRC grade muscles largely maintained similar topographies, allowing for reproducibility in slope and AUC (top). In contrast, RCs from lower MRC grade muscles were more variable (lower).

Figure 4
figure 4

Test–retest reliability of cortical inhibition (CSP) in patients with iSCI. CSP duration was in general more reliable for higher MRC grade muscles (a) than for lower MRC grade muscles (b). Representative examples (bottom) of CSP test–retest reliability. ρ values and associated P-values for SCCs are displayed for each plot. Bold ρ and P-values represent significant reliability. The CCCs are also displayed for each plot. Bold values for CCC represent substantial reliability based on guidelines (CCC>0.6). Open-circled plots denote metrics that were non-reliable. Hatched circle plots denote metrics that showed reliability with either CCC or SCC. The line in a and b is for the function y=x, representing ideal reliability.

Neurophysiological measures: motor map output and spatial distribution

Overall, we observed that motor map output showed poor reliability for both, muscles with a higher MRC grade and muscles with a lower MRC grade (Figure 5). However, motor map distribution was reliable, particularly for muscles with the higher MRC grade. Specifically, we noted that the center of gravity (CoG) for the muscle with the higher MRC grade showed moderate reliability for both the medio-lateral (CoGX; ρ=0.821, P=0.023, CCC=0.579) and antero-posterior coordinates (CoGY; ρ=0.536, P=0.215, CCC=0.644; Figure 5).

Figure 5
figure 5

Representative test–retest reliability in motor map location and distribution motor maps in patients with iSCI. The number of sites eliciting a muscle response (map area) changed from 13 to 15 for the higher MRC grade muscles and from 13 to 10 for the lower MRC grade muscles. The CoG did not shift significantly across the x-axis for the higher MRC grade muscle, but visible shifts can be seen along the y-axis. M-MEP denotes the maximum MEP. A full color version of this figure is available at the Spinal Cord journal online.

Effect of MRC grade and patient demographics on reliability

In general, we found that test–retest reliability in general was better for muscles with the higher MRC grade (Figure 6). Similarly, we observed that functional outcome measures were more reliable on the less affected side of the body. Categories of metrics that were variable, regardless of the MRC grade, were corticospinal output (AMEP and RCs) and motor map output, where the average percent difference between test 1 and test 2 ranged between 17.9 and 489.2% (corticospinal output) and between 12.4 and 276.6% (motor map output). High variability of these variables (Figure 6) was highly influenced by patient #7 (Figures 7a and b). Patient #8 was also found to have high variability in motor map output for lower MRC grade muscles. Besides these two patients, however, reliability values did not vary remarkably among the rest of the cohort.

Figure 6
figure 6

Between-muscle analysis of test–retest reliability in patients with iSCI. In general, we noted that higher MRC grade muscles and less affected sides of the body showed more test–retest reliability across patients with iSCI. Metrics that were most un-reliable for either muscle subset were CST output (AMEP and RCs) or motor map (MM) metrics. Values are plotted as an average±s.e.m. A full color version of this figure is available at the Spinal Cord journal online.

Figure 7
figure 7

Test–retest reliability of outcome measures across patients with iSCI. The percent difference between test 1 and test 2 for outcome measures was assessed across all patients. Overall, we noted that Patient #7 demonstrated the highest variability, particularly in corticospinal tract output for both higher (a) and lower (b) MRC grade muscles. High variability in motor map output was also noted for Patient #8 in their lower MRC grade muscles. m.p.i., months post injury. A full color version of this figure is available at the Spinal Cord journal online.

A correlation analysis suggested that patients with greater functional deficits, as noted by a lower UEMS score, demonstrated more variability in functional metrics collected on the less affected side of the body (ρ=−0.821; P=0.02). Patients with greater functional deficits also demonstrated more variability in motor map distribution for the muscle with the lower MRC grade (ρ=−0.543; P=0.26). In addition, patients who were at a more chronic stage post injury showed greater variability for motor map output of muscles with higher MRC grades (ρ=0.571; P=0.18) but reduced variability for motor map output of muscles with lower MRC grades (ρ=−0.657, P=0.15).

Discussion

The main goal of the present study was to determine whether neurophysiological metrics collected in the upper limb using TMS were reliable in patients with chronic cervical iSCI. We have found that TMS metrics were significantly more reliable in muscles with a higher MRC grade than in muscles with a lower MRC grade. TMS metrics that showed the poorest reliability, particularly for muscles with a lower MRC grade, were corticospinal output (AMEP, RCSlope, RCAUC) and motor map output (area, volume; Figure 6). Our results suggest that variability was influenced by factors such as the baseline UEMS score and disease chronicity, wherein patients who were weaker and many months post injury exhibited greatest variability of TMS metrics. On the basis of our observations, functional outcomes collected on the less affected side of the body and TMS metrics captured in muscles with a higher MRC grade could act as reliable measures when assessing longitudinal functional recovery in iSCI. However, TMS metrics of corticospinal excitability, corticospinal inhibition or motor map distribution that were also found to show relatively good reliability could still prove as useful indices to track recovery of weaker muscles in the iSCI population

Our finding that muscles with a lower MRC grade are not as reliable as muscles with a higher MRC grade is not surprising and can be understood in the context of neurophysiology and long-standing sequel of injury. Muscles with lower MRC grade are typically innervated by cervical levels caudal to the injury; hence, they are more affected in comparison with stronger muscles that are typically supplied by levels rostral to injury.40 Axonal sparing below the level of the injury is substantially reduced compared with regions innervated rostral to the epicenter.40 Level of axonal sparing reduces further with greater severity of iSCI.41, 42 As TMS-triggered volleys travel from the brain to the spinal cord, it is possible that the level of axonal sparing strongly influences reliability of metrics, defining output for weaker muscles below injury. Reliability of metrics for muscles with a lower MRC grade could also be affected by long-term plasticity. For example, maladaptive plasticity can occur at the level of the spinal cord in association with pain and peripheral inflammation, which can directly reduce motor function recovery in areas caudal to the lesion.43, 44 Thus, regardless of axonal sparing, heightened levels of pain-associated plasticity could lead to reduced reliability of metrics, indexing output to muscles weakened below injury.

Other physiologic reasons may have also contributed to weak reliability of TMS metrics for muscles with lower MRC grade. All patients were tested on at least 1 proximal muscle, and two patients were only tested with proximal muscles. TMS metrics for proximal muscles are poorly reliable even in the healthy.38 This is because representations of proximal muscles occupy a smaller region of the motor cortices in comparison with representations for the more commonly studied distal muscles.45 Retrograde degeneration following iSCI could have also resulted in variability in TMS metrics.46, 47, 48, 49

Our finding that motor map metrics for both muscles with higher and lower MRC grades were poorly reliable is surprising (Figure 5). However, these findings can be understood, considering that motor maps undergo prompt changes within <24 hours after injury and show constant movement for the remainder of the life span.50 Specifically, anterior–posterior re-mapping can occur in the motor cortex in as little as a few weeks after an injury.11, 46, 51, 52, 53, 54, 55, 56, 57 Corticospinal tracts from M1 undergo plastic changes and ultimately connect to sensory regions located more posteriorly, such as the somatosensory cortex (S1).55, 56 Thus, it is possible that fluctuating cortical re-mapping in the y-plane following iSCI could have contributed to our observation of a more variable CoGY (Figure 5). Further, cortical representations of muscles rostral and caudal to the lesion drastically shift after injury,34, 58, 59 wherein cortical representations of stronger muscles expand and overtake regions of the brain representing the now weaker muscles. As a result, maps for lower MRC grade muscles are diminished and under-represented and hence are likely less stable than maps of higher MRC grade muscles that are enlarged after injury.

Several patient-related factors also introduce variability in TMS metrics. We found that patients with more functional impairments demonstrate more variability. Further, patients with greater post-injury duration show a trend toward more variability. Impairment and chronicity likely introduce neurophysiologic changes as discussed above that reduce reliability.60 Patients with greater motor impairment (AIS B) and greater chronicity patients (>90 months post injury) should thus be enrolled with caution. We also noted that patients taking diazepam (patient #7) and gabapentin (patient #3, #5 and #8) show substantially high variability (Figure 7). Pharmacologic agents are known to negatively affect TMS reliability.61, 62, 63, 64, 65, 66, 67 Given this potential confound, it is understandable that intake of neuro-active medications may have also influenced test–retest reliability.

Regardless of factors affecting reliability, it becomes important to understand which variables can be adopted to reliably study functional recovery in SCI. Although measures dependent on MEP size are generally variable—for instance, AMEP, motor map area/volume, RCAUC and RCSLOPE—variables that are not directly dependent on MEP sizes or derived from them are more reliable. MEP size and its derivations are understandably less reliable because physiological oscillations in excitability around a fluctuating critical firing level or inhibitory factors such as intra- or inter-cortical inhibition could influence size of the descending volley.68, 69, 70 In contrast, AMT and other metrics such as CSP and CoG likely show fair-to-excellent reliability for muscles with a lower MRC grade because they are less influenced by variations in size of the MEP. Therefore, although most TMS metrics would be reliable for study of muscles with a higher MRC grade, our results suggest that AMT, CSP and CoG may be helpful in tracking pre-to-post changes in recovery in muscles with a low MRC grade. To further improve reliability, TMS metrics should be collected during a state of slight voluntary contraction in lower MRC grade muscles.71 An active state can also facilitate measurement of neurophysiology, as patients with SCI present with a severe loss of corticospinal output.72 Finally, although not the scope of this study, we acknowledge that the use of paired-pulse TMS techniques may also improve reliability, particularly in lower MRC grade muscles. Paired-pulse techniques when used at specific frequencies provide the means of delivering a more intense stimulus to elicit an MEP and consequently have been shown to increase MEP output.73, 74 Thus, in lower MRC grade muscles where the corticospinal pathways are inherently weak, the use of paired-pulse techniques may prove advantageous, as it can provide a strong enough stimulus to reliably evoke a muscle potential.

Finally, in the present study, we have confirmed that functional outcome measures often used in iSCI rehabilitation are reliable24, 25, 75(Figure 1). This observation was critical. We included study of functional metrics to understand how reliability of TMS metrics compares with their reliability in the same cohort of patients. Overall, we have found that reliability of TMS metrics in muscles with a higher MRC grade is comparable with reliability observed in functional outcome measures. For example, reliability of corticospinal excitability in muscles with a higher MRC grade (SCC=0.883; CCC=0.906) was comparable to UEMS reliability (SCC=0.899; CCC=0.964). Similar observations were noted for TMS metrics of corticospinal inhibition and corticospinal output in muscles with a higher MRC grade. Thus, our results suggest that certain TMS metrics show similar reliability to functional outcome measures and have the potential to be translated into clinical measures for SCI.

Study limitations

There are several limitations in our study that must be recognized. First, our test–retest analysis included a small number of subjects. Thus, future studies using a larger cohort of subjects would still need to confirm our findings. Second, as outlined in Table 1, all enrolled patients were taking several pain or anti-spasticity medications during our study. Such neuro-active medications have been shown to directly affect the reliability of TMS metrics.65 Thus, even though we required patients to maintain their medication regime, which is also similar to previous studies in TMS and iSCI,31 we cannot discount that the heterogeneity in medication regimes could have influenced our results.

Conclusions

In summary, our findings suggest that test–retest reliability of TMS metrics in iSCI is dependent on the MRC grade of the investigated muscle, baseline motor function and the amount of time post SCI. As a result, we suggest that TMS metrics captured in higher MRC grade muscles will likely be the most reliable measures when assessing longitudinal functional recovery and recovery potential. However, given their relatively good reliability, TMS metrics of corticospinal excitability, corticospinal inhibition or motor map distribution found in lower MRC grade muscle could still prove as useful tools in the iSCI population. Therefore, future studies using TMS metrics to longitudinally assess recovery should be cautious, as choice of muscle, TMS metrics and pre-existing patient demographics may influence reliability-measured outcomes.