Introduction

Assessing TED and the role of imaging

A wide array of imaging techniques, summarised in Table 1, have been proposed to aid in the diagnosis and management of thyroid eye disease (TED) [1]. While most TED patients may not require any imaging, it can be necessary for assessment of orbital apex crowding affecting the optic nerve, planning decompression surgery and to exclude other orbital diseases. This is particularly so with asymmetrical presentations when myositis, lymphoma, carotico-cavernous fistulae, varices, IgG4 disease, meningiomas and other malignancies should be considered [2].

Table 1 Imaging modalities for thyroid orbitopathy and their characteristics

A further and emerging role for orbital imaging is to help guide and determine response to treatment. Several principally magnetic resonance imaging (MRI)-based methods have been developed in an effort to quantify disease activity objectively (see Table 2). However, they have drawbacks, including confounding by surrounding tissue effects, the use of semi-quantitative relative values with a lack of day-to-day reproducibility and complexity of sequences required to derive absolute values.

Table 2 MRI-based techniques used for TED assessment

Given these limitations, clinicians still rely principally on subjective scoring systems to quantify disease severity and activity clinically; a widely used tool is the clinical activity score (CAS) [3], but it has been criticised as too binary, subjective and lacking in sensitivity [4]. It is also poorly designed for monitoring of improvement; [5] even marked amelioration of any component of the CAS will not improve the actual score unless the feature resolves completely [6]. Not surprisingly, more complex scoring systems have been introduced such as the VISA [7], modified NOSPECS [8] and EUGOGO (see www.eugogo.eu) scoring tools. But fundamentally they remain subjective and are cumbersome to administer [7].

Other proposed methods for assessing disease progression include serial binocular single vision mapping, measurements of proptosis [9] or Hess charts [10]. But these may not reflect activity or severity dynamically as the active inflammatory phase of TED can be followed by a chronic scarring phase with a static effect on ocular motility. Serial thyroid-stimulating hormone receptor antibody levels have been postulated to provide a further objective and quantitative means by which to assess thyroid eye disease activity but have not yet been widely adopted [11].

Proposed quantification of TED by modified MRI protocol

There has been great interest in using MRI to quantify disease activity, severity and response to treatment in TED. Some of these are summarised in Table 2. However, there is inherent variability in MRI measurement, influenced by technical factors such as variability in scanners, sequences used and physiological factors such as changes over time (from diurnal to age related) or change in disease activity. Signal intensities seen on images are not usually directly proportional to specific tissue properties such as density or molecular composition. This variability in measured MR signal values makes them unsuitable for quantifiable comparison between scans—even within an individual—unless sequences are specifically designed for quantification. To get around this, groups have used signal intensity ratios (SIRs) from standard clinical images such as those generated from STIR (short tau inversion recovery) sequences. These are T2-weighted fat-suppressed images in which higher intensities reflect higher water content. EOM intensity is compared with other unaffected muscles or even non-muscular tissues [12,13,14], but such ratios are less sensitive. Mathematically, small variabilities can be amplified to significantly alter these ratios.

Though the MR signal can be variable, the decay rate (or relaxation time) of the signal is not, being a physical property of the tissue that can be expressed numerically. T2 (or transverse)-relaxation times depend strongly on water content of tissue and longer T2-relaxation times correlate with greater EOM inflammation [15, 16]. However, signal from fat can confound these quantitative measurements as these also represent a variable combination of water and fat.

Fat fraction (FF) mapping is a method that takes advantage of the slightly different resonant frequencies of water and fat to generate images that separate the contribution of water and fat to the signal, allowing quantitative measurements of the FF within a tissue [17, 18].

In this study, we sought to compare T2-relaxation mapping to STIR SIR and, for the first time, to calculate FFs for EOMs. The combination of FF data with T2-relaxation times may refine signal quantification and could then represent a robust quantitative measure of disease activity.

Methods

Between 2014 and 2016, 62 patients with TED and six healthy controls were recruited for this study which had received ethical approval from our hospital internal ethics review board.

All patients received a full ophthalmological work-up, including determination of CAS and blood tests. MRI of the orbits was performed on a 1.5 T MRI scanner (450, GE Healthcare, Waukesha, WI, USA) including coronal STIR, T2-weighted multi-echo fast-spin echo and multi-echo fast-gradient echo sequences [18] (Fig. 1). The sequences for T2 relaxation and FF mapping took an additional 10 min in the scanner to acquire.

Fig. 1
figure 1

Images of the orbits of a patient with thyroid eye disease. a Coronal short tau inversion recovery (STIR): T2-weighted fat-suppressed image showing enlargement and heterogeneous hyperintensity of extra-ocular muscles (arrowheads). b Coronal T1-weighted image: areas of T1 hyperintensity within the muscles are consistent with fatty change (arrowheads). c T2-relaxation map shows areas of longer T2-relaxation (higher intensity) in the right inferior rectus muscle (arrow) compared to the left (arrowhead). d Fat fraction map shows higher fat fraction (higher intensity) in the left inferior rectus muscle (arrowhead) compared to the right (arrow)

Twelve patients who required active management with immunosuppression with cyclosporine and prednisolone were re-scanned following their treatment.

STIR SIRs, T2-relaxation times and percentage FF were measured by a neuroradiologist with more than 7 years’ experience, blinded to the clinical data, from regions of interest in the muscle belly of inferior, lateral, superior and medial rectus muscles bilaterally in all individuals. Thereafter, Pearson’s correlation between T2 and SIR was calculated for each individual muscle. Unpaired T tests were performed to investigate if T2, SIR and FF were different between the thyroid eye disease patients and normal. Paired T tests were performed to investigate if T2 and SIR were different in patients receiving immunosuppressive treatment. For FF comparisons, to meet the normality assumptions for standard parametric tests, the distributions were log transformed. Statistical analysis was performed in R (The R foundation for statistical computing, Vienna, Austria).

Results

We initially investigated if the T2-relaxation data correlated with SIR. As shown in Fig. 2a, for both TED and normal groups, there was a positive correlation between T2 and SIR (r = 0.43, p < 0.001).

Fig. 2
figure 2

a Scatter plot showing a positive correlation between T2-relaxation times and STIR SIR in extra-ocular muscles in TED patients and healthy controls (r = 0.42 in TED patients). However, the groups are poorly separated across the SIR axis and better separated by T2-relaxation time. b Comparison of mean( ±SD) T2-relaxation times for individual extra-ocular muscles across the TED and control groups. T2-relaxation time is higher in the TED patients for each extra-ocular muscle. c Comparison of mean( ±SD) SIR for individual extra-ocular muscles across the TED and control groups. There was no statistically significant difference in mean SIR of any of the measured extra-ocular muscles between the two groups. d Boxplot showing median and range of fat fractions for each extra-ocular muscle compared between TED and control groups. There were significant differences detected in some of the extra-ocular muscles. Mean FF for all muscles as a group was significantly different between TED and control groups (p < 0.001). (R = right, L = left, MR = medial Rectus, IR = inferior rectus, SR = superior rectus, LR = lateral rectus. For p values: ***p < 0.001, **p < 0.01, *p < 0.05, ns non-significant.)

Mean T2 for each rectus muscle category differed significantly between the TED and normal groups (combined mean T2 TED: 93.92 ± 12.80 ms (95% CI: 92.74–95.11); normal: 75.30 ± 7.96 ms (95% CI: 73.20–78.25), p < 0.001; Fig. 2b). However, there was no significant difference in mean SIR (combined mean SIR TED: 1.71 ± 0.32 (95% CI: 1.68–1.74); normal: 1.74 ±  0.25 (95% CI: 1.66–1.82), p = 0.52; Fig. 2c).

Overall mean FF was significantly greater in TED patients than in healthy volunteers (mean FF TED: 13.98 ± 7.0% (95% CI: 13.33–14.64), normal: 9.27 ± 3.14% (95% CI: 8.38–10.15], p < 0.001). There was substantial variability in the mean FF for each muscle in the patient group (Fig. 2d).

As shown in Fig. 3, there was a significant difference in mean T2 as well as SIR between pre- and post-treatment examinations in 12 TED patients receiving treatment with immunosuppression (mean difference T2: 12.05 ± 17.79 ms, p < 0.001, mean difference SIR: 0.26 ± 0.56, p = 0.001). On an individual basis (Fig. 3c), a reduction in muscle T2 was related to a reduction in CAS in patients.

Fig. 3
figure 3

The mean T2-relaxation time (a) and SIR (b) in extra-ocular muscles of pre-treatment baseline examinations was higher than on post-treatment examinations (error bars–standard deviation; a ***p < 0.001; b *p = 0.04). c shows bubble plots of 12 individual cases with pre- and post-treatment examinations. Serial examinations are represented on the x-axis in arbitrary units of time. Maximal T2-relaxation times are plotted in the y-axis with CAS represented by the area of the bubbles (annotated for each data point). In most cases, T2-relaxation time for the worst affected muscle correlated with CAS, with both reducing following treatment. However, in cases 5, 7 and 9 (highlighted by darker shade), there is initially an increase in T2-relaxation time despite a decrease in recorded CAS. In case 8, there is initially a decrease in T2-relaxation time before a decrease in CAS is detected (see text for further discussion)

Discussion

In this feasibility study, we sought to evaluate if existing MRI protocols can measure muscle inflammation and disease activity in TED patients. We show that measurement of T2-relaxation time is a more robust method for assessing disease activity than STIR SIR. As fat is implicated in the disease process in TED and is a potential confounder to the measurement of T2-relaxation times, we also describe the novel measurement of FF in EOMs.

T2-relaxation times could clearly differentiate abnormal from normal populations of EOMs, as well as pre-treatment from post-treatment status, in turn correlating with CAS. The correlation between T2-relaxation time and CAS as well as post-treatment status has been investigated previously [15, 16]. However, the significant difference demonstrated in this study between TED patients and controls allows normative values to be considered. T2-relaxation time appears to be a reliable measure of disease activity, allowing the assessment of individual muscles.

Although T2-relaxation time does correlate with SIR, which has also previously been shown to correlate with CAS and post-treatment status [12, 19, 20], the lack of a statistically significant difference in SIR between TED patients and controls at the level of individual muscles is striking and suggests that SIR is less discriminatory and unsuitable for quantitative assessment of disease activity. SIR measurements suffer from the need to choose an appropriate but arbitrary denominator, such as signal intensity from temporalis muscle or normal appearing white matter on the same image as the measured EOM. T2-relaxation time, as an estimate of an absolute value, is independent of machine and sequence-specific parameters that may affect SIR measurements and a more robust measure for application in longitudinal and cross-sectional comparisons.

In the temporal correlation between maximal T2-relaxation times and CAS (Fig. 3c), there were at least three cases with contradictory assessments (highlighted as darker shade). In each of these, there is a decrease in CAS despite an apparent increase in T2-relaxation time that suggested increased disease activity. Interestingly, clinical follow-up supported the radiological assessments, suggesting that T2 relaxation is a more sensitive marker of inflammation and more dynamic assessment of change. Specifically, in case 5, MRI picked up disease activity before it was reflected in the CAS. In case 7, the MRI results are seen to contradict the high initial CAS—however, this was a falsely positive high CAS due to anterior segment inflammation, not due to TED. Finally, in case 9, the MRI detected myositis before it became apparent on the CAS. In addition, case 8 in the series demonstrates an overall correlation between T2 and CAS. However, the decrease in T2 is measurable prior to the decrease in CAS (when this was still 7) and only later reduced to 2. As discussed earlier, this is because of the lack of sensitivity of the CAS to change—there is only a reduction in the score when signs or symptoms are entirely resolved, not when there is a substantial improvement with residual symptoms.

This study demonstrates the feasibility of measuring FF in EOMs, thereby quantifying the relative contribution of fat and water to the T2 signal. The increased FF within EOMs in TED patients is consistent with the suggestion that there is fatty infiltration and/or adipogenesis within the muscles in the course of the disease [21, 22]. Interestingly, the data also demonstrate that EOMs in healthy volunteers contain a measurable proportion of fat. Ultimately, higher FFs may reflect chronicity of the disease. Thus, the ability to differentiate between the contribution played by water and hyaluronic acid (presumed to equate to inflammation) and fat (presumed to equate to chronicity) may help guide decisions regarding treatment.

The limitations of this approach include the manual placement of regions of interest within a single slice through each EOM. Superior and lateral rectus muscles have smaller cross-sectional areas, the superior rectus muscle is not easily separated from levator palpabrae superioris and lateral rectus lies at an oblique angle to the coronal plane in which images are acquired. The regions of interest could also include voxels at the edges of muscle, where there may be artefact related to the interface between muscle and orbital fat. These factors may account for some of the variability in measurements of FF in both patients and controls. However, the variability is also consistent with the asymmetric and heterogeneous nature of the disease. There may also be an unknown effect of age as the control population are not age matched. Furthermore, the patient population was unselected in terms of clinical severity and chronicity.

In conclusion, quantitative T2-relaxation time measurements of EOMs in TED correlate with CAS and MRI SIR measurements. However, T2-relaxation measurements better discriminate abnormal from normal populations. Although this is an attractive objective measure of disease activity, it may be confounded by the presence of fat within muscles. We address this by concurrent measurement of FFs such that this combined MR approach may allow for improved quantification of disease activity and for monitoring within individual patients for treatment effect. Our physiological assessments of inflammation occasionally contradicted CAS scores, raising questions about the validity of clinical scoring of disease activity.