Introduction

Cortical grey matter has a profound role in the pathophysiology of multiple sclerosis, being present in early stages of the disease and being associated with disease progression and conversion1,2,3,4. Cortical lesions are highly disease specific, and hence included in the multiple sclerosis diagnostic criteria5. Nevertheless, cortical lesions are inconspicuous on MRI: histopathological sensitivity of conventional clinical sequences (i.e., T1, T2, and fluid-attenuated inversion recovery (FLAIR)) lies beneath 10%6,7,8. Double inversion recovery (DIR) has been found to be more sensitive (i.e., 18–23%), and highly pathologically specific (i.e., > 90%). With DIR imaging, signals from both the cerebrospinal fluid and the white matter are selectively suppressed, providing an image in which the grey matter is emphasized9. However, DIR sequences are not pre-programmed on all MR systems and are time-consuming in their acquisition. Consequently, they are not readily available in all hospitals and are omitted in most clinical trials.

This issue has recently been mitigated by the introduction of artificially generated DIR images10,11. This technology, that has been used in different settings in medical imaging before12,13, aids the availability of DIR in the clinical settings and enables ex post facto implementation of DIR in e.g. clinical trials cohorts. Aided by convolutional neural networks, DIR images can be generated from a combination of other –e.g. conventional– sequences, such as T1-weighted and PD/T2 or T1-weighted and FLAIR. Artificially generated DIR images were found to detect an equal number of cortical lesions compared to conventionally acquired DIR images; however, histopathological validation is lacking to date10,11.

The objective of the current work was to compare the sensitivity and specificity of artificially generated DIR images for cortical lesion detection in patients with multiple sclerosis, set against conventionally acquired DIR images and conventional (‘standard’) 3D-T1-weighted images, using histopathology as gold standard. Thereby, we aim to assess whether artificially generated DIR images could potentially supplement conventional clinical images in hospitals and clinical trials.

Results

Following in situ MRI and tissue availability, MRI data from thirty-eight patients (mean age 63.3 years, standard deviation (SD) 10.7 years; 31% male; mean postmortem delay at commencement of autopsy protocol 3:55 h SD 0:57 h) with progressive multiple sclerosis (mean disease duration 26.6 years, SD 10.9 years; 25 secondary progressive, 8 primary progressive, 5 unknown) were included for cortical lesion scoring. Thirteen patients were scanned using a 1.5 T Siemens Sonata system, twenty-five patients were scanned using a 3 T GE Signa HDxt system. From twenty-three patients, tissue samples were available. An overview of patient details is provided in Table 1. Overviews of the imaging training results (i.e., input and output) are provided in Figures S1 and S2. A total of 142 cortical lesions were identified in 66 tissue samples. Of these lesions, 6 were type I, 47 were type II, 76 were type III and 13 were type IV. In addition, 7 white matter lesions were detected.

Table 1 Demographic and clinical measures.

Histopathology-blinded lesion detection

The number of detected cortical lesions and the sensitivity measures corresponding to those numbers are displayed in Table 2. No statistical differences were found between conventional DIR sequences and artificial DIR sequences (both generated from T1-PD/T2 and T1-FLAIR) and 3D-T1 in the histopathology-blinded scoring. Descriptively, conventional DIR detected few more cortical lesions than T1-PD/T2 generated DIR, with 25 versus 19 cortical lesions, and also few more cortical lesions than T1-FLAIR generated DIR, with 25 versus 20 cortical lesions. The largest differences were found in type III lesions, of which 12 were detected on conventional DIR and 3 on 3D-T1. In Fig. 1, an example of a cortical lesion that was scored blinded to histopathology is depicted. The difference between conventional DIR and 3D-T1 closely approached, but fell short of significance (P = 0.07).

Table 2 Lesion count (sensitivity in %) MRI scoring.
Figure 1
figure 1

Available at https://adobe.com/products/illustrator).

Example of a type IV cortical lesion in the insular cortex, that is a priori visible on both artificially generated and conventionally acquired DIR images but not on standard 3D-T1. (A) Coronal slice in which the histopathological sample is indicated by the rectangle; the arrow indicates a visible type IV lesion. (B) Histopathological tissue sample with the type IV lesion indicated by the red line and the cortex by the black line. (C) Conventionally acquired DIR image with the type IV lesion indicated by the arrow; the rectangle demarcates the inset that is being shown in images D-F. (D) Inset of the corresponding artificial DIR image generated from 3D-T1 and 3D-FLAIR with the type IV lesion indicated by the arrow. (E) Inset of the corresponding artificial DIR image generated from 3D-T1 and PD/T2, with the type IV lesion indicated by the arrow. (F) Inset of the corresponding standard 3D-T1 image, in which the type IV lesion is not visible (generated using Adobe Illustrator—Adobe Inc., 2019. Adobe Illustrator,

Histopathology-blinded intra-rater intraclass correlation coefficients were 0.986 for conventional DIR, 0.974 for T1-PD/T2 generated DIR, 0.997 for T1-FLAIR-generated DIR, and 0.979 for 3D-T1. Histopathology-blinded inter-rater intraclass correlation coefficients were 0.951 for conventional DIR, 0.946 for T1-PD/T2 generated DIR, 0.941 for T1-FLAIR generated DIR and 0.800 for 3D-T1.

A total of eight false positives (areas that were scored as cortical lesions but did not appear to be lesions upon histopathological validation) were found in the data: 1 out of 25 on conventional DIR (4.0%), 2 out of 19 on T1-PD/T2 generated DIR (10.5%), 2 out of 20 on T1-FLAIR generated DIR (10.0%) and 3 out of 15 on 3D-T1 (20.0%). Specificity was 96.0% for conventional DIR, 89.5% for T1-PD/T2 generated DIR, 90.0% for T1-FLAIR generated DIR, and 80.0% for 3D-T1. An example of a partially demyelinated/remyelinating area that was scored as cortical lesion on both artificially generated and conventionally acquired DIR images is shown in Fig. 2.

Figure 2
figure 2

Available at https://adobe.com/products/illustrator).

False-positive. Example of a ‘false-positive’, an area of partial demyelination or remyelination that has been scored as a cortical lesion on all (conventional and artificial) DIR sequences. The panels depict the conventionally acquired DIR image in axial fashion (A), a coronal image as a reference for the histopathological sample (B), the histopathological sample that has been stained for myelin (C), 3D-T1 (D), artificial DIR generated from 3D-T1 and PD/T2 (E), and artificial DIR generated from 3D-T1 and FLAIR (F). The partial demyelination/remyelination area is indicated by the red arrow on all panels (generated using Adobe Illustrator—Adobe Inc., 2019. Adobe Illustrator,

Lesion discernibility unblinded to histopathology

Prior to commencing the unblinded scoring, the reproducibility of MRI-histopathology matching (as displayed in Fig. 3) was evaluated by repeating the manual identification of MRI regions for 34% (n = 27 samples) of the data, and showed a reproducibility of 92.6%. In the unblinded scoring, i.e. direct comparison of the tissue samples matched to MRI, an increase in detection was found for all sequences. Conventionally acquired DIR outperformed artificial DIR from T1 and PD/T2 (P = 0.02) and 3D-T1 (P = 0.01) with 70 vs. 50 and 49 lesions, respectively. These differences were predominantly driven by type III lesions. With histopathological feedback available, there were no differences between conventionally acquired DIR and artificial DIR from T1 and FLAIR, nor between the other sequences. There were no differences in terms of specificity of the images that were generated from 1.5 T and 3 T data.

Unblinded intra-rater intraclass correlation coefficients were 0.929 for conventional DIR, 0.960 for T1-FLAIR generated DIR, 0.940 for T1-PD/T2 generated DIR, and 0.710 for 3D-T1. Unblinded inter-rater intraclass correlation coefficients were 0.909 for conventional DIR, 0.800 for T1-PD/T2 generated DIR, 0.750 for T1-FLAIR generated DIR and 0.833 for 3D-T1.

Figure 3
figure 3

Available at https://adobe.com/products/illustrator).

Overview of the histopathological tissue sample to MRI matching procedure. (A) The coronal double inversion recovery (DIR) MRI is matched to (B) a photograph of the tissue sample. Then, the coronal DIR MRI (C) is matched to the histopathological sample (D) coming from the tissue sample. Next, the cross-hair is used to translate the histopathological tissue sample location from the coronal DIR (E) to the axial DIR (F); (generated using Adobe Illustrator—Adobe Inc., 2019. Adobe Illustrator,

Image contrasts

Contrast ratios of histopathologically validated cortical lesions to surrounding normal appearing grey matter and grey matter to white matter for the included MRI sequences are described in Table 3. Contrast ratios for cortical lesions to normal-appearing grey matter were 0.19 for conventionally acquired DIR, 0.03 for artificially generated DIR from 3D-T1 and PD/T2, 0.04 for artificially generated DIR from 3D-T1 and FLAIR, and 0.02 for 3D-T1. For cortical lesions to normal appearing grey matter contrast ratios, differences between sequences were statistically significant for conventional DIR vs. both variants of artificial DIR (P = 0.003 for DIR from T1 and FLAIR, and P = 0.006 for DIR from T1 and PD/T2) and for conventional DIR vs. 3D-T1 (P = 0.018). There were no further significant differences between sequences.

Table 3 Contrast ratios.

Contrast ratios for normal-appearing grey matter to white matter were 3.81 for conventionally acquired DIR, 0.48 for artificially generated DIR from 3D-T1 and PD/T2, 0.58 for artificially generated DIR from 3D-T1 and FLAIR, and 0.10 for 3D-T1. Grey matter to white matter contrast ratio differences were significant between conventional DIR and both variants of artificially generated DIR (P = 0.001 for T1-FLAIR DIR and P = 0.001 for T1-PD/T2 DIR) as well as conventional DIR vs. standard 3D-T1 (P = 0.001). Differences in grey matter to white matter contrast ratios were also significant for both variants of artificially generated DIR vs. standard 3D-T1 (P = 0.001 for T1-FLAIR DIR and P = 0.001 for T1-PD/T2 DIR).

Common artefacts

Although the artificially generated DIR images were less noisy than the conventionally acquired DIR images, some artefacts were detected in the data. Detected artefacts included ‘rim-artefacts’ (i.e. a hyperintense rim-forming in the cortex), ‘point-artefacts’ (i.e., black pixels throughout the image; potentially arising from zero-values in the 3D-T1 input images) and ‘patch-artefacts’ (i.e. subtle lines remaining from patch-wise augmentation of the images), which were detected in data from different MR systems and different ground contrasts. Rim-forming artefacts were visible in all artificial DIR images. Point-artefacts were detected in both variants of the artificially generated DIR images (3D-T1 and PD/T2 and 3D-T1 and 3D-FLAIR) which had their ground contrasts acquired on the GE MR system, whereas patch-artefacts were visible in artificially generated DIR from 3D-T1 and PD/T2 but not 3D-T1 and 3D-FLAIR from the Avanto system. In all artificially generated DIR images, the cortex showed constant but subtle hyperintensity, making it more difficult to distinguish between the different cortical layers. The artefacts that were noted are shown in Fig. 4.

Figure 4
figure 4

Available at https://adobe.com/products/illustrator).

Artefacts in artificially generated images. (A) Artificial double inversion recovery (DIR) image generated from 3D-T1 and PD/T2 in which the arrows indicate ‘point artefacts’ and the chevrons indicate ‘rim artefacts’ in the cortex, which are also visible on artificial DIR images generated from 3D-T1 and fluid-attenuated inversion recovery (FLAIR). (C) Conventionally acquired counterpart of the artificially generated images in (A) and (B). (D) ‘Patch artefact’ in artificial DIR image generated from 3D-T1 and PD/T2, indicated by the arrows. (E) Corresponding artificial DIR generated from 3D-T1 and FLAIR and (F) corresponding conventionally acquired DIR (generated using Adobe Illustrator—Adobe Inc., 2019. Adobe Illustrator,

Discussion

Cortical multiple sclerosis lesions are better discernible on double inversion recovery images, but these are often unavailable. We have evaluated the sensitivity and histopathological specificity of artificially generated double inversion recovery images in clinically and pathologically confirmed patients with multiple sclerosis. With a combined postmortem in situ brain MRI and histopathology approach, we have found no differences in histopathology-blinded lesion detection between conventional DIR images and two variants of artificially generated DIR images. This implicates that artificially generated DIR images could substitute conventional DIR images when unavailable or in case of limited acquisition time.

Vis-à-vis comparison of histopathology-blinded and unblinded detection rates showed that a priori detection of cortical lesions could be improved on all sequences, although most on artificial DIR images generated from 3D-T1 and 3D-FLAIR. Differences between the two variants of artificially generated DIR images in cortical lesion detection potential could be attributed to the input contrasts, 2D-PD/T2 vs. 3D-FLAIR. The latter had a higher resolution and hence more potential to pick up small signal anomalies. Thus, with the current sequence parameters, artificial DIR images generated from 3D-T1 and 3D-FLAIR have the most potential for cortical lesion detection.

Artificially generated DIR images have been shown sensitive to cortical multiple sclerosis lesions, but none of these works included histopathological validation10,11. The relatively low sensitivity (i.e., < 25%) and high specificity (i.e., > 90%) of conventional DIR that were found are consistent with previous works6,7. Contrasting to the existing literature, no significant difference between conventional DIR and T1 was found in the number of lesions that were detected blinded to histopathology. However, previous works using histopathology as gold standard included 2D-T1-weighted images, while we included 3D-T16,7. Nonetheless, we do see –concordant with former literature– 3D-T1-weighted images falling behind in terms of lower intra- and inter-rater scores. Furthermore, 3D-T1-weighted images have lower histopathological specificity compared to conventionally acquired and both variants of the artificially generated DIR images, implying that standard 3D-T1 weighted images are more difficult to assess for cortical lesions.

In both conventional and artificial DIR images, > 75% of cortical lesions is missed a priori—a recurring issue in MRI-histopathology studies6,7,8,18,19. Thus, detection potential of artificial DIR images does not surpass that of conventionally acquired DIR images. This could be due to low contrast ratios in the images, in combination with small cortical lesion size. Furthermore, like conventionally acquired DIR images, artificially generated DIR images are host to artefacts, albeit of a different nature. These artefacts did not affect the sensitivity and specificity of the artificially generated DIR images in the current sample. Further development of the algorithm (e.g. adjusting patch-wise augmentation component) should suffice to increase the contrast ratios and minimize or remove these artefacts, which might increase sensitivity and specificity of the artificially generated images.

The body of literature examining the clinical relevance of cortical lesion detection is consistently increasing. In a recent study, cortical lesions were found to be among the main correlates of disease progression and long-term invalidity20. A priori cortical lesion detection remains intricate, implicating that reader training could be improved or that the scoring criteria could be adjusted16. Another way to mitigate this detection issue would be through automated detection of lesions. However, performance of such detection tools has not yet approached manual lesion detection rates21,22,23. Other combinations of input image contrasts could also be used to increase cortical lesion detection: artificial DIR (or PSIR) images could be generated from e.g. T2* combined with MPRAGE sequences, since (combinations of) these sequences have been shown fruitful for cortical lesion detection several times before6,24,25,26,27.

Limitations to this work include that the 3D-T1 and 3D-FLAIR sequence had suboptimal nulling in some patients, due to changes in the brain after death28. Standardized inversion time, and nulling of 3D-T1 was suboptimal for some patients due to range in postmortem delay. Also, there were no control patients included to assess whether the algorithm might have shown cortical lesions in controls as well. However, former work with artificial DIR images that were generated using a similar algorithm included controls and did not show any lesions in the latter10. Moreover, given the retrospective character of this study we were limited to the use of coronal brain slices whilst—concordant with the literature- the images were assessed in axial fashion. However, translation from axial to coronal plane was automated using the cross-hair function embedded in FSL View in order to prevent mistakes in translation. The a priori sensitivity of cortical lesions was not hampered by this additional processing step. A strong characteristic of the current work is that the ground contrasts from which the artificial DIR images were generated were acquired on different MRI scanners, underlining the scanner-robustness of the algorithm whilst maintaining sensitivity for cortical lesions. Future works should also include multi-centre validation of artificially generated images, having both input data and readers from different centres.

In conclusion, artificial DIR images, particularly generated from a combination of conventional clinical 3D-T1 and 3D-FLAIR images, could potentially substitute conventional DIR images for cortical lesion detection in patients with multiple sclerosis. This offers opportunities when conventional DIR images are unavailable, but also for ex post facto addition to clinical trial cohorts in which DIR had not been included a priori. Cortical lesion sensitivity for both artificially generated and conventionally acquired DIR images could be increased, as our results show potential for improvement of the a priori detection rates.

Materials and methods

Patients and autopsy procedure

Post-mortem imaging data and tissue samples were retrospectively obtained through the standardized Amsterdam MS Center rapid autopsy protocol between 2012 and 201814,15. The protocol included in situ MR brain imaging, followed by brain extraction from cranium. Two MR systems were used for scanning, a 1.5 T system and 3 T system. Details of the MR protocol, including 3D-T1, 2D-PD/T2, 3D-FLAIR and 3D-DIR sequences, are provided in the supplementary methods. After in situ imaging, the brain was cut into coronal slices, from which tissue samples were sectioned: in part following a standardized protocol and in part following discernible pathology.

Standard protocol approvals, registrations and patient consents

Prior to death, all patients had registered with the Dutch Brain Bank, thereby giving informed consent for the use of their tissue and medical records for research purposes. Permission for the autopsy protocol was granted by the Amsterdam UMC institutional ethics review board, approval number 2009/148. Histopathology of 11/38 patients has been used for validation of different MR sequences before6. All procedures were performed in accordance with the applying guidelines and regulations.

MR image pre-processing and generation of artificial MR images

In brief, MR image pre-processing was performed using the Insight Toolkit (ITK; https://itk.org) and functional MRI of the brain (FMRIB) Software Library (FSL version 5.0.4; http://fsl.fmrib.ox.ac.uk) and consisted of rigid co-registration with MNI standard space using FLIRT (part of FSL) and spline interpolation. An extensive description of the network is provided elsewhere10. In short, two variants of artificial DIR images were generated through training of a convolutional neural network to predict 3D-DIR images, either from a combination of 3D-T1 and 3D-FLAIR images or from a combination of 3D-T1 and 2D-PD/T2 images. The network architecture combined a competing generator and discriminator network: the generator was trained to produce artificial 3D-DIR images from either 3D-T1 and FLAIR or from 3D-T1 and PD/T2, whereas the discriminator was trained to discriminate between artificially generated DIR images and their conventionally acquired counterparts. The network was trained using input data from different vendors. A total of 200 epochs were trained, using a two-folded data structure in which the patients were randomly appointed to train and test set in a 2:1 ratio. An overview of the train and test sets is presented in Table S1. Furthermore, patch-wise data-augmentation was performed. Optimization of the algorithm and initial quality assessment was performed through visual inspection of the artificially generated images, compared to their conventionally acquired counterparts.

Histopathology-blinded lesion detection

To minimize patient recognition by the reader, all images were randomly ordered and flipped prior to scoring. Cortical lesions were manually scored in Medical Image Processing, Analysis and Visualization software (MIPAV; version 10.0.0, Centre for Information Technology, National Institutes of Health, Bethesda, MD, USA). Cortical lesions were scored in axial fashion following consensus guidelines developed by the MAGNIMS group for both artificially generated and conventionally acquired DIR images: cortical lesions had to be hyperintense areas compared to the surrounding grey matter, of at least 3 mm216. For every scored lesion, adjacent slices were assessed to prevent scoring artefacts, e.g. from cortical vessels that can followed through adjacent slices. Lesion scoring was performed by P.M.B. (3 years of experience in cortical lesion scoring). Histopathology-blinded and unblinded inter-rater agreement was evaluated with J.J.G.G. (> 15 years of experience) for a subset of five patients for all sequences. Furthermore, histopathology-blinded intra-rater agreement was determined for all sequences of a random subset of five patients.

Histopathological staining and lesion identification

Brain tissue was stained for proteolipid protein (myelin; PLP)—the histopathological staining protocol is provided in the supplementary methods. After tissue staining and histopathology-blinded MRI scoring, histopathological lesions were identified as areas of complete absence of myelin (i.e., lack of PLP). Cortical lesions were also subdivided into four types based on their position in the cortex: mixed grey-white matter lesions (type I; leukocortical lesions), purely intracortical (type II; intracortical lesions), subpial (type III) or pan-cortical (type IV) lesions17.

Matching and unblinded MRI scoring

After histopathology-blinded MRI scoring, tissue samples were matched to the conventionally acquired DIR images through manual identification of the corresponding MRI regions. The matching procedure is illustrated in Fig. 1. To assure that no lesions were missed due to thin slicing and curving of the tissue sample in the coronal plane, unblinded assessment was expanded to the slices adjacent to lesion location. Contrast ratios for the different sequences were calculated; detailed descriptions are provided in the supplementary material.

Statistical analysis

Sensitivity of the included sequences for cortical lesion detection was determined by dividing the number of lesions detected in the histopathology-blinded (or unblinded scorings) by the number of lesions detected on histopathology, times 100%. Specificity of the different sequences was determined by dividing the number of histopathologically validated lesions by the tot number of lesions that were scored. Comparisons between sequences and sequences to histopathology were made using linear mixed models in SPSS 26.0 (SPSS, Chicago, Illinois), controlling for age, sex and postmortem delay at time of commencement of the in situ MRI. Results were corrected for multiple comparisons using Fischer’s Least Significant Difference test, after which P-values of ≤ 0.05 were considered statistically significant. Inter- and intra-rater agreement were expressed as intra-class correlation coefficient, two-way mixed model, absolute agreement. Differences between contrast ratios were assessed using Wilcoxon signed-rank tests.