Region of interest-specific loss functions improve T2 quantification with ultrafast T2 mapping MRI sequences in knee, hip and lumbar spine

Tolpadi, Aniket A.; Han, Misung; Calivà, Francesco; Pedoia, Valentina; Majumdar, Sharmila

doi:10.1038/s41598-022-26266-z

Download PDF

Article
Open access
Published: 23 December 2022

Region of interest-specific loss functions improve T₂ quantification with ultrafast T₂ mapping MRI sequences in knee, hip and lumbar spine

Aniket A. Tolpadi¹,
Misung Han¹,
Francesco Calivà¹,
Valentina Pedoia¹ &
…
Sharmila Majumdar¹

Scientific Reports volume 12, Article number: 22208 (2022) Cite this article

949 Accesses
5 Citations
3 Altmetric
Metrics details

Subjects

Abstract

MRI T₂ mapping sequences quantitatively assess tissue health and depict early degenerative changes in musculoskeletal (MSK) tissues like cartilage and intervertebral discs (IVDs) but require long acquisition times. In MSK imaging, small features in cartilage and IVDs are crucial for diagnoses and must be preserved when reconstructing accelerated data. To these ends, we propose region of interest-specific postprocessing of accelerated acquisitions: a recurrent UNet deep learning architecture that provides T₂ maps in knee cartilage, hip cartilage, and lumbar spine IVDs from accelerated T₂-prepared snapshot gradient-echo acquisitions, optimizing for cartilage and IVD performance with a multi-component loss function that most heavily penalizes errors in those regions. Quantification errors in knee and hip cartilage were under 10% and 9% from acceleration factors R = 2 through 10, respectively, with bias for both under 3 ms for most of R = 2 through 12. In IVDs, mean quantification errors were under 12% from R = 2 through 6. A Gray Level Co-Occurrence Matrix-based scheme showed knee and hip pipelines outperformed state-of-the-art models, retaining smooth textures for most R and sharper ones through moderate R. Our methodology yields robust T₂ maps while offering new approaches for optimizing and evaluating reconstruction algorithms to facilitate better preservation of small, clinically relevant features.

Rapid mono and biexponential 3D-T1ρ mapping of knee cartilage using variational networks

Article Open access 05 November 2020

Highly accelerated knee magnetic resonance imaging using deep neural network (DNN)–based reconstruction: prospective, multi-reader, multi-vendor study

Article Open access 12 October 2023

fastMRI+, Clinical pathology annotations for knee and brain fully sampled magnetic resonance imaging data

Article Open access 05 April 2022

Introduction

Magnetic Resonance Imaging (MRI) has emerged as a crucial part of diagnosing pathologies such as osteoarthritis, ligament damage, tumors, and others^1,2,3. Within MRI, several sequences can be deployed that exploit intrinsic tissue properties, providing images of varying weightings that effectively visualize tissues such as muscle, ligaments, bone marrow, and others⁴. In musculoskeletal (MSK) applications, clinical imaging protocols consist mostly of 2D fast spin echo (FSE) acquisitions with T₁ or T₂ weighting in various acquisition planes, which do well in depicting the structure and morphology of the underlying anatomy⁵. However, compositional MRI (cMRI) techniques to assess actual tissue parameters are gaining more attention as a complement of qualitative imaging.

cMRI techniques like T₂ relaxometry can provide maps of T₂ values (or another intrinsic MR parameter) across an imaging volume rather than a morphological image. For MSK applications, T₂ relaxometry offers sensitivity to water content, collagen content, and collagen fiber orientation in cartilage⁶, making it sensitive to biochemical changes that can precede morphological changes across several tissues and anatomies^7,8. Pre-morphological change sensitivity has been best characterized in the knee, where T₂ values are significantly higher across most cartilage compartments in healthy patients that later develop osteoarthritis (OA) compared to controls^9,10. Additionally, T₂ relaxometry offers quantitative MSK tissue health assessments, correlating with measures of hip cartilage and intervertebral disc (IVD) health^11,14,15,16, whereas in conventional clinical imaging, only semiquantitative tissue health assessments are obtainable with expert annotation^12,13. All of this makes cMRI a promising potential addition to clinical imaging protocols.

A major challenge facing clinical adoption of cMRI, however, is acquisition time: while mapping sequences like the magnetization-prepared angle-modulated partitioned k-space spoiled gradient echo snapshots (MAPSS) can provide robust MR parameter maps, their acquisition times can exceed 5–6 minutes, making their addition to a clinical scan protocol difficult¹⁷. Acquisitions can be accelerated by sampling fewer points in k-space, inducing aliasing artifacts in resulting images that must be removed through subsequent postprocessing. Some proposed approaches to these ends are reconstruction strategies such as parallel imaging (PI), compressed sensing (CS), model-based reconstructions, deep learning (DL), low-rank and sparse modeling methods, and MR Fingerprinting (MRF). Most of these approaches design an algorithm or exploit the redundancy of k-space acquisition across multiple coils to predict the appearance of the fully-sampled reconstructed image.

PI was one of the earliest techniques to accelerate MRI acquisition and has seen clinical adoption. Here, the redundancy of a multiple coil acquisition is leveraged to mitigate aliasing artifacts^18,19,20, reducing clinical scan time up to acceleration factor R = 3 for MSK applications^21,22. CS²³ has also shown promise, where aliased images are iteratively reconstructed by minimizing an objective function, retaining fidelity to acquired k-space and imposing sparsity on the reconstructed image in another domain. CS has attained clinically acceptable MSK image quality through roughly R = 4^21,22,24,25, and up to R = 8 in research settings for knee cartilage T_1ρ mapping²⁶. Similarly, PI and CS have also been applied sequentially (and simultaneously) for further acceleration²⁷.

For cMRI acceleration, model-based reconstructions have gained traction, integrating the physics of T₂/T₂^* decay and T₁ recovery into an objective function iteratively optimized to reconstruct maps, showing promise in brain and lumbar spine T₂ mapping^28,29,30. More generally, incorporation of the physics of MRI parameter recovery/decay has seen applications not just in model-based approaches, but in various aspects of other methodologies as well³¹. DL approaches have gained prominence in solving inverse problems such as reconstruction, allowing for cMRI reconstructions at higher R than other methods. Standalone DL approaches have seen promising results in knee MAPSS acceleration, T₁ mapping, and T₂ mapping sequences^{32,33,34,35,36}. In other methodologies, DL has been integrated with model-based approaches while introducing loss functions to maintain fidelity to acquired k-space, seeing promise up to R = 8 in knee and brain T₁ and T₂ mapping^37,38,39. DL has been applied to accelerate T₂ mapping in MR Fingerprinting, where DL can remove aliasing artifacts from undersampled acquisitions and/or replacing time-consuming dictionary lookup steps to predict MR parameter maps, and exploiting spatial correlations within maps to improve reconstructions^40,41. Lastly, aside from DL, low-rank and sparse modeling methods have emerged as a means of accelerating acquisitions, where several MRI images acquired at different echo times are decomposed into temporal basis functions and spatial coefficients to model an MRI parameter, showing promise through R = 8⁴².

These works represent great progress, although avenues for improvement remain. Above all, these methods have optimized reconstructed images for full-volume performance; however, in MSK applications, clinical assessment relies on the inspection of precise anatomic features in specific anatomic regions, and consequently, the reconstruction quality cannot be compromised within these regions. Put differently, given clinical context, strong image quality may be most important in specific regions of an image, leaving room for algorithm optimization. Furthermore, most recent published approaches leverage k-space data in formal reconstruction approaches, but for niche applications such as region of interest (ROI)-focused optimization, such approaches may be outperformed by DL-based post-processing algorithms that denoise and fit undersampled T₂-weighted images without using raw k-space. Moreover, performance of standard reconstruction algorithms is typically evaluated using metrics such as structural similarity index (SSIM), normalized root mean square error (NRMSE), and peak signal-to-noise ratio (PSNR), but recent works show these metrics may not provide the best correspondence with radiologist annotations^43,44, leading other groups to propose alternate metrics to fill this niche⁴⁵.

To these ends, this study proposes a recurrent UNet pipeline to postprocess undersampled coil-combined T₂-weighted echo images, fitting and predicting T₂ maps from accelerated MAPSS acquisitions in the knee, hip and lumbar spine^46,47. These algorithms are trained with multi-component, ROI-specific losses that optimize predicted maps for T₂ value and textural retention in cartilage and IVDs. In doing so, our approach allows for ROI-specific optimization, facilitating retention of small, crucial clinical features in tissues of interest while building on past applications of weighted loss functions for image processing tasks⁴⁸.

To summarize, the contributions of this work are as follows:

By using a 4-component loss function in network training, we introduce the concept of “ROI-specific optimization” of cMRI accelerated acquisition pipelines.
We conduct a thorough ablation study of these 4 loss function components, proving the value of all in retaining textures in predicted maps while retaining high fidelity to ground truth T₂ values.
Acknowledging that standard evaluation metrics such as SSIM and NRMSE provide suboptimal sensitivity to clinically relevant metrics, we conduct a thorough Gray Level Co-Occurrence Matrix (GLCM)-metric-based analysis of smooth and sharp textural retention in predicted maps, with an eye towards better evaluation of retention of small features crucial to clinical diagnoses^49,50.
We build on limited literature in hip and lumbar spine cMRI accelerated acquisition schemes by developing and evaluating our pipeline not only in knee cartilage, as several other works have done, but also for hip cartilage and lumbar spine IVD in ultrafast acquisitions.

Methods

MAPSS acquisitions

Retrospective datasets including MAPSS in the knee (n = 244 patients, 446 scans), hip (n = 67 patients, 89 scans), and lumbar spine (n = 21 patients, 24 scans) acquired from clinical 3 T MRI scanners was used. Patients were scanned in accordance with all pertinent guidelines, including approval from the University of California, San Francisco Institutional Review Board (Human Research Protection Program), and informed consent was obtained from all study participants. MAPSS simultaneously acquired multiple T_1ρ and T₂ weighted images, using T_1ρ or T₂ preparation followed by 3D RF-spoiled gradient-echo Cartesian acquisition in a segmented radial centric view ordering during a transient state. A fat-selective inversion pulse was applied before either T_1ρ^51,52 or T₂ preparation⁵³. Each acquisition included T_1ρ-prepared images at four spin-lock times (TSLs) for T_1ρ quantification, and three additional T₂-prepared images for T₂ quantification (TSL = 0 ms images were shared for TE = 0 ms images). In this study, only T₂-prepared images at four different TEs and corresponding T₂ maps from the MAPSS sequence were used. k_y-k_z space was acquired within an elliptical coverage (area = 0.7 compared to rectangular k_y-k_z, not acquiring corner space). Knee images were acquired from patients having ACL injuries, with scans taken at baseline and 3 years post-reconstruction. Hip images were acquired from patients having hip OA. Lumbar spine images were acquired from healthy subjects or patients with low back pain. Table 1 shows acquisition parameters.

Table 1 Knee, hip and lumbar spine datasets and splits.

Full size table

T₂ Fitting and spatial undersampling

Later T₂ weighted echo time images for each slice were registered to corresponding TE = 0 ms images using a 3D rigid registration algorithm with a normalized mutual information criterion⁵⁴. Levenberg–Marquardt fitting of registered T₂ weighted images yielded ground truth T₂ maps⁵⁵.

To simulate accelerated acquisition, coil-combined T₂ weighted magnitude images after reconstruction (ARC for knee and hip) were Fourier transformed and retrospectively undersampled using a center-weighted Poisson disc pattern, fully sampling a central 5% square in k_y-k_z (R = 2, 3, 4, 6, 8, 10, 12). Acquisition times associated with ground truth and accelerated MAPSS acquisitions in each body part can be found in Supplementary Table S1. As MAPSS acquires phase-encode lines with elliptical coverage in k_y-k_z (relative area of 0.7 compared to rectangular coverage), phase encoding lines solely within the sampling ellipse were undersampled. Although working with synthesized k-space data generated from coil-combined magnitude images, retrospective undersampling was done and R reported with respect to elliptical coverage in k_y-k_z to accurately simulate an actual undersampling pattern and not overstate model performance⁵⁶. However, for hip acquisitions, reconstructed space outside the y-FOV had already been discarded; thus, simulating acquisitions with application of ‘no phase wrap’ was not possible and undersampling patterns would differ from those implemented on a scanner. T₂ weighted images from each echo time were undersampled with a unique pattern. For k_y-k_z lines not sampled at a given echo time, those k_y-k_z lines were initialized with the corresponding k_y-k_z from the image with the temporally closest echo time for which that k_y-k_z was sampled. Only k_y-k_z lines not sampled in images acquired at all echo times were zero-filled. k-Space was subsequently inverse Fourier transformed, yielding undersampled, aliased images.

DL pipeline training

DL architecture

An overview of the data processing and training schemes is shown in Fig. 1, while a detailed diagram depicting our proposed network architecture is in Supplementary Fig. S1 (“Full Model”; 39,808,710 trainable parameters). Magnitude images from data undersampled as specified were fed into a recurrent UNet network. The network contains an initial recurrent portion: aliased images from each T₂ echo time have a 5-layer processing stream of 2D 3 × 3 convolutions with stride 1, yielding layers of depth 64, 128, 256, 512, and 1. Residual connections connect input aliased images with processing stream outputs. 2D 3 × 3 convolutions with stride 1 and residual connections transfer information between temporally adjacent corresponding hidden echo time processing layers with weighting parameter λ_w = 0.2⁵⁷. This soft-weighted view-sharing of neighboring T₂ weighted echo time images facilitated sharing of feature map information between temporally adjacent echo time images, which can augment sharing of k_y-k_z initializations to improve network image predictions. Outputs of all 4 echo time image processing streams were concatenated and fed to a UNet that predicted T₂ maps. 2D 3 × 3 convolutions with stride 2 were used for the encoder, and 2D 4 × 4 transpose convolutions with stride 2 for the decoder. Two additional architecture versions were also trained: one UNet with no recurrent portion (“No RNN”; 35,116,037 trainable parameters) and a second in which all layers apart from inputs to the recurrent portion and UNet had half the depth listed in Supplementary Fig. S1 (“Reduced Parameters”; 9,958,246 trainable parameters).

Loss function

Networks were trained with the multi-part loss function shown in Eq. (1):

$$L_{network} = \lambda_{{L_{1} }} L_{{L_{1} }} + \lambda_{{L_{1,\phi } }} L_{{L_{1, \phi } }} + \lambda_{SSIM} L_{SSIM} + \lambda_{Feature} L_{Feature}$$

(1)

in which $L_{{L_{1} }}$ is a scaled global L₁ loss detailed in Eq. (2):

$$L_{{L_{1} }} = \left| {S\left( {T_{2} } \right) - S\left( {\hat{T}_{2} } \right)} \right|$$

(2)

where $T_{2}$ represents ground truth T₂, $\hat{T}_{2}$ represents predicted T₂, and $S\left( x \right)$ is a translated and scaled sigmoid operator that assigns more weight to higher T₂ values. Sharp contrasts and high $T_{2}$ values can easily be lost in accelerated acquisition schemes, so $S\left( x \right)$ proved useful through empirical testing in focusing networks to preserve these details. $S\left( x \right)$ is defined below in Eq. (3):

$$S\left( x \right) = y_{l} + \left( {y_{h} - y_{l} } \right)\left( {1 + exp\left( { - \left( {10/\left( {x_{h} - x_{l} } \right)} \right)\left( {x - \left( {x_{l} + x_{h} } \right)/2 } \right)} \right)} \right)^{ - 1}$$

(3)

where $x_{l}$, $x_{h}$ were the low and high T₂ value limits where the sigmoid operator weighting will transition from $y_{l}$ to $y_{h}$. Parameters selected for the knee were as follows: $x_{l}$ = 0 ms, $x_{h}$ = 100 ms, $y_{l}$ = 0.1, $y_{h}$ = 1.0. In the hip: $x_{l}$ = 0 ms, $x_{h}$ = 60 ms, $y_{l}$ = 0.5, $y_{h}$ = 1.0. In the lumbar spine: $x_{l}$ = 0 ms, $x_{h}$ = 150 ms, $y_{l}$ = 0.25, $y_{h}$ = 1.0. A schematic of the operator that results from parameters of all three anatomies can be found as Supplementary Fig. S2.

$L_{{L_{1, \phi } }}$ is the ROI-specific L₁ loss, and is described in Eq. (4):

$$L_{{L_{1, \phi } }} = \left| {S\left( {T_{2,\phi } } \right) - S\left( {\hat{T}_{2,\phi } } \right))} \right|$$

(4)

where $T_{2,\phi }$ were ground truth T₂ values in the tissue of interest $\phi$ (IVD or cartilage), scaled by $S\left( x \right)$ (Eq. (3)), and $\hat{T}_{2,\phi }$ is the same for predicted T₂. Pixels corresponding to $\phi$ are obtained from segmentation masks, the generation of which is described in “Training and Segmentation Details”. For both $L_{{L_{1} }}$ and $L_{{L_{1, \phi } }}$, L₁ norms were used instead of L₂ due to reduced sensitivity to outliers, leading to more stable trainings.

$L_{SSIM}$ is an SSIM loss, described in Eq. (5):

$$L_{SSIM} = 1 - SSIM$$

(5)

where SSIM was the structural similarity index between predicted and target maps.

$L_{Feature}$ is a feature-based loss function designed to retain sharper textures, calculated as in Eq. (6):

$$L_{Feature} = \left| {VGG_{{T_{2} }} - VGG_{{\hat{T}_{2} }} } \right|$$

(6)

where $VGG_{{T_{2} }}$ and $VGG_{{\hat{T}_{2} }}$ were the outputs of the 21st layer of a VGG-19⁵⁸ network pretrained on ImageNet when fed resized and normalized target and predicted T₂ maps, respectively. Maps were resized to 224 × 224 × 1, concatenated with themselves along the channel axis to yield 224 × 224 × 3 inputs, and normalized such that the channels had mean pixel values of 0.485, 0.456 and 0.406, with standard deviations of 0.229, 0.224, and 0.225, respectively.

$\lambda_{{L_{1} }}$,$\lambda_{{L_{1,\phi } }}$, $\lambda_{SSIM}$, $\lambda_{Feature}$ were loss component weightings. All were positive-valued and optimized through constrained random hyperparameter searches with the following ranges:

Knee: $\lambda_{{L_{1} }}$ = 1,$\lambda_{{L_{1,\phi } }} = 50 - 150$, $\lambda_{SSIM} = 0 - 2$, $\lambda_{Feature} = 0 - 0.5$
Hip: $\lambda_{{L_{1} }}$ = 1,$\lambda_{{L_{1,\phi } }} = 0 - 3$, $\lambda_{SSIM} = 0 - 2$, $\lambda_{Feature} = 0 - 1$.
Spine: $\lambda_{{L_{1} }}$ = 1,$\lambda_{{L_{1,\phi } }} = 1 - 10$, $\lambda_{SSIM} = 10 - 100$, $\lambda_{Feature} = 5 - 55$.

Training and segmentation details

Scans of all three anatomies were split into training, validation and test sets as shown in Table 1. In the knee, cartilage was segmented manually. In the hip, cartilage was segmented manually for 4 central slices per volume. Segmentation in both was performed by research assistants trained by radiologists with over 20 years of experience. Since the hip dataset had substantially fewer segmented than unsegmented slices, the hip training set was bootstrapped to equalize the number of slices with and without segmentations (1068 bootstrapped slices). Finally, in the lumbar spine, IVDs were segmented with an ensemble of coarse-to-fine context memory (CFCM) networks⁵⁹. To calculate performance metrics and implement ROI-specific training losses, these segmentation masks were leveraged to identify pixels in tissues of interest (cartilage or IVD).

Signal values were scaled per slice for the middle 95% of pixel values to fall between 0 and 500 for the knee and lumbar spine, and 0 and 100 for the hip; these ranges were optimized empirically. During training, imaging volumes were augmented with random translation (± 10 pixels across phase and frequency directions) and random rotation (± 5 degrees about slice direction). All models were trained with learning rate 0.001 and Adam optimizer on an NVIDIA Titan Xp 12 GB GPU with batch size of 1 so the model would fit on a single GPU. Separate pipelines were trained for all 3 anatomies at R = 2, 3, 4, 6, 8, 10, and 12. For each pipeline, and at each trained R, a constrained random hyperparameter search was done for 15 iterations at 10 epochs per iteration to optimize $\lambda_{{L_{1} }}$,$\lambda_{{L_{1,\phi } }}$, $\lambda_{SSIM}$, and $\lambda_{Feature}$ for visual fidelity of predicted maps to ground truth. Visual fidelity was assessed in the search using NRMSE (calculated as shown in Eq. (7)) and Pearson’s r in the tissue of interest⁶⁰.

$$NRMSE={{\Vert {T}_{2}-\widehat{{T}_{2}}\Vert }_{2,\phi }\left({\Vert {T}_{2}\Vert }_{2,\phi }\right)}^{-1}$$

(7)

Final pipelines across all anatomies and R were trained using optimized parameter sets until validation loss did not decrease for 10 epochs. Key training details are summarized as part of Table 1.

Experiments

Loss function ablation study

An ablation study is key to understand contributions of loss components. Given optimized loss function weights, every combination of loss components was ablated and corresponding models were retrained until validation loss no longer decreased. “No RNN” and “Reduced Parameters” networks were also trained while maintaining loss function components at optimized values to assess the utility of simpler architectures. NRMSE and Pearson’s correlation coefficient (r) were calculated in tissues of interest across the test set for original and ablated models to determine loss component contributions to performance. Pearson’s r was deemed an appropriate statistical test for this and subsequent experiments, as it is useful in assessing the linear relationship between related pairs of interval data. While no formal NRMSE test was done, it nonetheless allows for quantitative assessment of T₂ quantification quality and easy comparison with results from other approaches. NRMSE is reported ± 1 standard deviation (s.d.); Pearson’s r was deemed significant in accordance with corresponding P values, α = 0.001, 0.01, and 0.05. NRMSEs within tissues of interest of a given scan were also multiplied by mean T₂ values within the tissue of interest of that patient, generating T₂ value equivalents of error rates.

To more specifically evaluate the utility of the ROI-specific loss component, two loss function configurations from the ablation study were further analyzed at all R: no ROI-specific loss component ($\lambda_{{L_{1,\phi } }} = 0; { }\lambda_{{L_{1} }} ,{ }\lambda_{SSIM} ,{ }\lambda_{Feature} \ne 0$) and no ROI-specific or feature-based components ($\lambda_{{L_{1,\phi } }} ,\lambda_{Feature} = 0; { }\lambda_{{L_{1} }} ,{ }\lambda_{SSIM} \ne 0$). These models were intended to represent baselines in which all loss functions were preserved except the ROI-specific component, and a standard reconstruction loss function of pixel and SSIM-based loss components, respectively. Pearson’s r—evaluated in tissues of interest and globally—was calculated to determine the degree and significance of correlation between predicted maps and ground truth, both globally and within tissues of interest, α = 0.001, 0.01, and 0.05.

Evaluation of accelerated acquisition scheme performance

Three versions of our pipeline (full pipeline, “No RNN,” and “Reduced Parameters”) were compared to state-of-the-art CS, DL, and DL/model-based solutions. At each R, MANTIS (54,413,056 trainable parameters) and MANTIS-GAN (54,413,056 [Generator] and 2,763,648 [Discriminator] trainable parameters) pipelines were trained using published network architectures, loss functions and undersampling strategies^42,43. Loss function weightings for both were optimized through grid hyperparameter searches yielding the following: (MANTIS) $\lambda_{data}$ = 0.1, $\lambda_{cnn}$ = 1; (MANTIS-GAN) $\lambda_{data}$ = 0.1, $\lambda_{cnn}$ = 1, $\lambda_{GAN}$ = 0.01. To apply CS reconstruction, original MAPSS T₂-prepared images were Fourier transformed into coil-combined k-space, 1D-inverse Fourier transformed along the readout direction, and individual slices in $k_{y} - k_{z}$ reconstructed using an $L_{1}$ wavelet-based algorithm with regularization coefficient 0.001⁶¹. CS reconstructed images were registered to the TE = 0 ms echo time image using a 3D rigid registration algorithm with a normalized mutual information criterion and fitted using Levenberg–Marquardt fitting to yield $T_{2}$ maps. Performance of these approaches and our proposed methods was evaluated through the following:

Comparison of global and ROI-specific performance

To test for completeness of training, performance of our proposed pipelines was compared against state-of-the-art models that did not use ROI-specific components in predicting T₂ maps. Pearson’s r (α = 0.001, 0.01, and 0.05) was used to compare model performances and assess strength of correlations to ground truth T₂.

Standard reconstruction metrics

Performance was reported in tissues of interest with standard reconstruction metrics: NRMSE (mean ± 1 s.d.) and Pearson’s r (α = 0.001, 0.01, and 0.05). NRMSEs were also converted into T₂ value equivalents by tissue compartment as in the ablation study.

T₂ value retention

Fidelity of predicted maps to ground truth T₂ was also assessed. First, predicted and ground truth T₂ values were compared across tissues of interest within the test set (mean ± 1 s.d.), generating violin plots for all three anatomies with overlaid boxplots for T₂ value distribution comparison. T₂ agreement was also assessed through Bland–Altman analysis.

Texture retention

Gray Level Co-Occurrence Matrix (GLCM)⁶² metrics were used to assess texture retention within tissues of interest. GLCM contrast and dissimilarity are maximized by large local pixel value changes and thus by sharper textures. GLCM homogeneity is maximized by small local pixel value changes, while GLCM energy and angular second moment (ASM) are maximized by few total pixel values within an image; hence, all three are maximized by smoothness. For each anatomy and R, we calculated these texture metrics at 4 orientations (θ = 0°, 45°, 90° and 135°; d = 1 pixel) and averaged across all orientations. Finally, we calculated intraclass correlation coefficients (ICCs) for all metrics with respect to ground truth (two-way mixed effects, single rater⁶³) and reported 95% ICC confidence intervals (α = 0.001, 0.01, and 0.05). These tests were chosen as appropriate, as they assess both reliability and agreement of associated metrics, and in this use case, individual GLCM metric values themselves are considered the only rater, justifying the ICC test type selected.

Repeatability study

To assess the robustness of pipelines to different datasets, two additional splits of the knee, hip and spine datasets were made, ensuring no patient was part of multiple validation and/or test datasets and that all scans from a given patient were only in one of training, validation and test for each split (folds 2 and 3 in Supplementary Table S2, where fold 1 is the original split). Additional hyperparameters searches optimized loss function weights on the two new splits. Optimized loss weights and corresponding T₂ quantification and texture retention performance for each splits is presented at all tested R in the same manner as for the primary split.

Raw multicoil data assessment

An in-house pipeline was developed that leveraged GE Orchestra 1.10 and other postprocessing tools to reconstruct coil-combined images from raw k-space data. As a proof of concept, knee MAPSS scans were performed on 3 volunteers, hip scans for 2, and lumbar spine for 2, all using the acquisition parameters listed for the retrospective datasets used for algorithm training, with raw k-space data saved for all. Multicoil k-space data (after ARC for knee and hip) was undersampled with the same center-weighted Poisson disc pattern described earlier, with each coil seeing the same undersampling pattern and k_y-k_z lines being shared across different T₂ weighted echo time k-spaces as previously described. Coil-combined images resulting from undersampled multi-coil data at all tested R were fed through corresponding post-processing pipelines to predict T₂ map appearance. A radiologist with 2 years of experience segmented knee cartilage, hip cartilage, and intervertebral discs from these acquisitions, allowing for visualizations of predicted T₂ maps and NRMSE calculations in ROIs.

Results

Ablation study results

Voxel-wise performance metrics for ablation study models at R = 8 are shown in Supplementary Table S3, with T₂ value NRMSE equivalents in Supplementary Table S4. Within the knee and hip, all loss components were necessary to obtain the optimal combination of high Pearson’s r and low NRMSE in cartilage. For the lumbar spine, while all loss components proved vital in maximizing Pearson’s r and minimizing NRMSE in IVDs, performance improved when the initial recurrent network was omitted. Though quantitative analysis is shown for all three pipeline versions in subsequent experiments, the full model is designated as best for knee and hip, and the no RNN for the spine.

ROI-specific and global assessments of best models and corresponding models trained without an ROI-specific loss (λ_1,ϕ = 0) and models trained with a generic loss (λ_1,ϕ = 0, λ_Feat = 0) are shown in Supplementary Table S5. In the knee and hip, across nearly all R, ROI-specific loss addition leads to improved correlations between predicted and ground truth cartilage T₂, with diminished performance globally. In the lumbar spine, which was trained with a substantially fewer batches than the knee and hip pipelines, these trends were inconsistent across tested R. Example predictions and ground truth for one slice of a patient in each pipeline are shown in Supplementary Fig. S3, showing that patterns of local T₂ value elevations in cartilage and IVDs are better preserved with an ROI-specific loss as opposed to pipelines trained without the loss component.

Visuals of network performance and comparison with state-of-the-art models

Predicted T₂ maps are displayed at select R for knee, hip and lumbar spine models in Fig. 2 for our three pipelines and three methods from the literature. In knee, hip, and lumbar spine, T₂ quantification performance is strongest with our proposed methods, maintaining low error rates, showing promising results compared with state-of-the-art methods through R = 10. Optimal architecture performances are further explored in Figs. 3–5. As shown in Fig. 3a, predicted T₂ knee maps retained strong fidelity to ground truth within tibiofemoral joint cartilage. Patterns within predicted maps became slightly more diffuse as R increased to 10, as indicated by a slight rise in NRMSE for cartilage in the slice, but visually, T₂ values and map patterns are preserved. As seen in Fig. 4a, hip predicted maps preserve T₂ values well in femoral and acetabular cartilage through R = 10, although T₂ patterns become more diffuse by R = 10. Figure 5a shows T₂ map predictions in the lumbar spine. The L4-L5 IVD is shown in more detail, where T₂ quantification performance was acceptable at R = 3, moderate at R = 6, and worse at R = 10, as indicated by rising IVD NRMSEs.

ROI and global performance comparisons of our selected pipelines against state-of-the-art approaches are in Supplementary Table S5. Across piplines trained with relatively large dataset (knee and hip), DL and model-based approaches (MANTIS and MANTIS-GAN) outperformed our proposed pipeline globally, but within cartilage ROIs, our pipeline exhibited stronger Pearson’s r at each tested R. These trends were not as strong in the lumbar spine pipelines, possibly owing to the randomness of training with a smaller dataset. Global and ROI-specific T₂ predictions are further visualized in Supplementary Fig. S4, showing predicted T₂ values exhibit substantially more visual fidelity to ground truth and lower NRMSE in state-of-the-art models compared to our pipeline, but a reversal of that trend in cartilage. In the lumbar spine, at some but not all R, those trends held, yielding similar conclusions to the Pearson’s r analysis.

Evaluation of T₂ quantification performance and comparison with state-of-the-art models

Voxel-wise T₂ evaluation fidelity

Pearson’s r and NRMSE across all anatomies and R for our approaches and state-of-the-art methods are in Table 2. T₂ value NRMSE equivalents are in Supplementary Table S6. For all anatomies and across nearly all R, T₂ quantification performance is strongest in our methods, particularly in the No RNN and full model pipelines, compared to state-of-the-art models.

Table 2 ROI-specific model performance in standard metrics from R = 2 through R = 12.

Full size table

An exhaustive examination of knee T₂ quantification performance, stratified by cartilage compartments, is in Supplementary Tables S7 and S8. For the full model, across all cartilage compartments, T₂ estimation errors remained under 10% through R = 10 across all cartilage compartments while Pearson’s r ranged from 0.748 at R = 2 to 0.491 at R = 12, indicating strong correlations⁶⁴ between predictions and ground truth at R = 2 and moderate correlations through R = 12. For some cartilage compartments and R, performance was stronger in the No RNN pipeline. Interestingly, quantification performance was strongest in patellofemoral joint cartilage, generally exhibiting lower NRMSE and stronger correlations. Our ROI-specific loss pipelines outperformed state-of-the-art models in each cartilage compartment.

Supplementary Tables S9 and S10 show hip T₂ quantification performance across cartilage compartments. As in the knee, quantification performance was strong, with error rates across all cartilage under 9% through R = 12 for the no RNN and full model pipelines. While the no RNN pipeline had stronger quantification errors, the full model had higher Pearson’s r, which ranged from 0.794 at R = 2 to 0.517 at R = 12, showing strong correlations between predictions and ground truth through R = 3 and moderate correlations through R = 12. T₂ quantification performance was slightly stronger in femoral than acetabular cartilage. Our pipelines again outperformed state-of-the-art models in each cartilage compartment.

Supplementary Tables S11 and S12 show lumbar spine T₂ quantification performance, which was mixed. Pearson’s r across all discs was very high, ranging from 0.884 at R = 2 to 0.643 at R = 12 for the no RNN model, indicating strong correlations through R = 8 and moderate correlations through R = 12 to ground truth. That said, IVD error rates were markedly higher across all R than in hip and knee cartilage, ranging from 4.86% to 18.8%. Though there was some volatility, error rates and Pearson’s r generally showed poorest T₂ quantification in L1/L2 and L2/L3 discs. Through R = 8, ROI-specific loss pipelines outperformed state-of-the-art models at nearly all disc levels, with stronger Pearson’s r in most IVD levels through R = 12.

T₂ Value retention on region of interest averages

Bland–Altman plots are provided for the knee, hip and lumbar spine in Figs. 3b, 4b, and 5b. In knee and hip, T₂ values are predicted with minimal bias with respect to ground truth. The ± 1.96 s.d. limits of agreement were less than approximately ± 6 ms with mean biases under ± 3 ms through R = 8 for knee cartilage (Fig. 3b). Among cartilage compartments, predictions in trochlear and patellar cartilage showed the least bias, while tibiofemoral cartilage T₂ was generally slightly overestimated. In the hip (Fig. 4b), ± 1.96 s.d. limits of agreement were less than approximately ± 5 ms with mean biases under ± 3 ms through R = 12, although T₂ quantification performance was similar across femoral and acetabular cartilage. In the lumbar spine (Fig. 5b), limits of agreement were considerably wider than the hip and knee pipelines, particularly above R = 4. While the line of equality was contained in these limits at all R, spine pipelines generally overestimated T₂ values. While at some particular R, a disc level saw poorer T₂ quantification than others (i.e. L2/L3 at R = 6), on balance, predicted maps yielded similar bias and error across all discs.

Supplementary Fig. S5 shows T₂ value distributions in violin and boxplots. Plots reveal minimal bias in hip cartilage predicted T₂ maps and slight but limited bias towards overestimating T₂ in knee cartilage. In the lumbar spine, more volatility was observed in predicted T₂ distributions, likely due to small test set size (n = 5), but at least through R = 6, these deviations had limited magnitude.