In vivo imaging of phosphocreatine with artificial neural networks

Phosphocreatine (PCr) plays a vital role in neuron and myocyte energy homeostasis. Currently, there are no routine diagnostic tests to noninvasively map PCr distribution with clinically relevant spatial resolution and scan time. Here, we demonstrate that artificial neural network-based chemical exchange saturation transfer (ANNCEST) can be used to rapidly quantify PCr concentration with robust immunity to commonly seen MRI interferences. High-quality PCr mapping of human skeletal muscle, as well as the information of exchange rate, magnetic field and radio-frequency transmission inhomogeneities, can be obtained within 1.5 min on a 3 T standard MRI scanner using ANNCEST. For further validation, we apply ANNCEST to measure the PCr concentrations in exercised skeletal muscle. The ANNCEST outcomes strongly correlate with those from 31P magnetic resonance spectroscopy (R = 0.813, p < 0.001, t test). These results suggest that ANNCEST has potential as a cost-effective and widely available method for measuring PCr and diagnosing related diseases.

1. I suggest that a very brief summary of the points I have made in relation to the background and possible use of this technique would usefully set the scene for the less specialist reader.
2. Obviously this would be marginal, given the current time resolution, but is it in any way feasible that the ANNCEST timecourse data in Fig 5a (and for comparison the 31P MRSI data in Fig 5b), specifically the recovery-from-exercise timepoints, could be used to calculate a spatially resolved map of PCr recovery rate constant? Such a thing would be interesting and physiologically relevant, being straightforwardly interpretable in terms of spatial distribution of mitochondrial function, and is surely a major way this technique could and should be developed. If (as I suspect) the data are too sparse, temporally, to support even a proof-of-principle calculation at this stage, I think it would be worth saying how this would work and why it would be useful. The point is, of course, that such kinetic measurements probe muscle energy metabolism and its abnormalities in a more effective (in some sense e.g. (sensitive, interpretable) way than resting [PCr] measurements.

Summary
The authors presented a pipeline that utilizes artificial neural networks (ANN) to quantify Phosphocreatine (PCr) concentration with high immunity to MRI interferences including magnetic field (B0) and radio-frequency transmission (B1) inhomogeneities. The methods have been tested on both phantoms and exercised skeletal muscle. The study documents the ability of the ANNN -based chemical exchange saturation transfer (ANNCEST) for measuring PCr and diagnosing related diseases.

Strength
• The paper introduces a framework for an important application that has a great interest to researchers for medical image analysis society, especially those pursuing in-vivo PCr quantification.
• The study is well-designed and validated • Most of the literature work references is up-to-date • The paper is well-written, and results are nicely discussed Weaknesses • The core computational algorithm used in the pipeline (ANN) is already published, which is not a problem in itself, but the rationale behind the specific choice should be discussed. Additionally, the scientific contribution of the manuscript, however, has been compensated for by the application itself and method evaluation.

Introduction (Page 3):
In addition to PCr measurement, 31 P MRS also provides information about pH, inorganic phosphate and adenosine phosphates (ATP, ADP, AMP) in tissue. In practice, 31 P MRS is most commonly applied to monitor the time dependencies of pH and PCr variation during exercise and recovery for assessing mitochondrial function. 7 Reson 275, 55-67 (2017). Fig 5b) Following the reviewer's suggestion, we demonstrate the feasibility of calculating a spatially resolved map of the PCr recovery time constant using PCr ANNCEST. The PCr ANNCEST data shown in Fig.5 were adopted.

Obviously this would be marginal, given the current time resolution, but is it in any way feasible that the ANNCEST timecourse data in Fig 5a (and for comparison the 31 P MRSI data in
To compensate for the sparsity of sampling time points, the baseline data was inserted behind the last recovery data as an additional sample. The time of first recovery data was set to 0 and the time interval of PCr mapping was 90 s. The PCr recovery rate constant was fitted using the following equation (

The paper introduces a framework for an important application that has a great interest to researchers
for medical image analysis society, especially those pursuing in-vivo PCr quantification.

The paper is well-written, and results are nicely discussed.
We thank reviewer for positive comments and summarizing the strengths of our work.

The core computational algorithm used in the pipeline (ANN) is already published, which is not a problem in itself, but the rationale behind the specific choice should be discussed. Additionally, the scientific contribution of the manuscript, however, has been compensated for by the application itself and method evaluation.
As the reviewer mentions, ANN has been well established and successfully applied to many diverse areas nowadays. The initial idea of this study and the choice of ANN were inspired by the work of Bo Zhu et al. Even though ANNCEST is a powerful tool, careful design of the acquired Z-spectrum and quantifiable parameters is still required, as described in the Discussion section. We have added the abovementioned statements in the revised manuscript (Page 8) to clarify the specific choice of ANN.

The online Methods is misplaced after discussion. How many is the training data? Also, 80-15-5 split is the author's choice or a standard procedure? Although the authors adapt the ANN, they still need to add details about their method, optimization, hyper-parameter tuning, etc.
Similar to the reviewer, we prefer the Methods before the Results, but we need to follow the mandatory format of Nature Communications, with manuscripts organized in the order: Abstract -> Introduction -> Results -> Discussion -> Methods.
The number of Z-spectra used for neural network training was 10 5 . We have clarified this in the revised manuscript. The 80-15-5 split is our choice and the default split provided by MATLAB is 70-15-15. To improve the neural network generalization and avoid overfitting, the training data were divided randomly into three sets (i.e. training set, validation set and test set). The training set was used for computing the gradient and updating the network weights and biases, and the error on validation set is monitored during the training process. During the initial phase of training, the error on validation set normally decreases. However, when the network begins to overfit the data, the error on validation set typically begins to rise. When this error increases for a specific number of iterations (40 in this study), the training of neural network is stopped, and the weights and biases at the minimum of the validation error are returned. The test set is designed to compare the performance of different ANN models, and while the error on test set is not used during training, in practice, it is still useful to monitor. If the error in the test set reaches a minimum at a significantly different iteration number than the error in the validation set, this might indicate a poor division of the data set. In this study, since the training Z-spectra were generated by randomizing the quantifiable parameters within certain ranges and no other ANN model was adopted, the test set is not critical for the training of neural network.
Therefore, we reduced the portion of test set to 5% and increased the portion of training set to 80%. It can be seen from Fig. 1(c) that the 80-15-5 split works well, and small standard deviation is observed between repeated trainings of neural network. The details of neural network training, specified in the first paragraph of the Methods section (Page 11), have now been expanded.

One more limitation of the study is the data size. Power analysis should be conducted to determine the appropriate data size that can be used to draw the author's conclusions.
In this study, PCr mapping using ANNCEST was validated by comparison with 31 P 2D MRSI measures obtained before and during in-magnet plantar flexion exercise. Following the reviewer's suggestion, a power analysis was performed to determine the appropriate data size to draw the conclusion. Assume we accept a p < 0.001 as acceptable and a study with 95% power, the sample size for the study will be (Kadam P, Bhalerao S.

Int J Ayurveda
where σ refers to the estimated standard deviation and Δ indicates the difference in effect. In this study, we expected a 50% reduction in PCr concentration during exercise (i.e. Δ ≈ 15 mM), and the standard deviation of PCr concentrations is 7.84 mM based on the baseline data shown in Fig. 5 (c). According to Eq. SEq3, the required sample size is about 14. In this study, a pixel-by-pixel correlation analysis was performed to compare PCr maps obtained by ANNCEST and 31 P 2D MRS on resting and exercised human skeletal muscle. Each PCr map has 16×16 = 256 pixels, and the PCr maps of baseline, during holding, and 0.75 min of recovery were chosen. Even though only partial regions were chosen for correlation analysis, the effective data size from four subjects is 202, which is much larger than the required sample size. We added the power analysis in Supplementary Section Section 7.

Abstract does not contain any quantitative results. Also, please refrain from using abbreviations from abstract, unless you define them (see e.g., WHM)
Following reviewer's suggestion, we reduced the usage of abbreviations for B 0 and B 1 in the Abstract (we left the abundant PCr one) and also added some quantitative results: Abstract (Page2): The PCr ANNCEST outcomes strongly correlated with those from 31 P magnetic resonance spectroscopy (R = 0.813, p < 0.001).

Quantitative results and statistical analysis have been conducted an included, however, comparison with other techniques is missing.
We have now added a description of how our method compares to 31 P measurements of pH and PCr concentration and compared the PCr recovery time from our experiments with the literature. Please see the responses to Reviewer 1 comment 1.1 for details.

The robust imaging of PCr on a 3T clinical system in 1.5 minutes is an exciting result that could have significant interest in the medical imaging field. While there are lots of interesting results, the logic and assumptions of the paper are not fully clear, with specific relevance to the benefit of the neural network
approach vs the PLOF approach, lessening enthusiasm.

Wouldn't the validity of the approach depend on how valid the assumptions are, which is very different in vivo where, for example, the MTC parameters were unknown. The logic should be clearly stated, and the results and methods should match this logic and follow a clear progression.
We apologize for the confusion. This paper contains multiple steps, and all these steps serve one goal, i.e.
developing high-quality PCr mapping using CEST for clinical practice at low field strengths. To achieve this goal, we developed and optimized PCr CEST experiment from data acquisition to data analysis to obtain optimal PCr CEST contrast and robust quantification result. For the data analysis, we proposed a novel CEST quantification framework dubbed ANNCEST, and its feasibility and efficiency were demonstrated on numerical simulation and phantom experiments (Figs. 1 & 2). For clinical data acquisition, we assigned the PCr CEST signal on human skeletal muscle at 3T and experimentally optimized the CEST contrast (Fig. 3).
Combining the advanced CEST quantification method, ANNCEST, with the optimized PCr CEST acquisition, high-quality PCr mapping on human skeletal muscle was obtained (Fig. 4). As validation and application, we applied PCr ANNCEST to measure PCr concentration and its spatiotemporal changes during skeletal muscle exercise (Fig. 5). For better evaluating and validating our approach, Bloch equation fitting, PLOF, and 31 P MRSI were carried out to show the performance of state-of-the-art methods. Following reviewer's suggestion, we added the following sections (in italic font) to Introduction and Discussion to make our paper clearer. To demonstrate the flexibility of ANNCEST in quantifying Z-spectrum with different parameters (e.g. number of pools, saturation length), we added Cr phantom results in Supplementary Materials Section 6.  998-1008 (2015).

Supplementary Materials (Section 4):
Since the MTC is homogeneous across human skeletal muscle (

Supplementary Materials (Section 6):
For the training of ANNCEST, we didn't make any assumptions. We feed the neural network with training Z-spectra and quantifiable parameters, and the other parameters, such as T 1 , T 2 , MTC, number of pools and saturation length, are blind for the neural network. The key thing we want to test in this paper is that if ANN can learn the relationship between desired parameters (e.g. concentration and exchange rate) and Z-spectrum and apply the learned knowledge to quantify new Z-spectrum. It can be seen from the Supplementary Materials (Fig. S4) that the depth, width and offset of CEST peak are related to concentration, exchange rate, and B 0 introduced frequency shift, respectively, and these effects can be fully exploited by ANN and applied to simultaneously quantify these parameters, as demonstrated by the results in this paper.
The flowchart of ANNCEST was added in Fig. S6-1. In order to demonstrate that ANNCEST is still valid with a different number of pools and saturation length (non-steady-state), Cr phantom experiments were performed and the results are given in Fig. S6-2&3.
ANNCEST is a data-driven quantification method, which is designed to extract relevant features from Z-spectra and utilize them to create a predictive tool based on the pattern hidden inside. The flowchart of ANNCEST is shown in Fig. S6-1.  Fig. S6-2, which reflects that ANNCEST can find strong correlations between Z-spectra and quantifiable parameters.
For validation, we applied the trained ANNCEST to quantify Z-spectra of Cr phantom obtained at room temperature (25℃), and the results are given in Fig. S6-3. An excellent correlation (R=0.9996) was observed between the ground truth and predicted phantom Cr concentration. The related Bland-Altman analysis of concentration is shown in Fig. S6-3f. The exchange rate obtained by ANNCEST (237.8±17.6 Hz) was consistent with that from the previous study (239~301 Hz at 25℃, pH 6.9-7.0) (Goerke S, et al. NMR Biomed 2014;27(5):507-518.). The predicted B 0 map (Fig. S6-3e) showed a strong correlation (0.9851) with that obtained by water saturation shift referencing (WASSR) MRI, as illustrated in Fig. S6-3h.

3T? The legend in figure 1b indicates a 300Hz variation, which would match the stated 0.4 ppm at 17.6T, not 3T. If the phantom work was done at 17.6 T, I don't understand the logic.
All the simulations, training and measurements shown in the manuscript were performed at 3T. The 300 Hz in

MTC, T 2 and lineshape). In both cases, it is unclear what assumptions are made about the PCr T 1 and T 2 , but they must be in the coupled Bloch equations. The training assumes a single T 1 , but doesn't that change with, for example, exercise? Discussion of possible problems and systematic errors is necessary, along with discussion of why these constraints were chosen (e.g. is it impossible to do the fitting without specifying the ratio of exchange rates?).
For the training of ANNCEST, we didn't make any assumptions. We feed the neural network with training Z-spectra and quantifiable parameters, and the other parameters, such as T 1 , T 2 , MTC, number of pools and saturation length, are blind for the neural network. The key thing we want to test in this paper is that if ANN can learn the relationship between desired parameters (e.g. concentration and exchange rate) and Z-spectrum and apply the learned knowledge to quantify new Z-spectrum.
The assumptions we made are for the generation of training data. In this paper, the training Z-spectra were generated by Bloch-  (8):998-1008.), the T 1 will increase around 10% after exercise, and we didn't include T 1 change in the training data of human skeletal muscle, which means the PCr mapping after exercise may contain the contamination from T 1 . In further study, better design of training data is required to improve the accuracy of ANNCEST. Following reviewer's suggestion, we added related discussion in the revised manuscript.  The procedures of PLOF method are: (1) Z-spectrum excluded CEST peak is used to fit the background; (2) fit the whole Z-spectrum with fixed background and get the true apparent relaxation rate of CEST peak. The offset of CEST is flexible during the fitting, which possesses some resistance against B 0 inhomogeneities. The assumption of PLOF method is that the background of Z-spectrum is broad and smooth, which can be presented by a polynomial function. This assumption may break down when including the Z-spectrum close to water, which has higher curvature compared to that further from water. Therefore, we discarded the saturation offsets 1.3 ~ 1.6 ppm during PLOF fitting. From the fitting results shown in Fig. 3, the 1.6 ~ 2.1 ppm is enough for the background fitting. In our opinion, the systematic errors for PLOF come from the following factors: First, the true apparent relaxation rate (T 1ρ based) is affected by B 1 , so an additional B 1 map is needed to correct for local effect changes induced by B 1 inhomogeneity. Second, the inevitable noise will degrade the fidelity of PLOF method, as well as Bloch equation fitting and other fitting based methods. From the demonstrations shown in the below Figure, satisfactory consistency was obtained by PLOF with different boundaries in the case without noise. However, when adding noise to Z-spectrum, the quantification results oscillated with different boundaries, which meant that the fitting method could not find a unique solution and the quantification results depended strongly on the fitting parameters. Minor points:

Why "Online Methods" instead of "Methods"?
We removed "Online" in the revised manuscript to avoid confusion.

"Numerical Simulation" section refers to figure 2, but I think you mean some of the subfigures in figure 1.
We thank reviewer for pointing out this typo. We replaced " Figure