Grading of endometrial cancer using 1H HR-MAS NMR-based metabolomics

The tissue metabolomic characteristics associated with endometrial cancer (EC) at different grades were studied using high resolution (400 MHz) magic angle spinning (HR-MAS) proton spectroscopy. The metabolic profiles were obtained from 64 patients (14 with grade 1 (G1), 33 with grade 2 (G2) and 17 with grade 3 (G3) tumors) and compared with the profile acquired from 10 patients with the benign disorders. OPLS-DA revealed increased valine, isoleucine, leucine, hypotaurine, serine, lysine, ethanolamine, choline and decreased creatine, creatinine, glutathione, ascorbate, glutamate, phosphoethanolamine and scyllo-inositol in all EC grades in reference to the non-transformed tissue. The increased levels of taurine was additionally detected in the G1 and G2 tumors in comparison to the control tissue, while the elevated glycine, N-acetyl compound and lactate—in the G1 and G3 tumors. The metabolic features typical for the G1 tumors are the increased dimethyl sulfone, phosphocholine, and decreased glycerophosphocholine and glutamine levels, while the decreased myo-inositol level is characteristic for the G2 and G3 tumors. The elevated 3-hydroxybutyrate, alanine and betaine levels were observed in the G3 tumors. The differences between the grade G1 and G3 malignances were mainly related to the perturbations of phosphoethanolamine and phosphocholine biosynthesis, inositol, betaine, serine and glycine metabolism. The statistical significance of the OPLS-DA modeling was also verified by an univariate analysis. HR-MAS NMR based metabolomics provides an useful insight into the metabolic reprogramming in endometrial cancer.


Quality Control
The QC procedure contains three elements: the requirements of the tissue specimen preparation, the QC of the NMR protocol and the QC of the multivariate data analysis. As in all bioanalytical approaches, a clean and reliable sample preparation strategy is a significant component in designing metabolomics (or -omics, in general) studies. Our tissue sample preparation procedures were defined and rigorously followed. To ensure robust and accurate quantification of the potential biomarkers by NMR, each step of the analytical protocol was carefully performed and evaluated. One of our QC approaches is to keep the post-processing procedure simple and to perform the multivariate analyses using raw or minimally processed data which does not rely, for example, on high-level processing.
The detailed information concerning all three stages of the QC procedure are as follows:

Sample preparation
To minimize the sample degradation processes the time between the tissue resection and freezing was kept as short as possible (below 5 minutes). The cutting of the sample for the NMR measurements was performed on a metal block cooled with liquid nitrogen to keep the sample frozen. The NMR measurements were performed at 4°C.

NMR quality control procedures
The quality control procedures included: -Magic angle adjustment using bromide potassium sample (performed once a month, or after a probe change). -Temperature calibration using 4% methanol dissolved in d3-methanol sample (done once a month, or after a probe change). -Manual shimming the probe using a sample of 3% CHCl3 in Acetone-d6 (every day) The acquisition of each HR MAS NMR spectrum was proceeded by: manual tuning and matching of the probe, locking the Bo filed using D20, manual shimming using the added formate signal (FWHM < 1.5 Hz), manual adjustment of the pulse length and manual determination of the transmitter frequency offset (O1) for optimal water suppression.
In order to make the spectra comparable to each other the probabilistic quotient normalization was used.

OPLS-DA modeling details
Each OPLS-DA model was described using the number of the model components, the fractions of the X and Y variation explained by the predictive and orthogonal components (R2X and R2Y), the fractions of the Y variation predicted by the models (Q2) and the p-values from the CV-Anova test. The combination of VIP and p(corr)[1] was used for variable selection.    Table S4. OPLS-DA models (7-9) diagnostics. Number of the OPLS-DA model components, the pvalues obtained from the CV Anova test, the fractions of the total X and Y variation explained by the model (R2X, R2Y), the fractions of the total Y variation that can be predicted by the model (Q2), the intercepts values obtained from the permutation tests representing the values of R2Y and Q2 of the purely random models.

Multivariate models S1-S4
Although the direct analysis of the influence of the disease advancement on the metabolic profiles within each pathomorphological grade group is not possible in our work, several multivariate models were constructed to shed some light on this problem ( Figure S3). Two separate OPLS-DA models distinguishing the G2 (stage 1) tumors from the control tissue (OPLS-DA model S1) and the G2 (stage 2+3) tumors from this tissue (OPLS-DA model S2) were compared to each other by means of the SUS plot. To examine the confounding effect of the disease stage on the metabolic differences between G1 and G2 endometrial cancer, the OPLS-DA model S3 was built based on the G1 and G2 (stage 1) tumors. The patients with the G2 and G3 tumors characterized by the more advanced disease stage (stages 2+3) were included in the development of the OPLS-DA model S4. Figure S3. The scheme presenting the constructed multivariate models (S1-S4).
The number of the model components, the fractions of the X and Y variation explained by the predictive and orthogonal components (R2X and R2Y), the fractions of the Y variation predicted by the OPLS-DA models S1-S4 (Q2), the p-values from CV-Anova test and the results from the permutation testing are presented in Table S8.  Table S8. OPLS-DA models (S1-S4) diagnostics. Number of the components in the models, the pvalues obtained from CV Anova test, the fractions of the total X and Y variation explained by the model (R2X, R2Y), the fractions of the total Y variation that can be predicted by the model (Q2), the intercepts values obtained from the permutation tests representing the values of R2Y and Q2 of the purely random models.

OPLS-DA models S1 and S2
The scores plots obtained from OPLS-DA models S1 and S2 differentiating G2 (stage 1) and G2 (stages 2+3) tumors from the control tissue are shown in Figures S4-S5. Figure S6 presents the SUS plot comparing these models. The p(corr)[1] and VIP values for the most important metabolites obtained from these models are listed in Table S9. The common features of the G2 (stage 1) and G2 (stage 2+3) tumors in reference to the control tissue include: higher 3hydroxybutyrate, N-acetyl compound, isoleucine, leucine, valine, lysine, taurine, serine, hypotaurine, choline and ethanolamine and decreased glutamate, creatinine, glutathione, scylloinositol, creatine and ascorbate. Increased lactate, alanine and phosphocholine and decreased glucose, myo-inositol, acetate and glutamine were found to be characteristic for the G2 (grade 2+3) group in relation to the non-transformed tissue [not observed in the G2 (stage 1) tumor group], while the lower succinate was distinctly observed in the G2 (grade 1) tumors group in relation to the normal tissue. Figure S4. Scores plot obtained from the OPLS-DA model S1 differentiating G2 (stage 1) tumors from the normal endometrium. The image was created using SIMCA-P 15.0 software package (https://www.sartorius.com).    Table S9. Metabolites contributing to the differentiation between the G2 (stage 1) tumors and the normal endometrium (OPLS-DA model S1) and the G2 stage (

OPLS-DA model S3
The scores plot obtained from OPLS-DA model S3 differentiating G1 (stage 1) from G2 (stage 1) tumors is presented in Figure S7, while the pcorr(1) and VIP values for the most important metabolites obtained from these models are shown in Table S10. The G2 (stage 1) tumors are characterized by the lower succinate, serine, dimethyl sulfone, ascorbate and taurine than the G1 (stage 1) tumors. These metabolites are characterized by AUC > 0.7.

OPLS-DA model S4
The scores plot obtained from the OPLS-DA model S4 differentiating the G2 (stage 2+3) from G3 (stage 2+3) tumors is presented in Figure S8, while the pcorr(1) and VIP values for the most important metabolites obtained from these models are shown in Table S11. Higher choline, scyllo-inositol, taurine, myo-inositol, creatine and succinate and lower betaine, ascorbate and glucose were observed in the G2 (stage 2+3) tumors than in the G3 (stage 2+3) tumors. However, creatine and ascorbate are characterized by AUC < 0.7. Although the ‫|‬p (

Analysis of the contribution of the aromatic region (5.2-8.4 ppm) to the total area under the HR MAS NMR spectra
The free induction decay signals were multiplied by an exponential function (0.3 Hz), Fouriertransformed, phased and baseline corrected in Topspin 3.1 software (Bruker BioSpin GmbH). The spectra were referenced to formate peak (at 8.44 ppm) in Mestrenova software (Santiago de Compostela, Spain). The low (5.2-8.4 ppm) and high field (0.8-4.8 ppm) regions were integrated. Table S12 shows the contribution of the aromatic region to the total area under the HR MAS NMR spectra (after excluding the residual water region) for the analyzed groups. Figure S10 presents the exemplary spectrum for which this contribution is equal to 2.3%.

Normalization
Probabilistic quotient normalization (PQN) was used to make the spectra comparable to each other in our work [ This method starts with the adjustment of the total intensity of each individual spectrum to the same value. Then, the ratios of the signal intensities between a given spectrum and a reference one (the median spectrum from the control group in our work) are calculated. The median of these ratios for a given spectrum is a normalization factor.
To avoid a negative impact of noise on this normalization factor value, the aromatic region was excluded from the normalization procedure. Taking a relatively small contribution of this region to the total area into account (Table S12), the exclusion of this part of the spectrum from the normalization procedure is not a critical step in our analysis. The obtained normalization factors for each spectrum were consistently used both for the aliphatic and aromatic regions.

Merging of aromatic and aliphatic regions into a single data matrix
The consistently normalized aliphatic (0.8 -4.8 ppm, apodized with an exponential function 0.3 Hz) and aromatic (5.2 -8.4 ppm, apodized with an exponential function 3 Hz) regions of the HR MAS NMR spectra were merged into a single data matrix. This matrix was centered and Pareto scaled before a multivariate analysis.

Multivariate analysis
The multivariate analysis of the merged aromatic and aliphatic regions was conducted according to the scheme presented in Figure S11. This scheme is similar to that presented in Figure 1 (the analysis of the aliphatic region, main manuscript file). The models computed from the merged aliphatic and aromatic regions are marked with a superscript M (PCA model 1 M , PLS-DA model 2 M , etc). Figure S11. The scheme presenting the constructed multivariate models.

Unsupervised analysis of natural clustering of patients according to the tumor grade and disease stage -Model PCA 1 M
The scores and loadings plots obtained from the model PCA 1 M are presented in Figure S12. The cross-validated scores and weights plots obtained from the PLS-DA models 2 M and 3 M are presented in Figure S13. It is apparent that the patterns visible in the scores plots obtained from the models PCA 1 M , PLS-DA 2 M and PLS-DA 3 M (Figures S12 and S13) are similar to those obtained from the models PCA 1, PLS-DA 2 and PLS-DA 3 (Figures 3 and 4, main manuscript file). The presented loadings plots indicate a minor contribution of the aromatic region to the principal components.

Pair-wise discrimination between the endometrial cancer of different grades and healthy endometrium (OPLS-DA model 4 M , OPLS-DA model 5 M , OPLS-DA model 6 M )
The scores and loadings plots obtained from the OPLS-DA models 4 M , 5 M and 6 M are presented in Figure S14. The p(corr)[1] vs VIP plots for the aromatic range are also shown in this plot.
The number of the model components, the fractions of the X and Y variation explained by the predictive and orthogonal components (R2X and R2Y), the fractions of the Y variation predicted by the models (Q2), the p-values from the CV-Anova test and the results from the permutation testing are presented in Table S13. The scores and loadings plots obtained from the OPLS-DA models 7 M , 8 M and 9 M are presented in Figure S15. The p(corr)[1] vs VIP plots for the aromatic range are also shown in this plot. The number of the model components, the fractions of the X and Y variation explained by the predictive and orthogonal components (R2X and R2Y), the fractions of the Y variation predicted by the models (Q2), the p-values from the CV-Anova test and the results from the permutation testing are presented in Table S14.

Multivariate analysis of the spectral region from 5.2 to 8.4 ppm Preprocessing
The free induction decay signals were multiplied by an exponential function (3 Hz), Fouriertransformed, phased, baseline corrected and referenced to formate peak at 8.44 ppm. The data were analyzed at full resolution. For consistency of the results the region from 5.2 to 8.4 ppm was normalized using the scaling factors obtained from probabilistic quotient normalization of the region from 0.8 to 4.8 ppm.

Multivariate analysis
The pre-processed data were imported to SIMCA-P 15.0 software (Umetrics, Sweden) for multivariate modeling. Before the modeling, the data were mean centered and Pareto scaled. The scheme of the analyses is presented in Figure S16.
Principal Component Analysis (PCA) was used to obtain the initial information about the natural grouping of the spectra acquired from the cancer samples (according to the tumor grade and the disease stage) and the healthy tissue (model S5). The clustering of tumors according to grade and stage was also examined using Partial Least-Squares Discriminant Analysis (PLS-DA) (models S6 and S7).  Figure S17 presents the average CPMG spectra (region from 5.2 to 8.4 ppm) acquired from the different grades of endometrial cancer and the control tissue. The tentatively assigned metabolites are marked in this figure and listed in Table S15. Figure S17. Average 1 H HR-MAS NMR CPMG spectra obtained from the G1, G2 and G3 endometrial tumors and the control tissue. The image was created using SIMCA-P 15.0 software package (https://www.sartorius.com). α-glc -α-glucose, Lipidunsaturated lipids, Glycglycogen, Ururidine, UDP sugar -Uridine diphosphate sugar, Urauracil, Ino -Inosine, ATP -adenosine triphosphate, Fumfumarate, Tyr tyrosine, Phephenylalanine, Ade -adenine

Metabolite
Chemical

PLS-DA models S6 and S7
Table S16 shows the number of the components, the fractions of the X and Y variation explained by the PLS-DA models S6 (differentiating the patients according to the disease stage) and S7 (differentiating the tumors according to the grade), the fractions of the Y variation predicted by the models and p value obtained from CV-Anova test.