Predicting postoperative peritoneal metastasis in gastric cancer with serosal invasion using a collagen nomogram

Accurate prediction of peritoneal metastasis for gastric cancer (GC) with serosal invasion is crucial in clinic. The presence of collagen in the tumour microenvironment affects the metastasis of cancer cells. Herein, we propose a collagen signature, which is composed of multiple collagen features in the tumour microenvironment of the serosa derived from multiphoton imaging, to describe the extent of collagen alterations. We find that a high collagen signature is significantly associated with a high risk of peritoneal metastasis (P < 0.001). A competing-risk nomogram including the collagen signature, tumour size, tumour differentiation status and lymph node metastasis is constructed. The nomogram demonstrates satisfactory discrimination and calibration. Thus, the collagen signature in the tumour microenvironment of the gastric serosa is associated with peritoneal metastasis in GC with serosal invasion, and the nomogram can be conveniently used to individually predict the risk of peritoneal metastasis in GC with serosal invasion after radical surgery.

It is notable that the training set and validation cohorts are different clinically. The authors need to explain whether this affects the analysis they have conducted. Particularly with respect to location of the primary tumour, tumour size and lymph node status.
I am assuming all tumours involved the serosa but convention warrants a T stage should be recorded for all cancers that may have a bearing on the outcome. Were there subserosal vs serosal invasion and was the lesion a T4 lesion vs a T3 lesion?
There is considerable interest in Lauren classification of GC, particularly when Diffuse GC has an infiltrative phenotype and has poor outcome. It would be beneficial to stratify on the basis of this histological grouping given Diffuse histology at the serosa will have high propensity to metastasise to peritoneum. Indeed I am surprised multivariate analysis did not find a significant difference in undifferentiated GC which will comprise many of these. Could analysis incorporate Lauren classification as an independent variable? Why did the authors choose Cardia as a reference for location? Cardia are less common in Chinese population and usually they are poorly prognostic and usually require more radical operations that will influence clinical outcomes.
Lymph node status seems to be the most significant multivariate predictor of outcome, with highest SHR and range. How do the authors propose to use this assay in clinical prediction? Will it be in combination with other variables and how is prediction of outcome affected by removal of clinical variables?
I have a technical question regarding the multiphoton assay to assess collagen. Was this collagen only in the serosa or was collagen matrix evaluated for the entire section? Would a negative control include serosa where there is no invasion of tumour cells? Formalin fixation tends to alter human tissue, while I assume all cases were treated equally, I am wondering about the technique and whether fixation (and possibly degree of fixation) may effect the collagen signature? For instance, would a case that was fixed overnight compared to one fixed over a few days, prior to paraffin embedding, influence the multiphoton imaging?
The regions of selection become important also. I note in Figure 1 there is one region of selection which has adipose tissue in the H&E. Adiposity does occur at the serosa and this will be variably distributed and may influence collagen matrix. How does BMI or fat content in the serosa of individuals affect the assay?
In usual circumstances when analysing a predictive assay you would create a model using the training cohort which then defines set thresholds and then use those thresholds in an independent validation cohort. It is not clear to me whether the threshold was set by the training cohort and then tested on the validation cohort. It appears the validation cohort was used as another independent cohort with different distribution and different threshold for collagenomics signature. Can the authors explain? I am not sure the ROC in Supp Fig 6 need to have all three time points. I would have thought 3 year cumulative outcome would be enough. Most relapses will occur within a 3 year timeframe if they are going to happen. I do not think they are significantly different.
When considering this as a diagnostic test it would be valuable to have an indication of the sensitivity and specificity as well as positive and negative predictive values to reassure clinicians they are selecting appropriate population for interventions.
My last point is a philosophical one. What do the authors propose will be the difference in management given their nomogram? Will they expect cases with N3 disease and serosal involvement to have different treatment based on their nomogram result because they are predicting less peritoneal involvement? I can understand they may advocate more intense peritoneal treatment for collagen signature positive patients, but would they advocate withholding treatment in advanced disease on the basis of the nomogram result?
Reviewer #3 (Remarks to the Author): Although the manuscript presents a good amount of data with a large patient cohort, the current manuscript is currently not technically sound. Important details are missing. Some content is biologically incorrect.
1) The term ‚collagenomics' is misleading. It implies to investigate the 'collagenome' -meaning all types and varieties of collagens. However, by this technique, only a very limited number of collagens are accessible by their endogenous signal. The authors should consider revising this term.
2) The authors talk about 'high-throughput quantitative collagen features'. Is there any structural/molecular assignment which can be linked to the different features? Especially the Gabor wavelet transformation features, which majorly influence the calculation of the 'collagenomics signature' is highly vague.
3) Is there any explanation why for the establishment of the collagenomics signature from the 4 types of collagen features (morphology, histogram, GLCM & Gabor wavelet) the LASSO regression mainly selected Gabor wavelet features as potential predictive variables (3 out of 4), while the other feature types barely seemed to influence the metastasis probability? 4) Training and validation data are always represented in 2 different figures/panels. Is this due to the variability between the two cohorts? If it was known from the clinical data already that there are differences between the tumor size, tumor location etc., it might be worth considering a pooling of all patients and randomly define a test and a training/validation set.

5) In regards to the methodology:
Were the unstained serial sections, that were used for the multiphoton imaging, treated in any way before the measurements? Was the paraffin removed, and if so how? In the future, would the definition of the invasive region/ROI also be possible based on the MP image or is there always an H&E section necessary to determine the ROI? 6) Fig. 1 indicates that the authors also collected TPEF signal from the tumor tissues. It is not clear why the features of these images were not included in the prediction models. 7) Does the X-tile plot (Supplemental Fig. 5) represent the data for training or the validation set? To verify the selection of the cut-off value, both plots should be shown. The presented plot does not lack any green color, red is the predominant color. What is the meaning of this? Also from Supplemental Fig. 4, it is not clear why this cut-off value was chosen. Why are the values overall lower in the validation set? Plots for training and validation cohort should have the same scale. 8) In general, the approach to extract the presented amount of image features, based on the collagen fiber structure, implements that many selected features are correlating. Only few features might be important for the predictive capacity. The authors should analyze the correlations of the features and select independent features for their prediction model. 9) Page 11, the author state that "the collagenomics signature was positively corrected with the cross-link density of collagen….". This is not surprising as the cross-link density is part of the 'Collagenomics signature calculation formula' (Appendix). However, it is not clear if this cross-link density (meaning the connections between individual collagen fibers?) is correlated to chemical crosslinks that are mostly present within a collagen fiber. The previous study from the authors (reference no 25) refers to chemical collagen crosslinks. Studies that analyze systematically the relationship of collagen network features (e.g. via SHG) and chemical crosslinking are still missing. The section in the manuscript (P. 11) needs clarification.
Reviewer #4 (Remarks to the Author): The authors propose a multiphoton imaging-derived "collagenomics" signature that associates with a high risk of peritoneal metastasis in gastric cancer with serosal invasion. This signature is validated in an independent, external data set.
This validated "collagenomics" signature in and of itself is a novel and interesting finding, especially for those who study and treat gastric cancer. If there were further metastasis-associated multiphoton imaging-derived collagen-related findings presented across multiple cancer types, these would be of widespread interest to the greater cancer research community.
Seemingly in order to find clinically relevant use for the signature, the authors then build a nomogram that includes this signature to predict individual risk of peritoneal metastasis in GC with serosal invasion. However, there are major concerns and issues with their nomogram approach and methods.
Fundamentally, a nomogram is built to be used in the clinic. Therefore, there needs to be a welldefined clinical justification for creating one, i.e. what clinical decision will be aided by using it? And this justification should be the overarching motivation for creating the nomogram in the first place. Instead, in the manuscript, there are only vague references to "clinical use" and "improving the prognosis" when introducing the nomogram. Even when presenting the results of decision curve analysis, the decision in question is not at all referred to.
It is not until later in the discussion that it becomes clear that there is an actual decision that could be influenced by the nomogram, namely which patients gets chosen to undergo intraperitoneal chemotherapy (IPC), which is costly and associated with a high rate of postoperative complications. This decision needs to be foregrounded as the basis for why a nomogram is justified in the first place. (As an aside, complications of IPL surgery can be incorporated into the decision curve as well. See Vickers et al 2008, DOI 10.1186/1472 More concerning, because it is an issue that can not be addressed by reorganization of the manuscript, is the inclusion of the "collagenomics" signature into the nomogram without addressing the essential question of whether there is justification for including non-clinical variables into a nomogram at all. Does the "collagenomics" signature add on to the clinical variables already used in similar nomograms in any clinically meaningful way? If there is to be an additional variable beyond the usual clinical variables, there needs to be explicit justification for how inclusion of these new data (that require additional investment/expense) make the model perform better.
As an example of a paper that addresses both of these concerns, cited by the authors themselves, Dong et al (2019) are clear about the clinical utility of the nomogram they develop and demonstrate that a nomogram with their "radiomic" signatures performs better with respect to diagnostic accuracy than a model with clinical factors alone.
Beyond these major concerns, there are some other issues, statistical and otherwise: 1. Why dichotomize the "collagenomic" signature? Dichotomizing results in loss of information. Is there an association between the signature itself and time-to-event outcomes? If there is later a reason to dichotomize into "high" and "low" signature, be explicit about what that reason is.
2. Issues with the abstract: Multiphoton imaging should be mentioned because it is an essential part of the novelty of the finding. Also, reporting a significant association of a high collagenomics signature with a high risk of peritoneal metastasis and poor oncological outcomes with P<0.05 is insufficient. The actual P-values should be shown -especially as multiple outcomes are being reported in that single sentence so that multiple testing issues are an immediate concern.

Reviewer #1
In this analysis, the authors use multiphoton imaging to quantify the extent of collagen alterations in the tumor microenvironment, to determine an association with peritoneal metastases. The least absolute shrinkage and selection operator (LASSO) regression was used to predict peritoneal metastases based on the collagenomics signature and clinicopathologic risk factors.
198 patients were analyzed in the training cohort, and 115 patients in the validation cohort. A significantly higher peritoneal metastasis rate was found in the high collagen signature patients compared to the low collagen signature. The investigators went on to incorporate tumor size, differentiation, and lymph node status into a competing-risk nomogram. The nomogram resulted in a good average concordance index. Clinical usefulness was considered achieved based on comparison to treat all or treat none strategies.
The authors are to be congratulated for investigating an area of need, in that the peritoneum is the most common site of recurrence after potentially curative resection of gastric cancer. The cohorts demonstrate significant differences in OS and DFS, when stratified according to low and high signature.
Response: We truly appreciate your efforts and comments on our manuscript. We have revised the manuscript according to your comments and suggestions.
Why was size >4 cm and not T stage utilized in the model?
Response: Thank you for your question. Four centimetres is the median tumour size of the enrolled patients; therefore, we divided patients into <4 cm and ≥4 cm groups in terms of tumour size. A tumour size larger than 4 cm was found to be significantly associated with peritoneal metastasis after competing-risk regression and was thus utilized in the model. In this study, only patients with serosal invasion were included, and all enrolled patients were the same T stage. Therefore, the T stage was not utilized in the model.
The authors envision that the nomogram will facilitate personalized medicine.
However, enthusiasm for this nomogram is tempered by the lack of clear cut-off level where an adjuvant therapy would not be indicated. Even in the low collagenomics signature group, the peritoneal metastasis rate is considerable.
Response: Thank you for your comments. Currently, a diagnosis of peritoneal metastasis after radical gastrectomy mainly depends on clinical signs, imaging examinations and even reoperation during the follow-up period, and a practical prediction model at the time point of surgery to predict peritoneal metastasis in GC patients with serosal invasion is still lacking. In this study, although the peritoneal metastasis rate was still considerable even in the low collagen signature group, a significantly higher peritoneal metastasis rate was observed in the high collagen signature group, which indicates that the collagen signature could identify patients who were more likely to suffer from peritoneal metastasis after radical surgery.
We chose 3 years as the time point. Then, the maximum Youden index of 0.3913 was selected as the optimal cutoff value in the training cohort, and all 343 patients were divided into high-risk and low-risk groups. Patients with predicted high risk were considered to receive hyperthermic intraperitoneal chemotherapy to reduce the risk of peritoneal metastasis. We found that the sensitivity, specificity, accuracy, negative predictive value and positive predictive value of the nomogram in the training cohort were 82.3%, 82.4%, 82.3%, 87.7% and 75.6%, respectively. In the validation cohort, the sensitivity was 81.4%, the specificity was 60.8%, the accuracy was 66.9%, the negative predictive value was 88.9% and the positive predictive value was 46.8%. In the total cohort, the sensitivity was 82.0%, the specificity was 72.4%, the accuracy was 75.8%, the negative predictive value was 88.0%, and the positive predictive value was 62.1%. We have added these results to our revised manuscript.
Page 10 Line 22 to Page 11 Line 8: "The maximum Youden index of 0.3913 of the ROC curve of the nomogram was selected as the optimal cutoff value in the training cohort, and patients were divided into high-risk and low-risk groups. We found that the sensitivity, specificity, accuracy, negative predictive value (NPV) and positive predictive value (PPV) of the nomogram in the training cohort were 82.3%, 82.4%, 82.3%, 87.7% and 75.6%, respectively. In the validation cohort, the sensitivity was 81.4%, the specificity was 60.8%, the accuracy was 66.9%, the NPV was 88.9%, and the PPV was 46.8%. In the total cohort, the sensitivity was 82.0%, the specificity was 72.4%, the accuracy was 75.8%, the NPV was 88.0%, and the PPV was 62.1% Table 3)." Page 26 Line 7 to 11: "The maximum Youden index of the 3-year time-independent ROC curve of the nomogram in the training cohort was selected as the optimal cutoff value. Then, all 343 patients were divided into the high-risk and low-risk groups. The sensitivity, specificity, accuracy, PPV and NPV were calculated to evaluate the prediction performance of the nomogram." In addition, we have also added these results to the Discussion section.

Supplementary
Page 13 Line 10 to 20: "Currently, a diagnosis of peritoneal metastasis after radical surgery mainly depends on clinical signs, imaging examinations and even reoperation during the follow-up period; a practical prediction model at the time point of radical surgery to predict peritoneal metastasis in GC patients with serosal invasion is still lacking. In this study, although the peritoneal metastasis rate was considerable even in the low collagen signature group, a significantly higher peritoneal metastasis rate was found in the high collagen signature group, which indicates that the collagen signature could identify patients who were more likely to suffer from peritoneal metastasis after radical surgery. In addition, the nomogram yielded an overall sensitivity, specificity  Table 5    Page 12 Line 16 to 20: "Compared to the clinicopathological model including tumour size, tumour differentiation status and lymph node metastasis, significant improvement in the C-index and AUROC was observed in the nomogram based on the collagen signature, which indicated that the collagen signature could improve the prediction of peritoneal metastasis beyond the use of easily obtained clinical variables." Reviewer #2 I read with interest this study that investigates a collagen signature derived from multiphoton analysis of collagen structure of tissue that has been resected from patients with advanced gastric cancer. The authors have created a collagen-based nomogram to predict peritoneal metastasis for tumours that have serosal involvement and conclude that the addition of the collagen signature improves the ability to predict peritoneal deposits.
Response: We appreciate your effort in reviewing our manuscript.
There are a number of clinical aspects of this study that need clarification and warrant assessment given the biology of gastric cancer differs by anatomical location and by histological type.
Response: Thank you for your valuable comments. We have clarified and addressed the concerns you have raised point-by-point; please see below.
It is notable that the training set and validation cohorts are different clinically. The authors need to explain whether this affects the analysis they have conducted.
Particularly with respect to location of the primary tumour, tumour size and lymph node status.
Response: Thank you for your valuable comments. Considering the clinical differences between the training cohort and validation cohort, we speculated that they might be due to the limited sample size in the validation cohort compared to the training cohort (198 vs. 115 Why did the authors choose Cardia as a reference for location? Cardia are less common in Chinese population and usually they are poorly prognostic and usually require more radical operations that will influence clinical outcomes. To assess the influence of outcome prediction of lymph node status by the removal of clinical variables, we calculated the AUORC and plotted ROC curves based on a combination of lymph node metastasis and other variables. The AUROC was reduced from 0.807 (nomogram) to 0.720 (lymph node metastasis alone) by the removal of other variables. We have addressed these points in our revised manuscript. Page 17 Line 9 to 20: "Although lymph node status seems to be the most significant multivariate predictor of outcome, with highest SHR and range, the prediction of the risk of peritoneal metastasis was always contributed by these four factors. For example, for a patient with a median collagen signature of 0.047 and a tumour size less than 4 cm with poor differentiation, the 3-year probability of peritoneal metastasis would be approximately 11% without no lymph node metastasis. If the N stage was N3a, the risk would increase to approximately 31%. Furthermore, the risk would be 40% if the N stage advanced to N3b. The AUROC would be reduced from 0.807 (nomogram based on the collagen signature) to 0.720 (lymph node metastasis alone) by removal of other variables (Supplementary Figure 12). Other variables that were significantly associated with peritoneal metastasis will also be considered for inclusion in the prediction model in the future."

Supplementary
I have a technical question regarding the multiphoton assay to assess collagen. Was this collagen only in the serosa or was collagen matrix evaluated for the entire section?
Would a negative control include serosa where there is no invasion of tumour cells?
Formalin fixation tends to alter human tissue, while I assume all cases were treated equally, I am wondering about the technique and whether fixation (and possibly degree of fixation) may affect the collagen signature? For instance, would a case that was fixed overnight compared to one fixed over a few days, prior to paraffin embedding, influence the multiphoton imaging?
Response: Thank you for your questions. In fact, only collagen in the serosa was evaluated, not the entire section. In this study, we focused on the local collagen changes in the tumour microenvironment of the serosa, and we presumed that there were no differences in collagen distribution and structure in normal serosa between patients with and without peritoneal metastasis; thus, a negative control of the normal serosa was not included.
Multiphoton imaging is a label-free and noninvasive approach to detect the tissue structure and cell morphology of specimens that is comparable to H&E staining (Yan Page 15 Line 2 to Page 15 Line 6: "In addition, it has been reported that tissue fixation and paraffin embedding have negligible effects on collagen detection and quantification; thus, a sample that was fixed overnight compared to one fixed over a few days, prior to paraffin embedding, would not influence multiphoton imaging 37 ." The regions of selection become important also. I note in Figure 1 there is one region of selection which has adipose tissue in the H&E. Adiposity does occur at the serosa and this will be variably distributed and may influence collagen matrix. How does BMI or fat content in the serosa of individuals affect the assay? Response: Thank you for your comments and questions. To explain whether the adipose tissue in the serosa would affect the assay, we have added the BMI information of all enrolled patients. We found that there was no significant difference in the collagen signature between patients with high and low BMI in both the training and validation cohorts. In addition, competing-risk regression indicated that the BMI had no effect on peritoneal metastasis (SHR: 1.14 95% CI: 0.69-1.89; P=0.61).
Therefore, although adiposity does occur in the serosa, we believe that the fat content in the serosa does not affect this assay. However, fat should be avoided as much as possible. Response: Thank you for your suggestion. The reason we showed the ROC curve at three time points was to present the discrimination ability of the prediction model at different time points. We agree that there was no significant difference among the different time points, and a 3-year cumulative outcome would be sufficient. Therefore, we have removed the ROC curves at 1 and 2 years in our revised manuscript, and the statement in the corresponding Results section was also revised. Page 10 Line 10 to 11: "In the validation cohort, the AUROC at 3 years was 0.776 (95% CI: 0.699-0.853) (Supplementary Figure 6b)."

Supplementary
When considering this as a diagnostic test it would be valuable to have an indication of the sensitivity and specificity as well as positive and negative predictive values to reassure clinicians they are selecting appropriate population for interventions.
Response: Thank you for your useful comments. We chose the 3 years as the time point. Then, the maximum Youden index of 0.3913 of the ROC curve was selected as the optimal cutoff value in the training cohort, and all 343 patients were divided into high-risk and low-risk groups. Patients with predicted high risk were considered to receive hyperthermic IPC to reduce the risk of peritoneal metastasis. We found that the sensitivity, specificity, accuracy, negative predictive value and positive predictive value of the nomogram in the training cohort were 82.3%, 82.4%, 82.3%, 87.7% and 75.6%, respectively. In the validation cohort, the sensitivity was 81.4%, the specificity was 60.8%, the accuracy was 66.9%, the negative predictive value was 88.9% and the positive predictive value was 46.8%. In the total cohort, a sensitivity of 82.0%, a specificity of 72.4%, an accuracy of 75.8%, a negative predictive value of 88.0% and a positive predictive value of 62.1% were detected. We have added these results in our revised manuscript.
Page 10 Line 22 to Page 11 to 8: "The maximum Youden index of 0.3913 of the ROC curve of the nomogram was selected as the optimal cutoff value in the training cohort, and patients were divided into high-risk and low-risk groups. We found that the sensitivity, specificity, accuracy, negative predictive value (NPV) and positive predictive value (PPV) of the nomogram in the training cohort were 82.3%, 82.4%, 82.3%, 87.7% and 75.6%, respectively. In the validation cohort, the sensitivity was 81.4%, the specificity was 60.8%, the accuracy was 66.9%, the NPV was 88.9%, and the PPV was 46.8%. In the total cohort, the sensitivity was 82.0%, the specificity was 72.4%, the accuracy was 75.8%, the NPV was 88.0%, and the PPV was 62.1% Table 3)."

(Supplementary
Page 26 Line 7 to 11: "The maximum Youden index of the 3-year time-independent ROC curve of the nomogram in the training cohort was selected as the optimal cutoff value. Then, all 343 patients were divided into the high-risk and low-risk groups. The sensitivity, specificity, accuracy, PPV and NPV were calculated to evaluate the prediction performance of the nomogram." Response: Thank you for your good advice. We have changed the "collagenomics signature" to "collagen signature" throughout our revised manuscript to avoid misleading the readers. Details are presented in the manuscript.

Supplementary
2) The authors talk about 'high-throughput quantitative collagen features'. Is there any structural/molecular assignment which can be linked to the different features?
Especially the Gabor wavelet transformation features, which majorly influence the calculation of the 'collagenomics signature' is highly vague.
Response: Thank you for your comments. Multiphoton imaging can visualize biomolecular arrays in cells, tissues and organisms; thus, the structural/molecular assignment might be linked to the different features. In this work, the "high-throughput quantitative collagen features" means that the high-dimensional features of collagen including morphological and textural features from second harmonic generation (SHG) images could be extracted after image processing. Gabor wavelet transformation is a kind of textural analysis that is used to reflect spatial relationship of image in different scales and orientations after convolution (Grigorescu SE, et al. IEEE Trans Image Process, 2002, 11: 1160. Other than the visually apparent features such as the length of collagen or area covered, the Gabor wavelet transformation features in this study indicate the collagen distribution of the image in different degrees. In the revised manuscript, we have addressed these points. Response: Thank you for your comments. The SHG of multiphoton imaging was initially used to describe the collagen morphology for optical diagnosis of tissues.
Collagen from SHG imaging was presented to describe empirical observations that were associated with particular pathological conditions. Multiphoton imaging has emerged as a useful tool for extracting quantitative collagen features in recent years.
Currently, a common consensus about the selection of feature types has not yet been achieved to comprehensively quantify collagen alterations using multiphoton imaging.
In our previous studies, we extracted four types of the abovementioned features to evaluate liver fibrosis (Xu S, et al. J Hepatol. 2014, 61:260-9;Xu S, et al. J Biophotonics. 2016, 9: 351-63). Similarly, in this study, we constructed the collagen signature based on these four types of collagen features.
The morphological features, such as collagen length and width, are easily understood.
Histogram and GLCM are two main types of textural features of collagen that have been reported by several studies (Mostaço-Guidolin LB, et al. Am J Respir Crit Care Med. 2019, 200: 431-43;Hristu R, et al. Biomed Opt Express. 2018, 9: 3923-36). The GLCM provides a second-order statistical representation of the distribution of grey levels within a specific ROI, which, in turn, provides the basis for textural analysis.
GLCM is built by calculating the occurrence of a certain grey-level pair i next to grey level j at the distance δ along the direction α. After the GLCM is obtained, the probability density function, P δ, α (i,j), of finding certain pairs of pixel intensity i and j are calculated. Therefore, GLCM textural analysis considers the variation in pixel grey levels within a certain distance. Thereby, the forms, distributions and variation in the imaged objects, such as collagen, can be tracked (Golaraei A, et al. Biomed Opt Express. 2020, 11: 1851. Histogram-based features summarize the collagen signal intensities within the ROI, and the inter-pixel correlation is ignored. The three types of textural features, including Gabor wavelet transformation features, were used to describe the spatial distribution of the collagen from different perspectives. We have addressed these points in our revised manuscript. In our previous studies, we extracted four types of the abovementioned features to evaluate liver fibrosis using multiphoton imaging 22,41 . Based on these results, we established the collagen signature from four types of collagen features." Page 23 Line 10 to 22: "The GLCM-based features provide a second-order statistical representation of the distribution of grey levels within a specific region of interest, which in turn provide the basis for textural analysis. GLCM is built by calculating the occurrence of a certain grey level pair i next to grey level j at the distance δ along the direction α. After GLCM is obtained, the probability density function, P δ, α (i,j), of finding certain pairs of pixel intensity i and j are calculated. Therefore, GLCM textural analysis considers the variation in pixel grey levels within a certain distance.
Histogram-based features summarize the collagen signal intensities within the region of interest, and the inter-pixel correlation is ignored. Gabor wavelet transformation is a kind of textural analysis that reflects spatial relationship of images in different scales and orientations after convolution of images 40 . In a word, these three types of textural features were used to describe the spatial distribution of the collagen from different perspectives." LASSO regression aims to identify the variables and corresponding regression coefficients that lead to a model that minimizes the prediction error from high-dimensional data. In a practical sense this constrains the complexity of the model.
Additionally, the LASSO approach trades off potential bias in estimating individual parameters for a better expected overall prediction (Ranstam J, et al. Br J Surg. 2018, 105: 1348. DOI: 10.1002. In this study, LASSO regression mainly selected Gabor wavelet features as potential predictive variables, which indicates that the combination of the three selected Gabor wavelet features and the mean cross-link density was most associated with the risk of peritoneal metastasis. In the revised manuscript, we have added these explanations. Page 16 line 3 to 11: "LASSO regression aims to identify the variables and corresponding regression coefficients that lead to a model that minimizes the prediction error from high-dimensional data. In a practical sense, this constrains the complexity of the model. Additionally, LASSO regression trades off potential bias in estimating individual parameters for a better expected overall prediction and focuses on the best combination among the features 42 . In this study, the LASSO regression mainly selected Gabor wavelet features as potential predictive variables, which indicates that the combination of the three selected Gabor wavelet features and the mean of cross-link density was most associated with the risk of peritoneal metastasis." 4) Training and validation data are always represented in 2 different figures/panels. Is this due to the variability between the two cohorts? If it was known from the clinical data already that there are differences between the tumor size, tumor location etc., it might be worth considering a pooling of all patients and randomly define a test and a training/validation set.
Response: Thank you for your good questions. The reason we represented the training and validation cohorts in 2 different figures/panels was not due to the variability between the two cohorts. We wanted to reveal that the performance of the prediction model developed in the training cohort could be validated in the validation cohort.
In addition, considering the clinical differences between the training cohort and validation cohort, we speculate that it might be due to the limited sample size in the validation cohort compared to the training cohort (198 vs. 115). Thus, we have enlarged the sample size and added additional 30 patients from October 1, 2010, to March 31, 2011, in the validation cohort using the same criteria. Then, we completed multiphoton imaging of these patients. After adding these patients, there were not significant clinical differences between the training and validation cohort. We have updated the new Table 1 in our revised manuscript. Multiphoton imaging is a label-free tool to obtain the tissue structure and cell morphology of specimens, and it is comparable to H&E staining (Yan J, et al. Surg Endosc, 2014, 28:36-41;Yan J, et al. J Biomed Opt, 2012, 17:026004;Chen J, et al. Gastrointest Endosc, 2011, 73:802-7); thus, experienced pathologists can master multiphoton imaging with little training and define the ROIs based on multiphoton imaging. We have addressed these points to the Discussion section of our revised manuscript.
Page 14 Line 18 to Page 15 Line 6: "There was no treatment on the unstained serial sections before the measurements, and the paraffin did not need to be removed 17,22 Moreover, multiphoton imaging is a label-free and noninvasive tool to obtain the tissue structure and cell morphology of specimens; it is comparable to hematoxylin-eosin (H&E) staining and does not affect the collagen signature 35,36 ; thus, experienced pathologists could master multiphoton imaging with little training, and it is possible to define regions of interest based on multiphoton imaging. In addition, it has been reported that tissue fixation and paraffin embedding have negligible effects on collagen detection and quantification; thus, a sample that was fixed overnight compared to one fixed over a few days, prior to paraffin embedding, would not influence multiphoton imaging 37 ." 6) Fig. 1 indicates that the authors also collected TPEF signal from the tumor tissues.
It is not clear why the features of these images were not included in the prediction models.
Response: Thank you for your comment. Indeed, the TPEF signal from the tumour tissues was not included in the prediction model. In this study, we focused on the influence of collagen changes on peritoneal metastasis. As a novel imaging technology, multiphoton imaging mainly contains two types of signals, including TPEF and SHG. From our previous studies, we found that combining TPEF with SHG could better reveal the tissue architecture and cell morphology of the specimens and was comparable to H&E staining (Yan J, et al. Surg Endosc, 2014, 28:36-41;Yan J, et al. J Biomed Opt, 2012, 17:026004;Chen J, et al. Gastrointest Endosc, 2011, 73:802-7). Thus, the TPEF signal was collected in our study to help to determine the regions of interest. Response: Thank you for your questions. The X-tile plot in Supplemental Fig. 5 represents the data for training cohort. Usually, when developing and validating a biomarker for individual prognosis, the cutoff value of the biomarker is determined in the training cohort; then, the same cutoff value is used in the validation cohort (Jiang Y, et al. Ann Surg, 2018, 267:504-13;Zhang JX, et al. Lancet Oncol, 2013, 14:1295Huang Y, et al. Radiology, 2016, 281:947-57) to illustrate that the cutoff value is also available in the validation cohort.
For Supplementary Fig. 5, a detailed description was provided in the previous publication (Camp RL, et al. Clin Cancer Res, 2004, 10:7252-9). Briefly, in Supplementary Fig. 5a, the colours in the plot represent the strength of the association at each division, ranging from low (black) to high (bright red or green). Red represents the inverse association between the collagen signature and survival, indicates that the higher of the collagen signature is, the worse the survival. In We agree that selecting independent features for a prediction model is one of the standard methods to construct a new model. LASSO regression has also been shown to outperform standard methods in some settings and has been broadly used to deal with high-dimensional data (Ranstam J, et al. Br J Surg. 2018, 105: 1348. DOI: 10.1002. It trades off potential bias in estimating individual parameters for a better expected overall prediction and focuses on the best combination among the features. In this study, the extracted collagen features were regarded as an integrity, and we thought that the collagen signature was a single parameter, which was similar to age or sex. Thus, we used LASSO regression to construct the collagen signature. Page 16 Line 11 to 18: "We found that there were correlations among the three Gabor wavelet transformation features (Supplementary Figure 10). Although selecting independent features for a prediction model is one of the standard methods to construct a new model, LASSO regression has also been shown to outperform the standard methods in some settings, and has been broadly used to deal with high-dimensional data 24,25,42 . The extracted collagen features were regarded as an integrity, which should be a single parameter; thus, we used LASSO regression to construct the collagen signature." 9) Page 11, the author state that "the collagenomics signature was positively corrected with the cross-link density of collagen….". This is not surprising as the cross-link density is part of the 'Collagenomics signature calculation formula' (Appendix).
However, it is not clear if this cross-link density (meaning the connections between individual collagen fibers?) is correlated to chemical crosslinks that are mostly present within a collagen fiber. The previous study from the authors (reference no 25) refers to chemical collagen crosslinks. Studies that analyze systematically the relationship of collagen network features (e.g. via SHG) and chemical crosslinking are still missing.
The section in the manuscript (P. 11) needs clarification.
Response: Thank you for your correction. The cross-link density in our study indicated the physical connections between individual collagen fibres but not the chemical collagen crosslinks. We have clarified the statement in our revised manuscript.
Page 14 Line 7 to 13: "In this study, the cross-link density indicates the connections between individual collagen fibres (i.e. physical cross-link density). A previous study has reported that an increased chemical cross-link density of collagen heightened the stromal stiffness and stimulated the invasive properties of tumour cells 32 . Thus, whether there is any connection between the physical cross-link density and chemical cross-link density and how the physical cross-link density affects the biological behaviours of tumour cells needs to be further investigated."

Reviewer #4
The authors propose a multiphoton imaging-derived "collagenomics" signature that associates with a high risk of peritoneal metastasis in gastric cancer with serosal invasion. This signature is validated in an independent, external data set.
Response: Thank you for your comments.
This validated "collagenomics" signature in and of itself is a novel and interesting finding, especially for those who study and treat gastric cancer. If there were further metastasis-associated multiphoton imaging-derived collagen-related findings presented across multiple cancer types, these would be of widespread interest to the greater cancer research community.
Response: Thank you for your comments. We truly appreciate your effort in reviewing our manuscript.
Seemingly in order to find clinically relevant use for the signature, the authors then build a nomogram that includes this signature to predict individual risk of peritoneal metastasis in GC with serosal invasion. However, there are major concerns and issues with their nomogram approach and methods.
Response: Thank you. We have addressed these concerns and issues with our nomogram approach and methods, and changes have been made to our manuscript.
Fundamentally, a nomogram is built to be used in the clinic. Therefore, there needs to Peritoneal metastasis is difficult to predict on clinical grounds. Cytologic examination of peritoneal lavage, which has been used to assess the risk of peritoneal metastasis in GC with serosal invasion, has been reported to lack sensitivity because a large number of patients still die from peritoneal metastasis even though they have negative cytologic results 12 . Some imaging modalities, including computed tomography (CT) and endoscopic ultrasonography (EUS), are common examination tools for GC; however, the accuracy of these imaging modalities for the diagnosis of peritoneal metastasis is not satisfactory 13 , and it is not until patients are suffering from peritoneal metastasis that these imaging modalities can identify the outcome. Considering the limited performance of the clinical variables and the high complication rates of IPC, a novel biomarker is needed for the prediction of peritoneal metastasis in GC with serosal invasion after radical gastrectomy to influence decision making." More concerning, because it is an issue that can not be addressed by reorganization of the manuscript, is the inclusion of the "collagenomics" signature into the nomogram without addressing the essential question of whether there is justification for including non-clinical variables into a nomogram at all. Does the "collagenomics" signature add on to the clinical variables already used in similar nomograms in any clinically meaningful way? If there is to be an additional variable beyond the usual clinical variables, there needs to be explicit justification for how inclusion of these new data (that require additional investment/expense) make the model perform better.
As an example of a paper that addresses both of these concerns, cited by the authors themselves, Dong et al (2019)  P<0.001). The C-indexes of the two models were also compared, and similar results were observed. We have added these results to the revised manuscript.

Comparison with the clinicopathological model
To evaluate the superiority of the nomogram based on the collagen signature over other easily obtained clinical variables, we excluded the collagen signature and built a clinicopathological model based on tumour size, tumour differentiation status and lymph node metastasis (Supplementary Table 4 Page 25 Line 13 to 14: "C-index and AUROC were used to compare the performance between the nomogram based on the collagen signature and the clinicopathological model."