A lncRNA signature associated with tumor immune heterogeneity predicts distant metastasis in locoregionally advanced nasopharyngeal carcinoma

Increasing evidence has revealed the roles of long noncoding RNAs (lncRNAs) as tumor biomarkers. Here, we introduce an immune-associated nine-lncRNA signature for predicting distant metastasis in locoregionally advanced nasopharyngeal carcinoma (LA-NPC). The nine lncRNAs are identified through microarray profiling, followed by RT–qPCR validation and selection using a machine learning method in the training cohort (n = 177). This nine-lncRNA signature classifies patients into high and low risk groups, which have significantly different distant metastasis-free survival. Validations in the Guangzhou internal (n = 177) and Guilin external (n = 150) cohorts yield similar results, confirming that the signature is an independent risk factor for distant metastasis and outperforms anatomy-based metrics in identifying patients with high metastatic risk. Integrative analyses show that this nine-lncRNA signature correlates with immune activity and lymphocyte infiltration, which is validated by digital pathology. Our results suggest that the immune-associated nine-lncRNA signature can serve as a promising biomarker for metastasis prediction in LA-NPC.


Dear Editor and Reviewers,
Thank you very much for your important and supportive comments, questions and suggestions, which have greatly helped us to improve our study. Enclosed is the revised version of manuscript (Ref.: NCOMMS-21-29177) entitled, "A lncRNA signature associated with tumor immune heterogeneity predicts distant metastasis in locoregionally advanced nasopharyngeal carcinoma". We have revised and modified the manuscript in accordance with the reviewers' comments and re-submitted the revised manuscript. We all appreciate your support and efforts on our manuscript and look forward to your further decision, and we sincerely hope to have the opportunity to publish this paper in Nature Communications.
A copy of the manuscript indicating where revisions have been made is included in the resubmission. The revisions are indicated using the "Track Changes" function in Word. Please find below our responses to the revisions proposed by the reviewers.

Response to Reviewer #1:
It is a multicentre, retrospective study to identify a nine-lncRNA signature for predicting metastasis in locoregionally advanced NPC. The candidate lncRNAs were identified by the differential expression analysis using microarray. The lncRNA signature was established in the training cohort by LASSO, and further validated in two independent NPC cohorts, indicating the robustness and soundness of the analysis methods. This signature appears to be associated with immune activity and tumor lymphocyte infiltration, suggesting the importance of tumor microenvironment for NPC metastasis. The study is novel and original with high translational values. The reviewer has a few major and minor comments and suggestions to further improve the quality of the study.

Major comments/questions
1. The patient characteristics are summarized in Table 1. But it is unclear whether all the patients included in the study are sporadic? Is there any record for family history for all the patients in this study? The Guilin external validation cohort has more cases >45y. What is the reason for the age discrepancy between the internal and external cohorts? The p value for evaluating the association of risk groups characterized by the lncRNA signature and the clinical parameters should be added in this table. Are these lncRNAs correlated with the clinical parameters at the individual level?

Answer:
Thank you very much for your valuable comments.
(1) In light of the reviewer's comments, we investigated the family history of all patients with nasopharyngeal carcinoma (NPC) in this study, and information was available for patients in the Guangzhou training and internal validation cohorts. In total, 38 of 354 (10.7%) patients had a family history of NPC (parents or siblings). There were no differences in family history of NPC between the low-and high-risk groups (χ2 test, Table 1). (2) The age discrepancy between the internal (Guangzhou) and external (Guilin) cohorts might be due to the difference in the treatment preferences of local hospitals or in large cities for young versus old patients. As in endemic areas, the incidence of NPC increases during adolescence and peaks at middle age, and incidence in males is approximately 2-to 3-fold higher than that in females 1 . A middle-aged man is usually the primary breadwinner for his family, so NPC in males means a loss of major income and a substantial increase in medical expenses, which imposes a heavy economic burden on the family, especially when the prognosis is poor.
Therefore, young patients probably prefer to seek treatment in large cities, where they believe they would receive better medical care, helping them return to society and their life more smoothly. In contrast, elderly patients may choose to seek treatment in local hospitals, which is convenient for their children to care for them. As a result, the age distribution in the internal (Guangzhou) and external (Guilin) cohorts is different.
(3) As the reviewer suggests, we have performed χ2 test to assess the association of risk groups characterized by the lncRNA signature and the clinical parameters ( Table 2). P values have been added to Table 1 of the revised manuscript.  Table 3 along with the P values. The cutoff value for each lncRNA was the threshold that produced the maximum sum of sensitivity and specificity in the ROC curve. As a result, lnc-CDK1-1 was significantly associated with N stage and lnc-STX6-2 was significantly associated with T stage.

Answer:
Thank you very much for the questions.
(1) In our study, the selected differentially expressed (DE) lncRNAs were not subjected to qRT-PCR in the discovery cohort (n=56, 38 NPC and 18 healthy controls). Instead, similar to many other biomarker studies, we validated and further screened NPC metastasis-related lncRNAs with qRT-PCR analysis in the training cohort (n=177). According to multicohort biomarker studies, researchers often use high-throughput assays to screen DE profiles in small subgroups of patients and then validate the findings with low-throughput methods in larger populations 2,3 .
We expected that it would be more reasonable to validate the DE lncRNAs in the training cohort, which reduces the chance of false positive and false negative discoveries due to a larger sample size. Taking all these factors into account, we performed the validation of DE lncRNAs in the training cohort instead of the discovery cohort.
(2) The differentially expressed lncRNAs between LA-NPC and healthy controls, and between LA-NPC patients with metastasis and those without metastasis, were identified with cutoffs of P < 0.05 and fold change > 1. 3. Are these nine lncRNAs expressed in the cancer cells? It is worthwhile to identify the source of lncRNA expression detected in this study.

Answer:
Thank you for your valuable comment. Based on the microarray analysis results, we believe the nine lncRNAs are not only expressed in cancer cells, as they were also detected in normal nasopharyngeal tissue samples. To further confirm this inference, we tried to isolate different types of cells from fresh NPC tissues to explore the source of the nine lncRNAs.
We collected NPC biopsies from the primary tumor site by endoscopy. were used to identify tumor cells (EpCAM + CD45 -) and immune cells (EpCAM -CD45 + ). TRIzol reagent (Invitrogen) was used to isolate total RNA. PCR was performed using SYBR Green qPCR SuperMix-UDG reagents (Invitrogen) on a LightCycler 480 detection system (Roche) with β-actin as an endogenous control. We collected three samples for qRT-PCR analysis, and all lncRNAs were detected in both cancer cells and immune cells (Figure 1).
It has been reported that some lncRNAs play essential roles in tumor progression and lead to considerable differences in clinical outcomes, which have been found to be expressed in malignant   4. In Figure 6, the difference between the high-and low-risk groups was observed in the B cells and CD8+ T cell infiltration. Can these two cell populations be the predictors for NPC metastasis as well?

Answer:
Thank you very much for this question. As the reviewer suggests, we investigated the prognostic value of B cells and CD8 + T cells in our cohort (n=99). The results showed that the intratumoral and stromal CD8 + T cell infiltration were significantly lower in the metastatic group than those in the nonmetastatic group (Figure 2A-B). The Kaplan-Meier plots also displayed significantly different DMFS in patients with high and low CD8 + T cell infiltration (Figure 2C-D). Meanwhile, we found that the intratumoral and stromal CD20 + B cell infiltration had no significant difference between the metastatic and nonmetastatic groups ( Figure 2E-F). Also, high and low level of CD20 + B cell infiltration were not predictors for DMFS ( Figure 2G-H).
In our study, CD8 + T cell infiltration but not CD20 + B cells was predictor of metastasis, but we also acknowledge that the results might be limited by the relatively small sample size (n=99) and should be further validated in a larger cohort before drawing conclusions. P values were calculated by log-rank test.

Comparison of intratumoral (A) and stromal (B) CD8 + T cells in patients in
5. It is unclear whether the prediction accuracy can be further boosted by combining three variables (lncRNA signature, pre-EBV and N stage) together in Guangzhou training and validation cohorts. Both cohorts have the pre-EBV data.

Answer:
We appreciate your advice. Using the same method described in the manuscript, we constructed a model by combining three variables (lncRNA signature, pretreatment EBV DNA and N stage) using multivariate Cox analysis in the training cohort ( Table 4), followed by validation in the internal cohort. The combined three-variable model also improved the efficiency of metastasis prediction compared with pretreatment EBV DNA or N stage alone (Figure 3), which was confirmed in the internal validation cohort. We also noted that the accuracy of the three-variable model was not inferior compared to that of the two-variable model (combining the nine-lncRNA signature and N stage) (Figure 3).

Answer:
Thank you for your question. The nine-lncRNA signature is the focus of the current study. Using Cox analysis, we found that it was significantly associated with DMFS. The Kaplan-Meier plots showed no cross-section between survival lines of the high-and low-risk groups, which primarily indicated that the proportional hazard (PH) assumption is likely to hold 8 . To further assess proportionality, we tested the PH assumption by Schoenfeld residuals 9 and test time and covariate interaction for statistical significance 8 using the "survival" and "survminer" packages in R software (version 4.0.3). The Schoenfeld residual test (Figure 4) and the interaction of the lncRNA signature and time ( Table 5) were not statistically significant.
Other covariates examined include host factors (i.e., age and sex), tumor factors (i.e., T and N stage) and pretreatment EBV DNA using the above methods, as well as a global test for all covariates after fitting the Cox regression model. As a result, the PH assumption is also evidenced by Schoenfeld residual plots ( Figure 5) and a global test for all covariates after fitting the Cox regression model (Guangzhou training cohort: P global = 0.87; Guangzhou internal validation cohort: P global = 0.50; Guilin external validation cohort: P global = 0.19). The results of the time and covariate interaction also support the PH assumption (Table 5).    11 . In NPC, some noncoding genes were identified within the frequent deletion or duplication regions and contributed to cancer genetic susceptibility 12 . For example, lncRNA MEG3 was inactive because of its location in a common copy number deletion region, chromosome 14q32 13 . These findings suggest that the chromatin abnormalities may be an essential mechanism involved in the lncRNA dysregulation.
Similar to protein-coding gene, lncRNA is regulated by typical epigenetic and transcriptional mechanisms. For instance, the activating marker (H3K27ac) is strongly enriched in the promoter region of lncRNA CCAT1, which activates CCAT1 transcription and upregulates its expression 14 .
A similar phenomenon can be found in the AFAP1-AS1 promoter under trastuzumab treatment, which confers chemoresistance to the cancer cells 15 . Additionally, the methylation of CpG islands inhibits the binding of DNA to transcription-associated proteins and mediates gene silencing 16 .
Therefore, the DNA methylation status of a gene promoter region is negatively correlated with its expression 17,18 , which contributes to the carcinogenesis and tumor progression. It is worth noting that aberrant DNA methylation was found to be a frequent event in NPC, and hypermethylation status was found in some specific genetic loci, indicating that some tumor-suppressing lncRNAs may be silenced by promoter hypermethylation in NPC 19,20 .
Moreover, lncRNA expression can also be regulated at the posttranscriptional level. One of the key features of this mechanism is mediated by RNA binding proteins (RBPs), which can alter RNA stability by controlling the molecular environment [21][22][23] . For instance, IGF2BP1 accelerates the catabolism of lncRNA HULC and results its downregulation in liver cancer 24 . HuR stabilizes HGBC and maintains its high expression in gallbladder cancer 25 . In NPC, IGF2BP2 binds with TINCR to slow down its RNA degradation and upregulates TINCR's expression 26 .
No studies have yet explored the potential reasons for the dysregulated expression of the nine lncRNAs in the immune-related signature in NPC, which entails a significant amount of research.
Upcoming research by our group will hopefully bring more clarity to these issues. To address the reviewer's concern, we have added these details about the digital pathology analysis to the Methods section of the revised manuscript (Page 17, paragraph 1).

Minor comments
1. The font size for the pathways in Figure 5C is too small. Figure 5C is difficult to read. What does the size of pathways 1-22 mean in the plot? Why the size of the pathway 3 "Angiogenesis" is much smaller than the pathway 1 "EMT"? More details for the figure legend should be added.

Answers:
Thank you for your valuable comments. The size of the pathway represents the number of genes in the gene set of that pathway. For example, there were 36 genes in the "angiogenesis" gene set, which is much smaller than other gene sets such as "EMT" (200 genes). Therefore, the size of the "angiogenesis" pathway is relatively smaller.
To address the reviewer's concern, we have modified the figure and related legend of Figure   5C in the revised manuscript to make it clearer for the readers. the enrichment score, and its color represents the magnitude of the statistical significance (shown as −log10 (P value)). The color of the third circle indicates that the labeled pathway is upregulated in the high-or low-risk group. The size of the inner bar plot shows the normalized enrichment score.

Response to Reviewer #2:
A lncRNA signature associated with tumour immune heterogeneity predicts distant metastasis in locoregionally advanced nasopharyngeal carcinoma The authors have identified a set of 9 long-non-coding RNAs to be predictive of locoregionally advanced nasopharyngeal carcinoma. The authors derive a weighted formula based on the expression levels (by qRT-PCR) of these 9 lncRNAs and show that it is predictive of LA-NPC metastasis in two cohorts of patients. The authors also show that the 9 lncRNAs are independently prognostic relative to other known prognostic factors and correlate with the level of immune infiltration.

Comments:
1. Introduction: The language needs overall work (academic style writing).

Answer:
Thank you for your valuable comment. We have modified the manuscript carefully and have further used English language editing services to provide language editing assistance. The major modified parts are indicated in the revised manuscript. The editing certificate has been attached as follows. [REDACTED] 2. Line 198-215: Does the derived risk score weights/formula sensitive to the method/kit used for qRT-PCR? If the authors propose this model to be used by other investigators, it is important to test its sensitivity outside the developed lab.

Answer:
Thank you very much for the question. Although many commercially available qRT-PCR kits and platforms generally perform the same function, the quantification results differ and maybe affected by the reagents, hardware and software design used 1,2 . For instance, the detection of EBV DNA, an indicator for NPC disease monitoring, can yield large variability using quantitative PCR assays without harmonization even in experienced clinical labs 3 . A tremendous amount of work has been done to reduce this variability, including the use of common calibrators and a PCR master mix. According to the criteria, we defined the intratumoral CD8 + T cells and CD20 + B cells as those in tumor nests of the whole tissue sections having cell-to-cell contact with no intervening stroma and directly interacting with cancer cells, while stromal CD8 + T cells and CD20 + B cells are dispersed in the stroma between the tumor nests and do not directly contact cancer cells. In addition, zones with crush artifacts, necrosis and fibrosis were excluded from the analysis. The same standard has been used in many studies for immune infiltration evaluation in various cancers [9][10][11][12] , including our previous large-scale cohort study in NPC 13 .

Researchers from Stanford University and another four centers in Hong
We divided positive cell counts by the corresponding areas when calculating the density of CD8 + T cells and CD20 + B cells. More specifically, the density of intratumoral CD8 + T cells was equal to intratumoral CD8 + positive cell counts divided by intratumoral areas (mm 2 ), which were both derived from HALO software. This method is widely used in digital image analysis 14,15 and considered to be useful for standardization, since it determines the number of infiltrating cells as an exact measurement contrary to the approximate semiquantitative evaluation 6 . In this way, the effect of tumor size on immune infiltration assessment has been taken into account.
To address the reviewer's concern, we have added more details about the digital pathology analysis to the Methods section of the revised manuscript (Page 17, paragraph 1) and modified the pipeline of digital pathology analysis in Fig. 6B. In this work, the authors integrated multiple NPC data and identified nine lncRNA related to prognosis through a series of conventional bioinformatics methods. The authors developed a nine-lncRNA module which were crucial to the NPC patients' prognosis and metastasis and validated the module in other independent data to prove the accuracy. The work prioritized lncRNA biomarker in NPC and provides a novel prognosis module for NPC subtype. However, there are some issues need to be solved.
Major Point: 1. The description of differential expression analysis is indistinct. The author should add specific methods or the R package. Besides, the author can provide two Supplementary table containing the detail result, such as the p value, FDR value for each gene.

Answer:
Thank you very much for your advice. In the microarray data analysis, the raw data were extracted using the Agilent Feature Extraction software (version 11.0.1.1). Quantile normalization, quality control and differential expression analysis were performed with GeneSpring GX v12.1 software package (Agilent Technologies). After normalization, transcript-specific lncRNAs probes labeled as "present" in more than or equal to half of the samples were considered high-qualified for the biomarker selection. Differential expression analyses were performed based on t-test between samples from 18 LA-NPC patients and 18 healthy controls, and between 10 paired samples from LA-NPC developed with or without posttreatment distant metastasis. The lncRNAs showing significant differential expression between the two groups were identified with cutoffs of P value < 0.05 and Fold Change > 1.5.
As the reviewer suggests, we have modified the Methods section (Page 15, paragraph 2). We also summarized the results of differential expression analysis with fold change, P value and FDR in Supplementary table S3. 2. Although lncRNA is very important in the process of disease development, protein coding genes contain a wider range of genetic information. Why does the author only analyze the prognosis factor based on lncRNA expression profile, rather than integrate the two expression profiles?

Answer:
Thank you very much for your valuable comments. Although the function of protein-coding genes has been substantially elucidated, emerging technologies are expanding investigators' abilities to annotate lncRNAs, and complex molecular mechanisms for lncRNAs are now being demonstrated.
LncRNAs have now been linked with the context of most, if not all, of the classic hallmarks of cancer, including tumor metastasis 1 . It has been reported that lncRNAs promote tumor metastasis by facilitating invasion and migration, modifying EMT, regulating metastatic colonization, and altering tumor microenvironment 1 . Our previous studies have also reported the roles of lncRNAs involved in NPC metastasis. For instance, DANCR promotes metastasis by interacting with the NF90/NF45 complex 2 , and FAM225A promotes metastasis by acting as a ceRNA to sponge miR-590-3p/miR-1275 and upregulate ITGB3 3 . These studies motivate our interest to determine whether lncRNAs could serve as metastasis predictors in NPC.
Increasing studies reveal that lncRNAs modulate mRNAs expression through various ways, including controlling mRNA stability, splicing, and translation 4 , which illustrates the complicated relationship between lncRNAs and mRNAs. As a result, a Cox model incorporating the lncRNAs and mRNAs for prognosis might not be suitable to reflect their subtle and complicated interactions.
To our knowledge, studies usually integrate mRNA and lncRNA expression profiles to construct a ceRNA network [5][6][7] . This network probably helps to illustrate the regulatory mechanisms between those RNAs, but it is rarely used directly to construct a biomarker for clinical practice as it might not be easily generated by a mathematical formula. An applicable model requires a clear definition of predictors and reproducible measurement methods available in clinical practice 8 . Therefore, we believe that a prognostic model calculated with a simple mathematical formula is of clinical value.
Considering the above, we constructed a lncRNA-based signature in present study. Subsequently, the proposed lncRNA signature was proven to be a robust prognostic indicator for distinguishing patients with a high risk of posttreatment distant metastasis in NPC.
In light of the reviewer's comment, we will explore an appropriate method for integrating mRNA and lncRNA data and evaluate whether integrating mRNAs with lncRNAs could further enhance the prediction accuracy in our future work.